Math Help Forum

Math Help Forum Feed Site Feed

Go Back   Math Help Forum > College/University Maths Help > Calculus
Reply
 
Thread Tools Display Modes
  #11  
Old 01-06-2007, 07:10 PM
ThePerfectHacker's Avatar
Global Moderator

 
Join Date: Nov 2005
Location: New York City
Posts: 11,656
Country:
Thanks: 366
Thanked 3,162 Times in 2,622 Posts
ThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond repute
Default

Here are some applications for which optimization problems can be used for. First, most things in the universe either come as a max/min. I remember my professor asked a question to the class. We were working on a beam problem and he asked all of these equations work thus all of these are the possible outcomes for the shape of curvature for the beam, how do we know it looks like this if all of these work. I immediately raised my hand (the only one, for some reason in college nobody ever answers anything) and said it is minimized, and he said correct, but why? My purpose of this telling is to mention what I said, that most things in the universe always come as a max/min problems, thus it was my experience that told me to say that. For example, a planet is in a shape of a ball (sphere) and the sphere is the maximum volume with a given surface area, or you can think of it as the minimum surface area with a given volume. Another useful thing about max/min solutions is that if it works for the worst case senario then it works for all cases, thus we can sometimes just look at one situation (worst case) show that it works and conclude that it always works, instead of getting each possible case.

Shortest Distance:
The distance from a point to a line (assuming it is not on it) is the shortest distance from the point to the line, because distance is measured in minimum value. To find that distance we draw a perpendicular line (which is minimized) and find its distance. The standard way is to do this: 1)Given the equation of line and a point, 2)Find equation of perpendicular line passing through the point, 3)Find the intersection between the line and its perpendicular, 4)Find the distance from the intersection point to that point. This is an algebraic nightmare. I have indeed down out a full solution (it took me 2 hours) and posted it on the site, but the page is not loading.
We will use the concept of minimums/maximums to solve this problem.
If we are given a horizontal line and a point the solution is trival (obvious).
Thus, it is safe to assume we have a non-horizontal line, y=mx+b and a point (x_0,y_0) not contained on the line (otherwise the distance is zero). What point on the line (x,y) minimizes the distance to the point (x_0,y_0) is what we are looking for. Thus, let the point on the line be (x,y)=(x,mx+b) then the distance to (x_0,y_0) is,
s=\sqrt{(x-x_0)^2+(mx+b-y_0)^2}.
The important observation is that if s is minimized then s^2 is too minimized (if the distance is as small as can possibly be then the square of the distance is the small as it can possibly be).
Thus, we have a function depending on x,
s=(x-x_0)^2+(mx+b-y_0)^2.
Now we find the derivative and make it zero (you are probably thinking how do we know if that is the minimum value? Maybe it is the maximum value, as we have seen. Simple, it cannot possibly be the maximum value because we could move the point as far as we want).
Thus, chain rule,
s'=2(x-x_0)+2m(mx+b-y_0)
s'=2x-2x_0+2m^2x+2mb-2my_0=0
x(1+m^2)=x_0+my_0-mb
Thus, (1+m^2)\not = 0,
x=\frac{x_0+my_0-mb}{1+m^2}
That is the intersection point between the line and the perpendicular.
Note, the following will involve some algebra manipulation but it is not nearly as bad without derivatives what we did in the beginning.
Now we find the distance between that point and the given point.
Note that, (details omitted),
x-x_0=\frac{x_0+my_0-mb}{1+m^2}-x_0=\frac{m(y_0-b-mx_0)}{1+m^2}
And that, (details omitted),
mx+b-y_0=\frac{mx_0+m^2y-m^2b}{1+m^2}+b-y_0=\frac{mx_0+b-y_0}{1+m^2}.
The the distance l between the points is,
l^2=(x-x_0)^2+(mx+b-y_0)^2
l^2= \frac{m^2(y_0-b-mx_0)^2}{(1+m^2)^2}+\frac{(mx_0+b-y_0)}{(1+m^2)^2}
Note,
(-x)^2=x^2
Thus,
(mx+b-y_0)^2=(y_0-b-mx_0)^2
Thus, (factoring the common expression)
l^2=\frac{(mx_0+b-y_0)^2(1+m^2)}{(1+m^2)^2}
Thus, (canceling)
l^2=\frac{(mx_0+b-y_0)^2}{1+m^2}.
Take square root and note, \sqrt{x^2}=|x|,
l=\frac{|mx_0+b-y_0|}{\sqrt{1+m^2}}.
But, I am going to make this formula even better!

There are two ways of expressing the equation of a line: slope intercept form and standard form,
y=mx+b---> Slope-Intercept Form.
Ax+By+C=0---> Standard-Form.
Now,
Standard > Slope-Intercept
Because in slope-intercept we can only express non-vertical lines.
In Standard Form we can express any line, both vertical and non-vertical.
To prove this assume we have a line. Divide the prove into two cases: A vertical line and a non-vertical line. If it is the first case then, x=k thus, 1\cdot x+0\cdot y-k=0.
If it is non-vertical then we can write, y=mx+b, thus, mx-1\cdot y+b=0.

Returning back to the "shortest distance" problem. If we have a non-vertical line in the form Ax+By+C=0 then B\not = 0 (non-vertical) thus,
(A/B)x+y+(C/B)=0
y=-(A/B)x-(C/B)
And some point not on the line (x_0,y_0).
The formula says,
l=\frac{|-(A/B)x_0-(C/B)-y_0|}{\sqrt{1+(-A/B)^2}}
Thus,
l=\frac{|(A/B)x_0+(C/B)+y_0|}{\sqrt{1+A^2/B^2}}
Thus,
l=\frac{|Ax_0+By_0+C|}{\sqrt{A^2+B^2}}.
Look how easy the formula looks! All you do is substitute the point into the line in Standard-Form and divide by the Pythagorean distance! So easy to remember! The other nice feature is that if the line is vertical then for certainly it works because in the vertical case you need to find the distance between the x-coordinates. And it also works when the point is on the line. Because if it is then by substituting you get a zero in the numerator. Thus, we can state this result as follows,

Theorem: Given ANY point (x_0,y_0) and ANY line Ax+By+C=0 then the distance between them is,
\boxed{ \frac{|Ax_0+By_0+C|}{\sqrt{A^2+B^2}}}.

I would like to mention that when you learn 3-D plots the analogy of a line in 2 dimensions is a plane in 3 dimensions. And there is a result that states that is you have a point (x_0,y_0,z_0) and a plane Ax+By+Cz+D=0 then the distance is,
\frac{|Ax_0+By_0+Cz_0+D|}{\sqrt{A^2+B^2+C^2}}

Example 26: Given a line 3x+4y=0 and a point (1,2) then the distance is,
\frac{|3(1)+4(2)|}{\sqrt{3^2+4^2}}=\frac{11}{5}

Method of Least Squares
In statistics a common problem after an experiment is done, a set of points is collected and visually represented as an x-y plot.
The problem is to approximate these set of points as a curve, in our example a straight line. But the difficulty is that these point do not necessarily lie on a staight line, thus we need to approximate the best possible line. The question that you should ask is what does it mean "best". The following concept and solution was devised by Gauss and independently and inadvertenly two weeks later by his nemesis, Legendre. Thus, some texts write Gauss discovered it while others write Legendre discovered it. The following explanation is my own, which I never seen elsewhere, I like it because it is more detailed thus suggesting what went through the minds of Gauss/Legendre.
Assume, we have a set of points and we guess what the the best fitting line is visually, and we draw it. Some error is created. The error is the difference between the actual value at the point and the approximated value at the line.
Below is an example, the set of points where \{(1,1.5),(2,1.75),(3,3),(4,5),(5,4)\}. I approximated by y=x.
The black lines (the vertical distances) represent the error for each point. The total error respectively is:
.5+.25+0+1+1=2.75
Note, errors are measured always in positive.
The dotted line is the "line of best fit" that I drew, the equation is y=.825x+.575.
Note, though it does not contain any point the total error (sum of errors) is less then the full red line.
This is a very reasonable way to define such a line.

Definition: The line of best fit for a set of points is a line such that the total error of the distances is minimized.

Note, the word minimized, this suggests this is going to be a Calculus optimization problem.
Given a set of points \{(x_1,y_1),(x_2,y_2),...,(x_n,y_n)\} and we are trying to minimize the total error with the line y=ax+b.
The total error is,
|ax_1+b-y_1|+|ax_2+b-y_2|+...+|ax_n+b-y_n|=\sum_{k=1}^n |ax_k+b-y_k|.
For simplicity sake we will find the line of best fite that passes through origin meaning b=0.
Thus, f(a)=\sum_{k=1}^n |ax_k-y_k|
To minimize this function we need f'(a)=0.
The problem is the absolute value, it is too messy, we never discussed the derivative of y=|x| and it happens to not exists at x=0, thus we cannot simply take the derivative. What do we do? We reason like this, if the errors are minimized (the vertical distances) then for certainly their squares (the vertical distances squared) are minimized. Thus, we need to minimize,
f(a)=\sum_{k=1}^n (|ax_k-y_k|)^2.
But, (|n|)^2=n^2.
(Note, the reason why we used the square, it removes the absolute value, we could have also used any even exponent, but why use it and work with higher exponents!)
Thus, we need to minimize,
f(a)=\sum_{k=1}^n (ax_k-y_k)^2.
If we find f'(a)=0 it is either max error value (line of worst fit) or min error value (line of best fit). It cannot be line of worst fit because we can choose a line a far away as we like. Thus the derivative equal to zero is going to be the line of best fit.
Thus, (chain rule)
f'(a)=2a\sum_{k=1}^n (ax_k-y_k)=0
Divide by 2a (note a\not = 0 because that is a vertical line and a non-interseting case. Meaning we can certainly tell by looking at the points if they are going to be vertical or not).
Thus,
\sum_{k=1}^n (ax_k-y_k)=a(x_1+...+x_n)-(y_1+...+y_n)=0
Thus, (We are working in 1st Quadrant sum of x-coordinates must be positive and hence non-zero).
a=\frac{y_1+...+y_n}{x_1+...+x_n}.
If you want to make it look more elegant you can divide through by n. And have the average in numberator and denominator,
a=\frac{\bar x}{\bar y}
The "bar" on top represent the average of x,y coordinates.

Funny, I just realized that I could have developed this formula without any Calculus! See if you can figure it out (Possible Problem of the Week).

Example 27: The problem in the diagram shown below. The line of best fit (through the origin) has slope,
\frac{1.5+1.75+3+5+4}{1+2+3+4+5}=1.016. Thus, it is almost the line I drew but not exactly.
Note, it is still not the line of best fit because we purposely neglected the constant term in the line equation.

It is my hope that I will show you how to approximate a curve with another curve. Over here we had a set of finite points when you approximate a curve with another curve you have an infinite set of points, there is a different techiqnue that is used.
~~~
Excercises

1)Find the distance between the parabola y=x^2 and the point (3,3).

*2)Develope the full method of least squares for a general line y=ax+b. (Hint: the exact same procedure. You only need to take the partial derivatives this time and make them zero).

3)In the picture below use #2 and find the line of best fit.

4)A length of striaght wire is L units long. You take a pair of rusty scissors and cut the wire somewhere, you take the first half and turn it into a square, you take the second half and turn it into a circle. Where should you cut (assuming you want to) to maximize and minimize the areas of the circle and square together?

5)Drawn point A on paper. Drawn point B directly below it. Draw point C directly to the right of point B. Draw two vertical lines through point A and B. The distance between the two horizontal lines is 10 miles and the distance between B and C is 20 miles. You are a crazy ATV driver and want to get to point C as quickly as possible while driving blindfolded and upside down. You start at point A. The region in between the horizontal lines is a desert and your top speed is 20 miles/hour. The line between B and C is the highway and you can travel 40 miles/hour. What path (in straight lines) should you take the minimize your driving time?
Attached Thumbnails
introduction-calculus-tutorial-picture1.gif  
__________________
And he (Elisha) went up from thence unto Bethel: and as he was going up by the way, there came forth little children out of the city, and mocked him, and said unto him, "Go up, thou bald head"; "go up, thou bald head". And he turned back, and looked on them, and cursed them in the name of the Lord. And there came forth two she-bears out of the wood, and tore up forty and two children of them.
Second Kings 2: 23-24

Last edited by ThePerfectHacker; 05-21-2007 at 06:51 PM.
Reply With Quote
The Following 2 Users Say Thank You to ThePerfectHacker For This Useful Post:
Donate to MHF
Advertisement
 
  #12  
Old 01-09-2007, 07:39 PM
ThePerfectHacker's Avatar
Global Moderator

 
Join Date: Nov 2005
Location: New York City
Posts: 11,656
Country:
Thanks: 366
Thanked 3,162 Times in 2,622 Posts
ThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond reputeThePerfectHacker has a reputation beyond repute
Default

Calculus itself is divided into three parts in Universities. The first two parts are divided into: differencial calculus and integral calculus. Differencial Calculus is more commonly called Calculus I and Integral Calculus is more commonly called Calculus II. As you expect Calculus I concentrates on the derivative, which is what we have been doing thus far in the lectures. Since I think you have sufficent understanding of what a derivative is and how it is used in math and applied math we can talk about the other important part of Calculus, the integral. The integral is related to the "anti-derivative" that means a function whose derivative is back the function. Think of it as an inverse operator on a function, like the square root is the opposite of the squaring function. But there is one problem. Say we want to solve y'=2x meaning a function whose derivative is 2x if we think about it we note that y=x^2, but wait any constant added disappears (similar to our discussion of free falling bodies equations). This is contained in the following definition and theorem.

Definition: The anti-derivative of a function y=f(x) is expressed as y=\int f(x) dx. It represent all the functions whose derivatives are f(x). Thus, the integral is a set of functions that has the property that if g(x)\in \int f(x) dx (an element of this set) then g'(x)=f(x). And if g(x) is a function such that g'(x)=f(x) then g(x)\in \int f(x) dx. This symbol (streched S) is called "indefinite integral".

I want to make the comment, note dx appears in the end. I myself an not in favor of putting that, but since that is the standard notation I do. It is unnecessary unless in a case where you are taking a multiple integral (which we will not discuss) and these "differencials" show which order to take the integral in. In a single variable case, I really do not see any purpose (except possibly for one).

Theorem: Given a function y=f(x) and g(x) is a function such that g'(x)=f(x) then,
\int f(x) dx = g(x)+C where C is any constant function on the open interval where we are differenciating.

Basically, this is saying if we can find one function that is an anti-derivative then all functions (the indefinite integral) is just some constant sum of that. It should seem understandable but we will not be able to completely prove it.

Proof: If y=f(x) and g'(x)=f(x) and let h(x) be another anti-derivative h'(x)=f(x). Then, g'(x)-h'(x)=0, property of derivative says (g(x)-h(x))'=0. Now the only function whose derivative is always zero is a constant function. Think of it this way, the tangent line is always horizontal, which is a horizontal line, hence a constant. Thus, g(x)-h(x)=C thus, g(x)=h(x)+C. Meaning it can be expressed as a constant added to the original function. (The actual proof is too advanced for us and relies on the most important theorem in Calculus, Mean Value Theorem).

Linearity of Integral: Since (cy)'=cy' and (y_1+y_2)'=y_1'+y_2' (remember I said pay attention to these properties they appear many times in math). The integral also has these properties. I leave that to you to prove.
\int f(x) +g(x) dx=\int f(x) dx+\int g(x) dx
\int k f(x) dx= k\int f(x) dx.

Example 28: Let f(x)=2x then \int 2x dx is found by finding an anti-derivative, for example x^2 then, \int 2x dx=x^2+C. Note, we could have chose x^2+1 as our anti-derivative. That would mean that \int 2x dx=x^2+1+C but in reality they are the same. Because in the second case by choosing a constant 1 less will result the same as in the second case. Thus, these set of functions are equal.

The following theorem should seem simple.

Power Rule: If f(x)=x^n and n\not = -1 then \int x^n dx=\frac{x^{n+1}}{n+1}.

Proof: Nothing to it, if n\not = -1 then we can define a function y=\frac{x^{n+1}}{n+1}.
Thus, y'=x^n by the power rule for derivatives. By our theorem since we found an anti-derivative all anti-derivatives differ by a constant. Thus, \int x^n dx=\frac{x^{n+1}}{n+1}+C.
The case n=-1 will be covered later.

I asked around in College what people found more difficult Calculus II or Calculus III more said Calculus II because rules for integrations are developed. Unlike derivatives where any known functions can be integrated. Ihe integral is much more complicated. Sometimes you cannot even find it! And there are many many rules how to deal with each case. Also unlike basic algebra where you understand how it works you do not need to memorize anything, over here some memorization is required because some of these derivations are clever and probably will not be found by just looking at them. I am not going to go through many different types of integrals. Just three very useful rules.

There are two functions that are fundamental in Calculus/Analysis, the exponential and logarithmic functions. I am going to show you an ugly way of deriving the main results of these functions, this is not a standard approach but I think it is important to at least have some idea where they come from.

Definition: Define the number e=\lim_{n\to \infty} (1+1/n)^n=2.718....

Of course, the main problem is that we need to show that this sequence converges to some number which we define as e. One way of doing this, is by using the famous theorem in Analysis, Weierstrauss-Bolzano theorem, which seems obvious, if a strictly increasing sequence is bounded (always below some number) then it converges. Again this is an existence theorem that assures us that the sequence converges but does not provide us to what. Thus, we need to show that a_{n+1}>a_n where a_n=(1+1/n)^n and also show that a_n<3, then by Bolzano-Weierstrauss theorem such a number exists. But I am not going to do that derivation.

Definition: We can define an exponential function y=e^x for the entire number line because e>0. All it is an exponent function like y=2^x only with a different base.

If we graph this function it only exists in the area above the x-axis. Hence the range of e^x are all positive numbers.

Definition: An inverse function (if it exists) is a function that undoes the original function. Meaning f(x) is a given invertible function. The inverse is denotated by f^{-1}(x) (and it does not mean 1/f(x)) such that f(f^{-1}(x))=x and f^{-1}(f(x))=x.

Example 29: The function y=x^2 does not have an inverse, however if we restrict the domain to x\geq 0 then the half-parabola does have an inverse, namely the square root function y=\sqrt{x}.

A way to determine if the inverse exists is by passing a horizontal line and seeing that if intersects the function once of not. This is not true with the parabola in the example above because for some horizontal lines it pases twice. But by restricting the domain to the positives the half parabola satisfies the conditions. One way to show an inverse exists on an interval is by showing the function is continous and strictly increasing or decreasing (derivative is always one sign) that would assure of a horizontal line passing once. The graph of the exponential function y=e^x is increasing and hence any horizontal line drawn intersects (if it does) only exactly once. Thus the inverse function is called the natural logarithmic function y=\ln x.

Definition: The natural logarithm function y=\ln x is defined for all positives values and it is the inverse of the natural exponential. The value of it means what does e have to be raised to , to result in x? Thus, \ln e =1 because e^1=e.

If the domain of an invertible function f is D and the range is R then the domain for f^{-1} is R and domain is D. Thus the natural logarithm is defined for the range of e^x which are the positives and the range is the domain of e^x which is any value.

Both the exponential and logarithmic functions have important properties.

Properties:
e^{x+y}=e^xe^y
(e^x)^y=e^{xy}
e^{-x}=\frac{1}{e^x}
e^0=1
For x,y>0
\ln (xy)=\ln x+\ln y
\ln (x/y)=\ln x-\ln y
\ln x^n = n\ln x
\ln e=1
\ln 1 =0
e^{\ln x}=x
\ln e^x=x

Now we get to the derivative of the exponential function and the logarithm.

Theorem: The derivative of y=e^x is y'=e^x.

Proof: This is not really a proof but it should give some reason. We know that e=(1+1/n)^n as n\to \infty. Thus, 1/n\to 0. We can therefore write, e=\lim_{\Delta x\to 0}(1+\Delta x)^{1/\Delta x}. Thus, for very small \Delta x we have e\approx (1+\Delta x)^{1/\Delta x} thus e^{\Delta x}\approx 1+\Delta x. Thus, e^{\Delta x}-1\approx \Delta x. Thus,
\frac{e^{\Delta x}-1}{\Delta x}\approx 1.
Thus, the smaller the number the closer the value,
\lim_{\Delta x\to 0} \frac{e^{\Delta x}-1}{\Delta x}=1.
Now we use the limit definition for derivative on y=e^x,
\lim_{\Delta x\to 0} \frac{e^{x+\Delta x}-e^x}{\Delta x}
\lim_{\Delta x\to 0} e^x \cdot \frac{e^{\Delta x}-1}{\Delta x}
Using the above statement,
e^x(1)=e^x.
(The following derivation is not mine, I stole it from my Calculus book).

Corollary: The integral \int e^x dx=e^x+C.

Theorem:: The derivative for y=\ln x, x>0 is y=\frac{1}{x}.

Proof:: We can write e^{\ln x}=x and take the derivative of both sides, the right hand side is trivial and the derivative is 1. The left hand side we use the chain rule, y=e^u where u=\ln x.
Thus,
\frac{dy}{du}=e^u
\frac{du}{dx}=u'
Thus,
\frac{dy}{dx}=u'e^u=u'e^{\ln x}=u'x.
Thus, left hand equal to right hand,
u'x=1
(\ln x)'=u'=\frac{1}{x}.
Note, the error in the proof is that I never shown that \ln x has a derivative in that case the proof fails. But there is a useful theorem that assures us the inverse function does have the derivative if the original function has.

Sometimes it is useful to consider the following derivative.

Theorem: The derivative for y=\ln |x|, x\not = 0 is y'=1/x.

Proof: The difference between this and the derivative I just stated above is the domain of function. In the first case 1/x was the right part of the hyperbola, while this derivative is both parts of the hyperbola. This is because of the absolute vaue it clears signs and hence we have both parts.

Corollary: The integral \int \frac{1}{x} dx = \ln |x|+C

Note the fundamental property of the natural exponential is that it is itself, of course zero also works but that is uninteresting. Thus the exponential satisfies the differencial equation y'=y. The interesting property about the logarithmic function is that it is a "transcendental function" meaning cannot be expressed +,-,\cdot ,/ , \sqrt[n]{\,\,\,} (funny thing is that I do not think it is proven). While an algebraic functions, like polynomials, rationals, can be expressed as those operations. And we have the derivative of a transcendental function an algebraic function!

We can extend the power rule for integrals.

Extended Power Rule: The integral \int x^n dx = \left\{ \begin{array}{c} \frac{x^{n+1}}{n+1}, n\not = -1 \\ \ln |x|, n = -1 \end{array} \right\}+C

Up to this point I have explained two very important functions. We will say that many anti-derivatives have them. Right now I will concentrate on three powerful techiniques of finding anti-derivatives. Throughout this I will be using my style of the substitution rule, that I never ever seen anybody do. Because mine is formal (mathematical) while the standard techinque is not and hence it wants to make me vomit. And mine is better because I use it.

Example 30: Assume we need to find \int e^x +x + 1/x dx. The integration is a linear operator (meaning we can do each one). Thus by the extended by power rule and exponential function we have,
e^x+\frac{1}{2}x^2+\ln |x|+C.

Substitution Rule
In Calculus it is standard to represent the anti-derivative of a function f(x) in capitals F(x). Let us assume we are given a standard function f(x) that has an anti-derivative F(x), that means, F'(x)=f(x). Let g(x) be some other function which we can take a derivative of. Then by the chain rule,
[F(g(x))]'=g'(x)F'(g(x))=g'(x)f(g(x)).
That means that F(g(x)) is an anti-derivative of g'(x)f(g(x)).
Thus, by the results we developed above,
F(g(x))+C=\int f(g(x))g'(x) dx.
Basically, what this is what the theorem says.
1)We want to find \int h(x) dx for some function h(x).
2)If we can express h(x)=f(g(x))g'(x) for some other functions f(x),g(x).
3)Then we need to find F(x) an anti-derivative of f(x).
4)Then the integral of h(x) is the compostion F(h(x)).
This theorem is the reverse of the chain rule.

I will do an example using the official way above and then do it using my way, because it will be easier for you to follow.

Example 31: We need to find \int (1+x)^5 dx. If instead we had \int x^5 then everything is easy. Thus, the inner function g(x)=x+1 and the outer function is f(x)=x^5 but we also have that g'(x)=1 which surly appears in the intergal because,
\int x^5 dx=\int x^5 (1)dx. Hence it has the form mentioned above. f(g(x))g'(x)=(x+1)^5. The next step is to find an anti-derivative of the outer function f(x)=x^5 which is F(x)=\frac{1}{6}x^6. Thus the answer is,
F(g(x))+C=\frac{1}{6}(x+1)^6+C.
Because it is an anti-derivative. If you take the derivative you will get back the original function. A useful way to check.

Hacker's Subsitution Rule
We know that,
\int f(g(x))g'(x)dx.
For simplicity call u=g(x) then we have,
\int f(u) u' dx.
Where f(u) is some expression of u.
Then we need to find the anti-derivative of the outer function.
That means,
\int f(u) du.
Thus, stated another way,
\int f(u) \frac{du}{dx} dx=\int f(u) du=F(u)+C=F(g(x))+C
(The mnemonic is that it is as if we can cancel the dx's. But we cannot, that is not a fraction.)

Example 32: Now we do it my simplified way. The idea is as follows. We call the inside function u then then we immediately find its derivative u' and make it appear in the product. Thus we have \int (1+x)^5 dx. We see that if we call u=x+1 we reduce the problem to a basic exponent, what we want. But as I said, we immediately find the derivative u'=1. Thus,
\int u^5 (1) dx = \int u^5 u' dx =\int u^5 du = \frac{1}{6}u^6 +C=\frac{1}{6}(1+x)^6 +C.

Subsitution rule is most important. You need to get good with it. We need much, much more examples.

Example 33: We will find \int (2x+1)^3 dx. We see it is reasonable to define the inner function as u=2x+1. Now we immediately find its derivative u'=2. But there is no multiple of 2 in the integrand! Does it mean the rule fails? No. Remember we factor in and out a constant function. Thus we introduce the 2 multiplier.
Thus,
\int (2x+1)^3 dx= \frac{1}{2} \int (2x+1)^3 (2) dx= \frac{1}{2} \int u^3 u' dx=\frac{1}{2} \int u^4 du.
The integral is simple to find through power rule,
\frac{1}{3} \frac{1}{5} u^5+C=\frac{1}{15} (2x+1)^5+C.

Example 34: We will find \int xe^{x^2} dx. This is tricker but let us call u=x^2. Now we immediately find its derivative u'=2x. Look! It is almost in the special form except of the 2 multiplier. Thus,
\int xe^{x^2}dx=\frac{1}{2}\int e^{x^2}(2x)dx=\frac{1}{2} \int e^u u' dx=\frac{1}{2} \int e^u du.
The integral is trivial.
\frac{1}{2}e^u+C=\frac{1}{2}e^{x^2}+C.
I like to mention, that is you were paying attention you would have said,
\frac{1}{2} \left( e^u+C \right) =\frac{1}{2}e^u+\frac{1}{2}C.
But that does not matter because