Moving Below the Surface (3): Gradient Descent — William

This week, we are going to talk about gradient decrease. The beauty of artificial neural network is that it utilizes a very simple algorithm to optimize itself which brings down the error of the system, the gradient decrease.

Gradient Decrease could quickly find the local minimum for a function. In the field of machine learning, this method is applied to the error function E(x) (or sometimes also called the cost function) so we could find the point of minimum error. That is exactly what we are looking for training the system.

To visualize this, let us assume a set of example data shown below. Our task is to find the line of best fit, performing a simple linear regression.

Data points (y, x)

Let us start with the function for a straight line:

Screen Shot 2018-01-02 at 23.05.37

With the data given above, we want to determine the best m and b that best represents the points the graph. To do this, we define the measure of the “representativeness” with the error function E(m,b) that takes the two coefficients as independent variables. In this case, we will utilize the average sum of squared differences as our error function. Essentially, we take the square of all differences between the predicted values of our best fit line and the actual values and average the sum over the all N data points (xi,yi) in the dataset. If we express our error function in a mathematical way, we will see something like this:

Screen Shot 2018-01-02 at 23.15.38

Now with the help of MatLab, we could plot the error function in a 3-dimensional graph. We could see straight away that the global minimum point lies at the point where m=5 and b=3.

The error surface generated using our data

Gradient Descent offers a systematic way to find the local minimum (global if you are lucky) without it being in a specific place. Of course, gradient descent has is own limitation, but we will put back this discussion to later posts. The goal of the algorithm is to find the minimum point (m*,b*) starting at a random point (m0,b0). Recall from the multivariable calculus class, the gradient of a function at a specific point is the vector formed by the partial derivatives of the function along both axes, i.e.

Screen Shot 2018-01-02 at 23.16.43

The gradient vector represents the direction where the function increases the most greatly. Therefore, to find the point of the minimum we need to move along the opposite or the negative direction of the gradient vector. A more formative way to express this shown as the following:

Screen Shot 2018-01-02 at 23.18.10

In this function group, mj and bj are the points we start with, while mj+1 and bj+1 are that of the next step. The γ parameter represents the learning rate which controls the effect of the variable movements. Again, we will leave how to find suitable learning rate for later blogs.

The challenge of this method comes when we find the partial derivatives of our error functions with respect to the coefficients. In this case, both derivatives of our error function are the following:

Screen Shot 2018-01-02 at 23.19.29

Note that the average sum of squared differences is not difficult to take derivatives comparing to other ones.

The following is a single run of the gradient descent method of 20 steps with a 0.01 learning rate and we could see the algorithm approaches the real minimum quite well.

20 steps of gradient descent with learning rate 0.01

We will explore more topics on machine learning next week. Stay put!

Work Cited

A Brief Introduction To Gradient Descent was published on November 08, 2015 and last modified on October 01, 2016. “A Brief Introduction To Gradient Descent.”,
Temming, Maria. “AI has found an 8-Planet system like ours in Kepler data.” Science News, 14 Dec. 2017,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.