You might remember linear regression from statistics as a method to produce a linear equation that models the relationship between two variables. Not surprisingly, linear regression is quite similar in machine learning, except that the focus is on the prediction rather than the interpretation of data. Regression is a *supervised* learning algorithm (if you remember from my previous blog) that predicts *real-valued* output when given an input. In this blog post, I will discuss the model representation of simple linear regression and introduce its cost function.

**Model Representation**

To better illustrate the model representation of linear regression, I will model the number of people waiting in line for lunch as a function of the time (in minutes) after lunch period 6 starts. I have collected the following data points:

Minutes after lunch period 6 starts | Number of people waiting in line |
---|---|

3 | 46 |

6 | 23 |

9 | 6 |

To represent a linear equation, we use to denote the input variable(s) and use to denote the output variable(s). Each pair of is called a “training example.” A “training set” contains m pairs of . The superscript (i) here does not represent power but rather denotes the index of the dataset. In the waiting line model, would be the time after lunch period 6 starts, whereas represents the number of people waiting in line. The “m” in this case equals 3 because there are 3 pairs of .

By putting training set through a learning algorithm, we can then determine the hypothesis (denoted by “h” in the chart above) which is a function that takes an input x and predicts an output y. In simple linear regression, we can use the following function as the hypothesis:

In the waiting line model, our hypothesis can then be represented by the plot:

**Cost Function**

If you are familiar with economics or financial accounting, you might know that a cost function is a formula that estimates the total cost of production and, in turn, helps managers to find the break-even point. In machine learning, however, a cost function, also known by the name “loss function”, describes the difference between the prediction and reality. It is used to “measure the accuracy of a hypothesis function.” The goal is, therefore, to minimize the output of a cost function. In other words, the goal is to find a pair of so that can be the best fit of y in the training sets.

In simple linear regression, we can use the a square error function to represent the cost function:

Algebraically, our goal is to find the minimum . Through calculation, we can find that has a minimum when and .

In next week’s blog, I will most likely continue the discussion of simple linear regression and touch on the topic of gradient descent. Thanks for reading and see you next week!

Works Cited

Ng, Andrew. “Cost Function.” *Coursera*, http://www.coursera.org/learn/machine-learning/lecture/rkTp3/cost-function. Accessed 1 Oct. 2017. Lecture.

—. “Cost Function – Intuition I.” *Coursera*, http://www.coursera.org/learn/machine-learning/lecture/N09c6/cost-function-intuition-i. Accessed 1 Oct. 2017. Lecture.

—. “Model Representation.” *Coursera*, http://www.coursera.org/learn/machine-learning/lecture/db3jS/model-representation. Accessed 1 Oct. 2017. Lecture.

*Residual*. *Statistics How To*, http://www.statisticshowto.com/rmse/. Accessed 1 Oct. 2017.

Stern, David. *Energy and Economic Growth: The Stylized Facts*. *Stochastic Trend*, 19 June 2014, stochastictrend.blogspot.com/2014/06/energy-and-economic-growth-animated-gif.html. Accessed 1 Oct. 2017.

mikehyojanIt was interesting to read from your blog that linear regression is quite similar to machine learning. I was just wondering if there is an extrapolation in your data. How are you going to deal with it while interpreting or predicting? So is it only applied in limited range of x-variable? I will be looking forward to reading your continuous discussion in linear regression next week.