You might remember linear regression from statistics as a method to produce a linear equation that models the relationship between two variables. Not surprisingly, linear regression is quite similar in machine learning, except that the focus is on the prediction rather than the interpretation of data. Regression is a supervised learning algorithm (if you remember from my previous blog) that predicts real-valued output when given an input. In this blog post, I will discuss the model representation of simple linear regression and introduce its cost function.
To better illustrate the model representation of linear regression, I will model the number of people waiting in line for lunch as a function of the time (in minutes) after lunch period 6 starts. I have collected the following data points:
|Minutes after lunch period 6 starts||Number of people waiting in line|
To represent a linear equation, we use to denote the input variable(s) and use to denote the output variable(s). Each pair of is called a “training example.” A “training set” contains m pairs of . The superscript (i) here does not represent power but rather denotes the index of the dataset. In the waiting line model, would be the time after lunch period 6 starts, whereas represents the number of people waiting in line. The “m” in this case equals 3 because there are 3 pairs of .
By putting training set through a learning algorithm, we can then determine the hypothesis (denoted by “h” in the chart above) which is a function that takes an input x and predicts an output y. In simple linear regression, we can use the following function as the hypothesis:
In the waiting line model, our hypothesis can then be represented by the plot:
If you are familiar with economics or financial accounting, you might know that a cost function is a formula that estimates the total cost of production and, in turn, helps managers to find the break-even point. In machine learning, however, a cost function, also known by the name “loss function”, describes the difference between the prediction and reality. It is used to “measure the accuracy of a hypothesis function.” The goal is, therefore, to minimize the output of a cost function. In other words, the goal is to find a pair of so that can be the best fit of y in the training sets.
In simple linear regression, we can use the a square error function to represent the cost function:
Algebraically, our goal is to find the minimum . Through calculation, we can find that has a minimum when and .
In next week’s blog, I will most likely continue the discussion of simple linear regression and touch on the topic of gradient descent. Thanks for reading and see you next week!
Ng, Andrew. “Cost Function.” Coursera, http://www.coursera.org/learn/machine-learning/lecture/rkTp3/cost-function. Accessed 1 Oct. 2017. Lecture.
—. “Cost Function – Intuition I.” Coursera, http://www.coursera.org/learn/machine-learning/lecture/N09c6/cost-function-intuition-i. Accessed 1 Oct. 2017. Lecture.
—. “Model Representation.” Coursera, http://www.coursera.org/learn/machine-learning/lecture/db3jS/model-representation. Accessed 1 Oct. 2017. Lecture.
Residual. Statistics How To, http://www.statisticshowto.com/rmse/. Accessed 1 Oct. 2017.
Stern, David. Energy and Economic Growth: The Stylized Facts. Stochastic Trend, 19 June 2014, stochastictrend.blogspot.com/2014/06/energy-and-economic-growth-animated-gif.html. Accessed 1 Oct. 2017.