Linear Regression
Last updated
Was this helpful?
Last updated
Was this helpful?
Suppose that we have a set of data on two variables . For example, we could have data about both the age and weight of a group of individuals, data on height and average hours of sleep etc. We may hypothesize that the variable depends on the variable . In other words, we may guess that there is a linear relationship between our two variables. In math, we say . Our goal is to find the set of parameters, and that best fit our data. It turns out that there is a unique set of parameters and that best fit our data. All that means is there is only one and only one that will give us the best formula .
Of course, sometimes our best-fit line will be good and other times it will not be so good.
In this section, we will learn how to find the least-squares regression line. In the section on correlation you have already learned how to quantify how good the fit is!
To find the slope:
This may look intimidating at first, but remember the notation just means find the expectation value of x.
To find the y-intercept:
The least-squares method works by minimizing the square of the error. Error is defined as the distance from the actual data point to the point on the least-squares line.
Note that we want the total sum of the squares of the errors minimized, not just individual errors. We square the errors to avoid cancellations from positive and negative errors. One line may minimize the error with respect to a certain data point as on the right, but we are interested in minimizing the total error.