Linear Regression
Linear regression sounds and looks intimidating at first utterance / glance. Linear regression is not a psychology term indicating reverting in a straight line back to my old self or an earlier stage in development. It is one of those highfalutin (As granny in the Beverly Hillbillies would say) terms that describes a great modeling tool. The tool can be used to predict the numerical value of “Y” for any given “X” based upon a dataset. Please note that the greatest confidence will be within the boundaries set in your data set from lowest and highest data points. However you can “extrapolate” outside the boundaries of your data to predict the value as well.
Pictured is a classic example of a correlation that can be characterized by linear regression. Let’s say you are a real estate agent and you want to determine the sale price of homes in an area based upon their square footage. In our example, we have collected 30 data points. I suggest anywhere from 30 to 60 data points for this type of analysis. The closer the data is to the blue line indicates the intensity in correlation and can be characterized by the Pearson Correlation Coefficient (r). We can perform hypothesis testing to determine the level of correlation (as alluded to by the closeness to the blue line). Our null hypothesis
Ho: (r) = 0; there is no correlation
Ha: (r) ≠ 0; there is some correlation
Alpha= 0.05; 95% confidence
Our software (Minitab) returns a p valve of 0.00 indicating that he relationship between square footage and sales price is statistically significant. Therefore, we reject our null hypothesis and conclude there is some correlation between square footage and the sales price of homes in a particular area.
Now let us consider the linear regression aspect of analysis which is given by an equation that describes the best fit. In short; we repeat the hypothesis testing considering the slope of the line and our software returns an equation to help us predict future outcomes. In our example Y = 99.7 * X + 1020.4 is the predicting equation. Therefore; if a house with 3100 square footage comes on the market. I can plug in 3100 for "X" in the equation. This will return the market value for the house at $310,090. In other words for a given Square footage of X; I can go back to my real estate sellers or real estate buyers with a predicted Sales price Y based upon the data collected. Don’t be perturbed if you don’t have Minitab or other statistical software. You can also use Excel charting. Go into the Trendline analysis and check the equation box in order to obtain your equation for predictive modeling.
In conclusion, linear regression is a powerful predicting tool. It can also be used for multiple variables and for nonlinear datasets. Simply, you can predict Y (response variable) for any given X (explanatory variable) for almost any data distribution. So launch out by predicting the future outcomes with your data.