LINEAR REGRESSION : THE IDEA

Dr. Lovedeep S.

Published Aug 15, 2017

The Linear Regression (LR) is simplest yet one of the most powerful tools we use both in deductive and predictive analyses. For years now, I have used it for parameter estimation, hypothesis testing and of-course generating predictions in context of my academic research. The idea essentially is to determine slope(b1) and intercept (b0) of the straight line (y= b0+b1*x) that best describes the relationship between variables x and y in the data set D={(x,y)} of N (say) data records. And unlike several other tools of predictive analytics LR is unique since mathematics can afford analytical formulas for the parameters b0 and b1. This makes it very easy to understand and code especially in python.

b1 = Covaraince(x,y)/Varaince(x)

b0 = mean(y) - b1* mean(x)

Coding this is simply a cake

def LinearRegression1D_fit(x,y):
 
   try:
        b0 = b1 = sigma = None
        bNum = stat.covariance(x,y)
        bDen = stat.variance(x)
        xmean = stat.mean(x)
        ymean = stat.mean(y)
        if bDen>0: 
            b1 = bNum/bDen
            b0 = ymean - b1*xmean
            
        return b0,b1
    except Exception as err:
        print (err)

One can easily generate the model line now:

yLine = [b0+b1*xi for xi in x]

Each Y_i in the list yLine is a modeled value of variable y at some value of variable xi, and with a little research one can find the analytical formula for the variance of Y_i. The variance(Y_i) is an important idea since it allows us to construct confidence intervals about the model line.

def LinearRegression1D_LineVariance(x,mse):
   '''
    x   : list of regressors
    mse : mean squared residuals

   '''
 
   try:
        n = len(x)
        xmean = stat.mean(x)
        xdevsq = stat.sq_sum([xi-xmean for xi in x])
        if xdevsq>0 and n > 0:
            ret = []
        
            for xh in x:
                term = 1./n+math.pow(xh-xmean,2)/xdevsq
                term = mse*term
                ret.append(term)
            return ret
        return None
    except Exception as err:
        tools.print_error(err,'')

Both the above code snippets use some of the other commonly used features, I have put into a generic module called "stat", and I will happily share full package on request.

The picture in above show data points, the model line (broken red) and 95% confidence intervals on y for a toy data set. Below is the test script for a small module that I can happily share on request

x = [1,2,3,4,5,6,7]
y = [2,7,8,13,14,20,19]

#Set and Fit Model Parameters
b0,b1 = linreg.LinearRegression1D_fit(x,y)

#Obtain Model Line, and Residuals
y_fit = linreg.LinearRegression1D_predict(x,b0,b1)
y_res = linreg.LinearRegression1D_residuals(x,y,b0,b1)

#Mean square of residuals
mse = sum([yi*yi for yi in y_res])/(len(y_res)-2)

#Variance (Y) on points of fit line
y_fit_var = linreg.LinearRegression1D_LineVariance(x,mse)

#Constructing the upper and lower CI (95%)
y_ci_low = [yi-2*math.sqrt(yvar) for yi,yvar in zip(y_fit,y_fit_var)]
y_ci_hi = [yi+2*math.sqrt(yvar) for yi,yvar in zip(y_fit,y_fit_var)]

#Plot the results
plt.scatter(x,y)
plt.plot(x,y_fit,'r--')
plt.plot(x,y_ci_low,'r')
plt.plot(x,y_ci_hi,'r')
plt.ylabel('Y-variable')
plt.xlabel('X-variable')

Here I think is a nice point to close this basic introduction to linear regression modeling but there is definitely a lot more to it : What else can be done? Can we use LR for modeling quadratic or higher order relationship between x and y (y = a x**2+bx+c) ? Are there any assumptions, that get imposed when we use this algorithm? Hopefully, I will write a bit more in future to examine all that :).

LINEAR REGRESSION : THE IDEA

Dr. Lovedeep S.

Others also viewed

Linear Regression, Decision Tree and Ensemble Learning applied to Seoul housing prices

Fitting a Linear regression "Model"

Handwritten Digit Recognition using Logistic Regression

ORDINAL REGRESSION

Logistic Regression, by "MLshorts"

Exploring Regression Algorithms: Simple to Polynomial Regression in Machine Learning (Basic Introduction)

Logistic Regression

Logistic Regression: A Comprehensive Guide

The Amazing Gram-Schmidt Algorithm: Understanding Multiple Linear Regression

Linear regression In-depth intuition(No loitering)

Explore content categories