Machine Learning: Regression Analysis

Dinesh Patel

Published Jan 20, 2018

Few important things in Regression Model.

It is a prediction model Technic and it estimates the relationship between a dependent (Target) and an independent variables (Predictor)

Type of Regressions:

Linear Regression: Useful when there is a linear relationship between independent and dependent variables.

Logistic Regression: Useful when dependent variables are binary (0/1, True/False, Yes/No) in nature.

Polynomial Regression: Useful when the power of independent variables is more than 1.

Linear Regression - Relationship:

The Linear Regression model assumes a linear relationship between the input variable and the outcome variables.

Can we express as: y = Bo + B1 x + e

where

y = outcome variable

x = input variable

e = random error

Bo = Slope of the line

B1 = intercept

You can check relationship between variables by using splom() function in library(lattice), corplot in library(corplot)

library(corrplot)
M <- cor(mtcars)
corrplot(M, method = "circle")
# Display the correlation coefficient
corrplot(M, method = "number") 
# Specialized the insignificant value according to the significant level
corrplot(M, type = "upper", order = "hclust", 
         p.mat = p.mat, sig.level = 0.01) 



library(lattice)
splom(~iris[1:4])

Variance Inflation Factor (VIF): It measure the the increase in the variance (the square of the estimate's standard deviation) of an estimated regression coefficient due to multicollinearity.

vif(model)

VIF = 1 means No correlation between variables, if VIF is between 1 and 5 means Moderately correlation and it between 5 and 10 means Highly correlated. Highly variance factor value indicating high multicollinearity.

Model Summary:

summary(model)

P-value is less than 0.05 means it is good. High P-value means your data are likely with a true null. Low P-value means your data re unlikely with true null.

For Ex. Pr(>|t|) column value have 0.0007 means it is best suited variable normally it have Three STAR(*) and subsequently 0.004 have Two STAR(*) and 0.011 have One STAR(*). Some variable have more then 0.05 like 0.020 have is not suited for model which does not contain any STAR.

R-Squared value indicates the perfection of the predictive value, if value is closer to 1 means it is most significant value and model is best-suited.

Paresh R. 8y

Reminds days of our study! Glad you started to share.

1 Reaction

To view or add a comment, sign in

Machine Learning: Regression Analysis

Dinesh Patel

More articles by Dinesh Patel

Others also viewed

Linear Regression model (Part-1)

What is Linear Regression?

Linear regression

What Is Regression In Machine Learning?

Understanding Simple Linear Regression: The Foundation of Predictive Modeling

Day 5: The Basics of Linear Regression: How to Fit Data and Forecast Outcomes

Understanding Regression in Machine Learning

A THEORY OF ASSUMPTIONS

Understanding Linear Regression in Machine Learning with an Example

Understanding Logistic Regression and Softmax Regression: Key Differences and Applications

Explore content categories

More articles by Dinesh Patel

Why Azure SQL Server DB?

Big Compute Architecture

BI Concepts