Machine Learning: Regression Analysis

Machine Learning: Regression Analysis

Few important things in Regression Model.

It is a prediction model Technic and it estimates the relationship between a dependent (Target) and an independent variables (Predictor)

Type of Regressions:

Linear Regression: Useful when there is a linear relationship between independent and dependent variables.

Logistic Regression: Useful when dependent variables are binary (0/1, True/False, Yes/No) in nature.

Polynomial Regression: Useful when the power of independent variables is more than 1.

Linear Regression - Relationship:

The Linear Regression model assumes a linear relationship between the input variable and the outcome variables.

Can we express as: y = Bo + B1 x + e

where

y = outcome variable

x = input variable

e = random error

Bo = Slope of the line

B1 = intercept

You can check relationship between variables by using splom() function in library(lattice), corplot in library(corplot)

library(corrplot)
M <- cor(mtcars)
corrplot(M, method = "circle")
# Display the correlation coefficient
corrplot(M, method = "number") 
# Specialized the insignificant value according to the significant level
corrplot(M, type = "upper", order = "hclust", 
         p.mat = p.mat, sig.level = 0.01) 



library(lattice)
splom(~iris[1:4])

Variance Inflation Factor (VIF): It measure the the increase in the variance (the square of the estimate's standard deviation) of an estimated regression coefficient due to multicollinearity.

vif(model)

VIF = 1 means No correlation between variables, if VIF is between 1 and 5 means Moderately correlation and it between 5 and 10 means Highly correlated. Highly variance factor value indicating high multicollinearity.

Model Summary:

summary(model)

P-value is less than 0.05 means it is good. High P-value means your data are likely with a true null. Low P-value means your data re unlikely with true null.

For Ex. Pr(>|t|) column value have 0.0007 means it is best suited variable normally it have Three STAR(*) and subsequently 0.004 have Two STAR(*) and 0.011 have One STAR(*). Some variable have more then 0.05 like 0.020 have is not suited for model which does not contain any STAR.

R-Squared value indicates the perfection of the predictive value, if value is closer to 1 means it is most significant value and model is best-suited.

Reminds days of our study! Glad you started to share.

To view or add a comment, sign in

More articles by Dinesh Patel

  • Why Azure SQL Server DB?

    Some of the Benefits of Azure SQL are:- Pay-as-you-go pricing model Spin up a server in seconds No patch upgrade…

  • Big Compute Architecture

    Big data is the latest buzzword, used to describe a huge volume of both structured and unstructured data that is so…

  • BI Concepts

    BI Core Concepts: Star Schema: The star schema has a center, represented by a fact table, and the points of the star…

Others also viewed

Explore content categories