Machine Learning: Regression Analysis
Few important things in Regression Model.
It is a prediction model Technic and it estimates the relationship between a dependent (Target) and an independent variables (Predictor)
Type of Regressions:
Linear Regression: Useful when there is a linear relationship between independent and dependent variables.
Logistic Regression: Useful when dependent variables are binary (0/1, True/False, Yes/No) in nature.
Polynomial Regression: Useful when the power of independent variables is more than 1.
Linear Regression - Relationship:
The Linear Regression model assumes a linear relationship between the input variable and the outcome variables.
Can we express as: y = Bo + B1 x + e
where
y = outcome variable
x = input variable
e = random error
Bo = Slope of the line
B1 = intercept
You can check relationship between variables by using splom() function in library(lattice), corplot in library(corplot)
library(corrplot)
M <- cor(mtcars)
corrplot(M, method = "circle")
# Display the correlation coefficient
corrplot(M, method = "number")
# Specialized the insignificant value according to the significant level
corrplot(M, type = "upper", order = "hclust",
p.mat = p.mat, sig.level = 0.01)
library(lattice)
splom(~iris[1:4])
Variance Inflation Factor (VIF): It measure the the increase in the variance (the square of the estimate's standard deviation) of an estimated regression coefficient due to multicollinearity.
vif(model)
VIF = 1 means No correlation between variables, if VIF is between 1 and 5 means Moderately correlation and it between 5 and 10 means Highly correlated. Highly variance factor value indicating high multicollinearity.
Model Summary:
summary(model)
P-value is less than 0.05 means it is good. High P-value means your data are likely with a true null. Low P-value means your data re unlikely with true null.
For Ex. Pr(>|t|) column value have 0.0007 means it is best suited variable normally it have Three STAR(*) and subsequently 0.004 have Two STAR(*) and 0.011 have One STAR(*). Some variable have more then 0.05 like 0.020 have is not suited for model which does not contain any STAR.
R-Squared value indicates the perfection of the predictive value, if value is closer to 1 means it is most significant value and model is best-suited.
Reminds days of our study! Glad you started to share.