Logistic & Linear regression: The Difference
Linear regression and logistic regression are the most commonly used classic machine learning algorithms, therefore knowing about where they differ helps. Let us look at their differences.
Type of variable
In linear regression, the dependent/y variable is always continuous.
In case of a binary logistic regression or multi-nominal logistic regression, the dependent/y variable should be discrete.
Purpose
Linear regression is used to estimate the dependent/y variable, considering the changes that happened to independent/x variables. Example: - With the change in car weight how will the mileage react.
Logistic regression is used to calculate the probability of an event occurring. Example: - Will it rain today in Bengaluru (Yes/No).
Relationship
Linear regression assumes that there is a linear relationship between the independent/x variables and dependent/y variable.
Linearity of relationship between independent/x variables and dependent/y variable is not required in case of logistic regression.
Error
When it comes to residuals, linear regression wants/requires the error term to be normally distributed.
In case of logistic regression, error term to be normally distributed is not required.
Distribution
Linear regression assumes that the data-points in the dependent variable are normally distributed (Gaussian distribution).
Logistic regression assumes binomial distribution (Success or Failure) of dependent variable.
Curve
Linear regression tries to find the best-fit straight line, popularly known as regression line.
In logistic regression, the curve is S-shaped. When you change, the coefficient there is a change in the steepness and direction of the curve. If the impact gives a positive slope the curve will be S-shaped, if the impact gives a negative slope the curve will be Z-shaped.
Sample size
In linear regression, each independent variable should have at least five cases to work with.
Logistic regression on the other hand needs at least 10 events per independent variable.
Algorithm
Linear regression uses least square estimation method to find a cost function that minimizes the squared distance of each observed response to its fitted value.
Logistic regression uses maximum likelihood estimation, which means you should choose coefficient in such a way that it maximizes the probability of Y given X (likelihood). It’s an iterative process that tries different solution before arriving at the maximum likelihood estimate.
Interpretation
Linear regression is interpreted as, keeping all other independent variable constant, how much the dependent variable increase/decrease with a unit increase in the particular independent variable.
For logistic regression, we interpret odd ratios, with other variables being constant in the model, what is the effect of a one-unit change in X in the predicted odd ratio.
Function Used
Linear regression uses identity link function of Gaussian family.
Logistic regression uses logit function of Binomial family.
Computational Time
As linear regression tries to find the minimum function it takes less time.
Logistic regression is an iterative process that finds the maximum likelihood, hence will take more time.
That is it…..