Logistic & Linear regression: The Difference

Logistic & Linear regression: The Difference

Linear regression and logistic regression are the most commonly used classic machine learning algorithms, therefore knowing about where they differ helps. Let us look at their differences.

Type of variable

In linear regression, the dependent/y variable is always continuous.

In case of a binary logistic regression or multi-nominal logistic regression, the dependent/y variable should be discrete.

Purpose

Linear regression is used to estimate the dependent/y variable, considering the changes that happened to independent/x variables. Example: - With the change in car weight how will the mileage react.

Logistic regression is used to calculate the probability of an event occurring. Example: - Will it rain today in Bengaluru (Yes/No). 

Relationship

Linear regression assumes that there is a linear relationship between the independent/x variables and dependent/y variable.

Linearity of relationship between independent/x variables and dependent/y variable is not required in case of logistic regression.

Error

When it comes to residuals, linear regression wants/requires the error term to be normally distributed.

In case of logistic regression, error term to be normally distributed is not required.

Distribution

Linear regression assumes that the data-points in the dependent variable are normally distributed (Gaussian distribution).

Logistic regression assumes binomial distribution (Success or Failure) of dependent variable.

Curve

Linear regression tries to find the best-fit straight line, popularly known as regression line.

In logistic regression, the curve is S-shaped. When you change, the coefficient there is a change in the steepness and direction of the curve. If the impact gives a positive slope the curve will be S-shaped, if the impact gives a negative slope the curve will be Z-shaped.

Sample size

 In linear regression, each independent variable should have at least five cases to work with.

Logistic regression on the other hand needs at least 10 events per independent variable.

Algorithm

Linear regression uses least square estimation method to find a cost function that minimizes the squared distance of each observed response to its fitted value.

Logistic regression uses maximum likelihood estimation, which means you should choose coefficient in such a way that it maximizes the probability of Y given X (likelihood). It’s an iterative process that tries different solution before arriving at the maximum likelihood estimate.

Interpretation

Linear regression is interpreted as, keeping all other independent variable constant, how much the dependent variable increase/decrease with a unit increase in the particular independent variable.

For logistic regression, we interpret odd ratios, with other variables being constant in the model, what is the effect of a one-unit change in X in the predicted odd ratio.

Function Used

Linear regression uses identity link function of Gaussian family.

Logistic regression uses logit function of Binomial family.

Computational Time

As linear regression tries to find the minimum function it takes less time.

Logistic regression is an iterative process that finds the maximum likelihood, hence will take more time.

That is it…..


To view or add a comment, sign in

Explore content categories