When to apply Linear Regression?
There must be followed two conditions to apply Linear Regression on the given data set.
1) Independent variables and dependent variables must be quantitative i.e. numerical if not then convert them into numerical.
2) To apply Linear Regression the variables must be correlated with each other
i.e. there must exist the relation between variables. To check whether two or more variables are correlated with each other or not one can use correlation coefficient.
Process: Correlation Coefficient is a statistical measure that calculates the strength of a relationship between the relative movements of independent and dependent variables.
Before going into detail, it is more important to go through the basic statistical building block of correlation coefficient i.e. covariance.
Covariance is a measure of how much two random variables vary together. It is calculated by the given formula.
[source: Google Images]
On the basis of the sign of covariance value one can get to know whether two variables are positively, negatively correlated and zero represents that both the variables are completely independent. It is not easily interpreted as it provides matrix. So, instead of it correlation coefficient is calculated.
Some of the correlation coefficients are as given below,
Pearson’s Correlation Coefficient : It is one of the most commonly used correlation coefficient that defines relationship between only one independent and one dependent variable. Pearson’s Correlation Coefficient is the covariance of the two variables divided by the product of their standard deviations. Pearson’s coefficient’s values vary between -1 and 1.
[source: Google Images]
Correlation Coefficient value is +1 then given variables are said to be perfectly positive correlated. Variables are said to be positively correlated when the value of one variable increase and correspondingly the value of another value also increases which can be represented by positive correlation coefficient.
[source: Google Images]
Correlation Coefficient value is -1 then given variables are said to be perfectly negative correlated. Variables are said to be negatively correlated when the value of one variable decrease and correspondingly the value of another variable decreases which can be represented by negative correlation coefficient.
[source: Google Images]
Correlation Coefficient value is 0 then given variables are not correlated i.e. one cannot apply Linear Regression.
[source: Google Images]
Spearman’s Correlation Coefficient : It is a statistical measure of the strength of a monotonic relationship between two variables. It is the non parametric version of the Pearson’s Correlation Coefficient. It determines the strength and direction of monotonic relationship between two variables rather than the strength and direction of the linear relationship between two variables, which is what Pearson’s Correlation Coefficient determines.
[source: Quora]
Monotonic relationship between two variables x and y is as given below…
- Monotonically increasing - as the x variable increases the y variable never decreases.
- Monotonically decreasing - as the x variable increases the y variable never increases.
- Not monotonic - as the x variable increases the y variable sometimes decreases and sometimes increases.
When one is unsure of the distribution and possible relationship between two variables, at that time Spearman's Correlation Coefficient is good to use.
[source: Google Images]
I suggest you to go through the given YouTube link to have an idea of Spearman's Correlation Coefficient.