Super Simple Machine Learning — Simple Linear Regression Part 2 [Math and Python]

Bernadette Low

Published Jan 25, 2018

Originally posted on 16th Jan '18 here. Part 1 can be found here.

This is part of a series of articles I am writing, covering ML algorithms, explained in a simple and light-hearted way for easy understanding. I may gloss over more technical aspects and terms as the goal here is to help myself and others to understand concepts over just following steps and throwing out terms blindly.

However, if my explanations are fundamentally incorrect, let me know.

Now that you’ve sort of got the basic concept for Simple Linear Regression down from Part 1, let’s get to the nitty gritty.

In this post, I will go into some Python coding and the math behind it, plus touch on certain characteristics of a dataset.

LET’S BEGIN!

Fantastic Parameters/Statistics and Where to Find them

I want to get some terms out of the way first:

Parameters
Statistics

Big thanks to stats god Michael for correcting my misconceptions in the first version.

Parameters: Characteristics of a POPULATION (e.g all possible outcomes). They are most likely impossible to derive.
Statistics: Characteristics of a SAMPLE (e.g outcomes we can record). The statistics allows you to estimate parameters: “Inferential statistics enables you to make an educated guess about a population parameter”

Examples of characteristics you should be familiar with by now:

Mean : average
Median: middle value
Variance : average of the squared difference between each x and the Mean x. It describes how far spread out the data is. If variance is high, your ‘low numbers’ are low and your ‘high numbers’ are high, just imagine an elastic band being held further and further apart and the variance increases accodingly. The lower the variance the more ‘stable’ it is as it converges to the Mean.
Standard Deviation: Square root of Variance. Basically finds how wide the spread of x is, which is the same as Variance BUT it is a matter of units. If you are looking at at a dataset of height (in cm), variance will give you cm2, but Standard Deviation is the square root of it and will give you an answer in cm, and this is sometimes better to calculate with and sits better with your OCD.

Get your head around Variance and Standard Deviation first, because you’ll encounter it ALOT in statistical modelling. This explanation is pretty good…. plus there’s doggies!

Alright, So …Regression?

Remember in Part 1 how I spoke about trying to plot different lines to find the one with the least squared error, and how R and Python packages can just solve it for you?

Well, let’s look into what these packages are doing.

KEEP THIS IN MIND:

y = ax + b

The equation behind the Ordinary Least Squared Method (finding the best fit line) looks like this:

The first equation is basically the line equation.

The 2nd and 3rd equations are the what you need to find a and b.

a and b are referred to as the “beta”s in the linear regression, and are considered learned parameters as they are eventually “learnt” after running the linear regression algorithm.

You can find the above equation here and the math behind it can be found here.

What’s going on in those formulae/formulas?

We are trying to minimise the Sum of Squared Errors (Squared differences between your actual and predicted. Refer to part 1 if unsure, are you even paying attention?!).

To do this your partial derivative of a and b has to be 0. Because 0 is the inflection point when the curve is at its least. Yada yada, you get those two equations.

Derivatives are not for me to explain but you can do a quick revisions here.

X and Y have Cooler Hats than You

You’ll notice that in the 2nd and 3rd equation, the x and y have funny things on their heads.

x̄ = x bar

̅y = y bar

Other than being difficult to type out properly, the bars basically mean mean.

x̄ refers to the mean (average) of x.

(x- x̄) is the difference between that value of x and the average value of all the different x values. This is referred to as the deviation score, meaning how far it deviates from the mean.

(xi-x̄) looks really familiar doesn’t it? That’s because it’s used in calculating variance and standard deviation as well.

See how useful the mean values are! This explains why parameters/statistics are so important.

Another symbol to take note of is the hat

ŷ = y-hat

This refers to the predicted value of y from a prediction equation.

In other words, to be more correct,

y = ax + b

should be

ŷ = ax + b

And the error is basically

REAL Y - PREDICTED Y

which can be written as

y - ŷ

This is also referred to as the residual. (Remember that step in Part 1 about checking that your residuals are random and should not show a pattern?)

Anyway because life is hard and complicated — the sum of squared errors in prediction are written as SSE

but can also be called :

residual sum of squares (RSS)
sum of squared residuals (SSR)

OKAY, THE COOL CODING BEGINS HERE

Now that we’ve got all those math out of the way, let’s start coding.

We are going to start off with using a Linear Regression module from the sklearn library in Python. This is similar to the one-line code I gave in Part 1 for R in which I am using something already pre-coded to find my regression line. Code can be found here. Does anyone know how to embed code into Linkedin posts?!

Yay and you’re done with the modelling!

Look at that gorgeous line. Whether it is a good line or not has not been decided yet (wait for Part 3.. just wait for it), but for now it has been decided that this line has the least SSE (or RSS, or SSR).

However, since I spent a substantial amount of time going through the equations behind the LinearRegression() method, I want to prove that it actually is the math behind the python module we just used.

This is the equation I will use in the following python code:

Also note that the power sign in Python is NOT " ^ " , it’s " ** "

Full code can be found here

Both give the exact same results as seen below:

The regression equation is y = 1.37x + 4.27

AAAND WE’RE DONE.

Hopefully you have a better idea about how Simple Linear Regression works now :) I certainly do.

This is only the first step to Linear Regression, but feel free to try it yourself. I used this post as a guide and it’s proven to be very comprehensive, especially for the math part.

You can set up and code in Jupyter Notebook or just use a Python IDE.

In the next episode, I will be touching on evaluating the accuracy of the model and how to derive predictions from it.

STAY TUNED!

EXTRA EXTRA!! Context is important!!

In Part 1, the example I gave about ‘Tears Shed’ vs ‘Exam Score’ was a bad one on afterthought.

As much as statistics can prove a correlation, ALWAYS REMEMBER THAT

**correlation is not causation**

Perhaps the number of tears shed would have affect the score but it could have all just been a coincidence and not directly related.

The correlation between number of tears shed and the exam score could have been a ….

*drum roll*

SUPERIOUS CORRELATION

Check out super fun superious correlations by Tyler Vigen here.

This is where business knowledge and common sense comes in. Whats relevant and what isn’t is not solely defined by a program.

Feature selection is very much a job for both humans and computers.

We’ve reached the end of PART 2! Thank you for sticking around so far. Keep those gorgeous eyes out for Part 3 which I will post tomorrow, and remember to let me know if you spot any errors.

To view or add a comment, sign in

Super Simple Machine Learning — Simple Linear Regression Part 2 [Math and Python]

Bernadette Low

Fantastic Parameters/Statistics and Where to Find them

Alright, So …Regression?

What’s going on in those formulae/formulas?

X and Y have Cooler Hats than You

OKAY, THE COOL CODING BEGINS HERE

AAAND WE’RE DONE.

EXTRA EXTRA!! Context is important!!

More articles by Bernadette Low

Others also viewed

ML based sentiment analysis of movie reviews

Applying Bayes Theorem: Simulating the Monty Hall Problem with Python

My Second Week Of Learning Python

Is Random Forest technique the best Classification technique yet?

Principal Component Analysis in Python

Hyperparameter tuning with GridSearchCV

DIY AI: Sentiment Analysis P2: Implementation

Stock Market Prediction Using Python: Article 1 ( The straight line )

Visual Machine Learning with Python Scikit-learn library #DataScience

Explore content categories

Fantastic Parameters/Statistics and Where to Find them

Alright, So …Regression?

What’s going on in those formulae/formulas?

X and Y have Cooler Hats than You

OKAY, THE COOL CODING BEGINS HERE

AAAND WE’RE DONE.

EXTRA EXTRA!! Context is important!!

More articles by Bernadette Low

Super Simple Machine Learning — Multiple Linear Regression Part 1

Super Simple Machine Learning. — Simple Linear Regression Part 3 [Validation]

Super Simple Machine Learning— Simple Linear Regression [Part 1]

We the Millennials, bro

Others also viewed

ML based sentiment analysis of movie reviews

Applying Bayes Theorem: Simulating the Monty Hall Problem with Python

My Second Week Of Learning Python

Is Random Forest technique the best Classification technique yet?

Principal Component Analysis in Python

Hyperparameter tuning with GridSearchCV

DIY AI: Sentiment Analysis P2: Implementation

Stock Market Prediction Using Python: Article 1 ( The straight line )

Visual Machine Learning with Python Scikit-learn library #DataScience

Explore content categories