Machine Learning Basics – Linear Regression Model #AIMLSeries-2
As a follow-up to my previous post, let’s build a basic regression model that can help us make some predictions.
Let’s look at a scenario where a used car dealer wants to estimate a car’s selling price (y) based on its odometer reading (x).The goal is to set the actual selling price a bit higher than the predicted one, so the seller can make a profit.
Below is a sample dataset derived from past car sales records.
Just by common sense, we might guess that the more a car has been driven (higher odometer reading), the lower its selling price will be. But how do we actually prove that using Linear Regression? For that, we can look at things like covariance and correlation.
Before we go further, let’s take a quick look at some basic statistics from the data using Microsoft Excel’s Analysis ToolPack. For now, let’s just focus on the mean and standard deviation from all the information we have.
Covariance helps us understand how two variables—X and Y—move together. If the value is positive, it means they increase or decrease together (directly related). If it's negative, it means when one goes up, the other goes down (inversely related).
So, how do we calculate covariance? Let’s break it down...
Covariance=(x-x̄ )*(y-ȳ)/n-1
Where x bar , y bar denotes mean and n is record count.
Covariance=-562/(10-1)=-62.4
A negative sign means that as the odometer reading goes up, the selling price goes down. But how can we tell how strong this relationship is between the two? That’s where correlation comes in.
Correlation shows how strongly one variable (like price) depends on another (like odometer reading). Its value always falls between +1 and -1, unlike covariance, which can be any number.
So, how do we calculate correlation? Let’s take a look...
Correlation =Covariance/(Standard Deviation in x * Standard Deviation in y)
Standard Deviation in Odometer Reading ( σ OD) = 15
Standard Deviation in Price ( σ Price) = 5
σ OD * σ Price = 15*5 =75
Recommended by LinkedIn
Correction= -62.4/75= -0.83
A correlation of -0.83 tells us there's a strong link between odometer reading and selling price. That means we’re good to go ahead and build the regression model.
If the correlation was low, it wouldn’t be ideal to build a model—because the predictions wouldn’t be very reliable.
Now, let’s plot selling price on the Y-axis and odometer reading on the X-axis. You’ll notice that the points don’t line up perfectly in a straight line.
The goal is to draw a line that’s as close as possible to all the points on the chart.
When we look at each point (like the price), we can break down its height into two parts: One part is from the bottom to the line (this is the fitted value) and The other part is from the line up to the point itself (this is the residual, or the error).
In simple terms: What we see (actual value) = What the model predicts (fitted value) + Error (residual)
We want to find a line where the total of all these errors is as small as possible. And because errors can be negative or positive, we square them to make sure they don’t cancel each other out.
This special line is called the Least Squares Line—and that’s what regression gives us.
To draw this line, we need just two things: the slope and the intercept—like the formula Y = c + mX, where m is the slope and c is where the line hits the Y-axis.
In Excel, you can easily get these values using the Regression option in the Analysis ToolPack. You’ll see a lot of output, but for now, just focus on the Coefficients section.
So, the final equation for our best-fit line is:
Selling Price = 56.2 – 0.27 × Odometer Reading
This means that if we know the odometer reading, we can estimate the car’s selling price.
For example, the model says that if a car has 100,000 km on the odometer, it would likely sell for around 2.9 lakhs.
More learnings coming soon where we will be covering how to do a health check for Models!