Machine Learning Basics – Linear Regression Model #AIMLSeries-2

Machine Learning Basics – Linear Regression Model #AIMLSeries-2

As a follow-up to my previous post, let’s build a basic regression model that can help us make some predictions.

Let’s look at a scenario where a used car dealer wants to estimate a car’s selling price (y) based on its odometer reading (x).The goal is to set the actual selling price a bit higher than the predicted one, so the seller can make a profit.

Below is a sample dataset derived from past car sales records.

Article content
Sample Data

Just by common sense, we might guess that the more a car has been driven (higher odometer reading), the lower its selling price will be. But how do we actually prove that using Linear Regression? For that, we can look at things like covariance and correlation.

Before we go further, let’s take a quick look at some basic statistics from the data using Microsoft Excel’s Analysis ToolPack. For now, let’s just focus on the mean and standard deviation from all the information we have.

Article content
Descriptive Statistics

Covariance helps us understand how two variables—X and Y—move together. If the value is positive, it means they increase or decrease together (directly related). If it's negative, it means when one goes up, the other goes down (inversely related).

So, how do we calculate covariance? Let’s break it down...

Covariance=(x-x̄ )*(y-ȳ)/n-1

Where x bar , y bar denotes mean and n is record count.

Article content

Covariance=-562/(10-1)=-62.4

A negative sign means that as the odometer reading goes up, the selling price goes down. But how can we tell how strong this relationship is between the two? That’s where correlation comes in.

Correlation shows how strongly one variable (like price) depends on another (like odometer reading). Its value always falls between +1 and -1, unlike covariance, which can be any number.

So, how do we calculate correlation? Let’s take a look...

Correlation =Covariance/(Standard Deviation in x  * Standard Deviation in  y)

Standard Deviation in  Odometer Reading ( σ OD)  = 15

Standard Deviation in Price  ( σ Price)  = 5

σ OD * σ Price  = 15*5 =75

Correction= -62.4/75= -0.83

A correlation of -0.83 tells us there's a strong link between odometer reading and selling price. That means we’re good to go ahead and build the regression model.

If the correlation was low, it wouldn’t be ideal to build a model—because the predictions wouldn’t be very reliable.

Now, let’s plot selling price on the Y-axis and odometer reading on the X-axis. You’ll notice that the points don’t line up perfectly in a straight line.

Article content
Least Squares Line

The goal is to draw a line that’s as close as possible to all the points on the chart.

When we look at each point (like the price), we can break down its height into two parts: One part is from the bottom to the line (this is the fitted value) and The other part is from the line up to the point itself (this is the residual, or the error).

In simple terms: What we see (actual value) = What the model predicts (fitted value) + Error (residual)

We want to find a line where the total of all these errors is as small as possible. And because errors can be negative or positive, we square them to make sure they don’t cancel each other out.

This special line is called the Least Squares Line—and that’s what regression gives us.

To draw this line, we need just two things: the slope and the intercept—like the formula Y = c + mX, where m is the slope and c is where the line hits the Y-axis.

In Excel, you can easily get these values using the Regression option in the Analysis ToolPack. You’ll see a lot of output, but for now, just focus on the Coefficients section.

Article content
Regression Reports

So, the final equation for our best-fit line is:

Selling Price = 56.2 – 0.27 × Odometer Reading

This means that if we know the odometer reading, we can estimate the car’s selling price.

For example, the model says that if a car has 100,000 km on the odometer, it would likely sell for around 2.9 lakhs.


Article content
Selling Price Prediction based on Odometer Reading

More learnings coming soon where we will be covering how to do a health check for Models!


To view or add a comment, sign in

More articles by Amrita Khatua

  • Figuring Out AI #3

    Anthropic just unveiled Claude Mythos Preview — an unreleased frontier model that has already identified thousands of…

  • Figuring Out AI #2

    The AI Revolution Is Already Here. Are We Ready? I've spent some time diving deep into the AI industry — and here's…

  • Figuring Out AI #1

    India's AI Stack: Where $250 Billion Goes — And What It Builds After India’s 2026 AI Summit, I kept wondering where…

  • Machine Learning Basics – Decision Tree #AIMLSeries-3

    Diving into Decision Trees with California Housing Data! I’ve been exploring Decision Trees lately- part of my prep…

  • Machine Learning Basics – Regression and Classification Models #AIMLSeries-1

    I used to be fascinated when someone said, “I’m into Machine Learning and building a model.” I always imagined it had…

    2 Comments

Others also viewed

Explore content categories