Predictive Modeling in Excel
Predictive Modeling in Excel

Predictive Modeling in Excel

For predictive analytics, Excel? Really Really?

Usually, that is the first reaction I get when I bring the subject up.

When I show how we can explore Excel's versatile nature to create predictive models for our data science and analytics ventures, this is accompanied by an incredulous look.

If the stores around you started gathering consumer data, should they follow a data-based approach to sell their goods?

Let me ask you a question. Can their revenue/sales be predicted or the number of goods estimated?

Now you have to wonder how they are going to construct a complex mathematical model in the world that can predict these things?

And it may be beyond their reach to study analytics or recruit an analyst. The good news is here-they don't need to.

Without having to write complicated code that flies over most people's heads, Microsoft Excel gives us the opportunity to conjure predictive models.

In MS Excel, we can easily construct a simple model such as linear regression that can help us perform analysis in a few simple steps.

What is Linear Regression?

It is a linear approach to statistically model the relationship between the dependent variable (the variable you want to predict) and the independent variables (the factors used for predicting). Linear regression gives us an equation like this:

Y=M1*X1+M2*X2+.............+MnXn+C

Here, we have Y as our dependent variable, X’s are the independent variables and all M’s are the coefficients. Coefficients are basically the weights assigned to the features, based on their importance and C is the constant which is basically the intercept.

Getting the All-Important Analysis ToolPack Excel Add-in

To perform a regression analysis in Excel, we first need to enable Excel’s Analysis ToolPak Add-in. The Analysis ToolPak in Excel is an add-in program that provides data analysis tools for statistical and engineering analysis.

To add it in your workbook, follow these steps.

Step 1 – Excel Options

Go to Files -> Options:

Step 2 – Locate Analytics ToolPak

Go to Add-ins on the left panel -> Manage Excel Add-ins -> Go:

Step 3 – Add Analytics ToolPak

Select the “Analysis ToolPak” and press OK:

You have successfully added the Analysis ToolPak in Excel! You can check it by going to the Data bar in the Ribbon.

Let’s start building our predictive model in Excel!

Implementing Linear Regression in Excel

Here is the problem statement we will be working with:

"The company Apple wants to predict the price of I-Pad by considering the following factors Screen(type), Storage capacity, Connectivity(type) and Gen"
No alt text provided for this image

->Encode the data in order to perform the regression analysis: Assigning numeric value to categorical data set.

No alt text provided for this image

Step 1 – Select Regression

Go to Data -> Data Analysis:

Go to Data Analysis in the Data ToolPak, select Regression and press OK:

Step 2 – Select Options

In this step, we will select some of the options necessary for our analysis, such as :

  • Input y range – The range of independent factor
  • Input x range – The range of dependent factors
  • Output range – The range of cells where you want to display the results

Analyzing our Predictive Model’s Results in Excel

The easy part was applying the linear regression model. Now comes the tricky part of our study, interpreting the outcomes of the predictive model in Excel. 

In the summary, we have 3 types of output and we will cover them one-by-one:

  • Regression statistics table
  • Regression coefficients table
  • Residual Table

 Regression Statistics Table

The regression statistics table tells us how well the line of best fit defines the linear relationship between the independent and dependent variables. Two of the most important measures are the R squared and Adjusted R squared values.

The R-squared statistic is the indicator of goodness of fit which tells us how much variance is explained by the line of best fit. R-squared value ranges from 0 to 1. In our case, we have the R-squared value of 0.93 which means that our line is able to explain 93% of the variance - a good sign.

No alt text provided for this image

But there is a problem - as we keep adding more variables, our R squared value will keep increasing even though the variable might not be having any effect. Adjusted R-squared solves this problem and is a much more reliable metric.

Regression Coefficient Table

The Coefficient table breaks down the components of the regression line in the form of coefficients.

No alt text provided for this image

Residual Table

The residual table reflects how much the predicted value varies from the actual value. It consists of the values predicted by our model:

No alt text provided for this image

Making Predictions in Excel!

No alt text provided for this image
No alt text provided for this image

--> The RMSE is 8% that indicates the variability in our predicted model is 8% when compared with the original model.

--> Our model has predicted the price range as per the specifications that was given as input. The detailed model will be reflected in the video attached.

That’s the power of linear regression done simply in Microsoft Excel.

#BusinessToys #DataScience #DataAnalytics #ExcelAnalytics #PassionForData #DataScientist

True, we undermine the possibilities and features of an excel spreadsheet. Many solutions, models, analysis are possible thru excel and it's add-ons. Thanks Nigamananda Mohapatra for bringing it up.

Many industries and businesses continue to emphasize the importance of Excel skills because it remains as an intelligent way to extract actionable insights, Loved it, thanks for sharing 👍

Awesome Nigamananda! Sometimes Excel / spreadsheets get a bad name among serious analytical people. However, I find that they are great for many things. They are simply one of many powerful tools that are best when used appropriately. Thanks for this post!

To view or add a comment, sign in

More articles by Nigamananda Mohapatra

Others also viewed

Explore content categories