Predictive Modeling in Excel
For predictive analytics, Excel? Really Really?
Usually, that is the first reaction I get when I bring the subject up.
When I show how we can explore Excel's versatile nature to create predictive models for our data science and analytics ventures, this is accompanied by an incredulous look.
If the stores around you started gathering consumer data, should they follow a data-based approach to sell their goods?
Let me ask you a question. Can their revenue/sales be predicted or the number of goods estimated?
Now you have to wonder how they are going to construct a complex mathematical model in the world that can predict these things?
And it may be beyond their reach to study analytics or recruit an analyst. The good news is here-they don't need to.
Without having to write complicated code that flies over most people's heads, Microsoft Excel gives us the opportunity to conjure predictive models.
In MS Excel, we can easily construct a simple model such as linear regression that can help us perform analysis in a few simple steps.
What is Linear Regression?
It is a linear approach to statistically model the relationship between the dependent variable (the variable you want to predict) and the independent variables (the factors used for predicting). Linear regression gives us an equation like this:
Y=M1*X1+M2*X2+.............+MnXn+C
Here, we have Y as our dependent variable, X’s are the independent variables and all M’s are the coefficients. Coefficients are basically the weights assigned to the features, based on their importance and C is the constant which is basically the intercept.
Getting the All-Important Analysis ToolPack Excel Add-in
To perform a regression analysis in Excel, we first need to enable Excel’s Analysis ToolPak Add-in. The Analysis ToolPak in Excel is an add-in program that provides data analysis tools for statistical and engineering analysis.
To add it in your workbook, follow these steps.
Step 1 – Excel Options
Go to Files -> Options:
Step 2 – Locate Analytics ToolPak
Go to Add-ins on the left panel -> Manage Excel Add-ins -> Go:
Step 3 – Add Analytics ToolPak
Select the “Analysis ToolPak” and press OK:
You have successfully added the Analysis ToolPak in Excel! You can check it by going to the Data bar in the Ribbon.
Let’s start building our predictive model in Excel!
Implementing Linear Regression in Excel
Here is the problem statement we will be working with:
"The company Apple wants to predict the price of I-Pad by considering the following factors Screen(type), Storage capacity, Connectivity(type) and Gen"
->Encode the data in order to perform the regression analysis: Assigning numeric value to categorical data set.
Step 1 – Select Regression
Go to Data -> Data Analysis:
Go to Data Analysis in the Data ToolPak, select Regression and press OK:
Step 2 – Select Options
In this step, we will select some of the options necessary for our analysis, such as :
- Input y range – The range of independent factor
- Input x range – The range of dependent factors
- Output range – The range of cells where you want to display the results
Analyzing our Predictive Model’s Results in Excel
The easy part was applying the linear regression model. Now comes the tricky part of our study, interpreting the outcomes of the predictive model in Excel.
In the summary, we have 3 types of output and we will cover them one-by-one:
- Regression statistics table
- Regression coefficients table
- Residual Table
Regression Statistics Table
The regression statistics table tells us how well the line of best fit defines the linear relationship between the independent and dependent variables. Two of the most important measures are the R squared and Adjusted R squared values.
The R-squared statistic is the indicator of goodness of fit which tells us how much variance is explained by the line of best fit. R-squared value ranges from 0 to 1. In our case, we have the R-squared value of 0.93 which means that our line is able to explain 93% of the variance - a good sign.
But there is a problem - as we keep adding more variables, our R squared value will keep increasing even though the variable might not be having any effect. Adjusted R-squared solves this problem and is a much more reliable metric.
Regression Coefficient Table
The Coefficient table breaks down the components of the regression line in the form of coefficients.
Residual Table
The residual table reflects how much the predicted value varies from the actual value. It consists of the values predicted by our model:
Making Predictions in Excel!
--> The RMSE is 8% that indicates the variability in our predicted model is 8% when compared with the original model.
--> Our model has predicted the price range as per the specifications that was given as input. The detailed model will be reflected in the video attached.
That’s the power of linear regression done simply in Microsoft Excel.
#BusinessToys #DataScience #DataAnalytics #ExcelAnalytics #PassionForData #DataScientist
#interestingfacts ! 😎
May we learnt it if yes how
True, we undermine the possibilities and features of an excel spreadsheet. Many solutions, models, analysis are possible thru excel and it's add-ons. Thanks Nigamananda Mohapatra for bringing it up.
Many industries and businesses continue to emphasize the importance of Excel skills because it remains as an intelligent way to extract actionable insights, Loved it, thanks for sharing 👍
Awesome Nigamananda! Sometimes Excel / spreadsheets get a bad name among serious analytical people. However, I find that they are great for many things. They are simply one of many powerful tools that are best when used appropriately. Thanks for this post!