Forecasting with 20 lines of Python
At some point you'll likely be asked to create a forecast for orders / sales / web visits, etc... for the coming year. Moreover, you may be asked to forecast for components of an overall forecast, such as product lines, web pages, or products.
Initially, you may want to leverage linear regression for the task, but after visualizing your data, you notice that your data is not linear and would require many additional inputs for your baseline forecast.
So, this is the point where you can leverage historical data and create indexes to set your baseline forecast.
But what is an index? In this case, an index is the percent of growth (or decline) of orders, web visits, etc... relative to the first time period of the year. Let's say in Jan 2019, period 1, I receive 835 orders of "widget a", and 1200 & 940 for Feb and March respectively (period 2 & 3). My indices for "widget a" are as follows
- Jan 2019 = 1.0 (or 835/835)
- Feb 2019 = 1.44 (or 1200/835)
- Mar 2019 = 1.12 (or 940/835)
Establishing indexes helps me outline the seasonality of my products in a very simple way. With these indexes, I can take Jan 2020 sales (actuals or estimates) and multiply them against each index value to get a baseline forecast based on past performance.
You can even choose to become more sophisticated with indices by continuously projecting orders based on recent performance or applying weights to help account for unplanned growth or decline. We won't get into that here though.
In the following example, we are given monthly orders of 3 products for the prior year. The task is to compute the indices and so that I can multiply the following year's actual or estimated sales against the indices and get a full years forecast.
Step 1: Import your modules and read in your data:
The data for this exercise can be retrieved from the gitlab repo linked at the end of the post.
Step 2: Visualize your trends (optional, but recommended)
We clearly don't have linear growth (at the month level at least).
Step 3: Extract month number and set a column value
I find this handy for numerous reasons, in this example, it serves as a good filter
Step 4: Extract the base value for computing your monthly indices for each product.
This is a very efficient way to extract the base value (denominator) for each product, especially if you have tens or hundreds of products you are forecasting!
Step 5: Merge DataFrames and compute index
Final Step: We made it! Multiply your base value against each month index. In this example, we already have our Jan 2020 estimates, so we're ready to rock.
And that is all you need to create a forecast on one or more targets! In the example above I only forecasted sales for 'widget-a'. Try doing the same for widgets b and c. And you can even challenge yourself to loop through and forecast all products in one swoop!
Notebook and Data: https://gitlab.com/oscarvalles/time_series_index
Happy forecasting & coding. Questions and comments always appreciated.