Forecasting with 20 lines of Python

Forecasting with 20 lines of Python

At some point you'll likely be asked to create a forecast for orders / sales / web visits, etc... for the coming year. Moreover, you may be asked to forecast for components of an overall forecast, such as product lines, web pages, or products.

Initially, you may want to leverage linear regression for the task, but after visualizing your data, you notice that your data is not linear and would require many additional inputs for your baseline forecast.

So, this is the point where you can leverage historical data and create indexes to set your baseline forecast.

But what is an index? In this case, an index is the percent of growth (or decline) of orders, web visits, etc... relative to the first time period of the year. Let's say in Jan 2019, period 1, I receive 835 orders of "widget a", and 1200 & 940 for Feb and March respectively (period 2 & 3). My indices for "widget a" are as follows

  • Jan 2019 = 1.0 (or 835/835)
  • Feb 2019 = 1.44 (or 1200/835)
  • Mar 2019 = 1.12 (or 940/835)

Establishing indexes helps me outline the seasonality of my products in a very simple way. With these indexes, I can take Jan 2020 sales (actuals or estimates) and multiply them against each index value to get a baseline forecast based on past performance.

You can even choose to become more sophisticated with indices by continuously projecting orders based on recent performance or applying weights to help account for unplanned growth or decline. We won't get into that here though.

In the following example, we are given monthly orders of 3 products for the prior year. The task is to compute the indices and so that I can multiply the following year's actual or estimated sales against the indices and get a full years forecast.

Step 1: Import your modules and read in your data:

The data for this exercise can be retrieved from the gitlab repo linked at the end of the post.

Step 2: Visualize your trends (optional, but recommended)

We clearly don't have linear growth (at the month level at least).

Step 3: Extract month number and set a column value

I find this handy for numerous reasons, in this example, it serves as a good filter

Step 4: Extract the base value for computing your monthly indices for each product.

This is a very efficient way to extract the base value (denominator) for each product, especially if you have tens or hundreds of products you are forecasting!

Step 5: Merge DataFrames and compute index

Final Step: We made it! Multiply your base value against each month index. In this example, we already have our Jan 2020 estimates, so we're ready to rock.

And that is all you need to create a forecast on one or more targets! In the example above I only forecasted sales for 'widget-a'. Try doing the same for widgets b and c. And you can even challenge yourself to loop through and forecast all products in one swoop!

Notebook and Data: https://gitlab.com/oscarvalles/time_series_index

Happy forecasting & coding. Questions and comments always appreciated.

To view or add a comment, sign in

More articles by Oscar Valles

  • Making money when stocks fall

    Post COVID, we've experienced some interesting swings in the market. A couple of my friends (Xikai Zhao & Ben…

  • Identify Outliers: using 20 Lines of Python

    You're facing a data set with many measures that you are planning to run through a regression model or analyze in some…

    3 Comments
  • Investing for Millennials

    The other day I came across Stash, a mobile app designed to simplify investing. Similar to Robinhood, this app and…

    1 Comment
  • Easy ETL with Python - For Beginners

    At work and in discussions with peers and colleagues, you are likely to encounter the topic of leveraging python for…

    3 Comments
  • 5 Great Tools for an Analyst's Stack

    You may be new to the analyst game or a seasoned pro, but you are reading this because like most analysts, you are are…

    2 Comments
  • Simple Regression with KNIME

    Just downloaded KNIME last night. I was previously using RapidMiner for data prep, loading and mining tool.

    2 Comments
  • Converting Point in Time data to Time Series data

    Leveraging data to tell stories may, at times, will have caveats. The data will not lend itself for your visual…

  • Sorting Secondary Blended Data in Tableau

    Data blending in Tableau enables us to bring in secondary data sets into our viz's and analysis without having to…

    2 Comments

Others also viewed

Explore content categories