Forecasting Revenue with Python & Prophet
If you have the word “manager” in your professional title, there is a good chance that you are asked to make forecasts. Making a prediction is simple enough, but making an accurate prediction can often be challenging. However, there are plenty of great tools available nowadays and you don't need to be a data scientist to make a decent forecast. Today, I’d like to share one of my favorite tools for predicting revenue in retail environments: Prophet.
What is Prophet?
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. It is robust to missing data and shifts in the trend, and typically handles outliers well.
Prophet was developed by Facebook's Core Data Science team and is used within the company for planning, goal-setting, and forecasts. Facebook released Prophet as open-source under the BSD license in 2017 and it has been freely available ever since.
Why is it a good choice for retail forecasting?
Prophet really shines as a forecasting tool when dealing with multiple seasonality--a common characteristic of most all retail environments. Strong seasonal trends can be observed throughout the year and for days of the week in most any retail store.
Let’s take a look at some revenue charts that demonstrate seasonality:
Here I can see a seasonal trend between the two years. For example, low peaks can be observed in August and the end of December into the first week of January. The other months match up fairly well--although the numbers themselves may not match due to growth, major peaks and valleys occur during the same time periods.
While the first couple of months seem a bit off, I happen to know that there was a construction detour during that time that routed an abnormally high amount of traffic by the outlet, resulting in higher sales. Prophet handles outliers well so the impact to this particular forecast will be negligible. If it were to impact the accuracy, however, no worries; Prophet will allow you to identify and exclude certain dates, if you choose to do so.
In addition to the yearly seasonality, I can also see a very strong weekly seasonality:
For this particular location, revenue gradually increases throughout the work-week and then slumps on the weekend. I can infer two things at this point:
- Prophet is a good candidate for modeling a forecast on this data; and
- The location of this outlet is clearly not a shopping mall ;-)
A Real-World Scenario
The revenue in the preceding charts was provided by a client that reached out for help in determining “great” sales days at their retail locations. (The dates and amounts have been modified to sufficiently anonymize the data, but it has been kept proportional.) The company wanted to reward managers for any days that sales were significantly above expectations, but the seemingly simple task of setting the expectation was proving problematic. They had tried a few basic estimating methods already and sometimes it worked, but sometimes it did not. Consequently, managers were quickly losing faith in the reward system and were beginning to perceive it as unfair.
There are two objectives in this scenario:
- Forecast the revenue “baseline” expectations
- Define what is “significantly above” expectations
I will use Prophet to forecast the baseline and set an appropriate confidence interval to determine what “significantly” deviates from the baseline.
Prerequisites
The focus of this demonstration is Prophet, but I will need two other software libraries to support this process:
- Pandas, for manipulating tabular data (a.k.a., “dataframes”)
- Matplotlib, for visually plotting the data in charts
In the following code, I make heavy use of these two libraries. In fact, only 5 lines of code are relevant to Prophet itself; the rest is dedicated to loading, preparing, and plotting the data. If you aren’t at least a little familiar with these libraries, the rest of the demonstration may come across as a bit confusing.
Creating the Forecast
If I open the revenue data in Excel, I can see that it is a standard CSV file with two columns: date & revenue:
The first step will be to open a Jupyter notebook and import the CSV file into a Pandas dataframe.
Prophet expects the column names to be “ds” and “y” so I’ll rename them.
Now I am ready to provide Prophet the Pandas dataframe for fitting. Note the interval_width on the second line. This is to set the confidence interval, which I will refer to near the end of this article.
I will get a message that Prophet is disabling daily seasonality. This is expected as no intraday data is being provided.
At this point, the model has been fit and I am ready to make predictions with Prophet. First, however, I will need to create a dataframe that includes future dates that the predictions can be inserted into.
The next step is to feed the future dates into Prophet to load the predictions.
That’s it! Now let me plot the predictions so that I can see what it looks like. I will make the “baseline” expectation a black dotted line and fill the confidence bands with grey.
Prophet provides some built in functions to check the accuracy, but I will forego this for now. To keep it simple, I’ll just plot some of the original revenue data to see how it lines up.
Interpreting the Final Plot
The original objectives were to:
- Forecast the revenue “baseline” expectations; and
- Define what is “significantly above” expectations
The final plot satisfies both of these objectives.
The black dotted line represents the target “baseline” revenue. This should be self-explanatory so I won’t elaborate.
A confidence interval is ideal for determining what a “significant” departure from the baseline is and is a much better measure than simply applying a percent deviation. The shaded grey band is the 90% confidence interval based upon the standard deviation of the data provided; there is a 90% chance any data point will fall within this region. Therefore, any day that the observed revenue is above or below this shaded area should be considered a particularly “good” or “bad” day, respectively.
Want to know more?
I hope this brief demonstration has encouraged you to explore Prophet, Python, and Jupyter notebooks. If you’d like to learn more, here are a few links to get you started:
- Python Programming for Beginners
- Python for Financial Analysis
- Prophet by Facebook
- Pandas Data Analysis Library
- Matplotlib Plotting Library
- Jupyter Notebooks
If you have any questions, feel free to reach out to me on LinkedIn or @BrianSRJ on Twitter.