Using Machine Learning to Predict Retail Gasoline Prices

Jacob Bourne

Published Oct 10, 2017

For inflation traders, one of the key risks is energy prices. What makes this more interesting than your typical hedge able risk is your actual risk is what the BLS thinks gasoline prices did, not the observable futures prices. Therefore all good TIPS traders have a model predicting retail gasoline prices.

So that is what we are trying to predict. If the spread between RBOB futures and Retail prices was constant, this would be trivial. As you can see, it is not:

It varies from 40c to 120c, and while there is a seasonal pattern to the spread, the noise is quite large.

To tackle this problem, I created a Long Short-Term Memory (LSTM) recurrent neural network.

(I would like to thank Dr. Jason Brownlee and his https://machinelearningmastery.com site. I could not have done this without his tutorials. Much of the code used here was adapted from https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/)

To do this I will use Keras (https://keras.io/) using a Tensorflow backend.

Step 1 — Prepare the machine

Into my Python 3.6 virtual environment, I install: numpy, pandas, scipy, sklearn, tensorflow, and Keras. If you want to visualize the network you need to install into the venv pydot and graphviz as well as install graphviz on the machine (via apt-get)

Step 2- Prepare the Data

The raw data is a csv file that has three columns, the date, the XB2 price and the Average retail gasoline price. That is all the data I am going to use. Since I know the date format, it is easy to create the dataframe

import pandas as pd
def parse(x):return pd.datetime.strptime(x, '%Y-%m-%d')
dataset = pd.read_csv('gasoline.csv', index_col=0, date_parser=parse)

Which creates:

               XB2  Retail
Date                      
2011-12-30  265.74   327.8
2011-12-31  265.74   327.9
2012-01-01  265.74   327.9
2012-01-02  265.74   328.8
2012-01-03  275.34   331.9

Since I want the network to learn (hopefully) any seasonality in the data, I create two more numeric columns, one for the month and one for the date.

datadates = dataset.index.values
datamonths = pd.Series(data=[pd.to_datetime(x).month for x in datadates], index=datadates, name='month')
datadays = pd.Series([pd.to_datetime(x).day for x in datadates], index=datadates, name='day')
datamonths = datamonths.to_frame().join(datadays.to_frame())
dataset = datamonths.join(dataset)

Which now gives me the 4 column dataframe of

             month  day     XB2  Retail
2011-12-30     12    30  265.74   327.8
2011-12-31     12    31  265.74   327.9
2012-01-01      1     1  265.74   327.9
2012-01-02      1     2  265.74   328.8
2012-01-03      1     3  275.34   331.9

I want to take advantage of the LSTM network, so I am going to pass in the prior 12 days of data to predict the current retail gas prices. In other words I am going to have 48 inputs into my model to try and predict my one output. The 48 inputs will be variable t-12 through variable t-1 for all the data to predict Retail at time t.

Using Dr. Jason Brownlee’s function series_to_supervised we create our matrix.

# frame as supervised learning
reframed = series_to_supervised(values, 12, 1)
# drop columns we don't want to predict (ie month, day, xb2 on day t)
reframed.drop(reframed.columns[[48, 49, 50]], axis=1, inplace=True)

Now we have a matrix that looks like this (just a snippet):

var1(t-12)  var2(t-12)  var3(t-12)  var4(t-12)  var1(t-11)  var2(t-11)  \
12        12.0        30.0  265.739990  327.799988        12.0        31.0   
13        12.0        31.0  265.739990  327.899994         1.0         1.0   
14         1.0         1.0  265.739990  327.899994         1.0         2.0   
15         1.0         2.0  265.739990  328.799988         1.0         3.0   
16         1.0         3.0  275.339996  331.899994         1.0         4.0

Next step is we have to normalize everything between 0 and 1. sklearn’s MinMaxScaler seems like the right tool for the job:

scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(reframed)

Now it comes time to split our data into train and test data sets. We have 5 years of data, so I am going to have the model train on the first two years and test on the last 3. We then turn the data sets into the 3d shapes that Keras expects.

# split into train and test sets
values = scaled
n_train_days = 2 * 365
train = values[:n_train_days, :]
test = values[n_train_days:, :]
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

If we look at the shape of our data we can see that:

print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
(730, 1, 48) (730,) (1363, 1, 48) (1363,)

So we have 730 days of 48 observations and results in our training set, and 1363 days in the testing set. Now we need to create our model. This is based on Dr Brownlee’s Multivariate Time Series Forecasting with LSTMs in Keras model.

We will define the LSTM with 50 neurons in the first hidden layer and 1 neuron in the output layer for predicting pollution. The input shape will be 1 time step with 48 features.

We will use the Mean Absolute Error (MAE) loss function and the efficient Adam version of stochastic gradient descent.

The model will be fit for 50 training epochs with a batch size of 91.

Finally, we keep track of both the training and test loss during training by setting the validation_data argument in the fit() function. At the end of the run both the training and test loss are plotted.

# design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
# fit network
history = model.fit(train_X, train_y, epochs=50, batch_size=91, validation_data=(test_X, test_y), verbose=2,
                    shuffle=False)

Or in fancy machine learning speak.

Running the model we get the following loss chart

The total RMSE of this model is 5.45 which is around a half of a cent. Not too bad for a basic and not highly tuned model. To show the prediction we basically unscale the results of the model (remember the model predicts 0,1)

# make a prediction
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
# invert scaling for forecast
inv_yhat = np.concatenate((test_X[:, 0:], yhat), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:, -1]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = np.concatenate((test_X[:, 0:], test_y), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:, -1]
# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))

As we can see, it does a very good job:

If we plot them versus each other we see very little bias, and some room for improvement (especially in the lower left)

As I mentioned, this is a basic model, there are lots of ways to make it better, but not bad for a start.

Dr. Virendra Dafane PhD 8y

This prediction model will have huge demand considering that -many oil rich companies and even rich Arab's want a reliable algorithm to predict price of all petroleum variants including retail gasoline !!

Using Machine Learning to Predict Retail Gasoline Prices

Jacob Bourne

Step 1 — Prepare the machine

Step 2- Prepare the Data

More articles by Jacob Bourne

Others also viewed

Gradient Flip Score: Illuminating Model Sensitivity with a Simple Metric

Using Stochastic Gradient Descent to Calculate IRT 1PL Parameters and Ability with TensorFlow on Simulated Data

Using simple machine learning approaches to combine trading alphas

One Minute Overview of the Isomap Embedding

Fire and Smoke detection using PyTorch (with complete code)

Hyperparameter Tuning with GridSearchCV: A Hands-On Example with SVM and the Iris Dataset

Getting started with Responsible Machine Learning - Recap

Comparison on Random Forest and Logistic Regression Algorithms

King County House Sales Prediction with XGBoost and Linear Regression

Explore content categories

Step 1 — Prepare the machine

Step 2- Prepare the Data

More articles by Jacob Bourne

Visual Agent Builder

Amazon Should Know Better

Beware of Geeks Bearing Gifts

Reflections on Trusting LLMs

Beating Diabetes with Math — An Optimization Playbook (with a Small ML Demo)

SOAR: Embracing Chaos in AI Memory and Context Retrieval

Why AI Worries Me

The Evolution of PR Reviews: From Pain to AI-Powered Precision

So I Decided to Make a Brain

GitAI or How I Stopped Writing Commit Messages

Others also viewed

Gradient Flip Score: Illuminating Model Sensitivity with a Simple Metric

Using Stochastic Gradient Descent to Calculate IRT 1PL Parameters and Ability with TensorFlow on Simulated Data

Using simple machine learning approaches to combine trading alphas

One Minute Overview of the Isomap Embedding

Fire and Smoke detection using PyTorch (with complete code)

Hyperparameter Tuning with GridSearchCV: A Hands-On Example with SVM and the Iris Dataset

Getting started with Responsible Machine Learning - Recap

Comparison on Random Forest and Logistic Regression Algorithms

King County House Sales Prediction with XGBoost and Linear Regression

Similar topics

Machine Learning for Ecommerce Forecasting

Explore content categories