Using an MLP to Predict CPI

Jacob Bourne

Published Nov 10, 2018

Deep learning is a popular tool in the data scientist's toolkit. While I generally prefer gradient boosted trees, knowing how to use multi-layer perceptrons models is certainly useful. The following link at Machine Learning Mastery goes into this topic in more detail. Consider this applying real world data to the methodology at that site.

The first step is getting the CPI data. This should be easy but its not, the BLS insists on putting semi annual and annual numbers in the raw time series. FRED is better, but its difficult to find the right FRED codes for some of the more obscure series. Luckily, Python is here and its scripting abilities are fantastic.

#first we need to get the data from the BLS website to create our dataframe
#we are going to get core and some level 2 series for another project


tickers = {'CUSR0000SAF11':'Food at Home',
            'CUSR0000SEFV':'Food away from Home',
            'CUSR0000SACE':'Energy Commodities',
            'CUSR0000SEHF':'Energy Services',
            'CUSR0000SACL1E':'Core Goods',
            'CUSR0000SASLE':'Core Services',
            'CUUR0000SA0L1E':'Core'}


#done to get rid of the semi annual and annual points
VALID_PERIODS = ['M01', 'M02', 'M03', 'M04', 'M05', 'M06', 'M07', 'M08', 'M09', 'M10', 'M11', 'M12']


#get the data from the BLS website
URL = 'https://download.bls.gov/pub/time.series/cu/cu.data.0.Current'
content = content = urllib.request.urlopen(URL)


# create holder lists
series = []
periods = []
values = []


i = 0
#ignore the first line
first_line = True
for line in content:
    
    if (i%25000) == 0:
        print('{} rows processed'.format(i))
        
    i+=1
    
    if first_line:
        first_line = False
    else:
        #the BLS tab separates
        tokens = line.split(b'\t')
        #series id is the first one
        id = tokens[0].decode("utf-8").strip()
        #now the date stuff
        if id in tickers.keys():
            year = tokens[1].decode("utf-8").strip()
            period = tokens[2].decode("utf-8").strip()
            value = float(tokens[3].decode("utf-8").strip())
            
            #this is to get rid of the semi annual and annual periods
            if period in VALID_PERIODS:
                period = int(period[1:])
                #create a pandas period
                period = pd.Period('{}-{}'.format(period, year))
                series.append(id)
                periods.append(period)
                values.append(value)
                
                
data = pd.DataFrame({'SERIES': series,
                     'PERIOD': periods,
                     'VALUE': values})  
data = data.pivot('PERIOD','SERIES','VALUE')
#just to make things pretty
data.columns = [tickers[x] for x in data.columns]

data.columns = [tickers[x] for x in data.columns]

When run it will process the raw CPI data and create a nice pandas data frame with the period as the index, and the different indices as columns. There are a little more than 125,000 rows to process so give it a little time to run through all the data.

The next step is to transform the data into the X and y data sets for the model to use. The key here is we are going to use the prior 36 months of data to predict the next twelve. Machine learning mastery had the basic script to do this in a programmatic way

def split_sequence(sequence, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequence)):
        # find the end of this pattern
        end_ix = i + n_steps_in
        out_end_ix = end_ix + n_steps_out
        # check if we are beyond the sequence
        if out_end_ix > len(sequence):
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
        X.append(seq_x)
        y.append(seq_y)

    return np.array(X), np.array(y)

Now because I am using real data, we have to split it into some sort of train/test split, so I have the data prior to 2015 as train and since 2015 as test.

raw_seq = data.Core[:'2014'].values
test_seq = data.Core['2014':].values
# choose a number of time steps
n_steps_in, n_steps_out = 36, 12
# split into samples
X, y = split_sequence(raw_seq, n_steps_in, n_steps_out)

test_X, test_y = split_sequence(test_seq,  n_steps_in, n_steps_out)

print(test_X[-1], test_y[-1])

[239.413 239.248 238.775 239.248 240.083 241.067 241.802 242.119 242.354
 242.436 242.651 243.359 243.985 244.075 243.779 244.528 245.68  246.358
 246.992 247.544 247.794 247.744 248.278 248.731 249.218 249.227 249.134
 250.083 251.143 251.29  251.642 251.835 252.014 251.936 252.46  252.941] [253.638 253.492 253.558 254.638 255.783 256.61  257.025 257.469 257.697

 257.867 258.012 258.429]

So now we have as our input a series of 36 months longs core CPI index values to try and predict the next 12. Using keras as the front end to tensorflow it is easy to create our model.

model = Sequential()
model.add(Dense(1024, activation='relu', input_dim=n_steps_in))
model.add(Dense(n_steps_out))
model.compile(optimizer='adam', loss='mse')
# fit model

model.fit(X, y, epochs=2000, verbose=0)

So how did the model do, well here is the most recent twelve months, predicted and actual. Remember it made these predictions with the data up to September 2017

Not too bad. This is a trivially simple model to do this with, going forward we will use multiple series as the input, not just prior core values. However, it looks like mlp models are useful for modeling CPI

To view or add a comment, sign in

Using an MLP to Predict CPI

Jacob Bourne

More articles by Jacob Bourne

Others also viewed

What is Machine Learning? Part 3 - Polynomial Regression

The Data Scientist’s Guide to Scaling: Standard, MinMax & Robust Methods

The Unsung Hero of AI: Building the Data Pipeline for QiandaoEar22

What is KNN (K-Nearest Neighbors)? A Beginner-Friendly Guide to a Classic ML Algorithm

Cracking the Code: Naïve Bayes Made Simple for Smarter Machine Learning

Data Science: Q&A

Comparison on Random Forest and Logistic Regression Algorithms

King County House Sales Prediction with XGBoost and Linear Regression

Being more careful in statistics and machine learning - part 1

Time to bring Data Science to the frontline

Explore content categories

More articles by Jacob Bourne

Visual Agent Builder

Amazon Should Know Better

Beware of Geeks Bearing Gifts

Reflections on Trusting LLMs

Beating Diabetes with Math — An Optimization Playbook (with a Small ML Demo)

SOAR: Embracing Chaos in AI Memory and Context Retrieval

Why AI Worries Me

The Evolution of PR Reviews: From Pain to AI-Powered Precision

So I Decided to Make a Brain

GitAI or How I Stopped Writing Commit Messages