Using an MLP to Predict CPI

Using an MLP to Predict CPI

Deep learning is a popular tool in the data scientist's toolkit. While I generally prefer gradient boosted trees, knowing how to use multi-layer perceptrons models is certainly useful. The following link at Machine Learning Mastery goes into this topic in more detail. Consider this applying real world data to the methodology at that site.

The first step is getting the CPI data. This should be easy but its not, the BLS insists on putting semi annual and annual numbers in the raw time series. FRED is better, but its difficult to find the right FRED codes for some of the more obscure series. Luckily, Python is here and its scripting abilities are fantastic.

#first we need to get the data from the BLS website to create our dataframe
#we are going to get core and some level 2 series for another project


tickers = {'CUSR0000SAF11':'Food at Home',
            'CUSR0000SEFV':'Food away from Home',
            'CUSR0000SACE':'Energy Commodities',
            'CUSR0000SEHF':'Energy Services',
            'CUSR0000SACL1E':'Core Goods',
            'CUSR0000SASLE':'Core Services',
            'CUUR0000SA0L1E':'Core'}


#done to get rid of the semi annual and annual points
VALID_PERIODS = ['M01', 'M02', 'M03', 'M04', 'M05', 'M06', 'M07', 'M08', 'M09', 'M10', 'M11', 'M12']


#get the data from the BLS website
URL = 'https://download.bls.gov/pub/time.series/cu/cu.data.0.Current'
content = content = urllib.request.urlopen(URL)


# create holder lists
series = []
periods = []
values = []


i = 0
#ignore the first line
first_line = True
for line in content:
    
    if (i%25000) == 0:
        print('{} rows processed'.format(i))
        
    i+=1
    
    if first_line:
        first_line = False
    else:
        #the BLS tab separates
        tokens = line.split(b'\t')
        #series id is the first one
        id = tokens[0].decode("utf-8").strip()
        #now the date stuff
        if id in tickers.keys():
            year = tokens[1].decode("utf-8").strip()
            period = tokens[2].decode("utf-8").strip()
            value = float(tokens[3].decode("utf-8").strip())
            
            #this is to get rid of the semi annual and annual periods
            if period in VALID_PERIODS:
                period = int(period[1:])
                #create a pandas period
                period = pd.Period('{}-{}'.format(period, year))
                series.append(id)
                periods.append(period)
                values.append(value)
                
                
data = pd.DataFrame({'SERIES': series,
                     'PERIOD': periods,
                     'VALUE': values})  
data ​= data.pivot('PERIOD','SERIES','VALUE')
#just to make things pretty
data.columns = [tickers[x] for x in data.columns]
data.columns = [tickers[x] for x in data.columns]

When run it will process the raw CPI data and create a nice pandas data frame with the period as the index, and the different indices as columns. There are a little more than 125,000 rows to process so give it a little time to run through all the data.

The next step is to transform the data into the X and y data sets for the model to use. The key here is we are going to use the prior 36 months of data to predict the next twelve. Machine learning mastery had the basic script to do this in a programmatic way

def split_sequence(sequence, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequence)):
        # find the end of this pattern
        end_ix = i + n_steps_in
        out_end_ix = end_ix + n_steps_out
        # check if we are beyond the sequence
        if out_end_ix > len(sequence):
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return np.array(X), np.array(y)

Now because I am using real data, we have to split it into some sort of train/test split, so I have the data prior to 2015 as train and since 2015 as test.

raw_seq = data.Core[:'2014'].values
test_seq = data.Core['2014':].values
# choose a number of time steps
n_steps_in, n_steps_out = 36, 12
# split into samples
X, y = split_sequence(raw_seq, n_steps_in, n_steps_out)
test_X, test_y = split_sequence(test_seq,  n_steps_in, n_steps_out)

print(test_X[-1], test_y[-1])

[239.413 239.248 238.775 239.248 240.083 241.067 241.802 242.119 242.354
 242.436 242.651 243.359 243.985 244.075 243.779 244.528 245.68  246.358
 246.992 247.544 247.794 247.744 248.278 248.731 249.218 249.227 249.134
 250.083 251.143 251.29  251.642 251.835 252.014 251.936 252.46  252.941] [253.638 253.492 253.558 254.638 255.783 256.61  257.025 257.469 257.697
 257.867 258.012 258.429]

So now we have as our input a series of 36 months longs core CPI index values to try and predict the next 12. Using keras as the front end to tensorflow it is easy to create our model.

model = Sequential()
model.add(Dense(1024, activation='relu', input_dim=n_steps_in))
model.add(Dense(n_steps_out))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=2000, verbose=0)

So how did the model do, well here is the most recent twelve months, predicted and actual. Remember it made these predictions with the data up to September 2017

Not too bad. This is a trivially simple model to do this with, going forward we will use multiple series as the input, not just prior core values. However, it looks like mlp models are useful for modeling CPI

To view or add a comment, sign in

More articles by Jacob Bourne

  • Visual Agent Builder

    One of the benefits of being a curmudgeon is seeing through hype—especially if you understand the math underneath it…

    1 Comment
  • Amazon Should Know Better

    Amazon rolled out Chronos‑2 with the kind of enthusiasm usually reserved for real breakthroughs, which makes the whole…

    1 Comment
  • Beware of Geeks Bearing Gifts

    GitHub Repo This didn’t start as an effort to help my old inflation-trading friends, but somehow it ended up as one…

    3 Comments
  • Reflections on Trusting LLMs

    Author’s Note: This piece was inspired by this YouTube video: The Original Sin of Computing..

    1 Comment
  • Beating Diabetes with Math — An Optimization Playbook (with a Small ML Demo)

    Scarcity, incentives, and constraints aren’t just the backbone of economics; they’re the backbone of self-discipline…

    3 Comments
  • SOAR: Embracing Chaos in AI Memory and Context Retrieval

    AI context is a subject I find endlessly fascinating, and this piece marks my second attempt to push the boundaries of…

    3 Comments
  • Why AI Worries Me

    Since the machines are coming for all our jobs, I wanted to give one last warning about them. Unlike most of the…

    4 Comments
  • The Evolution of PR Reviews: From Pain to AI-Powered Precision

    I hate PR reviews. I hate giving them—does Shakespeare grade elementary school essays? I hate receiving them—did Moses…

  • So I Decided to Make a Brain

    I recently embarked on an ambitious project: building a brain. With the current buzz around Large Language Models…

    4 Comments
  • GitAI or How I Stopped Writing Commit Messages

    This is a little project combining two hot things right now: ChatGPT and Rust. I've been meaning to learn Rust for some…

    2 Comments

Others also viewed

Explore content categories