Time Series Forecasting On Data

Lakitha H.

Published Sep 24, 2019

What is a Time series?

Time series is a series of data points listed in a timely order which is taken at successive equally spaced points in time. Time series can be either stationary or non-stationary but widely used for non-stationary data, like stock price,weather, retail sales and many more.

What is Time Series Forecasting?

It's the use of a model to predict the future value based on previously observed values.This is one of the important techniques that can be used to see a glimpse of the future.

Let's Get Started

Apple Inc close stock prices from 2015-2019

In this article we will be discussing about the stages you need to follow to implement and validate a time series forecasting.For this example we will be trying to predict future values of Apple Inc close stock prices.

Finding Suitable Time Series Forecasting Method

This is mostly depend on the characteristics of the data-set you are using.Factors effecting,

Type of data, univalent or multivariate(single value or multiple value)
Trend behavior of the data(moving up or moving down)
Seasonality of the data.(patterns with trends)

In order to successfully apply your model,you need to make the data stationary if its not already(Mean and variance should be equal at every stage). But some models already provide the differencing function inbuilt (Ex-ARIMA model, which will be discussed below).

Further more Stationarity can be checked using using a test called Dickey Fuller.

Time Series Forecasting Methods

p: Trend autoregression order.
d: Trend difference order.
q: Trend moving average order.
P: Seasonal autoregressive order.
D: Seasonal difference order.
Q: Seasonal moving average order.
m: The number of time steps for a single seasonal period.

Importantly, when selecting parameters, the m parameter influences the P, D, and Q parameters. For example, an m of 12 for monthly data suggests a yearly seasonal cycle.

Autoregression (AR(p))-The autoregression (AR) method models the next step in the sequence as a linear function of the observations at prior time steps. Suitable for for univariate time series without trend and seasonal components
Moving Average (MA(q))-The moving average (MA) method models the next step in the sequence as a linear function of the residual errors from a mean process at prior time steps.Suitable for uni-variate time series without trend and seasonal components
Autoregressive Moving Average (ARMA(p,q))-The Autoregressive Moving Average (ARMA) method models the next step in the sequence as a linear function of the observations and resiudal errors at prior time steps.This formed by combining both AR and MA models described above.Suitable for uni-variate time series without trend and seasonal components.
Autoregressive Integrated Moving Average (ARIMA(p,d,q))-The Autoregressive Integrated Moving Average (ARIMA) method models the next step in the sequence as a linear function of the difference observations and residual errors at prior time steps.It combine AR and MA both with a differencing pre-processing step called integration(I) to make the data stationary if they are not.suitable for uni-variate time series with trend and without seasonal components.
Seasonal Autoregressive Integrated Moving-Average (SARIMA(p,d,q)(P,D,Q))-This method models the next step in the sequence as a linear function of the differences observations, errors, differences seasonal observations, and seasonal errors at prior time steps.Suitable for uni-variate time series with trend and/or seasonal components.

Since now you have a basic understanding about the Time Series Forecasting and Analyzing methods.Lets predict the future stock prices of Apple Inc(APPL) .

Steps to create your Time Series Forecasting.

Apple stock data(APPL.csv) we will be using can be downloaded from below link.Data will be ranging from 2014-2019.

Pre-Processing/Model Selection

Since you have the data downloaded. if you open it you will be able to see a data structure as shown in the diagram.Now you need to rearrange data and plot Close/Date.Below code accomplish these tasks.

Code:

In this code we will be making a new data frame with only Date and the Close price of stocks in weekly basis.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm

df = pd.read_csv("AAPL.csv", parse_dates=["Date"], index_col="Date")
print(df)

data_ = df.groupby('Date')['Close'].sum().reset_index()
data_ = data_.set_index('Date')
sampled_data = data_['Close'].resample('w').mean()


plt.plot(sampled_data)
plt.title('APPLE Inc Stock close Price Prediction')
plt.xlabel('Date')
plt.ylabel('AAPL -Apple Stock prices (Close)')
plt.legend()
plt.show()

Output:

By looking a the plot ,Best time forecasting method for the stock close prices will be the SARIMA method since its non-stationary and consist trends with seasonal components.

Python library that we will be using for time series forecasting will be the statmodels library.Please follow the below link for several time series examples.

Using above python module you have a option to decompose your data to several sections to obtain a visual idea of the structure.Run the below code to obtain the decomposition plot.

code:

decomposition = sm.tsa.seasonal_decompose(sampled_data, model='additive')
fig = decomposition.plot()
plt.show()

Output:

Y axis explanation:

Observed-Normal stock price plot

Trend-It shoes the stock data responsible for the trend and its moving direction.

Seasonal-Its the repeating patterns in the data with trend.

Residual-Rest of the remaining data after removing trend and seasonal components.

After observing closely, we can conclude this data-set consist trend and seasonal components.

Now you need to find best parameters for your model to deliver optimal results.Since we choose our model for this data-set to be SARIMA(p,d,q)(P,D,Q) we need to find the best order value for SARIMA (p,d,q) and its Seasonal order values (P,D,Q)

Parameter Selection For The Chosen Model

There are many way to accomplish this,But in this article we will be focusing on using "Grid-search" to find the optimal parameters that yields the best performance of this model.

What is Grid-search method?

it builds a model for every combination of hyper-parameters specified and evaluates each model to find the parameters which provide best performance.

Below code represent a simple Grid-search algorithm which we can easily implement.

Code(part A):

Generating the combinations of order values that you need to create different models.

import itertools #import this at the begining of the code

p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
print(seasonal_pdq)

print(pdq)

Output:

[(0, 0, 0, 12), (0, 0, 1, 12), (0, 1, 0, 12), (0, 1, 1, 12), (1, 0, 0, 12), (1, 0, 1, 12), (1, 1, 0, 12), (1, 1, 1, 12)]

[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]

Code(part B):

Evaluate each model according to the values generated above.SARMIA model can be easy imported from the "stat-model" python module which we will be using below.This section will take a while to run depending on your range of parameters and your PC hardware acceleration.

Evaluation of the model-Relative quality of statistical models for a given set of data is calculated using a estimator called AIC(Akaike information criterion) . Lower AIC mean better model.

From the below code for each model, AIC value calculated and recorded in a list array.Finally,where we chose the lowest AIC model as our best model.

parameters=[]
aic_array=[]

for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(sampled_data, order=param, seasonal_order=param_seasonal,enforce_stationarity=False,enforce_invertibility=False)
            results = mod.fit(disp=0)
            print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
            parameters.append(str(param)+str(param_seasonal))
            aic_array.append(results.aic)
        except:
            continue


print(parameters)
print(aic_array)

best_parameters=parameters[aic_array.index(min(aic_array))]
print("Best parameters with lowest aic -"+best_parameters)

For this data-set best parameters obtained at lowest AIC is SARIMA(0, 1, 1)x(0, 1,1, 12).

Fitting The Model And Generating The Stat-model Diagnostic Plots

In this step we create a model with the above obtained parameters and fit the model with our stock price data(sampled_data). As a practice we should always run model diagnostics to investigate any unusual behavior.

Code: Model creation/Data fitting/Diagnostic plot

mod = sm.tsa.statespace.SARIMAX(sampled_data,
                                order=(0, 1, 1),
                                seasonal_order=(0, 1, 1, 12),
                                enforce_stationarity=False,
                                enforce_invertibility=False)
results = mod.fit(disp=0)
print(results)
print(results.summary().tables[1])
results.plot_diagnostics(figsize=(16, 8))

plt.show()

Output:

Diagnostic plot

By looking at the plot Its not perfectly fitted however, Normal Q-Q plot suggest that residuals are nearly normally distributed.(Better normally distributed residuals if many points lie on the line in Q-Q plot)

Validating The Forecasts

One step ahead forecast

From below code you can obtain the one step ahead forecast prediction from 2017-12-31 to 2019-09-22.

pred = results.get_prediction(start='2017-12-31', end='2019-09-22', dynamic=False)
pred_ci = pred.conf_int()
print(pred_ci)

ax = sampled_data['2014':].plot(label='Actual Stock Close Plot')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.9, figsize=(14, 7))
ax.fill_between(pred_ci.index,
                pred_ci.iloc[:, 0],
                pred_ci.iloc[:, 1], color='k', alpha=.2)

plt.title('APPLE Inc Stock close Price Prediction')
ax.set_xlabel('Date')
ax.set_ylabel('AAPL -Apple Stock prices (Close)')
plt.legend()
plt.show()

mse = ((pred.predicted_mean - sampled_data['2014':]) ** 2).mean()
print('The Mean Squared Error is {}'.format(round(mse, 2)))
print('The Root Mean Squared Error is {}'.format(round(np.sqrt(mse), 2)))

Output:

The Mean Squared Error is 26.39

The Root Mean Squared Error is 5.14

Mean Squared Error-The average squared difference between the estimated values and the actual value.

Root Mean Squared Error-Is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed.

If you carefully observe the above plot you will be able to notice that one step ahead forecast prediction almost superimpose with actual stock data. Further more ,Mean square error and Root mean square error results shows low values.Grey color band in the plot above shows the confidence range of each prediction.Narrow the band better confidence in the prediction.In my opinion with these results ,this is a well fitted model.

Few steps ahead forecast

In this section we will try to predict few steps(20 steps meaning 20 weeks) ahead not just one step.

Code:

data_for_forcast = sampled_data[0:int(len(sampled_data) * 0.9)]
mod = sm.tsa.statespace.SARIMAX(data_for_forcast,
                                order=(0, 1, 1),
                                seasonal_order=(0, 1, 1, 12),
                                enforce_stationarity=False,
                                enforce_invertibility=False)
results = mod.fit(disp=0)

forcast_pred = results.get_forecast(steps=20)
print(forcast_pred)
pred_ci = forcast_pred.conf_int()
print(pred_ci)


ax = data_for_forcast.plot(label='Actual Stock Close Plot')
forcast_pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.9, figsize=(14, 7))
ax.fill_between(pred_ci.index,
                pred_ci.iloc[:, 0],
                pred_ci.iloc[:, 1], color='k', alpha=.2)

plt.title('APPLE Inc Stock close Price Prediction')
ax.set_xlabel('Date')
ax.set_ylabel('AAPL -Apple Stock prices (Close)')
plt.legend()
plt.show()

Output:

If you look closely at the prediction accuracy confidence getting reduced from start to end(20 steps). So we can conclude that for this data-set trying to predict many steps ahead from past values doesn't delivery accurate results.

Finally,There will be a another article coming soon on explaning more about the Time Series methods discussed above from mathematical point of view.

Lasni Hettiarachchi 6y

🙌

To view or add a comment, sign in

Time Series Forecasting On Data

Lakitha H.

More articles by Lakitha H.

Others also viewed

Action Analysis Will Replace Data Analysis

Exploratory Data Analysis - Retail

Choosing the Right Time Series Forecasting Tool: ARIMA vs. Holt-Winters

What a forecast actually is, and what it isn't

A Friendly Introduction to Features in Time Series Data

When More Data Isn’t the Answer: How to Turn Reporting Overload Into Clarity

Are you excited about the insights you are getting from your data?

Because ... Data

A Practical guide to time series forecasting

How Comparing Multiple Charts Helps Validate Insights

Explore content categories

More articles by Lakitha H.

Introduction to Market Basket Analysis

Facebook Prophet For Time Series Forecasting

Others also viewed

Action Analysis Will Replace Data Analysis

Exploratory Data Analysis - Retail

Choosing the Right Time Series Forecasting Tool: ARIMA vs. Holt-Winters

What a forecast actually is, and what it isn't

A Friendly Introduction to Features in Time Series Data

When More Data Isn’t the Answer: How to Turn Reporting Overload Into Clarity

Are you excited about the insights you are getting from your data?

Because ... Data

A Practical guide to time series forecasting

How Comparing Multiple Charts Helps Validate Insights

Similar topics

Time Series Analysis in Finance

Time Series Forecasting Models

How to Validate Financial Forecasting Models

Explore content categories