Forecasting Demands of Multivariate TimeSeries Dataset using LSTM, TF2.0

Vighnesh Tiwari

Published Jul 7, 2021

Hi all, this blog is going to be special as many of us tried working upon Univariate Timeseries data sets but when it comes to multivariate we fumble across different platforms to get leads. But this one is going to be the one shop stop to learn and implement Multivariate Timeseries Forecasting using LSTM, TF2.0. Runnable code and references added bel

1.Dataset Used : https://www.kaggle.com/hmavrodiev/london-bike-sharing-dataset

2.Data Description : The dataset has 10 columns including the index which is a datetime stamp value.

timestamp — index as datetime value
cnt — new bike shares count
t1 — actual temp in Celcius
t2 — temp in Celcius which feels like
hum — % of humidity
wind_speed — speed of wind in km/hr
weather_code — weather category
is_holiday — holiday-1, non-holiday-2 
is_weekend — weekend-1, weekday-0
season — 0-spring, 1-summer, 2-fall, 3-winter

3.Feature Engineering to make new features out of index given as datetime value.

I'll be adding few more features engg. variable here soon....

4.Data Analysis.

Here we can observe a seasonality in dataset when grouped by day, lets see more when grouped on month basis.

This is also correlated to what we saw previously. So we can tend to believe that the features created using datetime index will be highly correlated to the target value.

I have added few more analysis in my notebook where certain insights are there like summer months are great in demands as well as on weekdays the morning and post after noon time has higher demands.

5. Preprocessing and train-test data preparation.

Splitting train and test by 80:20 ratio.

Recommended by LinkedIn

Data Science Simplified Part 9: Interactions and…

Pradeep Menon 8 years ago

Predicting House Prices: A Multivariate Linear…

PrasannaKumar Bhursu 2 years ago

Data Pre-Processing for Real Estate House Price…

Rivindu Ashinsa 1 year ago

Scaling features using statistics that are robust to outliers that the job of RobustScaler. This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).

This method will create the sequences for our model on train and test dataframe.

6. Modelling using Bidirectional LSTM.

Lets understand it line by line. So the first layer which is also an input layer consist the shape of input which is our Xtrain and LSTM units 128. Then to have another layer of LSTM we kept return_sequence as true. Then another layer of LSTM with 64 units again. Then added a Dropout layer to prevent it from over-fitting. And finally a Dense layer which is our output.

Training it for 30 epochs with a batch size of 32.

7. Lets Predict it on y_test and calculate its RMSE score.

8. Some key takeaways

a. I have done a minimal feature engineering which can be enhanced and more new features can be made using features like wind speed, humidity, temperature etc. Might be some mathematical formulas work here if they exists to build up new features.

b. On the modelling part some other experiments can be done on playing with hyper-params and addition of more layers(addition of layers doesn't guarantee better performing model), changing the dropout values etc.

9. References :

Code on Kaggle - https://www.kaggle.com/halfbloodprince16/predictingdemand-v1
https://paperswithcode.com/method/bilstm
https://towardsdatascience.com/lstm-and-bidirectional-lstm-for-regression-4fddf910c655
https://stackoverflow.com/questions/40331510/how-to-stack-multiple-lstm-in-keras
https://towardsdatascience.com/demand-prediction-with-lstms-using-tensorflow-2-and-keras-in-python-1d1076fc89a0

Ejigu Tefera (PhD) 4y

thank you for your useful post, how can map multivariate time series data containing multiple observation(simply many customers item consumption values in the same time point) input train data and target data) in LSTM model Training)?

To view or add a comment, sign in

Forecasting Demands of Multivariate TimeSeries Dataset using LSTM, TF2.0

Vighnesh Tiwari

Recommended by LinkedIn

More articles by Vighnesh Tiwari

Others also viewed

Unveiling the Hidden Patterns: Mastering the Art of Non-Linear Data in Data Science.

Rejection sampling

Building A Simple Linear Regression Model.

Polynomial Regression: Drawing curves, not lines, to connect the dots in our data.

The Relaxed Lasso: A Better Way to Fit Linear and Logistic Models

Time Series Analysis: Nile Dataset

Do numbers really speak for themselves with big data?

10 Things to remember for Data Scientists who are starting Experiment Design

PROBABILISTIC DATA STRUCTURE

Imputing Null Values with Regression

Explore content categories

Recommended by LinkedIn

More articles by Vighnesh Tiwari

Anomalies Detection in Unsupervised Approach using Isolation Forest

Analysis of Covid19 Cases in India for National Citizens and Foreign Citizens.

Market Basket Analysis

Customer Churn

Fortinet LogIn BoT

User Activity Categorization

Others also viewed

Unveiling the Hidden Patterns: Mastering the Art of Non-Linear Data in Data Science.

Rejection sampling

Building A Simple Linear Regression Model.

Polynomial Regression: Drawing curves, not lines, to connect the dots in our data.

The Relaxed Lasso: A Better Way to Fit Linear and Logistic Models

Time Series Analysis: Nile Dataset

Do numbers really speak for themselves with big data?

10 Things to remember for Data Scientists who are starting Experiment Design

PROBABILISTIC DATA STRUCTURE

Imputing Null Values with Regression

Explore content categories