Forecasting Demands of Multivariate TimeSeries Dataset using LSTM, TF2.0

Forecasting Demands of Multivariate TimeSeries Dataset using LSTM, TF2.0

Hi all, this blog is going to be special as many of us tried working upon Univariate Timeseries data sets but when it comes to multivariate we fumble across different platforms to get leads. But this one is going to be the one shop stop to learn and implement Multivariate Timeseries Forecasting using LSTM, TF2.0. Runnable code and references added bel

1.Dataset Used : https://www.kaggle.com/hmavrodiev/london-bike-sharing-dataset

No alt text provided for this image

2.Data Description : The dataset has 10 columns including the index which is a datetime stamp value.

No alt text provided for this image
No alt text provided for this image
timestamp — index as datetime value
cnt — new bike shares count
t1 — actual temp in Celcius
t2 — temp in Celcius which feels like
hum — % of humidity
wind_speed — speed of wind in km/hr
weather_code — weather category
is_holiday — holiday-1, non-holiday-2 
is_weekend — weekend-1, weekday-0
season — 0-spring, 1-summer, 2-fall, 3-winter        


3.Feature Engineering to make new features out of index given as datetime value.

No alt text provided for this image

I'll be adding few more features engg. variable here soon....

4.Data Analysis.

No alt text provided for this image

Here we can observe a seasonality in dataset when grouped by day, lets see more when grouped on month basis.

No alt text provided for this image

This is also correlated to what we saw previously. So we can tend to believe that the features created using datetime index will be highly correlated to the target value.

I have added few more analysis in my notebook where certain insights are there like summer months are great in demands as well as on weekdays the morning and post after noon time has higher demands.

5. Preprocessing and train-test data preparation.

No alt text provided for this image

Splitting train and test by 80:20 ratio.

No alt text provided for this image

Scaling features using statistics that are robust to outliers that the job of RobustScaler. This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).

No alt text provided for this image

This method will create the sequences for our model on train and test dataframe.

No alt text provided for this image

6. Modelling using Bidirectional LSTM.

No alt text provided for this image

Lets understand it line by line. So the first layer which is also an input layer consist the shape of input which is our Xtrain and LSTM units 128. Then to have another layer of LSTM we kept return_sequence as true. Then another layer of LSTM with 64 units again. Then added a Dropout layer to prevent it from over-fitting. And finally a Dense layer which is our output.

No alt text provided for this image

Training it for 30 epochs with a batch size of 32.

7. Lets Predict it on y_test and calculate its RMSE score.

No alt text provided for this image
No alt text provided for this image


8. Some key takeaways

a. I have done a minimal feature engineering which can be enhanced and more new features can be made using features like wind speed, humidity, temperature etc. Might be some mathematical formulas work here if they exists to build up new features.

b. On the modelling part some other experiments can be done on playing with hyper-params and addition of more layers(addition of layers doesn't guarantee better performing model), changing the dropout values etc.

9. References :

Code on Kaggle - https://www.kaggle.com/halfbloodprince16/predictingdemand-v1
https://paperswithcode.com/method/bilstm
https://towardsdatascience.com/lstm-and-bidirectional-lstm-for-regression-4fddf910c655
https://stackoverflow.com/questions/40331510/how-to-stack-multiple-lstm-in-keras
https://towardsdatascience.com/demand-prediction-with-lstms-using-tensorflow-2-and-keras-in-python-1d1076fc89a0
         

thank you for your useful post, how can map multivariate time series data containing multiple observation(simply many customers item consumption values in the same time point) input train data and target data) in LSTM model Training)?

Like
Reply

To view or add a comment, sign in

More articles by Vighnesh Tiwari

  • Anomalies Detection in Unsupervised Approach using Isolation Forest

    I. What are Anomalies ? As per Wikipedia, In the natural sciences, especially in atmospheric and Earth sciences…

  • Analysis of Covid19 Cases in India for National Citizens and Foreign Citizens.

    Corona Virus has become a Novel cause and got converted from epidemic to pandemic, and its lot more there to still…

  • Market Basket Analysis

    Market Basket Analysis is also called as Affinity Analysis where data analysis and data mining technique that discovers…

    1 Comment
  • Customer Churn

    Can we predict if bank customers will turnover next cycle? Hmmmm..

  • Fortinet LogIn BoT

    AIT Pune Campus provide it's students WiFi facility using Fortinet web security and each time we login to our devices…

  • User Activity Categorization

    Objectives :- Categorize what a student or a child does while working on system in lab or using institutes wifi. Is…

    1 Comment

Others also viewed

Explore content categories