dbnlearn: An R package for Dynamic Bayesian Network Structure Learning, Parameter Learning and Forecasting

dbnlearn: An R package for Dynamic Bayesian Network Structure Learning, Parameter Learning and Forecasting

Introduction

In this article I will present the dbnlearn, my second package in R (it was published in CRAN on 2020-07-30). It allows to learn the structure of univariate time series, learning parameters and forecasting. Implements a model of Dynamic Bayesian Networks with temporal windows, with collections of linear regressors for Gaussian nodes, based on the introductory texts of Korb and Nicholson (2010) <doi:10.1201/b10391> and Nagarajan, Scutari and Lèbre (2013) <doi:10.1007/978-1-4614-6446-4>.

Dynamic Bayesian Networks

Dynamic Bayesian Network (DBN) is a Bayesian Network (BN) which relates variables to each other over adjacent time steps. This is often called a Two-Timeslice BN (2TBN) because it says that at any point in time T, the value of a variable can be calculated from the internal regressors and the immediate prior value (time T-1). DBNs were developed by Paul Dagum in the early 1990s at Stanford University's Section on Medical Informatics. Paul Dagum; Adam Galper; Eric Horvitz (1992)"Dynamic Network Models for Forecasting."

Dagum developed DBNs to unify and extend traditional linear state-space models such as Kalman filters, linear and normal forecasting models such as ARMA and simple dependency models such as hidden Markov models into a general probabilistic representation and inference mechanism for arbitrary nonlinear and non-normal time-dependent domains. Paul Dagum; Adam Galper; Eric Horvitz (June 1991). "Temporal Probabilistic Reasoning: Dynamic Network Models for Forecasting."

Example of Dynamic Bayesian Network in the article, "A Review on Video-Based Human Activity Recognition"

Não foi fornecido texto alternativo para esta imagem

Another example of Dynamic Bayesian Network in the article, "Simultaneous Partitioned Sampling for Articulated Object Tracking"

Não foi fornecido texto alternativo para esta imagem


Example applied to Amazon stock prediction

Data Reading

In this code snippet you get the opening prices of Amazon's stock in the period from 2015 to 2020. The data are obtained by the Yahoo Finance API.

library(quantmod)
library(dbnlearn)
library(bnviewer)
library(ggplot2)
library(MLmetrics)

amz <- getSymbols("AMZN", src = "yahoo", from = "2015-01-01", to = "2020-07-01", auto.assign = FALSE)


#Amazon Stock Time Series
ts <- amz$AMZN.Open

Pre-Processing and Data Separation

Then, the pre-processing step is performed which transforms the data considering a time window with time (t-n), where n is the number of steps passed. In this example, a time window of 30 was adopted.

The dataset is separated into 30% testing and 70% for model training.

#Time Series Preprocessing with time window = 30
X.ts = dbn.preprocessing(ts, window = 30)

#Define 70% Train and 30% Test Data Set
percent = 0.7
n = nrow(X.ts)

trainIndex <- seq_len(length.out = floor(x = percent * n))
X.ts.train <- X.ts[trainIndex,]
X.ts.test <- X.ts[-trainIndex,]

Learning the Structure of the Dynamic Bayesian Network and Visualization

The 'dbn.learn' function is applied to learn the network structure based on the training samples, and then, the network is visualized by the 'viewer' function of the bnviewer package.

#Dynamic Bayesian Network Structure Learning
ts.learning = dbn.learn(X.ts.train)

#Visualization
viewer(ts.learning,
       bayesianNetwork.width = "100%",
       bayesianNetwork.height = "100vh",
       bayesianNetwork.enabled.interactive.mode = TRUE,
       edges.smooth = FALSE,
       bayesianNetwork.layout = "layout_with_gem",
       node.colors = list(background = "#f4bafd",
                          border = "#2b7ce9",
                          highlight = list(background = "#97c2fc",
                                           border = "#2b7ce9")),
       
       clusters.legend.title = list(text = "Legend"),
       
       clusters.legend.options = list(
         
         list(label = "Target (t)",
              shape = "icon",
              icon = list(code = "f111", size = 50, color = "#e91e63")),
         
         list(label = "Time Window (t-n)",
              shape = "icon",
              icon = list(code = "f111", size = 50, color = "#f4bafd"))
       ),
       
       clusters = list(
         list(label = "Target",
              shape = "icon",
              icon = list(code = "f111", color = "#e91e63"),
              nodes = list("X_t")),
         list(label = "Time Window (t-n)",
              shape = "icon",
              icon = list(code = "f111", color = "#f4bafd"),
              nodes = list("X_t1"))
       )
       
)

Visualization of the Dynamic Bayesian Network.

Não foi fornecido texto alternativo para esta imagem

Parameter Learning

Once having the network structure, parameter learning is performed using the maximum likelihood estimator.

#Dynamic Bayesian Network Fit
ts.fit = dbn.fit(ts.learning, X.ts.train)

Prediction

Now we can perform the data prediction considering the adjusted network, and validating with the test data.

#Predict values
prediction = dbn.predict(ts.fit, X.ts.test)

#Plot Real vs Predict
real = X.ts.test[, "X_t"]

df.validation = data.frame(list(real = real, prediction = prediction))

ggplot(df.validation, aes(seq(1:nrow(df.validation)))) +
  geom_line(aes(y = real, colour="real")) +
  geom_line(aes(y = prediction, colour="prediction")) +
  scale_color_manual(values = c(
    'real' = 'deepskyblue',
    'prediction' = 'maroon1')) +
  labs(title = "Dynamic Bayesian Network",
       subtitle = "Amazon Stock Time Series",
       colour = "Legend",
       x = "Time Index",
       y = "Values") + theme_minimal()

Viewing Actual Data and Prediction

It is possible to observe that the dynamic Bayesian network generated by the "dbnlearn" package, obtained an excellent performance, considering the millimeter adjustment between the test data versus the predicted data.

Não foi fornecido texto alternativo para esta imagem

Evaluation Metrics

For evaluation, the MAPE metric (Mean Absolute Percentage Error) will be considered.

MAPE(prediction, real)


As an output we had a MAPE of 0,01578516
Excellent result!


To the next...

I hope this approach can contribute to those who are starting in the area of Data Science, whether Statistics, Mathematicians, Computer Scientists or students who have an interest in the subject.

Não foi fornecido texto alternativo para esta imagem


Is there no data leakage problem for predict step? I am questioning why test data are entered. Also DBN works so(too) powerful as we can see in the plot. Does it really works for forecasting problem?

Like
Reply

To view or add a comment, sign in

More articles by Robson Fernandes

Others also viewed

Explore content categories