dbnlearn: An R package for Dynamic Bayesian Network Structure Learning, Parameter Learning and Forecasting

Robson Fernandes

Published Jul 30, 2020

Introduction

In this article I will present the dbnlearn, my second package in R (it was published in CRAN on 2020-07-30). It allows to learn the structure of univariate time series, learning parameters and forecasting. Implements a model of Dynamic Bayesian Networks with temporal windows, with collections of linear regressors for Gaussian nodes, based on the introductory texts of Korb and Nicholson (2010) <doi:10.1201/b10391> and Nagarajan, Scutari and Lèbre (2013) <doi:10.1007/978-1-4614-6446-4>.

Dynamic Bayesian Networks

A Dynamic Bayesian Network (DBN) is a Bayesian Network (BN) which relates variables to each other over adjacent time steps. This is often called a Two-Timeslice BN (2TBN) because it says that at any point in time T, the value of a variable can be calculated from the internal regressors and the immediate prior value (time T-1). DBNs were developed by Paul Dagum in the early 1990s at Stanford University's Section on Medical Informatics. Paul Dagum; Adam Galper; Eric Horvitz (1992). "Dynamic Network Models for Forecasting."

Dagum developed DBNs to unify and extend traditional linear state-space models such as Kalman filters, linear and normal forecasting models such as ARMA and simple dependency models such as hidden Markov models into a general probabilistic representation and inference mechanism for arbitrary nonlinear and non-normal time-dependent domains. Paul Dagum; Adam Galper; Eric Horvitz (June 1991). "Temporal Probabilistic Reasoning: Dynamic Network Models for Forecasting."

Example of Dynamic Bayesian Network in the article, "A Review on Video-Based Human Activity Recognition"

Não foi fornecido texto alternativo para esta imagem

Another example of Dynamic Bayesian Network in the article, "Simultaneous Partitioned Sampling for Articulated Object Tracking"

Example applied to Amazon stock prediction

Data Reading

In this code snippet you get the opening prices of Amazon's stock in the period from 2015 to 2020. The data are obtained by the Yahoo Finance API.

library(quantmod)
library(dbnlearn)
library(bnviewer)
library(ggplot2)
library(MLmetrics)

amz <- getSymbols("AMZN", src = "yahoo", from = "2015-01-01", to = "2020-07-01", auto.assign = FALSE)


#Amazon Stock Time Series
ts <- amz$AMZN.Open

Pre-Processing and Data Separation

Then, the pre-processing step is performed which transforms the data considering a time window with time (t-n), where n is the number of steps passed. In this example, a time window of 30 was adopted.

The dataset is separated into 30% testing and 70% for model training.

#Time Series Preprocessing with time window = 30
X.ts = dbn.preprocessing(ts, window = 30)

#Define 70% Train and 30% Test Data Set
percent = 0.7
n = nrow(X.ts)

trainIndex <- seq_len(length.out = floor(x = percent * n))
X.ts.train <- X.ts[trainIndex,]
X.ts.test <- X.ts[-trainIndex,]

Learning the Structure of the Dynamic Bayesian Network and Visualization

The 'dbn.learn' function is applied to learn the network structure based on the training samples, and then, the network is visualized by the 'viewer' function of the bnviewer package.

#Dynamic Bayesian Network Structure Learning
ts.learning = dbn.learn(X.ts.train)

#Visualization
viewer(ts.learning,
       bayesianNetwork.width = "100%",
       bayesianNetwork.height = "100vh",
       bayesianNetwork.enabled.interactive.mode = TRUE,
       edges.smooth = FALSE,
       bayesianNetwork.layout = "layout_with_gem",
       node.colors = list(background = "#f4bafd",
                          border = "#2b7ce9",
                          highlight = list(background = "#97c2fc",
                                           border = "#2b7ce9")),
       
       clusters.legend.title = list(text = "Legend"),
       
       clusters.legend.options = list(
         
         list(label = "Target (t)",
              shape = "icon",
              icon = list(code = "f111", size = 50, color = "#e91e63")),
         
         list(label = "Time Window (t-n)",
              shape = "icon",
              icon = list(code = "f111", size = 50, color = "#f4bafd"))
       ),
       
       clusters = list(
         list(label = "Target",
              shape = "icon",
              icon = list(code = "f111", color = "#e91e63"),
              nodes = list("X_t")),
         list(label = "Time Window (t-n)",
              shape = "icon",
              icon = list(code = "f111", color = "#f4bafd"),
              nodes = list("X_t1"))
       )
       
)

Visualization of the Dynamic Bayesian Network.

Parameter Learning

Once having the network structure, parameter learning is performed using the maximum likelihood estimator.

#Dynamic Bayesian Network Fit
ts.fit = dbn.fit(ts.learning, X.ts.train)

Prediction

Now we can perform the data prediction considering the adjusted network, and validating with the test data.

#Predict values
prediction = dbn.predict(ts.fit, X.ts.test)

#Plot Real vs Predict
real = X.ts.test[, "X_t"]

df.validation = data.frame(list(real = real, prediction = prediction))

ggplot(df.validation, aes(seq(1:nrow(df.validation)))) +
  geom_line(aes(y = real, colour="real")) +
  geom_line(aes(y = prediction, colour="prediction")) +
  scale_color_manual(values = c(
    'real' = 'deepskyblue',
    'prediction' = 'maroon1')) +
  labs(title = "Dynamic Bayesian Network",
       subtitle = "Amazon Stock Time Series",
       colour = "Legend",
       x = "Time Index",
       y = "Values") + theme_minimal()

Viewing Actual Data and Prediction

It is possible to observe that the dynamic Bayesian network generated by the "dbnlearn" package, obtained an excellent performance, considering the millimeter adjustment between the test data versus the predicted data.

Evaluation Metrics

For evaluation, the MAPE metric (Mean Absolute Percentage Error) will be considered.

MAPE(prediction, real)

As an output we had a MAPE of 0,01578516

Excellent result!

To the next...

I hope this approach can contribute to those who are starting in the area of Data Science, whether Statistics, Mathematicians, Computer Scientists or students who have an interest in the subject.

Namtaek Kwon 4y

Is there no data leakage problem for predict step? I am questioning why test data are entered. Also DBN works so(too) powerful as we can see in the plot. Does it really works for forecasting problem?

See more comments

To view or add a comment, sign in

dbnlearn: An R package for Dynamic Bayesian Network Structure Learning, Parameter Learning and Forecasting

Robson Fernandes

Introduction

Dynamic Bayesian Networks

Example applied to Amazon stock prediction

Data Reading

Pre-Processing and Data Separation

Learning the Structure of the Dynamic Bayesian Network and Visualization

Parameter Learning

Prediction

Viewing Actual Data and Prediction

Evaluation Metrics

To the next...

More articles by Robson Fernandes

Others also viewed

Handwritten Digits Classification Using CNN: A Deep Dive into MNIST Dataset with VGG-Inspired Architecture

Support Vector Machines (SVM) in ML

Explain Genetic Algorithm in ML | Types of Genetic Algorithms

Advanced Plant Disease Recognition: A Comprehensive Deep Learning Methodology

Introduction to Quantum Machine Learning

Role of SVM model in current data science

Deep Learning for Predictive Analytics in Healthcare

Graphs Reimagined: The Power of Cell Complexes

All Linear Algebra concepts you need for Machine Learning: You’ll Actually Understand

Machine Learning Categories

Explore content categories

Introduction

Dynamic Bayesian Networks

Example applied to Amazon stock prediction

Data Reading

Pre-Processing and Data Separation

Learning the Structure of the Dynamic Bayesian Network and Visualization

Parameter Learning

Prediction

Viewing Actual Data and Prediction

Evaluation Metrics

To the next...

More articles by Robson Fernandes

BnViewer - Interactive Visualization of Bayesian Networks - Interactive Panel and High Definition Export

Bayesian Networks with Continuous Distributions - Regression model to describe wine quality

Artificial Neural Networks - Multi Layer Perceptron applied to the Iris Data Set Classification

Node Clustering in Probabilistic Graphical Models - Bayesian Networks Explainability

Probabilistic Graphical Models - An approach in Bayesian Networks for Sport analysis and Insights for the formation of Basketball teams

BNViewer - An R package for interactive visualization of Bayesian Networks

Complex Systems - Detecção de Comunidades em Redes Complexas

Data Science - Inferência em Redes Bayesianas Aplicado a Análise de Fatores de Risco em Trombose Coronária

Data Science - Análise de Regressão Linear Aplicado a Previsão de Vendas

Others also viewed

Handwritten Digits Classification Using CNN: A Deep Dive into MNIST Dataset with VGG-Inspired Architecture

Support Vector Machines (SVM) in ML

Explain Genetic Algorithm in ML | Types of Genetic Algorithms

Advanced Plant Disease Recognition: A Comprehensive Deep Learning Methodology

Introduction to Quantum Machine Learning

Role of SVM model in current data science

Deep Learning for Predictive Analytics in Healthcare

Graphs Reimagined: The Power of Cell Complexes

All Linear Algebra concepts you need for Machine Learning: You’ll Actually Understand

Machine Learning Categories

Similar topics

Time Series Forecasting Models

Time Series Analysis in Finance

Explore content categories