Deep Probabilistic Programming

Steffen Knoedler

Published Feb 24, 2019

This week, one of our Data Science professors Thomas Hamelryck introduced us to Deep Probabilistic Programming (Deep PP). Until lately, I have not heard much about this topic but it has caught my interest now. Deep PP aims to combine the advantages of Deep Learning and PP. So what is it about?

Bayesian and frequentist statistics

First, Let us consider two different approaches to probability in statistics. Frequentist and Bayesian statistics. For Frequentists, the probability of an event is the proportion of that event in the long run. For instance, when you throw a coin the proportion of seeing heads will sum up to 50% when you just throw the coin enough times. We make use of this approach when we look at confidence intervals and conduct hypothesis tests. However, we are often more interested in the probability of the occurrence of a particular event in a more concrete setting. This is when the Bayesian Statistics becomes interesting. A Bayesian statistician would begin with a prior distribution, meaning a probability distribution reflecting the state of knowledge before collecting any data. "In Bayesian statistics, probability expresses a degree of belief in an event, which can change as new information is gathered, rather than a fixed value based upon frequency or propensity" (Wikipedia). Often, we have knowledge about the probability as we often have previous experience. We then combine the prior probabilities and the likelihood from the data to get the posterior probability of the event.

Machine Learning & Probabilistic Programming

When it comes to Machine Learning algorithms, this knowledge about probability becomes crucial. In traditional ML approaches, given the data, our objective is to find the best model that describes the data. We feed in the data to many models, learn the parameters from data and then use it to make predictions. However, we do not include any domain knowledge and only learn from the available data.

PP is a tool for statistical modeling and can help ML tasks as it includes domain knowledge and relies on Bayesian statistics. PP allows a mathematical way to input prior beliefs/ assumptions about the data dynamics that you are trying to model. Here, assumptions are encoded with prior distributions over the variables of the model. "PP makes it easy for a developer to define probability models and then “solve” these models automatically. Now, it is a matter of programming that enables a clean separation between modeling and inference" (applying model to unseen data to assess performance of model) (Cronin, B. 2013). All this might sound more complicated but it actually makes many things easier. It can vastly reduce the time and effort associated with implementing new models and understanding data. "Just as high-level programming languages transformed developers productivity by abstracting away the details of the processor and memory architecture, probabilistic languages promise to free the developer from the complexities of high-performance probabilistic inference" (Cronin, B. 2013). It becomes clear that a high level of abstraction is competitive advantage and hence very important in the industry.

Deep Probabilistic Programming

However, PP based on the Bayesian Model has a major disadvantage vs. traditional ML approaches, probabilistic modeling often ends up being (too) computationally intensive. When we apply the model to unseen data (inference), we need the posterior distribution which we typically approximate by sampling. But sampling does not scale to massive data sets and this is when Deep Learning becomes important. Data scientists have developed an automatic differentiation variational inference. Using this method, the scientist only need to provide a probabilistic model and a dataset, nothing else. "ADVI automatically derives an efficient variational inference algorithm, freeing the scientist to refine and explore many models" (Kucukelbir, A. et. al 2017). "Instead of drawing samples from the posterior, these algorithms instead fit a distribution (e.g. normal) to the posterior turning a sampling problem into and optimization problem" (PyData). The Python library Edward, for instance, implements the ADVI and makes it very easy for data scientists. The neural networks in deep learning are extremely good non-linear function approximators and representation learners. It helps us run the inference algorithm and create meaningful results, even when using Big Data.

I think bridging the gap between Probabilistic Programming and Deep Learning is a very exciting field of study and I cannot wait to learn more!

References

Cronin, B. (2013), "What is probabilistic programming?", Available at: https://www.oreilly.com/ideas/probabilistic-programming, Accessed on 23.02.2019

Hamelryck, T. (2019), "Probabilistic programming: A new paradigm in machine learning", lecture of "Introduction to Data Science" at University of Copenhagen

Kucukelbir, A. et. al (2017), Automatic Differentiation Variational Inference, Journal of Machine Learning Research 18 (2017) 1-45, Submitted 3/16; Revised 8/16; Published 1/17, Available at: http://www.jmlr.org/papers/volume18/16-107/16-107.pdf, Accessed on 23.02.2019

PyData.com, Available at: https://pydata.org/london2017/schedule/presentation/15/, Accessed on 23.02.2019

Wikipedia, "Bayesian statistics", Availbale at: https://en.wikipedia.org/wiki/Bayesian_statistics, Accessed on 23.02.2019

Deep Probabilistic Programming

Steffen Knoedler

More articles by this author

Others also viewed

Automating Manual Data Labeling: A Python Approach

Top programming languages for Machine Learning

Top Python Libraries for AI & ML in 2025 (Must-Learn Tools)

Which Programming Languages Are Used Most Often in AI Development?

The skforecast Project, AI Engineering and New Learning Resources

Data cleaning and preprocessing techniques in Python

Mastering the Backbone of AI: Core Tools & Open-Source Technologies You Must Learn in 2025

Understanding Linear Regression With Python

Machine Learning, Data Science and Generative AI with Python

Explore content categories

Time is Money! But what is Time?

Dec 3, 2020

Everything You Need to Know about Git and its Commands

Jun 3, 2020

Automate Your Job Search with AWS

Dec 19, 2019

Data Project: Who is looking for Data Enthusiasts in Frankfurt, Germany?

Jul 12, 2019