The Generative Modelling Framework
Hello DataXplorers! 🌟
After a week's break, we're back with an exciting new edition of dataxplorations. In this edition, we're taking a step back to basics. We'll explore the fundamental concept of generative modeling, breaking it down into a simple real-world scenario and then expanding it into a framework that will serve as a foundation for more complex endeavors. 🔍
Let's Begin with "Hello World" 👋
Remember the thrill of writing your first "Hello World" program when learning a new programming language?
It symbolizes the start of a journey filled with endless possibilities. In a similar vein, we'll create a basic generative model and introduce essential concepts that will empower us to tackle complex architectures. 🚀
Generative Modeling on a 2D Plane (Not ✈️)
Our journey commences with a scenario on a 2-dimensional plane. Imagine this:
You have a rule, let's call it REAL, which generates a set of points in this 2D space. Now, your challenge is to select a different point, RANDOM = (x, y), within the same 2D space in such a way that it appears to be generated by the same REAL rule. 🤔
How would you go about this?
You might rely on your knowledge of existing data points to construct a mental model, which we'll call GENERATED. This model estimates where in the space new points are likely to appear. If we think about it closely, GENERATED is essentially an estimate of the data distribution. 📈
Let's imagine that GENERATED resembles a rectangular box where points might be located, with an area outside the box where points are highly unlikely. 📦
But how do you generate a new observation?
It's surprisingly simple! You randomly choose a point within the box, or more formally, you sample from the distribution model GENERATED. 🎯
Congratulations! You've just built your very first generative model. You used the training data (represented by the pink points) to construct a model (depicted as the orange region). This model allows you to effortlessly generate other points that seem like they belong to the training set. 🖼️
But our journey doesn't stop here. Now, let's formalize this concept into a framework to better understand the objectives of generative modeling. 🌐
The Generative Modeling Framework 📊
Now, let's delve into the formal framework for generative modeling.
Recommended by LinkedIn
Imagine we have a dataset of observations, which we'll refer to as REAL. Our goal is to build a generative model, which we'll call GENERATED, that mimics the distribution REAL.
The fundamental objectives of GENERATED are:
Accuracy
GENERATED should accurately represent the data distribution REAL. When you generate an observation from GENERATED, it should closely resemble an observation drawn from REAL. Conversely, if the resemblance is low, the generated observation should not appear to be from REAL. 🎯
Generation
Sampling a new observation from GENERATED should be a straightforward process. In other words, GENERATED should provide a mechanism to easily create new data points. 🔄
Representation
GENERATED should enable us to comprehend how various high-level features within the data are depicted. It should provide insights into how the model interprets and represents data. 📊
This framework serves as the backbone of generative modeling, guiding our journey into the realm of artificial intelligence. 🧠
The Hostage Situation 🦹
Now, let's apply this framework to a real-world scenario. Picture a hostage situation where hostages are being held captive. To counter this threat, we have an automated sniper rifle stationed on a building opposite the hostage location. The different data points in the figure represent possible firing points for the sniper rifle. The data-generating rule follows a uniform distribution over the picture, with no chance of hitting the hostages. 🎯
Our generative model, GENERATED, is a simplification of the actual distribution, REAL. To evaluate the model's performance, let's consider three key points, A, B, and C:
Point A: An observation generated by our model but doesn't resemble REAL as it targets the hostages. 🚫
Point B: An observation that could never have been generated by REAL as it falls outside the target area.
Point C: An observation that could be generated by both REAL and GENERATED. Despite its shortcomings, our model is easy to sample from, as it follows a uniform distribution over the orange box. 🔄
In essence, our model is a simplified representation of the complex underlying distribution. The true distribution is divided into regions representing terrorists' body parts and hostages. This example illustrates the fundamental concepts of generative modeling.
While future challenges may be more intricate and high-dimensional, the foundational framework we've established remains constant. 🌟
Stay tuned for more exciting explorations into the world of generative modeling in the upcoming editions of dataxplorations! 🚀