Generative Learning Algorithms

Pratyush Singh

Published May 20, 2025

Hey everyone! I’m Pratyush Singh. I’ve put together this article as a study exercise to both solidify my understanding of Generative Learning Algorithms and share what I learn along the way. This article briefly touches upon-

Comparison between Discriminative and Generative Learning Algorithms
Correlation and Covariance
The Multivariate Gaussian Distribution
Gaussian Discriminant Analysis (including Linear Discriminant Analysis and Quadratic Discriminant Analysis)
The Naïve Bayes Classifier

Algorithms that try to learn P(Y|X) — Probability of an output Y, given features X—are called discriminative learning algorithms. If you're unfamiliar with conditional probabilities, this short video explains them well:https://www.youtube.com/watch?v=_IgyaD7vOOA).In classification problems, discriminative algorithms aim to find a decision boundary between classes in a d-dimensional space (where d is the number of features). A new data point is classified based on which side of the boundary it falls on. On the other hand, Generative learning algorithms try to model P(X|Y)( and P(Y)). They learn the distribution of features for each class, then classify a new data point by comparing how well its features match the learned distribution of each class. After modeling (X|Y)( and P(Y)—called the prior probability), the algorithm uses Bayes' rule to derive the posterior probability of Y given X.

If you're not familiar with Bayes' rule, this video provides a simple and intuitive explanation:https://www.youtube.com/watch?v=HZGCoVF3YvM&t=8s

Gaussian Discriminant Analysis

We can use the Gaussian Discriminant Analysis when our features X are continuous and real-valued. The GDA assumes the features X follow a multivariate normal distribution(This is explained at the end of the article, go through it first if you're not familiar with the term). The model is:

Here:

~ means "is distributed as."
ϕ (phi) is the probability parameter for the Bernoulli distribution or we can say, the class prior probability(e.g. P(Y=1) in a binary classification problem)
μ₀, μ₁ are the mean vectors of the Gaussian distributions for each class.
Σ is the covariance matrix or matrices depending on the model type(I have explained the concept of Covariance at the end of article).

1.Linear Discriminant Analysis:

All output classes share the same covariance matrix.

Implication:

The decision boundary is linear in the feature space.
This simplifies the computation and reduces the number of parameters.
More stable but less flexible.

When to use it:

When you have limited data.
When you suspect the classes have similar spread and orientation in feature space.

2.Quadratic Discriminant Analysis

Each output class has its own covariance matrix.

Implication:

The decision boundary is quadratic in the feature space and requires more parameters to estimate comparatively.
More flexible and can capture curved boundaries but with a higher risk of overfitting

When to use it:

When you have enough data to estimate separate covariance matrices.
When classes have very different spreads or orientations in the feature space.

GDA and Logistic Regression:

When would we prefer one model over another? GDA and logistic regression will, in general, give different decision boundaries when trained on the same dataset.

GDA assumes the distribution of P(X|Y) to be a multivariate gaussian distribution. When this assumption is correct — or approximately correct — GDA tends to outperform logistic regression, often requiring less data to achieve good performance (i.e., it is more data-efficient). In contrast, logistic regression is more robust in practice because it does not assume any specific distribution for P(X∣Y). Therefore, when the true underlying distributions are not Gaussian, logistic regression typically performs better.

Naïve Bayes

When our features X consist of discrete values and we assume the features are conditionally independent of each other given Y. This leads us to the Naïve Bayes classifier and the assumption that the features are conditionally independent of each other, given Y is called the Naïve Bayes assumption. Despite making strong assumptions(which might not be necessarily true), this classifier performs well on many real-world problems.

Multivariate Gaussian Distribution

Multivariate Gaussian Distribution is the generalization of the normal(Gaussian) distribution to multiple variables. Instead of single variables now we have a vector of variables(features, in the above case).The multivariate Gaussian describes the joint distribution of all these variables together i.e. they form a normal distribution in d-dimensions(equal to the number of variables we have).

This distribution is characterized by two parameters:

I. Mean Vector: Represents the expected value(mean) of each variable.[d-dimensional]

II. Covariance Matrix: Describes how variables vary individually (variances on the diagonal) and how they relate to each other (covariances off-diagonal). The shape and size of the distribution is determined by this covariance matrix.[d x d - dimensional]

Correlation vs Covariance

Both correlation and covariance measure the linear relationship between two variables. Covariance measures the direction of the linear relationship whereas Correlation measure both strength and direction of the linear relationship.

A positive covariance means both variables move in the same direction(if one increases, so does the other one) whereas a negative covariance means variables move in opposite direction. Covariance lies between -∞ to ∞, where as Correlation lies between -1 to 1. Where 1 means perfect positive relationship, -1 means perfect negative relationship and 0 signifies no relationship. Correlation can also be thought of as a normalized version of Covariance.

I. Covariance between two variables X and Y:

We multiply the difference between each value and its variable's mean for both variables, and then average over all such products

II. Correlation between two variables X and Y:

Correlation standardizes covariance by dividing it by the product of the standard deviations

IIII. Suppose we have d-random variables(features, then the covariance matrix is:

The covariance matrix captures the variances of each variable along its diagonal and the covariances between every pair of variables in the off-diagonal entries, summarizing how all variables vary together.

Atharva Tundalwar 11mo

Definitely worth reading

1 Reaction

Jivan Jyoti 11mo

Good to see 👌🏻

1 Reaction

See more comments

To view or add a comment, sign in

Generative Learning Algorithms

Pratyush Singh

Gaussian Discriminant Analysis

1.Linear Discriminant Analysis:

2.Quadratic Discriminant Analysis

GDA and Logistic Regression:

Naïve Bayes

Recommended by LinkedIn

Multivariate Gaussian Distribution

Correlation vs Covariance

More articles by Pratyush Singh

Others also viewed

Introduction to Machine Learning

Machine Learning Day3 - Supervised Learning

Supervised Learning: An In-Depth Guide

Multi-task learning(MTL) with Multi-Layer Perceptron (MLP) and Deep Learning Techniques

Advantages and Disadvantages of Transfer Learning in Deep Learning

Machine Learning and its Real World use cases

Optimizing Model Architecture Assessing Deep Learning Parameters and Hyper parameters Tuning

Transfer Learning with Generative Models

Supervised Machine Learning: How AI Learns to Predict the Future

Classifications in ML

Explore content categories

Gaussian Discriminant Analysis

1.Linear Discriminant Analysis:

2.Quadratic Discriminant Analysis

GDA and Logistic Regression:

Naïve Bayes

Recommended by LinkedIn

Multivariate Gaussian Distribution

Correlation vs Covariance

More articles by Pratyush Singh

Student's t-Distribution and t-tests

Main Challenges to Machine Learning

Types of Machine Learning Systems

An Illustrative introduction to Transformers(Part 2/3)

An Illustrative introduction to Transformers(Part 1/3)

Others also viewed

Introduction to Machine Learning

Machine Learning Day3 - Supervised Learning

Supervised Learning: An In-Depth Guide

Multi-task learning(MTL) with Multi-Layer Perceptron (MLP) and Deep Learning Techniques

Advantages and Disadvantages of Transfer Learning in Deep Learning

Machine Learning and its Real World use cases

Optimizing Model Architecture Assessing Deep Learning Parameters and Hyper parameters Tuning

Transfer Learning with Generative Models

Supervised Machine Learning: How AI Learns to Predict the Future

Classifications in ML

Explore content categories