Image Compression Using PCA

Mohammad Hossein Amini

Published Apr 23, 2019

One of the interesting applications of Principal Component Analysis is in image compression. To do this, we turn an image into some windows and take each of these windows as a feature vector. Considering each feature vector as a random vector, we take advantage of PCA to derive its principal components.

Let's turn a sample image into several 8*8 windows. This is one of my favorite images taken with one of my best friends, Vahid and my dear professor, Dr.Khosravi about two years ago.

The original image to apply PCA compression on

Assuming 8*8 windows, we have some 64-element vectors in this image. Obviously, the 64-dimensional feature space has 64 basis vectors to span all of it. But, as the PCA says(!), it is possible to find some basis vectors in this space, of which we have most of the information in their corresponding direction. Now we have two approaches which are presented.

First, we can do this by a tiny modification to simple unsupervised Hebbian neural networks. In this approach, after training, the network outputs would be the largest eigenvalues of the covariance matrix of features and the weight vector for each neuron will become the corresponding eigenvector. There are some implementation notes which I found during coding this one and they really helped speed up the training process.

Since each color is an integer between 0 and 255, the learning rate should be too small (in the order of 1e-10) for the sake of stability, otherwise, the network will explode! However such learning rates will cause an awfully slow training phase. To avoid this, I just normalized values between 0 and 1 and guess what? training time boosted so much and I even could choose 0.1 as the learning rate!
By vectorizing the learning rule, you may find a super faster training time. By just using for-loop and updating each weight individually, you may get stuck in a simple training for more than an hour! But by vectorizing rules and using matrix operations and Numpy module, all the training will be done in no more than five minutes for a colorful image!

The following is the result of keeping 16 most important components of feature space and use them as the basis, hence 1:4 compression.

The image is still obvious and I can still find myself in it. For better qualities, you can keep more than 16 features of it. It actually turns out, by keeping about 32 features we can have a nice image with half the size of its original version.

The second approach is in the field of statistical pattern recognition. We would estimate the covariance matrix and find its largest eigenvalues and their corresponding eigenvectors. The following is the result of implementing this approach and 32 principal components in it.

You can see the result of this method for 1, 8, 16 and 32 principal components.

Even in the 16-component case, you can see that the statistical approach has a better result. That's because the neural network approach hasn't converged even after about a million iterations and about 30 minutes of training, while the statistical result is obtained in about 30 seconds!

Finally, we shouldn't forget that this application is totally image-dependent. In an image, you can keep most of the details by just 5 principal components, while in another one, you are tempted to keep almost all of its principal components!

For those who would like to run these things themselves, the codes are on my Github:

Image Compression Using PCA

Mohammad Hossein Amini

More articles by Mohammad Hossein Amini

Others also viewed

A Deep Dive into Optimization Techniques in Machine Learning

Applications of Autoencoders

CNN in Image Modification

Demystifying AutoEncoders: The Architects of Data Compression and Reconstruction

Content-Based Image Classification and Retrieval Framework

First Transformer Project - Designing and Training a Text-Summarization Transformer

Face Recognition through CNN(VGG16)

Variational Autoencoders (VAEs)

Understanding Large-Batch Training: A Deep Dive into "An Empirical Model of Large-Batch Training"

Equivalence of Iterative vs. Vectorized Implementations to Determine Backpropagation Error (Δ) in Neural Networks [Mathematical Proof]

Explore content categories