Simplifying Data in Machine Learning
PCA and SVD Practical Applications in Facial Recognition

Simplifying Data in Machine Learning

Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) are dimension-reducing techniques that have a wide variety of applications in machine learning.

For example, PCA is popular in the computer vision field as it can be applied to image compression and facial recognition. PCA is a form of data mining so many fields such as finance, bioinformatics, psychology, and more use it. SVD can also be used for the same applications but it is most popular in Natural Language Processing.

Figure 1: Data with multiple features/dimensions.

Figure 1: Data with multiple features/dimensions.

It is typically easier and computationally less expensive to implement these techniques compared to more traditional algorithms in machine learning. These techniques allow us to extract more meaningful and varied information by reducing the noise in data containing a lot of features (high-dimensional data).

For example, in facial recognition, the data would have features that track eyes, noses, lips, ears, expressions, etc. The data could also contain information on colors. Figure 2 below demonstrates how PCA can simplify data. Suppose each data point tracks information about the eyes, nose, and lips. These are 3 features (dimensions).

Figure 2: A visualization of PCA reducing the dimensions/features of data.

Figure 2: A visualization of PCA reducing the dimensions/features of data. PC1 and PC2 are two axes along which the data aligns the best.

The goal is to find axes (principal components) along which the data aligns the best. Those new axes are used to plot the data points on a 2d plane (as opposed to the original 3d plane) so that more meaningful insights are extracted from the data. These axes represent abstract features, compared to the known features like eyes, nose, and lips of the original. In reality, for a given image of a face, there could be over a 100 known features that can be reduced down to 2 abstract features. Using PCA/SVD makes the data easier to store, process, and analyze.

In later articles, I plan to dive deeper into PCA and SVD.

About the Author: Niharika Kunapuli is a Computer Science graduate from CU Boulder that is passionate about data science and its practical applications to the world.

To view or add a comment, sign in

More articles by Niharika Kunapuli

Others also viewed

Explore content categories