Simplifying Data in Machine Learning

Niharika Kunapuli

Published Sep 7, 2022

Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) are dimension-reducing techniques that have a wide variety of applications in machine learning.

For example, PCA is popular in the computer vision field as it can be applied to image compression and facial recognition. PCA is a form of data mining so many fields such as finance, bioinformatics, psychology, and more use it. SVD can also be used for the same applications but it is most popular in Natural Language Processing.

Figure 1: Data with multiple features/dimensions.

It is typically easier and computationally less expensive to implement these techniques compared to more traditional algorithms in machine learning. These techniques allow us to extract more meaningful and varied information by reducing the noise in data containing a lot of features (high-dimensional data).

For example, in facial recognition, the data would have features that track eyes, noses, lips, ears, expressions, etc. The data could also contain information on colors. Figure 2 below demonstrates how PCA can simplify data. Suppose each data point tracks information about the eyes, nose, and lips. These are 3 features (dimensions).

Recommended by LinkedIn

Why Big data is NOT a threat to Market Research

Nimit Ramaiya 11 years ago

Beyond Human Data: A Critical Examination of Silver &…

Antonio Montano 🪄 1 year ago

Table of XX2Vec Algorithms

Michael Malak 10 years ago

Figure 2: A visualization of PCA reducing the dimensions/features of data. PC1 and PC2 are two axes along which the data aligns the best.

The goal is to find axes (principal components) along which the data aligns the best. Those new axes are used to plot the data points on a 2d plane (as opposed to the original 3d plane) so that more meaningful insights are extracted from the data. These axes represent abstract features, compared to the known features like eyes, nose, and lips of the original. In reality, for a given image of a face, there could be over a 100 known features that can be reduced down to 2 abstract features. Using PCA/SVD makes the data easier to store, process, and analyze.

In later articles, I plan to dive deeper into PCA and SVD.

About the Author: Niharika Kunapuli is a Computer Science graduate from CU Boulder that is passionate about data science and its practical applications to the world.

Sadashiv Iyer 3y

Thanks for posting!

Navya Roy 3y

Great work!

Ravi Kunapuli 3y

Illuminating

See more comments

To view or add a comment, sign in

Simplifying Data in Machine Learning

Niharika Kunapuli

Recommended by LinkedIn

More articles by Niharika Kunapuli

Others also viewed

Right Model, Wrong Answer: Navigating AI Pitfalls in Mineral Exploration

Decision Tree for Satellite Image Classification

A Master Machine Learning Algorithm?

MICE or ML? A Purrfect Solution for Data Imputation

The Art of Artificial Intelligence

Graphical Event Models: Discovering Sparkling GEMs in the Rubble of Event Data

Cracking The Code: Problem-Solving Through Algorithms

A step-by-step guide for LLM fine-tuning using PEFT and bitsandbytes

Linear Algebra Foundations of Modern AI: Matrix Decomposition and SVD

Generic Algorithms in Machine Learning

Singular Value Decomposition (SVD)

How to Simplify Complex Data Insights

Tips for Simplifying Complex Data Presentations

Simplifying Data with Visual Representations in Science

Explore content categories

Recommended by LinkedIn

More articles by Niharika Kunapuli

Using AI Agents to Streamline Client Onboarding in Enterprise Organizations

Others also viewed

Right Model, Wrong Answer: Navigating AI Pitfalls in Mineral Exploration

Decision Tree for Satellite Image Classification

A Master Machine Learning Algorithm?

MICE or ML? A Purrfect Solution for Data Imputation

The Art of Artificial Intelligence

Graphical Event Models: Discovering Sparkling GEMs in the Rubble of Event Data

Cracking The Code: Problem-Solving Through Algorithms

A step-by-step guide for LLM fine-tuning using PEFT and bitsandbytes

Linear Algebra Foundations of Modern AI: Matrix Decomposition and SVD

Generic Algorithms in Machine Learning

Similar topics

Singular Value Decomposition (SVD)

How to Simplify Complex Data Insights

Tips for Simplifying Complex Data Presentations

Simplifying Data with Visual Representations in Science

Explore content categories