ML Algorithm: Singular Value Decomposition

Angela Ju

Published Mar 21, 2021

Singular Value Decomposition, or SVD, has a wide applications. These include dimensionality reduction, image compression, and denoising data. It is also the algorithm behind PCA (Principle Component Analysis). Generally, the goal of SVD is finding the natural pattern behind data and extract important component to represent the variance within data to reduce dimension or exclude noise.

In this article, I would use an example to explain how to calculate SVD in Python and what the algorithm does visually and intuitionally.

Generate Data

First, let's randomly generate 5 observations with 2 features. Then put them into a [5,2] matrix . Below is how the matrix look like:

[[  0.97873798  32.71079171]
 [  2.2408932  148.96263615]
 [  1.76405235  57.64711706]
 [  1.86755799 103.33818754]
 [  0.40015721  42.83273828]]

Plot them in a 2-D space:

SVD In Python

SVD algorithm need the data to be standardized, so let's standardize the matrix first:

[[-0.70722819 -1.02504486]
 [ 1.18577858  1.65957016]
 [ 0.4706027  -0.4491879 ]
 [ 0.62584263  0.60596038]
 [-1.57499573 -0.79129778]]

Then call SVD function:

U,s,V=np.linalg.svd(scaled)


[[-0.40589586  0.23780797  0.3030969   0.28014476 -0.77996609]
 [ 0.66670509 -0.35451701  0.58885854 -0.11087209 -0.26603572]
 [ 0.00501779  0.68823809  0.58977478 -0.04580537  0.41996515]
 [ 0.28862871  0.014877   -0.06546159  0.94042907  0.16667358]
 [-0.55445574 -0.58640606  0.45744768  0.15073593  0.34165344]]
[3.01777414 0.94500752]
[[ 0.70710678  0.70710678]
 [ 
 [ 0.70710678 -0.70710678]]

Basically, SVD is trying to project vectors onto orthogonal axes. When a vector is decomposed this way, 3 pieces of information can be get:

The direction of the projection
The length of the projection
The vectors of projection

Therefore, a matrix can be represented as the product of three other matrices. If call SVD function, 3 matrices would be returned. We would use U, s, V to represent them. From the result above, we could see U is a n*n matrix, s is a d*1 matrix and V is a d*d matrix. Among them, U represents length of projection but convert to a unit vector, s represents square root of the sum of squared projection lengths of all points, and V is the direction of the projection.

U matrix

U matrix could be used to rebuild projected points. if we only take the first component. The projected point on V1 could be get through the code:

S = np.zeros((scaled.shape[0], scaled.shape[1]))
S[:scaled.shape[1], :scaled.shape[1]] = np.diag(s)
n_component=1
# Get reduced S
# Get reduced V
S_reduce = S[:, :n_component]
V_reduce = V[:n_component, :]

A=U.dot(S_reduce.dot(V_reduce))
print(A)

[[-0.86613652 -0.86613652]
 [ 1.42267437  1.42267437]
 [ 0.0107074   0.0107074 ]
 [ 0.61590151  0.61590151]
 
 [-1.18314675 -1.18314675]]

Let's plot projected points and Vector 1:

s matrix

As per discussed, s represents square root of the sum of squared projection lengths of all points. If we project 5 observations in V1, the distance between projected points and centroid (0,0) would be projection lengths, in the last step, we have rebuilt the projection points, so let's calculate the projection lengths in Python:

import math
for i in A:
    print(math.sqrt( ((i[0]-0)**2)+((i[1]-0)**2) ))
    

1.224902020063899
2.0119653880453976
0.015142551230337033
0.871016263161619
1.6732221823734545

If square all of these lengths, sum them up and take square root. The result is 3.01777414. Does this look familiar, yes, it is exactly the first value of s matrix.

np.sqrt(sum(A.dot(V_reduce.T)**2))
3.01777414

Actually, we decomposed the dataset through projecting them into 2 vectors. The first vector projection keep 3.01777414 variance and the sector vector projection contain 0.94500752 variance. Therefore, if we reduce the dimension and only use the first projection to represent the 5 observations, we would keep 76.15% (3.02/(3.02+0.945)) variance.

V matrix

V matrix is the projection direction. We are trying to project each observation into 2 orthogonal axes represented by vectors [ 0.70710678, 0.70710678] and [ 0.70710678, -0.70710678]. Visually, we just draw 2 vector lines using V matrix and project each of the 5 observations.

# V is a d*d matrix that indicates the rotation vector maximize the variance
# This rotations would remove multilinearity because if one feature is highly correlated with another, the first component would doninate the variance
plt.scatter(x_scaled, y_scaled, c='r', marker='s')
plt.xlim(-2, 2)
plt.ylim(-2, 2)
plt.gca().set_aspect('equal', adjustable='box')
plt.plot([0,V[0][0]], [0,V[0][1]])
plt.plot([0,V[1][0]], [0,V[1][1]])
plt.

Code

The full code is in my Git: https://github.com/a64543668/algorithms/blob/main/SVD%20Exploration.ipynb

To view or add a comment, sign in

ML Algorithm: Singular Value Decomposition

Angela Ju

More articles by Angela Ju

Others also viewed

Introduction to the KNN

Predicting the star type with Machine Learning

Tests for randomness - Frequency based

Fitting a Linear regression "Model"

Intro to Causal Inference using DAGs - Part 1

To D or to 2D, that is the question

AI DOES MACHINE LEARNING

Is the R language dying?

Stirring the pile effectively

Twisted Sequential Monte Carlo: Navigating Complex Probability Landscapes 🚀🧩

Explore content categories

More articles by Angela Ju

Submit Environment to Run Python Code in Hadoop Cluster

Make Neural Network More Explainable: Soft Decision Tree

Neural Network: A Step by Step Explanation and Code in TensorFlow

Support Vector Machine: A Detailed Explanation of Attributes in sklearn Model

Look Up Techniques in SAS

Step by Step Instruction of How to Install SparkR on RStudio with Your PC

A Function I Wrote in R to Derive Optimal Adstock Rate From Data

Some Interesting Facts about Toronto Parking Tickets

Toronto Parking Ticket Data Analysis

Data Analytics Tool Comparison: Excel, SPSS and SAS