ML Algorithm: Singular Value Decomposition
Singular Value Decomposition, or SVD, has a wide applications. These include dimensionality reduction, image compression, and denoising data. It is also the algorithm behind PCA (Principle Component Analysis). Generally, the goal of SVD is finding the natural pattern behind data and extract important component to represent the variance within data to reduce dimension or exclude noise.
In this article, I would use an example to explain how to calculate SVD in Python and what the algorithm does visually and intuitionally.
Generate Data
First, let's randomly generate 5 observations with 2 features. Then put them into a [5,2] matrix . Below is how the matrix look like:
[[ 0.97873798 32.71079171] [ 2.2408932 148.96263615] [ 1.76405235 57.64711706] [ 1.86755799 103.33818754] [ 0.40015721 42.83273828]]
Plot them in a 2-D space:
SVD In Python
SVD algorithm need the data to be standardized, so let's standardize the matrix first:
[[-0.70722819 -1.02504486] [ 1.18577858 1.65957016] [ 0.4706027 -0.4491879 ] [ 0.62584263 0.60596038] [-1.57499573 -0.79129778]]
Then call SVD function:
U,s,V=np.linalg.svd(scaled) [[-0.40589586 0.23780797 0.3030969 0.28014476 -0.77996609] [ 0.66670509 -0.35451701 0.58885854 -0.11087209 -0.26603572] [ 0.00501779 0.68823809 0.58977478 -0.04580537 0.41996515] [ 0.28862871 0.014877 -0.06546159 0.94042907 0.16667358] [-0.55445574 -0.58640606 0.45744768 0.15073593 0.34165344]] [3.01777414 0.94500752] [[ 0.70710678 0.70710678] [ [ 0.70710678 -0.70710678]]
Basically, SVD is trying to project vectors onto orthogonal axes. When a vector is decomposed this way, 3 pieces of information can be get:
- The direction of the projection
- The length of the projection
- The vectors of projection
Therefore, a matrix can be represented as the product of three other matrices. If call SVD function, 3 matrices would be returned. We would use U, s, V to represent them. From the result above, we could see U is a n*n matrix, s is a d*1 matrix and V is a d*d matrix. Among them, U represents length of projection but convert to a unit vector, s represents square root of the sum of squared projection lengths of all points, and V is the direction of the projection.
U matrix
U matrix could be used to rebuild projected points. if we only take the first component. The projected point on V1 could be get through the code:
S = np.zeros((scaled.shape[0], scaled.shape[1])) S[:scaled.shape[1], :scaled.shape[1]] = np.diag(s) n_component=1 # Get reduced S # Get reduced V S_reduce = S[:, :n_component] V_reduce = V[:n_component, :] A=U.dot(S_reduce.dot(V_reduce)) print(A) [[-0.86613652 -0.86613652] [ 1.42267437 1.42267437] [ 0.0107074 0.0107074 ] [ 0.61590151 0.61590151] [-1.18314675 -1.18314675]]
Let's plot projected points and Vector 1:
s matrix
As per discussed, s represents square root of the sum of squared projection lengths of all points. If we project 5 observations in V1, the distance between projected points and centroid (0,0) would be projection lengths, in the last step, we have rebuilt the projection points, so let's calculate the projection lengths in Python:
import math for i in A: print(math.sqrt( ((i[0]-0)**2)+((i[1]-0)**2) )) 1.224902020063899 2.0119653880453976 0.015142551230337033 0.871016263161619 1.6732221823734545
If square all of these lengths, sum them up and take square root. The result is 3.01777414. Does this look familiar, yes, it is exactly the first value of s matrix.
np.sqrt(sum(A.dot(V_reduce.T)**2)) 3.01777414
Actually, we decomposed the dataset through projecting them into 2 vectors. The first vector projection keep 3.01777414 variance and the sector vector projection contain 0.94500752 variance. Therefore, if we reduce the dimension and only use the first projection to represent the 5 observations, we would keep 76.15% (3.02/(3.02+0.945)) variance.
V matrix
V matrix is the projection direction. We are trying to project each observation into 2 orthogonal axes represented by vectors [ 0.70710678, 0.70710678] and [ 0.70710678, -0.70710678]. Visually, we just draw 2 vector lines using V matrix and project each of the 5 observations.
# V is a d*d matrix that indicates the rotation vector maximize the variance
# This rotations would remove multilinearity because if one feature is highly correlated with another, the first component would doninate the variance
plt.scatter(x_scaled, y_scaled, c='r', marker='s')
plt.xlim(-2, 2)
plt.ylim(-2, 2)
plt.gca().set_aspect('equal', adjustable='box')
plt.plot([0,V[0][0]], [0,V[0][1]])
plt.plot([0,V[1][0]], [0,V[1][1]])
plt.
Code
The full code is in my Git: https://github.com/a64543668/algorithms/blob/main/SVD%20Exploration.ipynb