14 Chapter 14: Singular Value Decomposition

Every Matrix Has a Geometry

Author

He Wang

14.1 The story: the spectral theorem grows up

In Chapter 13, symmetric and Hermitian matrices gave us a beautiful story:

\[ A = QDQ^T \qquad \text{or} \qquad A = UDU^*. \]

A symmetric matrix acts by choosing orthogonal directions, stretching each direction by an eigenvalue, and then recombining the result. This is one of the cleanest geometric pictures in linear algebra.

But most matrices in applications are not symmetric. Many are not even square. A data matrix may have rows representing people and columns representing features. An image is a rectangular grid of pixel intensities. A machine-learning design matrix may have thousands of observations and hundreds of variables.

The singular value decomposition, or SVD, says that every matrix still has a spectral-type geometry. The matrix may not have enough eigenvectors, but it always has singular vectors.

The central idea is simple:

To understand a general matrix $X$, study the symmetric positive semidefinite matrix $X^T X$.

This chapter develops SVD from this idea and then connects it to geometry, matrix norms, pseudoinverses, least squares, image compression, and principal component analysis.

14.2 Learning goals

After this chapter, you should be able to:

define singular values and singular vectors;
derive the singular value decomposition from the spectral theorem for $X^T X$;
compute SVD numerically in Python;
interpret SVD geometrically as rotation/reflection, scaling, and rotation/reflection;
use singular values to compute rank and the spectral norm;
use SVD to form low-rank approximations and pseudoinverses;
explain why SVD is important in least squares, image compression, and PCA.

14.3 14.1 The Gram matrix $X^T X$

Let $X\in \mathbb R^{N\times d}$. The columns of $X$ are vectors in $\mathbb R^N$. The matrix $X^T X$ records all dot products among these columns.

Definition 14.1: Gram matrix

Let $X\in \mathbb R^{N\times d}$. The matrix

\[ G=X^T X\in \mathbb R^{d\times d} \]

is called the Gram matrix of $X$.

The Gram matrix is always symmetric:

\[ (X^T X)^T = X^T X. \]

It is also positive semidefinite, because for every vector $v\in \mathbb R^d$,

\[ v^T X^T X v = (Xv)^T(Xv)=\|Xv\|_2^2\ge 0. \]

Proposition 14.2: Basic properties of $X^T X$

Let $X\in\mathbb R^{N\times d}$. Then:

$X^T X$ is symmetric.
$X^T X$ is positive semidefinite.
$\ker(X^T X)=\ker(X)$.
$\operatorname{rank}(X^T X)=\operatorname{rank}(X)$.

Proof

Symmetry follows from

\[ (X^T X)^T=X^T(X^T)^T=X^T X. \]

For positive semidefiniteness, compute

\[ v^T X^T Xv=\|Xv\|_2^2\ge 0. \]

If $Xv=0$, then clearly $X^T Xv=0$. Conversely, if $X^T Xv=0$, then

\[ 0=v^T X^T Xv=\|Xv\|_2^2, \]

so $Xv=0$. Thus the kernels are equal. Rank-nullity gives

\[ \operatorname{rank}(X^T X) =d-\dim\ker(X^T X) =d-\dim\ker(X) =\operatorname{rank}(X). \]

14.4 14.2 Singular values

Since $X^T X$ is symmetric positive semidefinite, the spectral theorem applies. Therefore $X^T X$ has an orthonormal eigenbasis and nonnegative eigenvalues.

Suppose

\[ X^T X v_i=\lambda_i v_i, \]

where

\[ \lambda_1\ge \lambda_2\ge\cdots\ge \lambda_p>0=\lambda_{p+1}=\cdots=\lambda_d. \]

The singular values of $X$ are the square roots of the eigenvalues of $X^T X$:

\[ \sigma_i=\sqrt{\lambda_i}. \]

Definition 14.3: Singular values

Let $X\in\mathbb R^{N\times d}$. The singular values of $X$ are

\[ \sigma_i=\sqrt{\lambda_i}, \]

where $\lambda_i$ are the eigenvalues of $X^T X$, listed in nonincreasing order. Thus

\[ \sigma_1\ge\sigma_2\ge\cdots\ge\sigma_p>0. \]

The rank of $X$ is the number of nonzero singular values.

The eigenvectors $v_i$ of $X^T X$ are called right singular vectors. For each positive singular value, define

\[ u_i=\frac{Xv_i}{\|Xv_i\|_2}=\frac{Xv_i}{\sigma_i}. \]

These vectors are called left singular vectors.

Lemma 14.4: Orthogonality of left singular vectors

For $i,j\in\{1,\ldots,p\}$,

\[ \langle u_i,u_j\rangle=\delta_{ij}. \]

Proof

Using $u_i=Xv_i/\sigma_i$, we compute

\[ \langle u_i,u_j\rangle =\frac{1}{\sigma_i\sigma_j}(Xv_i)^T(Xv_j) =\frac{1}{\sigma_i\sigma_j}v_i^T X^T X v_j. \]

Since $X^T Xv_j=\sigma_j^2v_j$,

\[ \langle u_i,u_j\rangle =\frac{\sigma_j^2}{\sigma_i\sigma_j}v_i^T v_j =\frac{\sigma_j}{\sigma_i}\delta_{ij}. \]

If $i=j$, this equals $1$. If $i\ne j$, this equals $0$.

14.5 14.3 The singular value decomposition

Let

\[ U=[u_1\ \cdots\ u_N], \qquad V=[v_1\ \cdots\ v_d] \]

be orthogonal matrices obtained by completing the left and right singular vectors to orthonormal bases. Let $\Sigma$ be the rectangular diagonal matrix whose diagonal entries are the singular values.

Theorem 14.5: Singular value decomposition

For every matrix $X\in\mathbb R^{N\times d}$ of rank $p$, there exist orthogonal matrices

\[ U\in\mathbb R^{N\times N}, \qquad V\in\mathbb R^{d\times d}, \]

and a rectangular diagonal matrix $\Sigma\in\mathbb R^{N\times d}$ such that

\[ X=U\Sigma V^T. \]

Equivalently,

\[ X=\sigma_1u_1v_1^T+\sigma_2u_2v_2^T+\cdots+\sigma_pu_pv_p^T. \]

Proof

For $i=1,\ldots,p$, the definition of $u_i$ gives

\[ Xv_i=\sigma_i u_i. \]

For $i=p+1,\ldots,d$, the eigenvalue of $X^T X$ is $0$, so

\[ 0=v_i^T X^T Xv_i=\|Xv_i\|_2^2. \]

Therefore $Xv_i=0$. Hence

\[ XV=[Xv_1\ \cdots\ Xv_d] =[\sigma_1u_1\ \cdots\ \sigma_pu_p\ 0\ \cdots\ 0] =U\Sigma. \]

Multiplying by $V^T$ on the right gives

\[ X=U\Sigma V^T. \]

The rank-one expansion follows by multiplying out $U\Sigma V^T$.

Interpretation

The SVD says that every linear transformation can be decomposed as

\[ X=U\Sigma V^T. \]

Read this from right to left:

$V^T$ rotates or reflects the input space.
$\Sigma$ scales along coordinate directions.
$U$ rotates or reflects the output space.

14.6 14.4 Compact and truncated SVD

The full SVD may contain many zero singular values. In computation, we often keep only the nonzero part.

Definition 14.6: Compact SVD

Let $X\in\mathbb R^{N\times d}$ have rank $p$. The compact SVD is

\[ X=U_p\Sigma_pV_p^T, \]

where

\[ U_p=[u_1\ \cdots\ u_p], \qquad V_p=[v_1\ \cdots\ v_p], \qquad \Sigma_p=\operatorname{diag}(\sigma_1,\ldots,\sigma_p). \]

Definition 14.7: Rank-$k$ truncation

For $1\le k\le p$, the rank-$k$ truncation of $X$ is

\[ X_k=\sigma_1u_1v_1^T+\cdots+\sigma_ku_kv_k^T. \]

This keeps the first $k$ singular directions and discards the remaining ones.

14.7 14.5 Example: computing an SVD by hand

Consider

\[ M=\begin{bmatrix} 0&1&1\\ 1&1&0 \end{bmatrix}. \]

Then

\[ M^TM= \begin{bmatrix} 1&1&0\\ 1&2&1\\ 0&1&1 \end{bmatrix}. \]

The eigenvalues of $M^TM$ are $3,1,0$. Therefore the singular values are

\[ \sigma_1=\sqrt3, \qquad \sigma_2=1. \]

One possible orthonormal eigenbasis is

\[ v_1=\frac{1}{\sqrt6}\begin{bmatrix}1\\2\\1\end{bmatrix}, \qquad v_2=\frac{1}{\sqrt2}\begin{bmatrix}1\\0\\-1\end{bmatrix}, \qquad v_3=\frac{1}{\sqrt3}\begin{bmatrix}1\\-1\\1\end{bmatrix}. \]

Then

\[ u_1=\frac{Mv_1}{\sqrt3}=\frac{1}{\sqrt2}\begin{bmatrix}1\\1\end{bmatrix}, \qquad u_2=Mv_2=\frac{1}{\sqrt2}\begin{bmatrix}-1\\1\end{bmatrix}. \]

Thus one SVD is

\[ M=U\Sigma V^T, \]

where

\[ U=\frac{1}{\sqrt2} \begin{bmatrix} 1&-1\\ 1&1 \end{bmatrix}, \qquad \Sigma= \begin{bmatrix} \sqrt3&0&0\\ 0&1&0 \end{bmatrix}, \]

and

\[ V= \begin{bmatrix} 1/\sqrt6&1/\sqrt2&1/\sqrt3\\ 2/\sqrt6&0&-1/\sqrt3\\ 1/\sqrt6&-1/\sqrt2&1/\sqrt3 \end{bmatrix}. \]

14.8 14.6 Python computation: SVD and reconstruction

Code

import numpy as np

M = np.array([[0, 1, 1],
              [1, 1, 0]], dtype=float)

U, s, Vt = np.linalg.svd(M, full_matrices=True)

print("Singular values:", s)
print("U =")
print(U)
print("V^T =")
print(Vt)

Sigma = np.zeros_like(M, dtype=float)
Sigma[:len(s), :len(s)] = np.diag(s)

M_reconstructed = U @ Sigma @ Vt
print("Reconstructed M =")
print(np.round(M_reconstructed, 6))

Singular values: [1.73205081 1.        ]
U =
[[-0.70710678 -0.70710678]
 [-0.70710678  0.70710678]]
V^T =
[[-4.08248290e-01 -8.16496581e-01 -4.08248290e-01]
 [ 7.07106781e-01  1.65204549e-16 -7.07106781e-01]
 [ 5.77350269e-01 -5.77350269e-01  5.77350269e-01]]
Reconstructed M =
[[-0.  1.  1.]
 [ 1.  1. -0.]]

The signs of singular vectors may differ from a hand computation. This is not a contradiction. If $u_i$ and $v_i$ are both replaced by their negatives, the rank-one matrix $u_iv_i^T$ does not change.

14.9 14.7 Geometry: the unit circle becomes an ellipse

Theorem 14.8: Image of the unit circle

Let $A\in\mathbb R^{2\times2}$ be invertible. Then the image of the unit circle under $A$ is an ellipse. The lengths of the semi-major and semi-minor axes are the singular values of $A$.

Proof

Let $A=U\Sigma V^T$, where

\[ \Sigma=\begin{bmatrix}\sigma_1&0\\0&\sigma_2\end{bmatrix}, \qquad \sigma_1\ge\sigma_2>0. \]

The map $V^T$ preserves the unit circle because it is orthogonal. The diagonal map $\Sigma$ sends the unit circle to the ellipse

\[ \frac{y_1^2}{\sigma_1^2}+\frac{y_2^2}{\sigma_2^2}=1. \]

The map $U$ rotates or reflects this ellipse without changing its axis lengths. Therefore the semi-axis lengths are $\sigma_1$ and $\sigma_2$.

Code

import numpy as np
import matplotlib.pyplot as plt

A = np.array([[2.0, 1.0],
              [0.5, 1.5]])

t = np.linspace(0, 2*np.pi, 400)
circle = np.vstack([np.cos(t), np.sin(t)])
ellipse = A @ circle

plt.figure(figsize=(5,5))
plt.plot(circle[0], circle[1], label="unit circle")
plt.plot(ellipse[0], ellipse[1], label="A applied to circle")
plt.axis("equal")
plt.grid(True)
plt.legend()
plt.title("A linear map sends the unit circle to an ellipse")
plt.show()

print("Singular values:", np.linalg.svd(A, compute_uv=False))

Singular values: [2.55833637 0.97719754]

14.10 14.8 Matrix norms and singular values

Definition 14.9: Induced matrix norm

For a vector norm $\|\cdot\|_p$, the induced matrix norm is

\[ \|A\|_p =\max_{x\ne0}\frac{\|Ax\|_p}{\|x\|_p} =\max_{\|x\|_p=1}\|Ax\|_p. \]

Theorem 14.10: Spectral norm

For every matrix $A$, the induced $2$-norm is the largest singular value:

\[ \|A\|_2=\sigma_1(A). \]

Proof

Let $A=U\Sigma V^T$. Since $U$ and $V$ are orthogonal, they preserve Euclidean norms. Thus

\[ \|Ax\|_2=\|U\Sigma V^Tx\|_2=\|\Sigma V^Tx\|_2. \]

Let $y=V^Tx$. Then $\|y\|_2=\|x\|_2$. If $\|y\|_2=1$, then

\[ \|\Sigma y\|_2^2 =\sum_i \sigma_i^2y_i^2 \le \sigma_1^2\sum_i y_i^2 =\sigma_1^2. \]

Therefore $\|A\|_2\le \sigma_1$. Equality is achieved when $y=e_1$, equivalently $x=v_1$. Hence $\|A\|_2=\sigma_1$.

14.11 14.9 The Moore–Penrose pseudoinverse

An invertible square matrix has an inverse. But rectangular or singular matrices usually do not. The SVD gives a replacement: the pseudoinverse.

Definition 14.11: Moore–Penrose pseudoinverse

Let $A\in\mathbb R^{m\times n}$. A matrix $A^+\in\mathbb R^{n\times m}$ is called the Moore–Penrose pseudoinverse of $A$ if it satisfies

\[ AA^+A=A, \qquad A^+AA^+=A^+, \]

\[ (AA^+)^T=AA^+, \qquad (A^+A)^T=A^+A. \]

Theorem 14.12: Pseudoinverse from SVD

Let

\[ A=U\Sigma V^T \]

be an SVD of $A$. Define $\Sigma^+$ by replacing each nonzero singular value $\sigma_i$ by $1/\sigma_i$ and transposing the rectangular diagonal shape. Then

\[ A^+=V\Sigma^+U^T. \]

Code

A = np.array([[1, 1],
              [1, 2],
              [1, 3]], dtype=float)
b = np.array([1, 2, 2], dtype=float)

x_pinv = np.linalg.pinv(A) @ b
x_lstsq, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)

print("Pseudoinverse solution:", x_pinv)
print("lstsq solution:", x_lstsq)
print("Residual norm:", np.linalg.norm(A @ x_pinv - b))

Pseudoinverse solution: [0.66666667 0.5       ]
lstsq solution: [0.66666667 0.5       ]
Residual norm: 0.408248290463863

Theorem 14.13: Least-squares and minimum-norm solution

For $A\in\mathbb R^{m\times n}$ and $b\in\mathbb R^m$, the vector

\[ x^+=A^+b \]

is a least-squares solution of $Ax=b$. Among all least-squares solutions, it has minimum Euclidean norm.

Proof

Write $A=U\Sigma V^T$, $y=V^Tx$, and $c=U^Tb$. Orthogonal transformations preserve Euclidean lengths, so

\[ \|Ax-b\|_2=\|U\Sigma V^Tx-b\|_2=\|\Sigma y-c\|_2. \]

This is a diagonal least-squares problem. For each nonzero singular value, the best choice is

\[ y_i=\frac{c_i}{\sigma_i}. \]

For zero singular values, the residual does not depend on $y_i$, so the minimum-norm choice is $y_i=0$. Transforming back gives

\[ x=V\Sigma^+U^Tb=A^+b. \]

14.12 14.10 Low-rank approximation and compression

The SVD orders information by singular value size. Large singular values represent strong directions; small singular values represent weaker directions. This makes SVD ideal for compression.

Theorem 14.14: Eckart–Young theorem

Let

\[ A=\sigma_1u_1v_1^T+\cdots+\sigma_pu_pv_p^T \]

be the SVD of $A$, with $\sigma_1\ge\cdots\ge\sigma_p>0$. For $k<p$, define

\[ A_k=\sigma_1u_1v_1^T+\cdots+\sigma_ku_kv_k^T. \]

Then $A_k$ is a best rank-$k$ approximation to $A$ in the spectral norm:

\[ \min_{\operatorname{rank}(B)\le k}\|A-B\|_2 =\|A-A_k\|_2 =\sigma_{k+1}. \]

It is also best in the Frobenius norm:

\[ \min_{\operatorname{rank}(B)\le k}\|A-B\|_F =\left(\sum_{i=k+1}^p\sigma_i^2\right)^{1/2}. \]

Proof idea

Orthogonal transformations preserve spectral and Frobenius norms. Therefore the approximation problem can be transformed into the problem of approximating the diagonal matrix $\Sigma$. The best rank-$k$ approximation keeps the largest $k$ diagonal entries and discards the rest. Transforming back gives $A_k$.

Code

# A small synthetic image-like matrix
x = np.linspace(-3, 3, 80)
y = np.linspace(-3, 3, 80)
Xgrid, Ygrid = np.meshgrid(x, y)
Image = np.exp(-(Xgrid**2 + Ygrid**2)) + 0.5*np.exp(-((Xgrid-1.2)**2 + (Ygrid+0.7)**2)/0.4)

U, s, Vt = np.linalg.svd(Image, full_matrices=False)

def rank_k(k):
    return U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]

for k in [1, 5, 10, 20]:
    Ak = rank_k(k)
    rel_error = np.linalg.norm(Image - Ak, 'fro') / np.linalg.norm(Image, 'fro')
    print(f"k={k:2d}, relative Frobenius error={rel_error:.4f}")

k= 1, relative Frobenius error=0.1805
k= 5, relative Frobenius error=0.0000
k=10, relative Frobenius error=0.0000
k=20, relative Frobenius error=0.0000

14.13 14.11 Principal component analysis

Suppose $X\in\mathbb R^{N\times d}$ is a data matrix whose rows are observations and whose columns are centered features. Then

\[ \frac{1}{N-1}X^TX \]

is the sample covariance matrix. The right singular vectors of $X$ are the principal component directions.

Proposition 14.15: PCA directions from SVD

The first principal component direction is the first right singular vector $v_1$ of $X$. More generally, the first $k$ principal directions are

\[ v_1,\ldots,v_k. \]

The corresponding variances are proportional to

\[ \sigma_1^2,\ldots,\sigma_k^2. \]

Proof

PCA seeks a unit vector $v$ maximizing the variance of projected data:

\[ \|Xv\|_2^2=v^TX^TXv, \qquad \|v\|_2=1. \]

By the Rayleigh quotient theorem, the maximum is the largest eigenvalue of $X^TX$, achieved at its corresponding unit eigenvector. This is exactly the first right singular vector $v_1$, and the maximum value is $\lambda_1=\sigma_1^2$.

Code

np.random.seed(0)
Z = np.random.randn(200, 2)
Xdata = Z @ np.array([[3, 1], [0, 0.5]])
Xcentered = Xdata - Xdata.mean(axis=0)

U, s, Vt = np.linalg.svd(Xcentered, full_matrices=False)
print("Principal directions as rows of V^T:")
print(Vt)
print("Explained variance proportions:")
print(s**2 / np.sum(s**2))

Principal directions as rows of V^T:
[[-0.94842385 -0.31700504]
 [ 0.31700504 -0.94842385]]
Explained variance proportions:
[0.97869706 0.02130294]

14.14 14.12 Complex SVD

Over complex vector spaces, transpose is replaced by conjugate transpose.

Theorem 14.16: Complex SVD

For every matrix $A\in\mathbb C^{m\times n}$, there exist unitary matrices

\[ U\in\mathbb C^{m\times m}, \qquad V\in\mathbb C^{n\times n}, \]

and a rectangular diagonal matrix $\Sigma\in\mathbb R^{m\times n}$ with nonnegative diagonal entries such that

\[ A=U\Sigma V^*. \]

The diagonal entries of $\Sigma$ are the square roots of the eigenvalues of the Hermitian positive semidefinite matrix $A^*A$.

Proof

The matrix $A^*A$ is Hermitian because

\[ (A^*A)^*=A^*A. \]

It is positive semidefinite because

\[ z^*A^*Az=\|Az\|_2^2\ge0. \]

By the complex spectral theorem, there is a unitary matrix $V$ such that

\[ A^*A=VDV^*. \]

Let $\sigma_i=\sqrt{\lambda_i}$. For positive $\sigma_i$, define

\[ u_i=\frac{Av_i}{\sigma_i}. \]

The same orthogonality calculation shows these $u_i$ are orthonormal. Complete them to a unitary basis. Then $AV=U\Sigma$, so $A=U\Sigma V^*$.

14.15 14.13 Challenge questions

14.16 Challenge 1

Why are singular values always nonnegative, even though eigenvalues can be negative?

Solution

Singular values are square roots of eigenvalues of $A^TA$ or $A^*A$. These Gram matrices are positive semidefinite, so their eigenvalues are nonnegative. Therefore the singular values are nonnegative.

14.17 Challenge 2

Explain why SVD exists for every matrix, while ordinary diagonalization does not.

Solution

Ordinary diagonalization requires a square matrix with enough linearly independent eigenvectors. Many square matrices are defective, and rectangular matrices do not have ordinary eigenvalue decompositions. SVD applies the spectral theorem to $A^TA$ or $A^*A$, which is always symmetric or Hermitian positive semidefinite. Therefore SVD exists for every matrix.

14.18 Challenge 3

Suppose the singular values of a matrix are $20,5,1,0.1$. What is the spectral norm error of the best rank-$2$ approximation?

Solution

By Eckart–Young, the best rank-$2$ approximation error in spectral norm is the next singular value:

\[ \|A-A_2\|_2=\sigma_3=1. \]

14.19 Challenge 4

Suppose $A\in\mathbb R^{2\times2}$ has singular values $8$ and $1$. Describe the image of the unit circle under $A$.

Solution

The image is an ellipse. Its semi-major axis has length $8$ and its semi-minor axis has length $1$. The directions of the axes are the left singular vectors.

14.20 14.14 Practice problems

Practice Problem 1

Let

\[ A=\begin{bmatrix}3&0\\0&2\end{bmatrix}. \]

Find the singular values and describe the image of the unit circle.

Solution

Since $A^TA=\operatorname{diag}(9,4)$, the singular values are $3$ and $2$. The unit circle is sent to the ellipse

\[ \frac{x^2}{9}+\frac{y^2}{4}=1. \]

Practice Problem 2

Let

\[ A=\begin{bmatrix}1&1\\1&1\end{bmatrix}. \]

Find the rank from the singular values.

Solution

Compute

\[ A^TA=\begin{bmatrix}2&2\\2&2\end{bmatrix}. \]

Its eigenvalues are $4$ and $0$. Therefore the singular values are $2$ and $0$. The rank is $1$.

Practice Problem 3

If a matrix has singular values $7,3,0,0$, what is its rank? What is its spectral norm?

Solution

The rank is the number of nonzero singular values, so the rank is $2$. The spectral norm is the largest singular value, so $\|A\|_2=7$.

Practice Problem 4

Use Python to compute the SVD of

\[ A=\begin{bmatrix}1&4\\2&2\\2&-4\end{bmatrix}. \]

Compare the singular values with the hand computation in the notes.

Solution

Code

A = np.array([[1, 4], [2, 2], [2, -4]], dtype=float)
U, s, Vt = np.linalg.svd(A, full_matrices=True)
print(s)

[6. 3.]

The singular values are $6$ and $3$.

14.21 14.15 AI companion activities

Use an AI assistant as a study partner, not as a replacement for your own computation.

Explain the story. Ask: “Explain why SVD is a spectral theorem for rectangular matrices in three levels: beginner, applied math student, and data scientist.”
Check a proof. Ask the AI to verify the proof that $\ker(X^TX)=\ker(X)$, then identify the single most important line.
Compare methods. Ask: “Compare eigenvalue decomposition, Schur decomposition, and singular value decomposition.”
Debug code. Paste your SVD code and ask the AI to check dimensions of $U$, $\Sigma$, and $V^T$.
Explore applications. Ask for three examples where SVD appears in machine learning, image processing, or recommendation systems.

14.22 14.16 Summary

The singular value decomposition is one of the central tools of applied linear algebra:

\[ A=U\Sigma V^T \qquad \text{or} \qquad A=U\Sigma V^* \text{ over } \mathbb C. \]

It generalizes the spectral theorem to all matrices. It explains geometry, rank, matrix norms, least squares, pseudoinverses, low-rank approximation, compression, and PCA. In modern applied mathematics, data science, and machine learning, SVD is not only a theorem; it is a computational language for extracting structure from data.

--- title: "Chapter 14: Singular Value Decomposition" subtitle: "Every Matrix Has a Geometry" author: "He Wang" format: html: toc: true number-sections: true code-fold: true code-tools: true theme: cosmo execute: echo: true warning: false message: false jupyter: python3 --- ## The story: the spectral theorem grows up In Chapter 13, symmetric and Hermitian matrices gave us a beautiful story: $$ A = QDQ^T \qquad \text{or} \qquad A = UDU^*. $$ A symmetric matrix acts by choosing orthogonal directions, stretching each direction by an eigenvalue, and then recombining the result. This is one of the cleanest geometric pictures in linear algebra. But most matrices in applications are not symmetric. Many are not even square. A data matrix may have rows representing people and columns representing features. An image is a rectangular grid of pixel intensities. A machine-learning design matrix may have thousands of observations and hundreds of variables. The **singular value decomposition**, or **SVD**, says that every matrix still has a spectral-type geometry. The matrix may not have enough eigenvectors, but it always has singular vectors. The central idea is simple: > To understand a general matrix $X$, study the symmetric positive semidefinite matrix $X^T X$. This chapter develops SVD from this idea and then connects it to geometry, matrix norms, pseudoinverses, least squares, image compression, and principal component analysis. ## Learning goals After this chapter, you should be able to: 1. define singular values and singular vectors; 2. derive the singular value decomposition from the spectral theorem for $X^T X$; 3. compute SVD numerically in Python; 4. interpret SVD geometrically as rotation/reflection, scaling, and rotation/reflection; 5. use singular values to compute rank and the spectral norm; 6. use SVD to form low-rank approximations and pseudoinverses; 7. explain why SVD is important in least squares, image compression, and PCA. ## 14.1 The Gram matrix $X^T X$ Let $X\in \mathbb R^{N\times d}$. The columns of $X$ are vectors in $\mathbb R^N$. The matrix $X^T X$ records all dot products among these columns. ::: {.callout-note} ## Definition 14.1: Gram matrix Let $X\in \mathbb R^{N\times d}$. The matrix $$ G=X^T X\in \mathbb R^{d\times d} $$ is called the **Gram matrix** of $X$. ::: The Gram matrix is always symmetric: $$ (X^T X)^T = X^T X. $$ It is also positive semidefinite, because for every vector $v\in \mathbb R^d$, $$ v^T X^T X v = (Xv)^T(Xv)=\|Xv\|_2^2\ge 0. $$ ::: {.callout-important} ## Proposition 14.2: Basic properties of $X^T X$ Let $X\in\mathbb R^{N\times d}$. Then: 1. $X^T X$ is symmetric. 2. $X^T X$ is positive semidefinite. 3. $\ker(X^T X)=\ker(X)$. 4. $\operatorname{rank}(X^T X)=\operatorname{rank}(X)$. ::: <details> <summary>Proof</summary> Symmetry follows from $$ (X^T X)^T=X^T(X^T)^T=X^T X. $$ For positive semidefiniteness, compute $$ v^T X^T Xv=\|Xv\|_2^2\ge 0. $$ If $Xv=0$, then clearly $X^T Xv=0$. Conversely, if $X^T Xv=0$, then $$ 0=v^T X^T Xv=\|Xv\|_2^2, $$ so $Xv=0$. Thus the kernels are equal. Rank-nullity gives $$ \operatorname{rank}(X^T X) =d-\dim\ker(X^T X) =d-\dim\ker(X) =\operatorname{rank}(X). $$ </details> ## 14.2 Singular values Since $X^T X$ is symmetric positive semidefinite, the spectral theorem applies. Therefore $X^T X$ has an orthonormal eigenbasis and nonnegative eigenvalues. Suppose $$ X^T X v_i=\lambda_i v_i, $$ where $$ \lambda_1\ge \lambda_2\ge\cdots\ge \lambda_p>0=\lambda_{p+1}=\cdots=\lambda_d. $$ The **singular values** of $X$ are the square roots of the eigenvalues of $X^T X$: $$ \sigma_i=\sqrt{\lambda_i}. $$ ::: {.callout-note} ## Definition 14.3: Singular values Let $X\in\mathbb R^{N\times d}$. The singular values of $X$ are $$ \sigma_i=\sqrt{\lambda_i}, $$ where $\lambda_i$ are the eigenvalues of $X^T X$, listed in nonincreasing order. Thus $$ \sigma_1\ge\sigma_2\ge\cdots\ge\sigma_p>0. $$ The rank of $X$ is the number of nonzero singular values. ::: The eigenvectors $v_i$ of $X^T X$ are called **right singular vectors**. For each positive singular value, define $$ u_i=\frac{Xv_i}{\|Xv_i\|_2}=\frac{Xv_i}{\sigma_i}. $$ These vectors are called **left singular vectors**. ::: {.callout-important} ## Lemma 14.4: Orthogonality of left singular vectors For $i,j\in\{1,\ldots,p\}$, $$ \langle u_i,u_j\rangle=\delta_{ij}. $$ ::: <details> <summary>Proof</summary> Using $u_i=Xv_i/\sigma_i$, we compute $$ \langle u_i,u_j\rangle =\frac{1}{\sigma_i\sigma_j}(Xv_i)^T(Xv_j) =\frac{1}{\sigma_i\sigma_j}v_i^T X^T X v_j. $$ Since $X^T Xv_j=\sigma_j^2v_j$, $$ \langle u_i,u_j\rangle =\frac{\sigma_j^2}{\sigma_i\sigma_j}v_i^T v_j =\frac{\sigma_j}{\sigma_i}\delta_{ij}. $$ If $i=j$, this equals $1$. If $i\ne j$, this equals $0$. </details> ## 14.3 The singular value decomposition Let $$ U=[u_1\ \cdots\ u_N], \qquad V=[v_1\ \cdots\ v_d] $$ be orthogonal matrices obtained by completing the left and right singular vectors to orthonormal bases. Let $\Sigma$ be the rectangular diagonal matrix whose diagonal entries are the singular values. ::: {.callout-important} ## Theorem 14.5: Singular value decomposition For every matrix $X\in\mathbb R^{N\times d}$ of rank $p$, there exist orthogonal matrices $$ U\in\mathbb R^{N\times N}, \qquad V\in\mathbb R^{d\times d}, $$ and a rectangular diagonal matrix $\Sigma\in\mathbb R^{N\times d}$ such that $$ X=U\Sigma V^T. $$ Equivalently, $$ X=\sigma_1u_1v_1^T+\sigma_2u_2v_2^T+\cdots+\sigma_pu_pv_p^T. $$ ::: <details> <summary>Proof</summary> For $i=1,\ldots,p$, the definition of $u_i$ gives $$ Xv_i=\sigma_i u_i. $$ For $i=p+1,\ldots,d$, the eigenvalue of $X^T X$ is $0$, so $$ 0=v_i^T X^T Xv_i=\|Xv_i\|_2^2. $$ Therefore $Xv_i=0$. Hence $$ XV=[Xv_1\ \cdots\ Xv_d] =[\sigma_1u_1\ \cdots\ \sigma_pu_p\ 0\ \cdots\ 0] =U\Sigma. $$ Multiplying by $V^T$ on the right gives $$ X=U\Sigma V^T. $$ The rank-one expansion follows by multiplying out $U\Sigma V^T$. </details> ::: {.callout-tip} ## Interpretation The SVD says that every linear transformation can be decomposed as $$ X=U\Sigma V^T. $$ Read this from right to left: 1. $V^T$ rotates or reflects the input space. 2. $\Sigma$ scales along coordinate directions. 3. $U$ rotates or reflects the output space. ::: ## 14.4 Compact and truncated SVD The full SVD may contain many zero singular values. In computation, we often keep only the nonzero part. ::: {.callout-note} ## Definition 14.6: Compact SVD Let $X\in\mathbb R^{N\times d}$ have rank $p$. The compact SVD is $$ X=U_p\Sigma_pV_p^T, $$ where $$ U_p=[u_1\ \cdots\ u_p], \qquad V_p=[v_1\ \cdots\ v_p], \qquad \Sigma_p=\operatorname{diag}(\sigma_1,\ldots,\sigma_p). $$ ::: ::: {.callout-note} ## Definition 14.7: Rank-$k$ truncation For $1\le k\le p$, the rank-$k$ truncation of $X$ is $$ X_k=\sigma_1u_1v_1^T+\cdots+\sigma_ku_kv_k^T. $$ This keeps the first $k$ singular directions and discards the remaining ones. ::: ## 14.5 Example: computing an SVD by hand Consider $$ M=\begin{bmatrix} 0&1&1\\ 1&1&0 \end{bmatrix}. $$ Then $$ M^TM= \begin{bmatrix} 1&1&0\\ 1&2&1\\ 0&1&1 \end{bmatrix}. $$ The eigenvalues of $M^TM$ are $3,1,0$. Therefore the singular values are $$ \sigma_1=\sqrt3, \qquad \sigma_2=1. $$ One possible orthonormal eigenbasis is $$ v_1=\frac{1}{\sqrt6}\begin{bmatrix}1\\2\\1\end{bmatrix}, \qquad v_2=\frac{1}{\sqrt2}\begin{bmatrix}1\\0\\-1\end{bmatrix}, \qquad v_3=\frac{1}{\sqrt3}\begin{bmatrix}1\\-1\\1\end{bmatrix}. $$ Then $$ u_1=\frac{Mv_1}{\sqrt3}=\frac{1}{\sqrt2}\begin{bmatrix}1\\1\end{bmatrix}, \qquad u_2=Mv_2=\frac{1}{\sqrt2}\begin{bmatrix}-1\\1\end{bmatrix}. $$ Thus one SVD is $$ M=U\Sigma V^T, $$ where $$ U=\frac{1}{\sqrt2} \begin{bmatrix} 1&-1\\ 1&1 \end{bmatrix}, \qquad \Sigma= \begin{bmatrix} \sqrt3&0&0\\ 0&1&0 \end{bmatrix}, $$ and $$ V= \begin{bmatrix} 1/\sqrt6&1/\sqrt2&1/\sqrt3\\ 2/\sqrt6&0&-1/\sqrt3\\ 1/\sqrt6&-1/\sqrt2&1/\sqrt3 \end{bmatrix}. $$ ## 14.6 Python computation: SVD and reconstruction ```{python} import numpy as np M = np.array([[0, 1, 1], [1, 1, 0]], dtype=float) U, s, Vt = np.linalg.svd(M, full_matrices=True) print("Singular values:", s) print("U =") print(U) print("V^T =") print(Vt) Sigma = np.zeros_like(M, dtype=float) Sigma[:len(s), :len(s)] = np.diag(s) M_reconstructed = U @ Sigma @ Vt print("Reconstructed M =") print(np.round(M_reconstructed, 6)) ``` The signs of singular vectors may differ from a hand computation. This is not a contradiction. If $u_i$ and $v_i$ are both replaced by their negatives, the rank-one matrix $u_iv_i^T$ does not change. ## 14.7 Geometry: the unit circle becomes an ellipse ::: {.callout-important} ## Theorem 14.8: Image of the unit circle Let $A\in\mathbb R^{2\times2}$ be invertible. Then the image of the unit circle under $A$ is an ellipse. The lengths of the semi-major and semi-minor axes are the singular values of $A$. ::: <details> <summary>Proof</summary> Let $A=U\Sigma V^T$, where $$ \Sigma=\begin{bmatrix}\sigma_1&0\\0&\sigma_2\end{bmatrix}, \qquad \sigma_1\ge\sigma_2>0. $$ The map $V^T$ preserves the unit circle because it is orthogonal. The diagonal map $\Sigma$ sends the unit circle to the ellipse $$ \frac{y_1^2}{\sigma_1^2}+\frac{y_2^2}{\sigma_2^2}=1. $$ The map $U$ rotates or reflects this ellipse without changing its axis lengths. Therefore the semi-axis lengths are $\sigma_1$ and $\sigma_2$. </details> ```{python} import numpy as np import matplotlib.pyplot as plt A = np.array([[2.0, 1.0], [0.5, 1.5]]) t = np.linspace(0, 2*np.pi, 400) circle = np.vstack([np.cos(t), np.sin(t)]) ellipse = A @ circle plt.figure(figsize=(5,5)) plt.plot(circle[0], circle[1], label="unit circle") plt.plot(ellipse[0], ellipse[1], label="A applied to circle") plt.axis("equal") plt.grid(True) plt.legend() plt.title("A linear map sends the unit circle to an ellipse") plt.show() print("Singular values:", np.linalg.svd(A, compute_uv=False)) ``` ## 14.8 Matrix norms and singular values ::: {.callout-note} ## Definition 14.9: Induced matrix norm For a vector norm $\|\cdot\|_p$, the induced matrix norm is $$ \|A\|_p =\max_{x\ne0}\frac{\|Ax\|_p}{\|x\|_p} =\max_{\|x\|_p=1}\|Ax\|_p. $$ ::: ::: {.callout-important} ## Theorem 14.10: Spectral norm For every matrix $A$, the induced $2$-norm is the largest singular value: $$ \|A\|_2=\sigma_1(A). $$ ::: <details> <summary>Proof</summary> Let $A=U\Sigma V^T$. Since $U$ and $V$ are orthogonal, they preserve Euclidean norms. Thus $$ \|Ax\|_2=\|U\Sigma V^Tx\|_2=\|\Sigma V^Tx\|_2. $$ Let $y=V^Tx$. Then $\|y\|_2=\|x\|_2$. If $\|y\|_2=1$, then $$ \|\Sigma y\|_2^2 =\sum_i \sigma_i^2y_i^2 \le \sigma_1^2\sum_i y_i^2 =\sigma_1^2. $$ Therefore $\|A\|_2\le \sigma_1$. Equality is achieved when $y=e_1$, equivalently $x=v_1$. Hence $\|A\|_2=\sigma_1$. </details> ## 14.9 The Moore--Penrose pseudoinverse An invertible square matrix has an inverse. But rectangular or singular matrices usually do not. The SVD gives a replacement: the pseudoinverse. ::: {.callout-note} ## Definition 14.11: Moore--Penrose pseudoinverse Let $A\in\mathbb R^{m\times n}$. A matrix $A^+\in\mathbb R^{n\times m}$ is called the **Moore--Penrose pseudoinverse** of $A$ if it satisfies $$ AA^+A=A, \qquad A^+AA^+=A^+, $$ $$ (AA^+)^T=AA^+, \qquad (A^+A)^T=A^+A. $$ ::: ::: {.callout-important} ## Theorem 14.12: Pseudoinverse from SVD Let $$ A=U\Sigma V^T $$ be an SVD of $A$. Define $\Sigma^+$ by replacing each nonzero singular value $\sigma_i$ by $1/\sigma_i$ and transposing the rectangular diagonal shape. Then $$ A^+=V\Sigma^+U^T. $$ ::: ```{python} A = np.array([[1, 1], [1, 2], [1, 3]], dtype=float) b = np.array([1, 2, 2], dtype=float) x_pinv = np.linalg.pinv(A) @ b x_lstsq, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None) print("Pseudoinverse solution:", x_pinv) print("lstsq solution:", x_lstsq) print("Residual norm:", np.linalg.norm(A @ x_pinv - b)) ``` ::: {.callout-important} ## Theorem 14.13: Least-squares and minimum-norm solution For $A\in\mathbb R^{m\times n}$ and $b\in\mathbb R^m$, the vector $$ x^+=A^+b $$ is a least-squares solution of $Ax=b$. Among all least-squares solutions, it has minimum Euclidean norm. ::: <details> <summary>Proof</summary> Write $A=U\Sigma V^T$, $y=V^Tx$, and $c=U^Tb$. Orthogonal transformations preserve Euclidean lengths, so $$ \|Ax-b\|_2=\|U\Sigma V^Tx-b\|_2=\|\Sigma y-c\|_2. $$ This is a diagonal least-squares problem. For each nonzero singular value, the best choice is $$ y_i=\frac{c_i}{\sigma_i}. $$ For zero singular values, the residual does not depend on $y_i$, so the minimum-norm choice is $y_i=0$. Transforming back gives $$ x=V\Sigma^+U^Tb=A^+b. $$ </details> ## 14.10 Low-rank approximation and compression The SVD orders information by singular value size. Large singular values represent strong directions; small singular values represent weaker directions. This makes SVD ideal for compression. ::: {.callout-important} ## Theorem 14.14: Eckart--Young theorem Let $$ A=\sigma_1u_1v_1^T+\cdots+\sigma_pu_pv_p^T $$ be the SVD of $A$, with $\sigma_1\ge\cdots\ge\sigma_p>0$. For $k<p$, define $$ A_k=\sigma_1u_1v_1^T+\cdots+\sigma_ku_kv_k^T. $$ Then $A_k$ is a best rank-$k$ approximation to $A$ in the spectral norm: $$ \min_{\operatorname{rank}(B)\le k}\|A-B\|_2 =\|A-A_k\|_2 =\sigma_{k+1}. $$ It is also best in the Frobenius norm: $$ \min_{\operatorname{rank}(B)\le k}\|A-B\|_F =\left(\sum_{i=k+1}^p\sigma_i^2\right)^{1/2}. $$ ::: <details> <summary>Proof idea</summary> Orthogonal transformations preserve spectral and Frobenius norms. Therefore the approximation problem can be transformed into the problem of approximating the diagonal matrix $\Sigma$. The best rank-$k$ approximation keeps the largest $k$ diagonal entries and discards the rest. Transforming back gives $A_k$. </details> ```{python} # A small synthetic image-like matrix x = np.linspace(-3, 3, 80) y = np.linspace(-3, 3, 80) Xgrid, Ygrid = np.meshgrid(x, y) Image = np.exp(-(Xgrid**2 + Ygrid**2)) + 0.5*np.exp(-((Xgrid-1.2)**2 + (Ygrid+0.7)**2)/0.4) U, s, Vt = np.linalg.svd(Image, full_matrices=False) def rank_k(k): return U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :] for k in [1, 5, 10, 20]: Ak = rank_k(k) rel_error = np.linalg.norm(Image - Ak, 'fro') / np.linalg.norm(Image, 'fro') print(f"k={k:2d}, relative Frobenius error={rel_error:.4f}") ``` ## 14.11 Principal component analysis Suppose $X\in\mathbb R^{N\times d}$ is a data matrix whose rows are observations and whose columns are centered features. Then $$ \frac{1}{N-1}X^TX $$ is the sample covariance matrix. The right singular vectors of $X$ are the principal component directions. ::: {.callout-important} ## Proposition 14.15: PCA directions from SVD The first principal component direction is the first right singular vector $v_1$ of $X$. More generally, the first $k$ principal directions are $$ v_1,\ldots,v_k. $$ The corresponding variances are proportional to $$ \sigma_1^2,\ldots,\sigma_k^2. $$ ::: <details> <summary>Proof</summary> PCA seeks a unit vector $v$ maximizing the variance of projected data: $$ \|Xv\|_2^2=v^TX^TXv, \qquad \|v\|_2=1. $$ By the Rayleigh quotient theorem, the maximum is the largest eigenvalue of $X^TX$, achieved at its corresponding unit eigenvector. This is exactly the first right singular vector $v_1$, and the maximum value is $\lambda_1=\sigma_1^2$. </details> ```{python} np.random.seed(0) Z = np.random.randn(200, 2) Xdata = Z @ np.array([[3, 1], [0, 0.5]]) Xcentered = Xdata - Xdata.mean(axis=0) U, s, Vt = np.linalg.svd(Xcentered, full_matrices=False) print("Principal directions as rows of V^T:") print(Vt) print("Explained variance proportions:") print(s**2 / np.sum(s**2)) ``` ## 14.12 Complex SVD Over complex vector spaces, transpose is replaced by conjugate transpose. ::: {.callout-important} ## Theorem 14.16: Complex SVD For every matrix $A\in\mathbb C^{m\times n}$, there exist unitary matrices $$ U\in\mathbb C^{m\times m}, \qquad V\in\mathbb C^{n\times n}, $$ and a rectangular diagonal matrix $\Sigma\in\mathbb R^{m\times n}$ with nonnegative diagonal entries such that $$ A=U\Sigma V^*. $$ The diagonal entries of $\Sigma$ are the square roots of the eigenvalues of the Hermitian positive semidefinite matrix $A^*A$. ::: <details> <summary>Proof</summary> The matrix $A^*A$ is Hermitian because $$ (A^*A)^*=A^*A. $$ It is positive semidefinite because $$ z^*A^*Az=\|Az\|_2^2\ge0. $$ By the complex spectral theorem, there is a unitary matrix $V$ such that $$ A^*A=VDV^*. $$ Let $\sigma_i=\sqrt{\lambda_i}$. For positive $\sigma_i$, define $$ u_i=\frac{Av_i}{\sigma_i}. $$ The same orthogonality calculation shows these $u_i$ are orthonormal. Complete them to a unitary basis. Then $AV=U\Sigma$, so $A=U\Sigma V^*$. </details> ## 14.13 Challenge questions ::: {.callout-question} ## Challenge 1 Why are singular values always nonnegative, even though eigenvalues can be negative? ::: <details> <summary>Solution</summary> Singular values are square roots of eigenvalues of $A^TA$ or $A^*A$. These Gram matrices are positive semidefinite, so their eigenvalues are nonnegative. Therefore the singular values are nonnegative. </details> ::: {.callout-question} ## Challenge 2 Explain why SVD exists for every matrix, while ordinary diagonalization does not. ::: <details> <summary>Solution</summary> Ordinary diagonalization requires a square matrix with enough linearly independent eigenvectors. Many square matrices are defective, and rectangular matrices do not have ordinary eigenvalue decompositions. SVD applies the spectral theorem to $A^TA$ or $A^*A$, which is always symmetric or Hermitian positive semidefinite. Therefore SVD exists for every matrix. </details> ::: {.callout-question} ## Challenge 3 Suppose the singular values of a matrix are $20,5,1,0.1$. What is the spectral norm error of the best rank-$2$ approximation? ::: <details> <summary>Solution</summary> By Eckart--Young, the best rank-$2$ approximation error in spectral norm is the next singular value: $$ \|A-A_2\|_2=\sigma_3=1. $$ </details> ::: {.callout-question} ## Challenge 4 Suppose $A\in\mathbb R^{2\times2}$ has singular values $8$ and $1$. Describe the image of the unit circle under $A$. ::: <details> <summary>Solution</summary> The image is an ellipse. Its semi-major axis has length $8$ and its semi-minor axis has length $1$. The directions of the axes are the left singular vectors. </details> ## 14.14 Practice problems ::: {.callout-tip} ## Practice Problem 1 Let $$ A=\begin{bmatrix}3&0\\0&2\end{bmatrix}. $$ Find the singular values and describe the image of the unit circle. ::: <details> <summary>Solution</summary> Since $A^TA=\operatorname{diag}(9,4)$, the singular values are $3$ and $2$. The unit circle is sent to the ellipse $$ \frac{x^2}{9}+\frac{y^2}{4}=1. $$ </details> ::: {.callout-tip} ## Practice Problem 2 Let $$ A=\begin{bmatrix}1&1\\1&1\end{bmatrix}. $$ Find the rank from the singular values. ::: <details> <summary>Solution</summary> Compute $$ A^TA=\begin{bmatrix}2&2\\2&2\end{bmatrix}. $$ Its eigenvalues are $4$ and $0$. Therefore the singular values are $2$ and $0$. The rank is $1$. </details> ::: {.callout-tip} ## Practice Problem 3 If a matrix has singular values $7,3,0,0$, what is its rank? What is its spectral norm? ::: <details> <summary>Solution</summary> The rank is the number of nonzero singular values, so the rank is $2$. The spectral norm is the largest singular value, so $\|A\|_2=7$. </details> ::: {.callout-tip} ## Practice Problem 4 Use Python to compute the SVD of $$ A=\begin{bmatrix}1&4\\2&2\\2&-4\end{bmatrix}. $$ Compare the singular values with the hand computation in the notes. ::: <details> <summary>Solution</summary> ```{python} A = np.array([[1, 4], [2, 2], [2, -4]], dtype=float) U, s, Vt = np.linalg.svd(A, full_matrices=True) print(s) ``` The singular values are $6$ and $3$. </details> ## 14.15 AI companion activities Use an AI assistant as a study partner, not as a replacement for your own computation. 1. **Explain the story.** Ask: “Explain why SVD is a spectral theorem for rectangular matrices in three levels: beginner, applied math student, and data scientist.” 2. **Check a proof.** Ask the AI to verify the proof that $\ker(X^TX)=\ker(X)$, then identify the single most important line. 3. **Compare methods.** Ask: “Compare eigenvalue decomposition, Schur decomposition, and singular value decomposition.” 4. **Debug code.** Paste your SVD code and ask the AI to check dimensions of $U$, $\Sigma$, and $V^T$. 5. **Explore applications.** Ask for three examples where SVD appears in machine learning, image processing, or recommendation systems. ## 14.16 Summary The singular value decomposition is one of the central tools of applied linear algebra: $$ A=U\Sigma V^T \qquad \text{or} \qquad A=U\Sigma V^* \text{ over } \mathbb C. $$ It generalizes the spectral theorem to all matrices. It explains geometry, rank, matrix norms, least squares, pseudoinverses, low-rank approximation, compression, and PCA. In modern applied mathematics, data science, and machine learning, SVD is not only a theorem; it is a computational language for extracting structure from data.

14.1 The story: the spectral theorem grows up

14.2 Learning goals

14.3 14.1 The Gram matrix \(X^T X\)

14.4 14.2 Singular values

14.5 14.3 The singular value decomposition

14.6 14.4 Compact and truncated SVD

14.7 14.5 Example: computing an SVD by hand

14.8 14.6 Python computation: SVD and reconstruction

14.9 14.7 Geometry: the unit circle becomes an ellipse

14.10 14.8 Matrix norms and singular values

14.11 14.9 The Moore–Penrose pseudoinverse

14.12 14.10 Low-rank approximation and compression

14.13 14.11 Principal component analysis

14.14 14.12 Complex SVD

14.15 14.13 Challenge questions

14.16 Challenge 1

14.17 Challenge 2

14.18 Challenge 3

14.19 Challenge 4

14.20 14.14 Practice problems

14.21 14.15 AI companion activities

14.22 14.16 Summary