14  Chapter 14: Singular Value Decomposition

Every Matrix Has a Geometry

Author

He Wang

14.1 The story: the spectral theorem grows up

In Chapter 13, symmetric and Hermitian matrices gave us a beautiful story:

\[ A = QDQ^T \qquad \text{or} \qquad A = UDU^*. \]

A symmetric matrix acts by choosing orthogonal directions, stretching each direction by an eigenvalue, and then recombining the result. This is one of the cleanest geometric pictures in linear algebra.

But most matrices in applications are not symmetric. Many are not even square. A data matrix may have rows representing people and columns representing features. An image is a rectangular grid of pixel intensities. A machine-learning design matrix may have thousands of observations and hundreds of variables.

The singular value decomposition, or SVD, says that every matrix still has a spectral-type geometry. The matrix may not have enough eigenvectors, but it always has singular vectors.

The central idea is simple:

To understand a general matrix \(X\), study the symmetric positive semidefinite matrix \(X^T X\).

This chapter develops SVD from this idea and then connects it to geometry, matrix norms, pseudoinverses, least squares, image compression, and principal component analysis.

14.2 Learning goals

After this chapter, you should be able to:

  1. define singular values and singular vectors;
  2. derive the singular value decomposition from the spectral theorem for \(X^T X\);
  3. compute SVD numerically in Python;
  4. interpret SVD geometrically as rotation/reflection, scaling, and rotation/reflection;
  5. use singular values to compute rank and the spectral norm;
  6. use SVD to form low-rank approximations and pseudoinverses;
  7. explain why SVD is important in least squares, image compression, and PCA.

14.3 14.1 The Gram matrix \(X^T X\)

Let \(X\in \mathbb R^{N\times d}\). The columns of \(X\) are vectors in \(\mathbb R^N\). The matrix \(X^T X\) records all dot products among these columns.

NoteDefinition 14.1: Gram matrix

Let \(X\in \mathbb R^{N\times d}\). The matrix

\[ G=X^T X\in \mathbb R^{d\times d} \]

is called the Gram matrix of \(X\).

The Gram matrix is always symmetric:

\[ (X^T X)^T = X^T X. \]

It is also positive semidefinite, because for every vector \(v\in \mathbb R^d\),

\[ v^T X^T X v = (Xv)^T(Xv)=\|Xv\|_2^2\ge 0. \]

ImportantProposition 14.2: Basic properties of \(X^T X\)

Let \(X\in\mathbb R^{N\times d}\). Then:

  1. \(X^T X\) is symmetric.
  2. \(X^T X\) is positive semidefinite.
  3. \(\ker(X^T X)=\ker(X)\).
  4. \(\operatorname{rank}(X^T X)=\operatorname{rank}(X)\).
Proof

Symmetry follows from

\[ (X^T X)^T=X^T(X^T)^T=X^T X. \]

For positive semidefiniteness, compute

\[ v^T X^T Xv=\|Xv\|_2^2\ge 0. \]

If \(Xv=0\), then clearly \(X^T Xv=0\). Conversely, if \(X^T Xv=0\), then

\[ 0=v^T X^T Xv=\|Xv\|_2^2, \]

so \(Xv=0\). Thus the kernels are equal. Rank-nullity gives

\[ \operatorname{rank}(X^T X) =d-\dim\ker(X^T X) =d-\dim\ker(X) =\operatorname{rank}(X). \]

14.4 14.2 Singular values

Since \(X^T X\) is symmetric positive semidefinite, the spectral theorem applies. Therefore \(X^T X\) has an orthonormal eigenbasis and nonnegative eigenvalues.

Suppose

\[ X^T X v_i=\lambda_i v_i, \]

where

\[ \lambda_1\ge \lambda_2\ge\cdots\ge \lambda_p>0=\lambda_{p+1}=\cdots=\lambda_d. \]

The singular values of \(X\) are the square roots of the eigenvalues of \(X^T X\):

\[ \sigma_i=\sqrt{\lambda_i}. \]

NoteDefinition 14.3: Singular values

Let \(X\in\mathbb R^{N\times d}\). The singular values of \(X\) are

\[ \sigma_i=\sqrt{\lambda_i}, \]

where \(\lambda_i\) are the eigenvalues of \(X^T X\), listed in nonincreasing order. Thus

\[ \sigma_1\ge\sigma_2\ge\cdots\ge\sigma_p>0. \]

The rank of \(X\) is the number of nonzero singular values.

The eigenvectors \(v_i\) of \(X^T X\) are called right singular vectors. For each positive singular value, define

\[ u_i=\frac{Xv_i}{\|Xv_i\|_2}=\frac{Xv_i}{\sigma_i}. \]

These vectors are called left singular vectors.

ImportantLemma 14.4: Orthogonality of left singular vectors

For \(i,j\in\{1,\ldots,p\}\),

\[ \langle u_i,u_j\rangle=\delta_{ij}. \]

Proof

Using \(u_i=Xv_i/\sigma_i\), we compute

\[ \langle u_i,u_j\rangle =\frac{1}{\sigma_i\sigma_j}(Xv_i)^T(Xv_j) =\frac{1}{\sigma_i\sigma_j}v_i^T X^T X v_j. \]

Since \(X^T Xv_j=\sigma_j^2v_j\),

\[ \langle u_i,u_j\rangle =\frac{\sigma_j^2}{\sigma_i\sigma_j}v_i^T v_j =\frac{\sigma_j}{\sigma_i}\delta_{ij}. \]

If \(i=j\), this equals \(1\). If \(i\ne j\), this equals \(0\).

14.5 14.3 The singular value decomposition

Let

\[ U=[u_1\ \cdots\ u_N], \qquad V=[v_1\ \cdots\ v_d] \]

be orthogonal matrices obtained by completing the left and right singular vectors to orthonormal bases. Let \(\Sigma\) be the rectangular diagonal matrix whose diagonal entries are the singular values.

ImportantTheorem 14.5: Singular value decomposition

For every matrix \(X\in\mathbb R^{N\times d}\) of rank \(p\), there exist orthogonal matrices

\[ U\in\mathbb R^{N\times N}, \qquad V\in\mathbb R^{d\times d}, \]

and a rectangular diagonal matrix \(\Sigma\in\mathbb R^{N\times d}\) such that

\[ X=U\Sigma V^T. \]

Equivalently,

\[ X=\sigma_1u_1v_1^T+\sigma_2u_2v_2^T+\cdots+\sigma_pu_pv_p^T. \]

Proof

For \(i=1,\ldots,p\), the definition of \(u_i\) gives

\[ Xv_i=\sigma_i u_i. \]

For \(i=p+1,\ldots,d\), the eigenvalue of \(X^T X\) is \(0\), so

\[ 0=v_i^T X^T Xv_i=\|Xv_i\|_2^2. \]

Therefore \(Xv_i=0\). Hence

\[ XV=[Xv_1\ \cdots\ Xv_d] =[\sigma_1u_1\ \cdots\ \sigma_pu_p\ 0\ \cdots\ 0] =U\Sigma. \]

Multiplying by \(V^T\) on the right gives

\[ X=U\Sigma V^T. \]

The rank-one expansion follows by multiplying out \(U\Sigma V^T\).

TipInterpretation

The SVD says that every linear transformation can be decomposed as

\[ X=U\Sigma V^T. \]

Read this from right to left:

  1. \(V^T\) rotates or reflects the input space.
  2. \(\Sigma\) scales along coordinate directions.
  3. \(U\) rotates or reflects the output space.

14.6 14.4 Compact and truncated SVD

The full SVD may contain many zero singular values. In computation, we often keep only the nonzero part.

NoteDefinition 14.6: Compact SVD

Let \(X\in\mathbb R^{N\times d}\) have rank \(p\). The compact SVD is

\[ X=U_p\Sigma_pV_p^T, \]

where

\[ U_p=[u_1\ \cdots\ u_p], \qquad V_p=[v_1\ \cdots\ v_p], \qquad \Sigma_p=\operatorname{diag}(\sigma_1,\ldots,\sigma_p). \]

NoteDefinition 14.7: Rank-\(k\) truncation

For \(1\le k\le p\), the rank-\(k\) truncation of \(X\) is

\[ X_k=\sigma_1u_1v_1^T+\cdots+\sigma_ku_kv_k^T. \]

This keeps the first \(k\) singular directions and discards the remaining ones.

14.7 14.5 Example: computing an SVD by hand

Consider

\[ M=\begin{bmatrix} 0&1&1\\ 1&1&0 \end{bmatrix}. \]

Then

\[ M^TM= \begin{bmatrix} 1&1&0\\ 1&2&1\\ 0&1&1 \end{bmatrix}. \]

The eigenvalues of \(M^TM\) are \(3,1,0\). Therefore the singular values are

\[ \sigma_1=\sqrt3, \qquad \sigma_2=1. \]

One possible orthonormal eigenbasis is

\[ v_1=\frac{1}{\sqrt6}\begin{bmatrix}1\\2\\1\end{bmatrix}, \qquad v_2=\frac{1}{\sqrt2}\begin{bmatrix}1\\0\\-1\end{bmatrix}, \qquad v_3=\frac{1}{\sqrt3}\begin{bmatrix}1\\-1\\1\end{bmatrix}. \]

Then

\[ u_1=\frac{Mv_1}{\sqrt3}=\frac{1}{\sqrt2}\begin{bmatrix}1\\1\end{bmatrix}, \qquad u_2=Mv_2=\frac{1}{\sqrt2}\begin{bmatrix}-1\\1\end{bmatrix}. \]

Thus one SVD is

\[ M=U\Sigma V^T, \]

where

\[ U=\frac{1}{\sqrt2} \begin{bmatrix} 1&-1\\ 1&1 \end{bmatrix}, \qquad \Sigma= \begin{bmatrix} \sqrt3&0&0\\ 0&1&0 \end{bmatrix}, \]

and

\[ V= \begin{bmatrix} 1/\sqrt6&1/\sqrt2&1/\sqrt3\\ 2/\sqrt6&0&-1/\sqrt3\\ 1/\sqrt6&-1/\sqrt2&1/\sqrt3 \end{bmatrix}. \]

14.8 14.6 Python computation: SVD and reconstruction

Code
import numpy as np

M = np.array([[0, 1, 1],
              [1, 1, 0]], dtype=float)

U, s, Vt = np.linalg.svd(M, full_matrices=True)

print("Singular values:", s)
print("U =")
print(U)
print("V^T =")
print(Vt)

Sigma = np.zeros_like(M, dtype=float)
Sigma[:len(s), :len(s)] = np.diag(s)

M_reconstructed = U @ Sigma @ Vt
print("Reconstructed M =")
print(np.round(M_reconstructed, 6))
Singular values: [1.73205081 1.        ]
U =
[[-0.70710678 -0.70710678]
 [-0.70710678  0.70710678]]
V^T =
[[-4.08248290e-01 -8.16496581e-01 -4.08248290e-01]
 [ 7.07106781e-01  1.65204549e-16 -7.07106781e-01]
 [ 5.77350269e-01 -5.77350269e-01  5.77350269e-01]]
Reconstructed M =
[[-0.  1.  1.]
 [ 1.  1. -0.]]

The signs of singular vectors may differ from a hand computation. This is not a contradiction. If \(u_i\) and \(v_i\) are both replaced by their negatives, the rank-one matrix \(u_iv_i^T\) does not change.

14.9 14.7 Geometry: the unit circle becomes an ellipse

ImportantTheorem 14.8: Image of the unit circle

Let \(A\in\mathbb R^{2\times2}\) be invertible. Then the image of the unit circle under \(A\) is an ellipse. The lengths of the semi-major and semi-minor axes are the singular values of \(A\).

Proof

Let \(A=U\Sigma V^T\), where

\[ \Sigma=\begin{bmatrix}\sigma_1&0\\0&\sigma_2\end{bmatrix}, \qquad \sigma_1\ge\sigma_2>0. \]

The map \(V^T\) preserves the unit circle because it is orthogonal. The diagonal map \(\Sigma\) sends the unit circle to the ellipse

\[ \frac{y_1^2}{\sigma_1^2}+\frac{y_2^2}{\sigma_2^2}=1. \]

The map \(U\) rotates or reflects this ellipse without changing its axis lengths. Therefore the semi-axis lengths are \(\sigma_1\) and \(\sigma_2\).

Code
import numpy as np
import matplotlib.pyplot as plt

A = np.array([[2.0, 1.0],
              [0.5, 1.5]])

t = np.linspace(0, 2*np.pi, 400)
circle = np.vstack([np.cos(t), np.sin(t)])
ellipse = A @ circle

plt.figure(figsize=(5,5))
plt.plot(circle[0], circle[1], label="unit circle")
plt.plot(ellipse[0], ellipse[1], label="A applied to circle")
plt.axis("equal")
plt.grid(True)
plt.legend()
plt.title("A linear map sends the unit circle to an ellipse")
plt.show()

print("Singular values:", np.linalg.svd(A, compute_uv=False))

Singular values: [2.55833637 0.97719754]

14.10 14.8 Matrix norms and singular values

NoteDefinition 14.9: Induced matrix norm

For a vector norm \(\|\cdot\|_p\), the induced matrix norm is

\[ \|A\|_p =\max_{x\ne0}\frac{\|Ax\|_p}{\|x\|_p} =\max_{\|x\|_p=1}\|Ax\|_p. \]

ImportantTheorem 14.10: Spectral norm

For every matrix \(A\), the induced \(2\)-norm is the largest singular value:

\[ \|A\|_2=\sigma_1(A). \]

Proof

Let \(A=U\Sigma V^T\). Since \(U\) and \(V\) are orthogonal, they preserve Euclidean norms. Thus

\[ \|Ax\|_2=\|U\Sigma V^Tx\|_2=\|\Sigma V^Tx\|_2. \]

Let \(y=V^Tx\). Then \(\|y\|_2=\|x\|_2\). If \(\|y\|_2=1\), then

\[ \|\Sigma y\|_2^2 =\sum_i \sigma_i^2y_i^2 \le \sigma_1^2\sum_i y_i^2 =\sigma_1^2. \]

Therefore \(\|A\|_2\le \sigma_1\). Equality is achieved when \(y=e_1\), equivalently \(x=v_1\). Hence \(\|A\|_2=\sigma_1\).

14.11 14.9 The Moore–Penrose pseudoinverse

An invertible square matrix has an inverse. But rectangular or singular matrices usually do not. The SVD gives a replacement: the pseudoinverse.

NoteDefinition 14.11: Moore–Penrose pseudoinverse

Let \(A\in\mathbb R^{m\times n}\). A matrix \(A^+\in\mathbb R^{n\times m}\) is called the Moore–Penrose pseudoinverse of \(A\) if it satisfies

\[ AA^+A=A, \qquad A^+AA^+=A^+, \]

\[ (AA^+)^T=AA^+, \qquad (A^+A)^T=A^+A. \]

ImportantTheorem 14.12: Pseudoinverse from SVD

Let

\[ A=U\Sigma V^T \]

be an SVD of \(A\). Define \(\Sigma^+\) by replacing each nonzero singular value \(\sigma_i\) by \(1/\sigma_i\) and transposing the rectangular diagonal shape. Then

\[ A^+=V\Sigma^+U^T. \]

Code
A = np.array([[1, 1],
              [1, 2],
              [1, 3]], dtype=float)
b = np.array([1, 2, 2], dtype=float)

x_pinv = np.linalg.pinv(A) @ b
x_lstsq, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)

print("Pseudoinverse solution:", x_pinv)
print("lstsq solution:", x_lstsq)
print("Residual norm:", np.linalg.norm(A @ x_pinv - b))
Pseudoinverse solution: [0.66666667 0.5       ]
lstsq solution: [0.66666667 0.5       ]
Residual norm: 0.408248290463863
ImportantTheorem 14.13: Least-squares and minimum-norm solution

For \(A\in\mathbb R^{m\times n}\) and \(b\in\mathbb R^m\), the vector

\[ x^+=A^+b \]

is a least-squares solution of \(Ax=b\). Among all least-squares solutions, it has minimum Euclidean norm.

Proof

Write \(A=U\Sigma V^T\), \(y=V^Tx\), and \(c=U^Tb\). Orthogonal transformations preserve Euclidean lengths, so

\[ \|Ax-b\|_2=\|U\Sigma V^Tx-b\|_2=\|\Sigma y-c\|_2. \]

This is a diagonal least-squares problem. For each nonzero singular value, the best choice is

\[ y_i=\frac{c_i}{\sigma_i}. \]

For zero singular values, the residual does not depend on \(y_i\), so the minimum-norm choice is \(y_i=0\). Transforming back gives

\[ x=V\Sigma^+U^Tb=A^+b. \]

14.12 14.10 Low-rank approximation and compression

The SVD orders information by singular value size. Large singular values represent strong directions; small singular values represent weaker directions. This makes SVD ideal for compression.

ImportantTheorem 14.14: Eckart–Young theorem

Let

\[ A=\sigma_1u_1v_1^T+\cdots+\sigma_pu_pv_p^T \]

be the SVD of \(A\), with \(\sigma_1\ge\cdots\ge\sigma_p>0\). For \(k<p\), define

\[ A_k=\sigma_1u_1v_1^T+\cdots+\sigma_ku_kv_k^T. \]

Then \(A_k\) is a best rank-\(k\) approximation to \(A\) in the spectral norm:

\[ \min_{\operatorname{rank}(B)\le k}\|A-B\|_2 =\|A-A_k\|_2 =\sigma_{k+1}. \]

It is also best in the Frobenius norm:

\[ \min_{\operatorname{rank}(B)\le k}\|A-B\|_F =\left(\sum_{i=k+1}^p\sigma_i^2\right)^{1/2}. \]

Proof idea

Orthogonal transformations preserve spectral and Frobenius norms. Therefore the approximation problem can be transformed into the problem of approximating the diagonal matrix \(\Sigma\). The best rank-\(k\) approximation keeps the largest \(k\) diagonal entries and discards the rest. Transforming back gives \(A_k\).

Code
# A small synthetic image-like matrix
x = np.linspace(-3, 3, 80)
y = np.linspace(-3, 3, 80)
Xgrid, Ygrid = np.meshgrid(x, y)
Image = np.exp(-(Xgrid**2 + Ygrid**2)) + 0.5*np.exp(-((Xgrid-1.2)**2 + (Ygrid+0.7)**2)/0.4)

U, s, Vt = np.linalg.svd(Image, full_matrices=False)

def rank_k(k):
    return U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]

for k in [1, 5, 10, 20]:
    Ak = rank_k(k)
    rel_error = np.linalg.norm(Image - Ak, 'fro') / np.linalg.norm(Image, 'fro')
    print(f"k={k:2d}, relative Frobenius error={rel_error:.4f}")
k= 1, relative Frobenius error=0.1805
k= 5, relative Frobenius error=0.0000
k=10, relative Frobenius error=0.0000
k=20, relative Frobenius error=0.0000

14.13 14.11 Principal component analysis

Suppose \(X\in\mathbb R^{N\times d}\) is a data matrix whose rows are observations and whose columns are centered features. Then

\[ \frac{1}{N-1}X^TX \]

is the sample covariance matrix. The right singular vectors of \(X\) are the principal component directions.

ImportantProposition 14.15: PCA directions from SVD

The first principal component direction is the first right singular vector \(v_1\) of \(X\). More generally, the first \(k\) principal directions are

\[ v_1,\ldots,v_k. \]

The corresponding variances are proportional to

\[ \sigma_1^2,\ldots,\sigma_k^2. \]

Proof

PCA seeks a unit vector \(v\) maximizing the variance of projected data:

\[ \|Xv\|_2^2=v^TX^TXv, \qquad \|v\|_2=1. \]

By the Rayleigh quotient theorem, the maximum is the largest eigenvalue of \(X^TX\), achieved at its corresponding unit eigenvector. This is exactly the first right singular vector \(v_1\), and the maximum value is \(\lambda_1=\sigma_1^2\).

Code
np.random.seed(0)
Z = np.random.randn(200, 2)
Xdata = Z @ np.array([[3, 1], [0, 0.5]])
Xcentered = Xdata - Xdata.mean(axis=0)

U, s, Vt = np.linalg.svd(Xcentered, full_matrices=False)
print("Principal directions as rows of V^T:")
print(Vt)
print("Explained variance proportions:")
print(s**2 / np.sum(s**2))
Principal directions as rows of V^T:
[[-0.94842385 -0.31700504]
 [ 0.31700504 -0.94842385]]
Explained variance proportions:
[0.97869706 0.02130294]

14.14 14.12 Complex SVD

Over complex vector spaces, transpose is replaced by conjugate transpose.

ImportantTheorem 14.16: Complex SVD

For every matrix \(A\in\mathbb C^{m\times n}\), there exist unitary matrices

\[ U\in\mathbb C^{m\times m}, \qquad V\in\mathbb C^{n\times n}, \]

and a rectangular diagonal matrix \(\Sigma\in\mathbb R^{m\times n}\) with nonnegative diagonal entries such that

\[ A=U\Sigma V^*. \]

The diagonal entries of \(\Sigma\) are the square roots of the eigenvalues of the Hermitian positive semidefinite matrix \(A^*A\).

Proof

The matrix \(A^*A\) is Hermitian because

\[ (A^*A)^*=A^*A. \]

It is positive semidefinite because

\[ z^*A^*Az=\|Az\|_2^2\ge0. \]

By the complex spectral theorem, there is a unitary matrix \(V\) such that

\[ A^*A=VDV^*. \]

Let \(\sigma_i=\sqrt{\lambda_i}\). For positive \(\sigma_i\), define

\[ u_i=\frac{Av_i}{\sigma_i}. \]

The same orthogonality calculation shows these \(u_i\) are orthonormal. Complete them to a unitary basis. Then \(AV=U\Sigma\), so \(A=U\Sigma V^*\).

14.15 14.13 Challenge questions

14.16 Challenge 1

Why are singular values always nonnegative, even though eigenvalues can be negative?

Solution

Singular values are square roots of eigenvalues of \(A^TA\) or \(A^*A\). These Gram matrices are positive semidefinite, so their eigenvalues are nonnegative. Therefore the singular values are nonnegative.

14.17 Challenge 2

Explain why SVD exists for every matrix, while ordinary diagonalization does not.

Solution

Ordinary diagonalization requires a square matrix with enough linearly independent eigenvectors. Many square matrices are defective, and rectangular matrices do not have ordinary eigenvalue decompositions. SVD applies the spectral theorem to \(A^TA\) or \(A^*A\), which is always symmetric or Hermitian positive semidefinite. Therefore SVD exists for every matrix.

14.18 Challenge 3

Suppose the singular values of a matrix are \(20,5,1,0.1\). What is the spectral norm error of the best rank-\(2\) approximation?

Solution

By Eckart–Young, the best rank-\(2\) approximation error in spectral norm is the next singular value:

\[ \|A-A_2\|_2=\sigma_3=1. \]

14.19 Challenge 4

Suppose \(A\in\mathbb R^{2\times2}\) has singular values \(8\) and \(1\). Describe the image of the unit circle under \(A\).

Solution

The image is an ellipse. Its semi-major axis has length \(8\) and its semi-minor axis has length \(1\). The directions of the axes are the left singular vectors.

14.20 14.14 Practice problems

TipPractice Problem 1

Let

\[ A=\begin{bmatrix}3&0\\0&2\end{bmatrix}. \]

Find the singular values and describe the image of the unit circle.

Solution

Since \(A^TA=\operatorname{diag}(9,4)\), the singular values are \(3\) and \(2\). The unit circle is sent to the ellipse

\[ \frac{x^2}{9}+\frac{y^2}{4}=1. \]

TipPractice Problem 2

Let

\[ A=\begin{bmatrix}1&1\\1&1\end{bmatrix}. \]

Find the rank from the singular values.

Solution

Compute

\[ A^TA=\begin{bmatrix}2&2\\2&2\end{bmatrix}. \]

Its eigenvalues are \(4\) and \(0\). Therefore the singular values are \(2\) and \(0\). The rank is \(1\).

TipPractice Problem 3

If a matrix has singular values \(7,3,0,0\), what is its rank? What is its spectral norm?

Solution

The rank is the number of nonzero singular values, so the rank is \(2\). The spectral norm is the largest singular value, so \(\|A\|_2=7\).

TipPractice Problem 4

Use Python to compute the SVD of

\[ A=\begin{bmatrix}1&4\\2&2\\2&-4\end{bmatrix}. \]

Compare the singular values with the hand computation in the notes.

Solution
Code
A = np.array([[1, 4], [2, 2], [2, -4]], dtype=float)
U, s, Vt = np.linalg.svd(A, full_matrices=True)
print(s)
[6. 3.]

The singular values are \(6\) and \(3\).

14.21 14.15 AI companion activities

Use an AI assistant as a study partner, not as a replacement for your own computation.

  1. Explain the story. Ask: “Explain why SVD is a spectral theorem for rectangular matrices in three levels: beginner, applied math student, and data scientist.”
  2. Check a proof. Ask the AI to verify the proof that \(\ker(X^TX)=\ker(X)\), then identify the single most important line.
  3. Compare methods. Ask: “Compare eigenvalue decomposition, Schur decomposition, and singular value decomposition.”
  4. Debug code. Paste your SVD code and ask the AI to check dimensions of \(U\), \(\Sigma\), and \(V^T\).
  5. Explore applications. Ask for three examples where SVD appears in machine learning, image processing, or recommendation systems.

14.22 14.16 Summary

The singular value decomposition is one of the central tools of applied linear algebra:

\[ A=U\Sigma V^T \qquad \text{or} \qquad A=U\Sigma V^* \text{ over } \mathbb C. \]

It generalizes the spectral theorem to all matrices. It explains geometry, rank, matrix norms, least squares, pseudoinverses, low-rank approximation, compression, and PCA. In modern applied mathematics, data science, and machine learning, SVD is not only a theorem; it is a computational language for extracting structure from data.