17  Chapter 17: Grassmannians and Distances Between Subspaces

When data are not points, but linear worlds

Author

He Wang

18 The story: comparing directions, not only points

In earlier chapters, a vector was often the main object. A vector could be a point, a signal, an image, a list of coefficients, or a feature representation.

But in many modern applications, the important object is not one vector. It is a subspace.

A data set with columns

\[ A=[\vec a_1\ \vec a_2\ \cdots\ \vec a_k]\in \mathbb R^{n\times k} \]

carries the linear information

\[ \operatorname{im}(A)=\operatorname{span}\{\vec a_1,\ldots,\vec a_k\}\subseteq \mathbb R^n. \]

In PCA, the first few principal components span a low-dimensional subspace. In computer vision, several images of the same object may span a subspace. In signal processing, a family of observations may be represented by a subspace. In machine learning, a group of embeddings may be compared by comparing the subspaces they generate.

So the central question of this chapter is:

ImportantGuiding question

How do we measure the distance between two linear subspaces?

A first attempt fails immediately. If \(U,W\subseteq \mathbb R^n\) are linear subspaces, then both contain \(\vec 0\). Therefore

\[ \min_{\vec u\in U,\ \vec w\in W}\|\vec u-\vec w\|=0 \]

for every pair of subspaces. This quantity cannot distinguish two identical subspaces from two very different subspaces. To compare subspaces, we must compare directions.

19 Metric spaces and the problem of subspace distance

19.1 Definition 17.1: Metric

Let \(S\) be a set. A function

\[ d:S\times S\to \mathbb R \]

is called a metric if for all \(x,y,z\in S\):

  1. \(d(x,y)\ge 0\), and \(d(x,y)=0\) if and only if \(x=y\);
  2. \(d(x,y)=d(y,x)\);
  3. \(d(x,z)\le d(x,y)+d(y,z)\).

A set equipped with a metric is called a metric space.

The Euclidean metric on \(\mathbb R^n\) is

\[ d(\vec x,\vec y)=\|\vec x-\vec y\|_2. \]

But the objects in this chapter are not vectors. They are subspaces. Thus we need a metric on a space whose points are themselves subspaces.

20 The Grassmannian

20.1 Definition 17.2: Grassmannian

For integers \(0\le k\le n\), the Grassmannian \(\operatorname{Gr}(k,n)\) is the set of all \(k\)-dimensional linear subspaces of \(\mathbb R^n\):

\[ \operatorname{Gr}(k,n)=\{U\subseteq \mathbb R^n: U\text{ is a linear subspace and }\dim U=k\}. \]

A point of \(\operatorname{Gr}(k,n)\) is not a vector. It is an entire \(k\)-dimensional subspace.

20.2 Example 17.3: Lines through the origin

The Grassmannian \(\operatorname{Gr}(1,n)\) is the set of all lines through the origin in \(\mathbb R^n\). A nonzero vector \(\vec u\) determines the same line as every nonzero scalar multiple \(c\vec u\).

Thus \(\operatorname{Gr}(1,n)\) remembers direction, but forgets length and sign.

NoteStory interpretation

A nonzero vector is like a road sign pointing in a direction. The line it spans is the direction itself. If we multiply the sign by \(2\), \(-3\), or \(100\), the direction is still the same line.

21 Orthonormal frames and the Stiefel manifold

A subspace can be represented by a basis matrix. For numerical work, the best basis matrix is usually an orthonormal one.

21.1 Definition 17.4: Stiefel manifold

The real Stiefel manifold \(\operatorname{St}(k,n)\) is the set of all \(n\times k\) matrices with orthonormal columns:

\[ \operatorname{St}(k,n)=\{Q\in \mathbb R^{n\times k}: Q^TQ=I_k\}. \]

Each \(Q\in \operatorname{St}(k,n)\) determines a \(k\)-dimensional subspace \(\operatorname{im}(Q)\). But many different orthonormal bases represent the same subspace.

21.2 Proposition 17.5: Grassmannian as a quotient

The Grassmannian can be represented as

\[ \operatorname{Gr}(k,n)=\operatorname{St}(k,n)/O(k), \]

where \(O(k)\) is the group of \(k\times k\) orthogonal matrices.

Proof

Let \(Q\in \operatorname{St}(k,n)\). Its columns form an orthonormal basis for the subspace \(U=\operatorname{im}(Q)\).

If \(R\in O(k)\), then \(QR\) also has orthonormal columns because

\[ (QR)^T(QR)=R^TQ^TQR=R^TIR=I. \]

Also \(\operatorname{im}(QR)=\operatorname{im}(Q)\), because multiplying on the right by \(R\) only rotates the coordinate system inside the same subspace.

Conversely, suppose \(Q_1,Q_2\in \operatorname{St}(k,n)\) have the same column space. Then each column of \(Q_2\) is a linear combination of the columns of \(Q_1\), so \(Q_2=Q_1R\) for some \(k\times k\) matrix \(R\). Since both have orthonormal columns,

\[ I=Q_2^TQ_2=R^TQ_1^TQ_1R=R^TR. \]

Thus \(R\in O(k)\). Therefore quotienting by \(O(k)\) identifies exactly the different orthonormal bases of the same subspace.

22 Principal angles between subspaces

For unit vectors \(\vec u\) and \(\vec v\), the angle \(\theta\) between them is determined by

\[ \cos\theta=\vec u^T\vec v. \]

For two \(k\)-dimensional subspaces, one angle is usually not enough. We need a list of angles.

22.1 Definition 17.6: Principal angles

Let \(U,W\in \operatorname{Gr}(k,n)\). The principal angles

\[ 0\le \theta_1\le \theta_2\le \cdots\le \theta_k\le \frac{\pi}{2} \]

are defined recursively by

\[ \cos\theta_j= \max \left\{ \langle \vec u,\vec w\rangle: \begin{array}{l} \vec u\in U,\ \vec w\in W,\\ \|\vec u\|=\|\vec w\|=1,\\ \vec u\perp \vec u_1,\ldots,\vec u_{j-1},\\ \vec w\perp \vec w_1,\ldots,\vec w_{j-1} \end{array} \right\}. \]

The maximizing vectors are called principal vectors.

The first principal angle finds the closest pair of directions. The second angle finds the closest pair after removing the first pair, and so on.

23 Computing principal angles by QR and SVD

The recursive definition is conceptually clear, but it is not how we compute principal angles in practice. Numerically, we use QR and SVD.

23.1 Theorem 17.7: Principal angles from SVD

Let \(U,W\in \operatorname{Gr}(k,n)\). Let

\[ Q_U,Q_W\in \mathbb R^{n\times k} \]

have orthonormal columns spanning \(U\) and \(W\). Let

\[ Q_U^TQ_W=A_0\Sigma B_0^T \]

be a singular value decomposition, where

\[ \Sigma=\operatorname{diag}(\sigma_1,\ldots,\sigma_k), \qquad \sigma_1\ge\sigma_2\ge\cdots\ge\sigma_k\ge 0. \]

Then the principal angles satisfy

\[ \cos\theta_i=\sigma_i, \qquad i=1,\ldots,k. \]

Proof

Every unit vector in \(U\) can be written as \(Q_U\vec x\) with \(\|\vec x\|=1\). Every unit vector in \(W\) can be written as \(Q_W\vec y\) with \(\|\vec y\|=1\). Therefore

\[ \langle Q_U\vec x,Q_W\vec y\rangle =\vec x^TQ_U^TQ_W\vec y. \]

The first principal angle is determined by

\[ \max_{\|\vec x\|=\|\vec y\|=1} \vec x^T(Q_U^TQ_W)\vec y. \]

By the variational characterization of singular values, this maximum is the largest singular value \(\sigma_1\). After imposing orthogonality to the previously selected vectors, the same argument gives \(\sigma_2,\ldots,\sigma_k\). Hence \(\cos\theta_i=\sigma_i\).

TipComputational recipe

Given data matrices \(A\) and \(B\):

  1. Compute orthonormal bases \(Q_A\) and \(Q_B\) for their column spaces using QR.
  2. Compute the singular values of \(Q_A^TQ_B\).
  3. Clip the singular values into \([0,1]\) to avoid roundoff errors.
  4. Compute \(\theta_i=\cos^{-1}(\sigma_i)\).

24 Worked examples

24.1 Example 17.8: Two lines in the plane

Let

\[ U=\operatorname{span}\left\{\begin{bmatrix}1\\0\end{bmatrix}\right\}, \qquad W=\operatorname{span}\left\{\begin{bmatrix}\cos\alpha\\ \sin\alpha\end{bmatrix}\right\}, \]

where \(0\le \alpha\le \pi/2\). Then

\[ Q_U^TQ_W=egin{bmatrix}1&0\end{bmatrix} \begin{bmatrix}\cos\alpha\\ \sin\alpha\end{bmatrix}=\cos\alpha. \]

The only singular value is \(\sigma_1=\cos\alpha\), so \(\theta_1=\alpha\).

24.2 Example 17.9: Two planes in \(\mathbb R^3\)

Let

\[ U=\operatorname{span}\{\vec e_1,\vec e_2\}, \qquad W=\operatorname{span}\{\vec e_1,\cos\alpha\,\vec e_2+\sin\alpha\,\vec e_3\}. \]

The subspaces share the direction \(\vec e_1\), so \(\theta_1=0\). The second direction is rotated by angle \(\alpha\), so \(\theta_2=\alpha\).

Matrix computation confirms this:

\[ Q_U=\begin{bmatrix}1&0\\0&1\\0&0\end{bmatrix}, \qquad Q_W=\begin{bmatrix}1&0\\0&\cos\alpha\\0&\sin\alpha\end{bmatrix}. \]

Then

\[ Q_U^TQ_W=\begin{bmatrix}1&0\\0&\cos\alpha\end{bmatrix}, \]

whose singular values are \(1\) and \(\cos\alpha\).

25 Distances on the Grassmannian

Once the principal angles are known, several distances can be defined.

25.1 Definition 17.10: Grassmann geodesic distance

For \(U,W\in \operatorname{Gr}(k,n)\) with principal angles \(\theta_1,\ldots,\theta_k\), the Grassmann geodesic distance is

\[ d_{\operatorname{Gr}}(U,W)=\left(\sum_{i=1}^k \theta_i^2\right)^{1/2}. \]

25.2 Definition 17.11: Common Grassmannian distances

Let \(\theta_1,\ldots,\theta_k\) be the principal angles between \(U\) and \(W\).

\[ d_{\mathrm{Asimov}}(U,W)=\theta_k. \]

\[ d_{\mathrm{chordal}}(U,W)=\left(\sum_{i=1}^k \sin^2\theta_i\right)^{1/2}. \]

\[ d_{\mathrm{Fubini\text{-}Study}}(U,W)=\cos^{-1}\left(\prod_{i=1}^k\cos\theta_i\right). \]

\[ d_{\mathrm{projection}}(U,W)=\sin\theta_k. \]

\[ d_{\mathrm{Procrustes}}(U,W)=2\left(\sum_{i=1}^k \sin^2(\theta_i/2)\right)^{1/2}. \]

For nearby subspaces, \(\sin\theta_i\approx \theta_i\), so the chordal distance and geodesic distance are close.

26 Projection matrices and chordal distance

A subspace \(U\in \operatorname{Gr}(k,n)\) can be represented by its orthogonal projection matrix. If \(Q_U\) has orthonormal columns spanning \(U\), then

\[ P_U=Q_UQ_U^T. \]

26.1 Proposition 17.12: Projection representation is basis-independent

If \(Q_U\) and \(\widetilde Q_U\) are two orthonormal basis matrices for the same subspace \(U\), then

\[ Q_UQ_U^T=\widetilde Q_U\widetilde Q_U^T. \]

Proof

Since \(Q_U\) and \(\widetilde Q_U\) have the same column space, there exists \(R\in O(k)\) such that \(\widetilde Q_U=Q_UR\). Therefore

\[ \widetilde Q_U\widetilde Q_U^T =Q_URR^TQ_U^T =Q_UIQ_U^T =Q_UQ_U^T. \]

26.2 Theorem 17.13: Chordal distance from projection matrices

Let \(U,W\in \operatorname{Gr}(k,n)\), and let \(P_U,P_W\) be their orthogonal projection matrices. Then

\[ \frac{1}{\sqrt 2}\|P_U-P_W\|_F =\left(\sum_{i=1}^k\sin^2\theta_i\right)^{1/2}. \]

Thus

\[ d_{\mathrm{chordal}}(U,W)=\frac{1}{\sqrt 2}\|P_U-P_W\|_F. \]

Proof

Let \(P_U=Q_UQ_U^T\) and \(P_W=Q_WQ_W^T\). Since projection matrices are symmetric and idempotent,

\[ \|P_U-P_W\|_F^2 =\operatorname{tr}((P_U-P_W)^T(P_U-P_W)). \]

Thus

\[ \|P_U-P_W\|_F^2 =\operatorname{tr}(P_U)+\operatorname{tr}(P_W)-2\operatorname{tr}(P_UP_W). \]

Because \(P_U\) and \(P_W\) project onto \(k\)-dimensional subspaces,

\[ \operatorname{tr}(P_U)=\operatorname{tr}(P_W)=k. \]

Also,

\[ \operatorname{tr}(P_UP_W) =\operatorname{tr}(Q_UQ_U^TQ_WQ_W^T) =\operatorname{tr}(Q_U^TQ_WQ_W^TQ_U) =\|Q_U^TQ_W\|_F^2. \]

The singular values of \(Q_U^TQ_W\) are \(\cos\theta_1,\ldots,\cos\theta_k\). Hence

\[ \operatorname{tr}(P_UP_W)=\sum_{i=1}^k\cos^2\theta_i. \]

Therefore

\[ \|P_U-P_W\|_F^2 =2k-2\sum_{i=1}^k\cos^2\theta_i =2\sum_{i=1}^k\sin^2\theta_i. \]

Taking square roots gives the result.

27 Python computation

Code
import numpy as np


def orthonormal_basis(A, tol=1e-12):
    """Return an orthonormal basis for the column space of A."""
    Q, R = np.linalg.qr(A)
    diag = np.abs(np.diag(R))
    r = int(np.sum(diag > tol))
    return Q[:, :r]


def principal_angles(A, B):
    """Principal angles between column spaces of A and B."""
    QA = orthonormal_basis(A)
    QB = orthonormal_basis(B)
    if QA.shape[1] != QB.shape[1]:
        raise ValueError("This function assumes equal-dimensional subspaces.")
    s = np.linalg.svd(QA.T @ QB, compute_uv=False)
    s = np.clip(s, 0.0, 1.0)
    return np.arccos(s)


def distances_from_angles(theta):
    return {
        "geodesic": np.linalg.norm(theta),
        "chordal": np.linalg.norm(np.sin(theta)),
        "projection": np.sin(np.max(theta)),
        "Asimov": np.max(theta),
        "Procrustes": 2*np.linalg.norm(np.sin(theta/2)),
        "Fubini-Study": np.arccos(np.prod(np.cos(theta)))
    }

alpha = np.pi/6
A = np.array([[1,0],[0,1],[0,0]], dtype=float)
B = np.array([[1,0],[0,np.cos(alpha)],[0,np.sin(alpha)]], dtype=float)

theta = principal_angles(A, B)
print("Principal angles in radians:", theta)
print("Principal angles in degrees:", theta*180/np.pi)
print(distances_from_angles(theta))
Principal angles in radians: [0.         0.52359878]
Principal angles in degrees: [ 0. 30.]
{'geodesic': 0.5235987755982985, 'chordal': 0.49999999999999967, 'projection': 0.49999999999999967, 'Asimov': 0.5235987755982985, 'Procrustes': 0.5176380902050411, 'Fubini-Study': 0.5235987755982985}

27.1 Comparing PCA subspaces

Code
np.random.seed(7)

# Two synthetic data clouds in R^5 with dominant 2D structure.
n = 5
m = 200
U_true = orthonormal_basis(np.random.randn(n, 2))
noise = 0.15*np.random.randn(n, m)
X = U_true @ np.random.randn(2, m) + noise

# Rotate the true subspace slightly to create a second data set.
R = np.eye(n)
R[1,1] = np.cos(0.35)
R[1,2] = -np.sin(0.35)
R[2,1] = np.sin(0.35)
R[2,2] = np.cos(0.35)
V_true = R @ U_true
Y = V_true @ np.random.randn(2, m) + 0.15*np.random.randn(n, m)

# PCA by SVD of centered data.
Xc = X - X.mean(axis=1, keepdims=True)
Yc = Y - Y.mean(axis=1, keepdims=True)
Ux, sx, _ = np.linalg.svd(Xc, full_matrices=False)
Uy, sy, _ = np.linalg.svd(Yc, full_matrices=False)

PX = Ux[:, :2]
PY = Uy[:, :2]
theta_pca = principal_angles(PX, PY)
print("PCA subspace angles in degrees:", theta_pca*180/np.pi)
print("Grassmann distance:", np.linalg.norm(theta_pca))
PCA subspace angles in degrees: [4.31762243 9.17458385]
Grassmann distance: 0.17697230014103085

28 Classical challenge questions

28.1 Challenge 17.1: Why point-to-point distance fails

Let \(U,W\subseteq \mathbb R^n\) be two linear subspaces. Explain why

\[ \min_{\vec u\in U,\ \vec w\in W}\|\vec u-\vec w\| \]

is not a useful distance between linear subspaces.

Solution

Since \(U\) and \(W\) are linear subspaces, both contain the zero vector. Choosing \(\vec u=\vec 0\) and \(\vec w=\vec 0\) gives

\[ \|\vec u-\vec w\|=0. \]

Therefore the minimum is always \(0\), no matter how different the two subspaces are. A useful subspace distance must compare directions.

28.2 Challenge 17.2: Two planes sharing a line

Let

\[ U=\operatorname{span}\{\vec e_1,\vec e_2\}, \qquad W=\operatorname{span}\{\vec e_1,\cos\alpha\,\vec e_2+\sin\alpha\,\vec e_3\}. \]

Find the principal angles and the Grassmann geodesic distance.

Solution

The subspaces share the direction \(\vec e_1\), so \(\theta_1=0\). The remaining direction is rotated by angle \(\alpha\), so \(\theta_2=\alpha\). Therefore

\[ (\theta_1,\theta_2)=(0,\alpha) \]

and

\[ d_{\operatorname{Gr}}(U,W)=\sqrt{0^2+\alpha^2}=\alpha. \]

28.3 Challenge 17.3: Projection matrix representation

Let \(Q\in \mathbb R^{n\times k}\) have orthonormal columns. Show that \(P=QQ^T\) is the orthogonal projection matrix onto \(\operatorname{im}(Q)\).

Solution

First, \(P^T=(QQ^T)^T=QQ^T=P\), so \(P\) is symmetric. Second,

\[ P^2=QQ^TQQ^T=Q(Q^TQ)Q^T=QIQ^T=QQ^T=P. \]

For any \(\vec x\), \(P\vec x=QQ^T\vec x\in \operatorname{im}(Q)\). Also,

\[ Q^T(\vec x-P\vec x)=Q^T\vec x-Q^TQQ^T\vec x=0. \]

Thus the error \(\vec x-P\vec x\) is orthogonal to \(\operatorname{im}(Q)\). Hence \(P\vec x\) is the orthogonal projection of \(\vec x\) onto \(\operatorname{im}(Q)\).

28.4 Challenge 17.4: Chordal distance from projections

Prove that

\[ d_{\mathrm{chordal}}(U,W)=\frac{1}{\sqrt 2}\|P_U-P_W\|_F. \]

Solution

The proof follows from the trace computation:

\[ \|P_U-P_W\|_F^2=2k-2\operatorname{tr}(P_UP_W). \]

Using orthonormal bases \(Q_U,Q_W\),

\[ \operatorname{tr}(P_UP_W)=\|Q_U^TQ_W\|_F^2. \]

The singular values of \(Q_U^TQ_W\) are \(\cos\theta_1,\ldots,\cos\theta_k\), so

\[ \operatorname{tr}(P_UP_W)=\sum_{i=1}^k\cos^2\theta_i. \]

Therefore

\[ \|P_U-P_W\|_F^2=2\sum_{i=1}^k\sin^2\theta_i. \]

Taking square roots gives the result.

29 Practice problems

29.1 Problem 17.1

Let \(U=\operatorname{span}\{(1,0)\}\) and \(W=\operatorname{span}\{(1,1)\}\). Find the principal angle between \(U\) and \(W\).

Solution

Normalize the vector \((1,1)\):

\[ \vec w=\frac{1}{\sqrt2}(1,1). \]

Then

\[ \cos\theta=(1,0)\cdot \frac{1}{\sqrt2}(1,1)=\frac{1}{\sqrt2}. \]

Thus \(\theta=\pi/4\).

29.2 Problem 17.2

Let \(U=\operatorname{span}\{\vec e_1,\vec e_2\}\) and \(W=\operatorname{span}\{\vec e_2,\vec e_3\}\) in \(\mathbb R^3\). Find the principal angles.

Solution

The two planes share the direction \(\vec e_2\), so \(\theta_1=0\). The remaining directions are \(\vec e_1\) and \(\vec e_3\), which are orthogonal, so \(\theta_2=\pi/2\). Therefore

\[ (\theta_1,\theta_2)=(0,\pi/2). \]

29.3 Problem 17.3

If two two-dimensional subspaces have principal angles \(\theta_1=0.2\) and \(\theta_2=0.6\), compute the Grassmann geodesic distance and chordal distance.

Solution

The geodesic distance is

\[ d_{\operatorname{Gr}}=\sqrt{0.2^2+0.6^2}=\sqrt{0.40}\approx 0.6325. \]

The chordal distance is

\[ d_{\mathrm{chordal}}=\sqrt{\sin^2(0.2)+\sin^2(0.6)}\approx 0.5985. \]

30 AI companion activities

TipActivity 17.A: Explain the geometry

Ask an AI tool:

Explain why comparing two subspaces requires comparing directions rather than comparing closest points.

Then improve the response by requiring it to include the zero-vector argument.

TipActivity 17.B: Check a computation

Use Python to compute principal angles for two subspaces. Then ask an AI tool:

Here are my matrices \(Q_U\) and \(Q_W\). Explain why the singular values of \(Q_U^TQ_W\) give the cosines of the principal angles.

Check whether the explanation correctly uses orthonormal bases.

TipActivity 17.C: Applications search prompt

Ask:

Give three applications where Grassmannian distances are useful in data science, computer vision, or signal processing. For each one, explain what the subspaces represent.

31 Summary

  • The Grassmannian \(\operatorname{Gr}(k,n)\) is the set of all \(k\)-dimensional subspaces of \(\mathbb R^n\).
  • The Stiefel manifold \(\operatorname{St}(k,n)\) is the set of orthonormal \(k\)-frames.
  • A subspace can be represented by many orthonormal bases, so \(\operatorname{Gr}(k,n)=\operatorname{St}(k,n)/O(k)\).
  • Principal angles measure how two subspaces are positioned relative to each other.
  • Principal angles are computed from the singular values of \(Q_U^TQ_W\).
  • Grassmannian distances are built from principal angles.
  • Projection matrices give a basis-independent representation of subspaces.
  • Grassmannian geometry appears in PCA, computer vision, signal processing, subspace clustering, and machine learning.