---
title: "Chapter 18: Computational Complexity and Computation of Eigenvalues"
subtitle: "When the formula is correct, but the computation is too large"
author: "He Wang"
format:
html:
toc: true
number-sections: true
code-fold: true
code-tools: true
execute:
echo: true
warning: false
message: false
jupyter: python3
---
# The story: linear algebra at scale
In the first part of this course, many computations were small enough to do by hand. We solved systems, found bases, diagonalized matrices, computed least-squares solutions, and studied decompositions such as QR, Schur, spectral decomposition, SVD, Haar transforms, and FFT.
But applied linear algebra becomes truly powerful when the matrices are large. A matrix might represent a social network, a graph Laplacian, a covariance matrix, a discretized differential operator, an image, a signal, or a machine-learning data set.
At that scale, the question is no longer only:
::: {.callout-note}
## The old question
What is the correct formula?
:::
It becomes:
::: {.callout-important}
## The computational question
Can we compute the answer using a reasonable amount of time and memory?
:::
For example, the expressions
$$
(AB)\vec x
\qquad\text{and}\qquad
A(B\vec x)
$$
are mathematically equal, but they may have very different computational costs. The lesson of this chapter is that **linear algebra is not only about identities; it is also about algorithms**.
This chapter has two main goals:
1. understand computational complexity for basic matrix operations;
2. study numerical eigenvalue computation, especially the power method and Rayleigh quotient.
# Space complexity, time complexity, and Big-O notation
::: {.callout-definition}
## Definition 18.1: Space complexity
The **space complexity** of an algorithm is the amount of memory needed to store the input, intermediate quantities, and output.
:::
::: {.callout-definition}
## Definition 18.2: Time complexity
The **time complexity** of an algorithm is the number of elementary operations needed to complete the computation. In numerical linear algebra, the elementary operations are usually arithmetic operations such as addition, subtraction, multiplication, and division.
:::
::: {.callout-definition}
## Definition 18.3: Big-O notation
Let $f(n)$ and $g(n)$ be positive functions. We write
$$
f(n)=O(g(n))
$$
if there exist constants $C>0$ and $n_0$ such that
$$
f(n)\le Cg(n)\qquad\text{for all }n\ge n_0.
$$
Big-O notation records the dominant asymptotic growth rate and ignores fixed constants and lower-order terms.
:::
## Memory: dense versus sparse matrices
A dense matrix $A\in\mathbb R^{m\times n}$ stores $mn$ numbers. Its memory cost is therefore
$$
O(mn).
$$
If a matrix is sparse, most entries are zero. A sparse storage format stores only the nonzero entries and their positions. The memory cost is closer to
$$
O(\operatorname{nnz}(A)),
$$
where $\operatorname{nnz}(A)$ denotes the number of nonzero entries.
::: {.callout-example}
## Example 18.4: A tridiagonal matrix
A tridiagonal $n\times n$ matrix has nonzero entries only on the main diagonal, the diagonal above it, and the diagonal below it. It has approximately $3n$ nonzero entries, not $n^2$ entries. For large $n$, storing it as a sparse matrix saves enormous memory.
:::
```{python}
import numpy as np
from scipy import sparse
n = 10_000
main = 2*np.ones(n)
off = -1*np.ones(n-1)
A_sparse = sparse.diags([off, main, off], [-1, 0, 1], format="csr")
print("shape:", A_sparse.shape)
print("number of nonzeros:", A_sparse.nnz)
print("density:", A_sparse.nnz/(n*n))
```
# Basic time complexity in linear algebra
Let $\vec x,\vec y\in\mathbb R^n$, $A\in\mathbb R^{m\times n}$, $B\in\mathbb R^{n\times p}$, and $M\in\mathbb R^{n\times n}$.
| Operation | Typical dense cost |
|---|---:|
| Sum entries of $\vec x$ | $O(n)$ |
| Dot product $\vec x\cdot\vec y$ | $O(n)$ |
| Matrix-vector product $A\vec x$ | $O(mn)$ |
| Matrix-matrix product $AB$ | $O(mnp)$ |
| Gaussian elimination for $M\vec x=\vec b$ | $O(n^3)$ |
| Dense eigenvalue computation | about $O(n^3)$ |
::: {.callout-note}
## Interpretation
The difference between $O(n)$, $O(n^2)$, and $O(n^3)$ is small for homework-size matrices, but enormous for real data. If $n=10^6$, even storing a dense $n\times n$ matrix is usually impossible.
:::
```{python}
import pandas as pd
sizes = [10, 100, 1_000, 10_000, 1_000_000]
data = []
for n in sizes:
data.append({
"n": n,
"n": n,
"n log2 n": n*np.log2(n),
"n^2": n**2,
"n^3": n**3,
})
pd.DataFrame(data)
```
# Associativity can save work
A recurring rule in efficient matrix computation is:
::: {.callout-important}
## Do not form a large matrix if you only need its action on a vector.
:::
::: {.callout-theorem}
## Proposition 18.5: Associativity can save work
Let $A\in\mathbb R^{m\times n}$, $B\in\mathbb R^{n\times p}$, and $\vec x\in\mathbb R^p$. Mathematically,
$$
(AB)\vec x=A(B\vec x).
$$
However, the computational costs are different:
$$
(AB)\vec x:\quad O(mnp)+O(mp),
$$
while
$$
A(B\vec x):\quad O(np)+O(mn).
$$
:::
<details>
<summary><strong>Proof</strong></summary>
To compute $(AB)\vec x$, first form the full matrix $AB$. Since $AB$ has $mp$ entries and each entry is a dot product of length $n$, forming $AB$ costs $O(mnp)$. Then multiplying the resulting $m\times p$ matrix by $\vec x$ costs $O(mp)$.
For $A(B\vec x)$, first compute $\vec y=B\vec x$, costing $O(np)$. Then compute $A\vec y$, costing $O(mn)$.
The two results are equal because matrix multiplication is associative, but the second method avoids forming the large intermediate matrix $AB$.
</details>
::: {.callout-example}
## Example 18.6: One vector versus a full product
If $m=n=p=10^4$, then forming $AB$ costs about $10^{12}$ scalar operations. Computing $B\vec x$ and then $A(B\vec x)$ costs about $2\cdot 10^8$ scalar operations. The two expressions are mathematically equal, but the computational work differs by thousands of times.
:::
```{python}
def cost_form_then_multiply(m,n,p):
return m*n*p + m*p
def cost_multiply_vector_first(m,n,p):
return n*p + m*n
m=n=p=10_000
print("Form AB first:", cost_form_then_multiply(m,n,p))
print("Vector first:", cost_multiply_vector_first(m,n,p))
print("ratio:", cost_form_then_multiply(m,n,p)/cost_multiply_vector_first(m,n,p))
```
# Diagonal and sparse matrices
::: {.callout-definition}
## Definition 18.7: Diagonal matrix
Given $\vec a=(a_1,\ldots,a_n)^T$, the diagonal matrix with diagonal $\vec a$ is
$$
D=\operatorname{diag}(\vec a)=
\begin{bmatrix}
a_1& &0\\
&\ddots&\\
0&&a_n
\end{bmatrix}.
$$
:::
::: {.callout-theorem}
## Proposition 18.8: Diagonal matrices behave like vectors
Let $D=\operatorname{diag}(\vec a)\in\mathbb R^{n\times n}$ and $B\in\mathbb R^{n\times p}$. Then
$$
DB=
\begin{bmatrix}
a_1B_{1,:}\\
a_2B_{2,:}\\
\vdots\\
a_nB_{n,:}
\end{bmatrix}.
$$
Thus multiplying by a diagonal matrix rescales rows. It costs $O(np)$, not $O(n^2p)$.
:::
<details>
<summary><strong>Proof</strong></summary>
The $(i,j)$ entry of $DB$ is
$$
(DB)_{ij}=\sum_{k=1}^nD_{ik}B_{kj}.
$$
Since $D$ is diagonal, $D_{ik}=0$ unless $k=i$. Therefore
$$
(DB)_{ij}=D_{ii}B_{ij}=a_iB_{ij}.
$$
So each row of $B$ is scaled independently.
</details>
```{python}
B = np.array([[1,2,3],
[4,5,6],
[7,8,9]], dtype=float)
a = np.array([10, 100, 1000], dtype=float)
D = np.diag(a)
print(D @ B)
print(a[:,None] * B) # same result without forming D
```
# Solving linear systems: direct and iterative viewpoints
Solving a linear system
$$
A\vec x=\vec b
$$
is one of the central problems in numerical linear algebra. Gaussian elimination is a direct method. It gives an answer after a finite sequence of row operations. But for large sparse systems, iterative methods can be preferable.
::: {.callout-example}
## Example 18.9: Stable configuration
A physical equilibrium problem may lead to
$$
\begin{cases}
3x-y-z=1,\\
-x+2y-z=0,\\
-2x-y+4z=-1.
\end{cases}
$$
There are several computational viewpoints:
1. solve exactly by Gaussian elimination;
2. simulate the physical system until it stabilizes;
3. use an iterative method, possibly with preconditioning.
:::
```{python}
A = np.array([[3,-1,-1],[-1,2,-1],[-2,-1,4]], dtype=float)
b = np.array([1,0,-1], dtype=float)
np.linalg.solve(A,b)
```
::: {.callout-note}
## Preconditioning
A preconditioner changes a difficult system into an equivalent or approximately equivalent system that is easier for an iterative method to solve. In large-scale problems, the difference between cubic behavior and nearly linear behavior can be the difference between impossible and practical.
:::
# Eigenvalue computation: why the characteristic polynomial is not enough
Given $A\in\mathbb R^{n\times n}$, an eigenvalue $\lambda$ and eigenvector $\vec v\ne\vec 0$ satisfy
$$
A\vec v=\lambda\vec v.
$$
The algebraic definition comes from
$$
\det(A-\lambda I)=0.
$$
This is a polynomial equation of degree $n$.
::: {.callout-warning}
## Numerical warning
For large $n$, expanding and solving the characteristic polynomial is usually not a good numerical method. Polynomial coefficients can be highly sensitive to roundoff error, and high-degree polynomial root finding can be unstable. Modern eigenvalue algorithms use matrix iterations instead.
:::
::: {.callout-definition}
## Definition 18.10: Dominant eigenvalue
Let $\lambda_1,\ldots,\lambda_n$ be the eigenvalues of $A$. A **dominant eigenvalue** is an eigenvalue whose absolute value is strictly largest:
$$
|\lambda_1|>|\lambda_i|\qquad i=2,\ldots,n.
$$
An eigenvector associated with $\lambda_1$ is called a **dominant eigenvector**.
:::
# The power method
The power method is one of the simplest numerical algorithms for approximating a dominant eigenvector.
::: {.callout-definition}
## Definition 18.11: Power iteration
Choose an initial vector $\vec x_0\ne\vec 0$. Define
$$
\vec x_{k+1}=A\vec x_k.
$$
Equivalently,
$$
\vec x_k=A^k\vec x_0.
$$
In practical computation, one usually normalizes each step:
$$
\vec y_{k+1}=A\vec x_k,
\qquad
\vec x_{k+1}=\frac{\vec y_{k+1}}{\|\vec y_{k+1}\|}.
$$
:::
::: {.callout-theorem}
## Theorem 18.12: Power method in the diagonalizable case
Suppose $A\in\mathbb C^{n\times n}$ is diagonalizable with eigenpairs
$$
A\vec b_i=\lambda_i\vec b_i
$$
and
$$
|\lambda_1|>|\lambda_2|\ge\cdots\ge|\lambda_n|.
$$
If
$$
\vec x_0=c_1\vec b_1+c_2\vec b_2+\cdots+c_n\vec b_n
$$
with $c_1\ne0$, then the directions of $A^k\vec x_0$ converge to the dominant eigendirection $\operatorname{span}\{\vec b_1\}$.
:::
<details>
<summary><strong>Proof</strong></summary>
Compute
$$
A^k\vec x_0
=c_1\lambda_1^k\vec b_1+c_2\lambda_2^k\vec b_2+\cdots+c_n\lambda_n^k\vec b_n.
$$
Factor out $\lambda_1^k$:
$$
A^k\vec x_0
=\lambda_1^k\left(c_1\vec b_1+
\sum_{i=2}^n c_i\left(\frac{\lambda_i}{\lambda_1}\right)^k\vec b_i\right).
$$
Since $|\lambda_i/\lambda_1|<1$ for every $i\ge2$, the terms
$$
\left(\frac{\lambda_i}{\lambda_1}\right)^k
$$
converge to zero. After normalization, the direction approaches the direction of $\vec b_1$.
</details>
::: {.callout-note}
## Rate of convergence
The dominant error term is governed by
$$
\left|\frac{\lambda_2}{\lambda_1}\right|^k.
$$
The smaller $|\lambda_2/\lambda_1|$ is, the faster the method converges. If $|\lambda_2|$ is close to $|\lambda_1|$, convergence can be very slow.
:::
```{python}
def power_method(A, steps=25, seed=1):
rng = np.random.default_rng(seed)
x = rng.normal(size=A.shape[0])
x = x/np.linalg.norm(x)
history = []
for k in range(steps):
y = A @ x
x = y/np.linalg.norm(y)
lam = (x @ A @ x)/(x @ x)
history.append(lam)
return np.array(history), x
A = np.array([[3,1],[1,2]], dtype=float)
history, x = power_method(A, steps=20)
print("approx eigenvector:", x)
print("Rayleigh quotients:", history[-5:])
print("numpy eigenvalues:", np.linalg.eigvals(A))
```
# Rayleigh quotient
::: {.callout-definition}
## Definition 18.13: Rayleigh quotient
Let $A\in\mathbb R^{n\times n}$ be symmetric. The **Rayleigh quotient** is
$$
R_A(\vec x)=\frac{\vec x^TA\vec x}{\vec x^T\vec x},
\qquad \vec x\ne\vec 0.
$$
For Hermitian $A\in\mathbb C^{n\times n}$, the complex Rayleigh quotient is
$$
R_A(\vec z)=\frac{\vec z^*A\vec z}{\vec z^*\vec z},
\qquad \vec z\ne\vec 0.
$$
:::
::: {.callout-theorem}
## Proposition 18.14: Eigenvalue from eigenvector
If $A\vec v=\lambda\vec v$ and $\vec v\ne\vec 0$, then
$$
R_A(\vec v)=\lambda.
$$
:::
<details>
<summary><strong>Proof</strong></summary>
For the real case,
$$
R_A(\vec v)=\frac{\vec v^TA\vec v}{\vec v^T\vec v}
=\frac{\vec v^T(\lambda\vec v)}{\vec v^T\vec v}
=\lambda.
$$
The Hermitian case is identical with transpose replaced by conjugate transpose.
</details>
::: {.callout-theorem}
## Theorem 18.15: Rayleigh quotient and extreme eigenvalues
Let $A\in\mathbb R^{n\times n}$ be symmetric with eigenvalues
$$
\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n.
$$
Then
$$
\max_{\vec x\ne\vec 0}R_A(\vec x)=\lambda_1,
\qquad
\min_{\vec x\ne\vec 0}R_A(\vec x)=\lambda_n.
$$
The same statement holds for Hermitian matrices over $\mathbb C$.
:::
<details>
<summary><strong>Proof</strong></summary>
By the spectral theorem, choose an orthonormal eigenbasis $\vec q_1,\ldots,\vec q_n$. Write
$$
\vec x=c_1\vec q_1+\cdots+c_n\vec q_n.
$$
Then
$$
\vec x^TA\vec x=\sum_{i=1}^n\lambda_i c_i^2,
\qquad
\vec x^T\vec x=\sum_{i=1}^n c_i^2.
$$
Therefore
$$
R_A(\vec x)=\frac{\sum_{i=1}^n\lambda_i c_i^2}{\sum_{i=1}^n c_i^2}.
$$
This is a weighted average of the eigenvalues, with nonnegative weights
$$
\frac{c_i^2}{\sum_{j=1}^n c_j^2}.
$$
A weighted average of numbers between $\lambda_n$ and $\lambda_1$ also lies between $\lambda_n$ and $\lambda_1$. Equality is achieved at eigenvectors for the largest and smallest eigenvalues.
</details>
```{python}
A = np.array([[2,1,0],[1,3,1],[0,1,2]], dtype=float)
vals, vecs = np.linalg.eigh(A)
print("eigenvalues:", vals)
rng = np.random.default_rng(0)
for _ in range(5):
x = rng.normal(size=3)
R = x @ A @ x / (x @ x)
print(R)
```
# Restricted Rayleigh quotient and PCA intuition
::: {.callout-theorem}
## Theorem 18.16: Restricted Rayleigh quotient
Let $A$ be real symmetric with eigenvalues
$$
\lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n
$$
and orthonormal eigenvectors $\vec q_1,\ldots,\vec q_n$. Then
$$
\max_{\vec x\ne\vec 0,\ \vec q_1^T\vec x=0}R_A(\vec x)=\lambda_2.
$$
More generally,
$$
\max_{\vec x\ne\vec 0,\ \vec q_1^T\vec x=\cdots=\vec q_{k-1}^T\vec x=0}R_A(\vec x)=\lambda_k.
$$
:::
<details>
<summary><strong>Proof</strong></summary>
The constraints
$$
\vec q_1^T\vec x=\cdots=\vec q_{k-1}^T\vec x=0
$$
mean that $\vec x$ lies in
$$
\operatorname{span}\{\vec q_k,\ldots,\vec q_n\}.
$$
Write
$$
\vec x=c_k\vec q_k+\cdots+c_n\vec q_n.
$$
Then
$$
R_A(\vec x)=\frac{\sum_{i=k}^n\lambda_i c_i^2}{\sum_{i=k}^n c_i^2}.
$$
This is a weighted average of $\lambda_k,\ldots,\lambda_n$, so it is at most $\lambda_k$. Equality occurs when $\vec x=\vec q_k$.
</details>
This theorem explains why PCA finds directions one after another: the first principal direction maximizes variance, the second maximizes variance subject to being orthogonal to the first, and so on.
# Deflation
After finding one eigenvalue, one may try to remove it and compute the next one. This idea is called **deflation**.
::: {.callout-definition}
## Definition 18.17: Deflation
Deflation is the process of modifying a problem after finding one solution so that the next computation finds a new solution.
:::
::: {.callout-theorem}
## Proposition 18.18: Simple deflation for a symmetric matrix
Let $A\in\mathbb R^{n\times n}$ be symmetric with orthonormal eigenvectors $\vec v_1,\ldots,\vec v_n$ and eigenvalues $\lambda_1,\ldots,\lambda_n$. Define
$$
B=A-\lambda_1\vec v_1\vec v_1^T.
$$
Then
$$
B\vec v_1=\vec 0,
\qquad
B\vec v_i=\lambda_i\vec v_i\quad(i=2,\ldots,n).
$$
Thus $B$ has eigenvalues $0,\lambda_2,\ldots,\lambda_n$.
:::
<details>
<summary><strong>Proof</strong></summary>
Since $\vec v_1^T\vec v_1=1$,
$$
B\vec v_1
=A\vec v_1-\lambda_1\vec v_1(\vec v_1^T\vec v_1)
=\lambda_1\vec v_1-\lambda_1\vec v_1
=\vec 0.
$$
If $i\ge2$, then $\vec v_1^T\vec v_i=0$, so
$$
B\vec v_i
=A\vec v_i-\lambda_1\vec v_1(\vec v_1^T\vec v_i)
=\lambda_i\vec v_i.
$$
</details>
::: {.callout-warning}
## Numerical caution
Deflation is conceptually useful, but repeated deflation can be numerically unstable. Small errors in earlier eigenvectors can contaminate later eigenvalue computations.
:::
```{python}
A = np.array([[3,1],[1,2]], dtype=float)
vals, vecs = np.linalg.eigh(A)
idx = np.argsort(vals)[::-1]
vals = vals[idx]
vecs = vecs[:,idx]
lam1 = vals[0]
v1 = vecs[:,0]
B = A - lam1*np.outer(v1,v1)
print("A eigenvalues:", vals)
print("deflated B:\n", B)
print("B eigenvalues:", np.linalg.eigvalsh(B))
```
# Generalized eigenvalues
Eigenvalue problems often appear in generalized form.
::: {.callout-definition}
## Definition 18.19: Generalized eigenvalue
Let $A,B\in\mathbb R^{n\times n}$. A scalar $\lambda$ is called a **generalized eigenvalue** of the pair $(A,B)$ if there exists a nonzero vector $\vec v$ such that
$$
A\vec v=\lambda B\vec v.
$$
The vector $\vec v$ is called a **generalized eigenvector**.
:::
If $B=I$, this reduces to the ordinary eigenvalue problem. If $B$ is invertible, then
$$
A\vec v=\lambda B\vec v
$$
is equivalent to
$$
B^{-1}A\vec v=\lambda\vec v.
$$
However, in numerical computation, one usually avoids explicitly forming $B^{-1}A$.
::: {.callout-example}
## Example 18.20: A generalized eigenvalue problem
Let
$$
A=\begin{bmatrix}1&2\\2&4\end{bmatrix},
\qquad
B=\begin{bmatrix}1&1\\1&1\end{bmatrix}.
$$
The generalized eigenvalues satisfy
$$
\det(A-\lambda B)=0.
$$
Compute
$$
\det\begin{bmatrix}1-\lambda&2-\lambda\\2-\lambda&4-\lambda\end{bmatrix}
=(1-\lambda)(4-\lambda)-(2-\lambda)^2=-\lambda.
$$
Thus $\lambda=0$ is a generalized eigenvalue. The generalized eigenvectors solve $A\vec v=\vec 0$, so
$$
\vec v=k\begin{bmatrix}-2\\1\end{bmatrix}.
$$
:::
# Applications
## PageRank and Markov chains
If a network is represented by a stochastic matrix $P$, then a stationary distribution satisfies
$$
P\vec x=\vec x.
$$
This is an eigenvalue problem with eigenvalue $1$. Power iteration is a natural way to approximate the stationary vector.
## Principal component analysis
PCA solves an eigenvalue problem for a covariance matrix $C$. The first principal component solves
$$
\max_{\|\vec x\|=1}\vec x^TC\vec x.
$$
By the Rayleigh quotient theorem, the solution is the eigenvector corresponding to the largest eigenvalue of $C$.
## Spectral graph methods
Graph Laplacians lead to eigenvalue and generalized eigenvalue problems. In spectral clustering, eigenvectors of a graph Laplacian are used as low-dimensional coordinates for clustering vertices.
# Classical challenge questions with solutions
## Challenge 1: When should we precompute $AB$?
Let $A\in\mathbb R^{m\times n}$, $B\in\mathbb R^{n\times p}$, and suppose we need to compute $A(B\vec x)$ for many different vectors $\vec x\in\mathbb R^p$.
1. When is it better to precompute $AB$?
2. When is it better to avoid forming $AB$?
<details>
<summary><strong>Solution</strong></summary>
For one vector $\vec x$, it is usually better to compute $A(B\vec x)$ with cost
$$
O(np+mn),
$$
rather than form $AB$, which costs $O(mnp)$.
If the same matrices $A$ and $B$ are used for $N$ different vectors, then the tradeoff is
$$
\text{precompute }AB:\quad O(mnp)+N\cdot O(mp),
$$
while
$$
\text{do not precompute}:\quad N\cdot O(np+mn).
$$
Precomputing may be useful only if $N$ is large enough and if storing $AB$ is feasible. If $A$ and $B$ are sparse but $AB$ is dense, precomputing can destroy sparsity.
</details>
## Challenge 2: When does the power method fail?
Give two reasons the power method may fail or converge slowly.
<details>
<summary><strong>Solution</strong></summary>
First, if the initial vector $\vec x_0$ has no component in the dominant eigendirection, then the dominant eigenvector never appears in the iteration. In the diagonalizable case, this means $c_1=0$ in
$$
\vec x_0=c_1\vec b_1+\cdots+c_n\vec b_n.
$$
Second, if
$$
\left|\frac{\lambda_2}{\lambda_1}\right|
$$
is close to $1$, then convergence is slow. Other difficulties include repeated dominant eigenvalues, complex dominant eigenvalues, and non-diagonalizable matrices.
</details>
## Challenge 3: Rayleigh quotient as a weighted average
Explain why the Rayleigh quotient of a symmetric matrix cannot exceed the largest eigenvalue.
<details>
<summary><strong>Solution</strong></summary>
Write $A=QDQ^T$ with eigenvalues $\lambda_1\ge\cdots\ge\lambda_n$. If $\vec x=Q\vec c$, then
$$
R_A(\vec x)=\frac{\sum_i\lambda_i c_i^2}{\sum_i c_i^2}.
$$
This is a weighted average of the eigenvalues. A weighted average cannot be larger than the largest value being averaged. Therefore
$$
R_A(\vec x)\le\lambda_1.
$$
</details>
## Challenge 4: Deflation and projection
Suppose $A$ is symmetric and $\vec v_1$ is a unit eigenvector with eigenvalue $\lambda_1$. Explain why
$$
B=A-\lambda_1\vec v_1\vec v_1^T
$$
removes the component of $A$ acting along $\vec v_1$.
<details>
<summary><strong>Solution</strong></summary>
The matrix $\vec v_1\vec v_1^T$ is the orthogonal projection onto $\operatorname{span}\{\vec v_1\}$. Since $A\vec v_1=\lambda_1\vec v_1$, the term
$$
\lambda_1\vec v_1\vec v_1^T
$$
is exactly the rank-one part of $A$ on that eigendirection. Subtracting it gives
$$
B\vec v_1=\vec 0.
$$
All orthogonal eigenvectors are unchanged because $\vec v_1^T\vec v_i=0$ for $i\ne1$.
</details>
# Practice problems
## Problem 1
Let $A\in\mathbb R^{5000\times 5000}$ and $\vec x\in\mathbb R^{5000}$. Compare the cost of computing $A^2\vec x$ by forming $A^2$ first versus computing $A(A\vec x)$.
<details>
<summary><strong>Solution</strong></summary>
Forming $A^2$ costs about $O(n^3)$ with $n=5000$, and then multiplying by $\vec x$ costs $O(n^2)$. Computing $A(A\vec x)$ uses two matrix-vector products, costing $2O(n^2)$. The second method avoids the expensive matrix-matrix product.
</details>
## Problem 2
Let
$$
A=\begin{bmatrix}4&0\\0&1\end{bmatrix},
\qquad
\vec x_0=\begin{bmatrix}1\\1\end{bmatrix}.
$$
Compute the direction of $A^k\vec x_0$ as $k\to\infty$.
<details>
<summary><strong>Solution</strong></summary>
We have
$$
A^k\vec x_0=
\begin{bmatrix}4^k&0\\0&1\end{bmatrix}
\begin{bmatrix}1\\1\end{bmatrix}
=
\begin{bmatrix}4^k\\1\end{bmatrix}.
$$
After normalization, the second component becomes negligible compared with the first. The direction converges to
$$
\operatorname{span}\left\{\begin{bmatrix}1\\0\end{bmatrix}\right\}.
$$
</details>
## Problem 3
For
$$
A=\begin{bmatrix}2&1\\1&2\end{bmatrix},
$$
compute the Rayleigh quotient at $\vec x=(1,0)^T$ and at $\vec y=(1,1)^T$.
<details>
<summary><strong>Solution</strong></summary>
For $\vec x=(1,0)^T$,
$$
R_A(\vec x)=\frac{\vec x^TA\vec x}{\vec x^T\vec x}=2.
$$
For $\vec y=(1,1)^T$,
$$
A\vec y=(3,3)^T,
$$
so
$$
R_A(\vec y)=\frac{\vec y^TA\vec y}{\vec y^T\vec y}
=\frac{6}{2}=3.
$$
Indeed, $\vec y$ is an eigenvector with eigenvalue $3$.
</details>
## Problem 4
Suppose a symmetric matrix has eigenvalues $10,9,1$. What is the approximate convergence ratio of the power method?
<details>
<summary><strong>Solution</strong></summary>
The convergence ratio is approximately
$$
\left|\frac{\lambda_2}{\lambda_1}\right|=\frac{9}{10}=0.9.
$$
This is close to $1$, so convergence may be slow.
</details>
## Problem 5
Let $P$ be a column-stochastic matrix. Why is a stationary distribution an eigenvector problem?
<details>
<summary><strong>Solution</strong></summary>
A stationary distribution $\vec \pi$ satisfies
$$
P\vec \pi=\vec \pi.
$$
This is exactly the eigenvalue equation with eigenvalue $1$:
$$
P\vec \pi=1\cdot \vec \pi.
$$
</details>
# AI companion activities
Use an AI assistant as a study partner, but verify all mathematical claims by computation or proof.
::: {.callout-tip}
## Activity 1: Complexity explanation
Ask an AI assistant:
> Explain why computing $(AB)x$ by forming $AB$ first is usually inefficient. Give a numerical example with dimensions.
Then check whether the stated operation counts match the formulas in this chapter.
:::
::: {.callout-tip}
## Activity 2: Power method debugging
Ask an AI assistant to write a Python implementation of the power method. Test it on:
$$
A=\begin{bmatrix}3&1\\1&2\end{bmatrix}
$$
and compare the output with `numpy.linalg.eig`.
:::
::: {.callout-tip}
## Activity 3: Rayleigh quotient interpretation
Ask an AI assistant:
> Why is the Rayleigh quotient a weighted average of eigenvalues for symmetric matrices?
Then rewrite the explanation using the spectral decomposition $A=QDQ^T$.
:::
::: {.callout-tip}
## Activity 4: Find failure cases
Ask an AI assistant to produce examples where the power method fails or converges slowly. For each example, compute the eigenvalues and explain the behavior using $|\lambda_2/\lambda_1|$.
:::