Code
import numpy as np
from numpy.linalg import eig, matrix_power, solve, norm
np.set_printoptions(precision=4, suppress=True)When repeated matrix multiplication becomes prediction
In earlier chapters, a matrix acted like a machine: it transformed one vector into another vector. In this chapter the machine is used repeatedly. Today becomes tomorrow, tomorrow becomes the next day, and a whole future is generated by powers of one matrix.
A bike-sharing system moves bicycles among stations. A population model moves individuals among age classes. A web-ranking model moves probability among pages. An economy moves production through sectors. In each case the central question is the same:
If the state evolves by \(x_{k+1}=Ax_k\), what happens after many steps?
This chapter develops the linear-algebra language for answering that question: eigenvalues, eigenvectors, stochastic matrices, Markov chains, and the Perron–Frobenius theorem.
After this chapter, you should be able to:
import numpy as np
from numpy.linalg import eig, matrix_power, solve, norm
np.set_printoptions(precision=4, suppress=True)Let \(A\in \mathbb R^{n\times n}\). A discrete linear dynamical system is a recurrence \[ x_{k+1}=Ax_k,\qquad k=0,1,2,\ldots, \] where \(x_k\in\mathbb R^n\) is the state at time \(k\). The solution is \[ x_k=A^k x_0. \]
The matrix \(A\) is not just one transformation. It is a rule for time. Powers of \(A\) encode the future.
Suppose bikes move among Downtown, Cambridge, and Suburbs. Use a column vector \(x_k\), where the entries are proportions of bikes at the three locations. Let \[ A=\begin{bmatrix} 0.6&0.3&0.4\\ 0.1&0.4&0.3\\ 0.3&0.3&0.3 \end{bmatrix}. \] The entry \(a_{ij}\) is the fraction moving from location \(j\) to location \(i\). Each column sums to \(1\), so total bikes are conserved.
A = np.array([[0.6, 0.3, 0.4],
[0.1, 0.4, 0.3],
[0.3, 0.3, 0.3]])
x0 = np.array([1.0, 0.0, 0.0])
for k in range(6):
print(k, x0 if k==0 else matrix_power(A,k) @ x0)0 [1. 0. 0.]
1 [0.6 0.1 0.3]
2 [0.51 0.19 0.3 ]
3 [0.483 0.217 0.3 ]
4 [0.4749 0.2251 0.3 ]
5 [0.4725 0.2275 0.3 ]
The distribution starts concentrated Downtown and gradually moves toward a stable distribution.
If \(v\) is an eigenvector of \(A\) with eigenvalue \(\lambda\), then \[ Av=\lambda v. \] Applying \(A\) repeatedly gives \[ A^k v=\lambda^k v. \] So eigenvectors are directions whose behavior is simple under iteration.
Suppose \(A\in\mathbb R^{n\times n}\) is diagonalizable: \[ A=PDP^{-1}, \] where \(D=\operatorname{diag}(\lambda_1,\ldots,\lambda_n)\). Then \[ A^k=PD^kP^{-1}, \] and if \(x_0=c_1v_1+\cdots+c_nv_n\), then \[ x_k=A^k x_0=c_1\lambda_1^k v_1+\cdots+c_n\lambda_n^k v_n. \]
Since \(A=PDP^{-1}\), \[ A^2=(PDP^{-1})(PDP^{-1})=PD^2P^{-1}. \] By induction, \(A^k=PD^kP^{-1}\). If \(v_i\) is the \(i\)-th eigenvector, then \(Av_i=\lambda_i v_i\), so \(A^k v_i=\lambda_i^k v_i\). By linearity, the formula for \(x_k\) follows.
B = np.array([[1.2, 0.0],
[0.0, 0.7]])
x = np.array([1.0, 1.0])
for k in range(8):
print(k, matrix_power(B,k) @ x)0 [1. 1.]
1 [1.2 0.7]
2 [1.44 0.49]
3 [1.728 0.343]
4 [2.0736 0.2401]
5 [2.4883 0.1681]
6 [2.986 0.1176]
7 [3.5832 0.0824]
The first component grows because \(1.2^k\) grows. The second component decays because \(0.7^k\to0\).
Many applied systems describe probabilities. In that setting, the state vector should remain a probability vector.
A vector \(x\in\mathbb R^n\) is a probability vector if \[ x_i\ge 0\quad\text{for all }i, \qquad \sum_{i=1}^n x_i=1. \]
A matrix \(A\in\mathbb R^{n\times n}\) is column stochastic if \[ a_{ij}\ge0\quad\text{for all }i,j, \qquad \sum_{i=1}^n a_{ij}=1\quad\text{for each column }j. \]
If \(A\) is column stochastic and \(x\) is a probability vector, then \(Ax\) is again a probability vector.
A finite Markov chain is a discrete system \[ x_{k+1}=Ax_k \] where \(A\) is stochastic and \(x_k\) is a probability distribution over finitely many states.
Two companies compete for customers. Each month, 90% of customers of Company 1 stay with Company 1, while 10% move to Company 2. Also, 20% of customers of Company 2 move to Company 1, while 80% stay.
Using column-stochastic convention, \[ A=\begin{bmatrix}0.9&0.2\\0.1&0.8\end{bmatrix}. \]
A = np.array([[0.9, 0.2],
[0.1, 0.8]])
x = np.array([0.5, 0.5])
for k in range(10):
print(k, x)
x = A @ x0 [0.5 0.5]
1 [0.55 0.45]
2 [0.585 0.415]
3 [0.6095 0.3905]
4 [0.6267 0.3734]
5 [0.6387 0.3613]
6 [0.6471 0.3529]
7 [0.6529 0.3471]
8 [0.6571 0.3429]
9 [0.6599 0.3401]
The process approaches a stationary vector.
A probability vector \(\pi\) is a stationary distribution for a column-stochastic matrix \(A\) if \[ A\pi=\pi. \] Equivalently, \(\pi\) is an eigenvector of \(A\) with eigenvalue \(1\), normalized so its entries sum to \(1\).
To find \(\pi\), solve \[ (A-I)\pi=0, \qquad \mathbf 1^T\pi=1. \]
A = np.array([[0.9, 0.2],
[0.1, 0.8]])
M = A - np.eye(2)
# Replace one equation with the normalization equation.
C = np.vstack([M[0], np.ones(2)])
d = np.array([0.0, 1.0])
pi = solve(C, d)
piarray([0.6667, 0.3333])
The stationary distribution is approximately \((2/3,1/3)^T\).
The equation \((A-I)\pi=0\) describes an eigenspace. If \(\pi\) is a solution, then any nonzero scalar multiple is also a solution. A probability vector must satisfy \(\sum_i \pi_i=1\), so the normalization equation selects the correct scale.
A stochastic matrix \(A\) is regular if some positive power \(A^m\) has all entries strictly positive.
If \(A\) is a regular stochastic matrix, then there is a unique stationary distribution \(\pi\), and for every probability vector \(x_0\), \[ A^k x_0\to \pi \quad\text{as }k\to\infty. \] Equivalently, \(A^k\) converges to a matrix whose columns are all \(\pi\).
The theorem is a special case of Perron–Frobenius theory. A regular stochastic matrix has a dominant eigenvalue \(1\). All other eigenvalues have modulus strictly less than \(1\). Therefore all non-stationary eigen-directions decay under repeated multiplication.
A common way to make a web transition matrix regular is to add damping: \[ G=\alpha P+(1-\alpha)\frac{1}{n}\mathbf 1\mathbf 1^T, \qquad 0<\alpha<1. \] This says: with probability \(\alpha\), follow a link; with probability \(1-\alpha\), jump uniformly to any page.
P = np.array([[0, 1/2, 1],
[1/2, 0, 0],
[1/2, 1/2, 0]])
alpha = 0.85
n = 3
G = alpha*P + (1-alpha)/n*np.ones((n,n))
print(G)
print("column sums:", G.sum(axis=0))[[0.05 0.475 0.9 ]
[0.475 0.05 0.05 ]
[0.475 0.475 0.05 ]]
column sums: [1. 1. 1.]
x = np.ones(n)/n
for k in range(20):
x = G @ x
xarray([0.4327, 0.2339, 0.3333])
The limiting distribution gives a ranking of the pages.
Markov chains are only one part of a larger theory. The same ideas apply to population growth, input–output economics, network centrality, and recommendation systems.
A matrix \(A\) is nonnegative, written \(A\ge0\), if every entry satisfies \(a_{ij}\ge0\). It is positive, written \(A>0\), if every entry satisfies \(a_{ij}>0\).
A nonnegative matrix \(A\) is primitive if there exists a positive integer \(m\) such that \[ A^m>0. \]
Let \(A\ge0\) be primitive. Then:
The full proof is deeper than ordinary diagonalization. The core idea is that positivity prevents cancellation. A positive matrix maps the positive cone into its interior. Repeated application forces many initial vectors to align with a unique attracting positive direction. This direction is the Perron eigenvector.
In a network, a node is important if it is connected to important nodes. This recursive idea leads to \[ Ax=\lambda x, \] where \(A\) is an adjacency matrix and \(x\) is a vector of centrality scores.
A = np.array([[0,1,1,0],
[1,0,1,1],
[1,1,0,1],
[0,1,1,0]], dtype=float)
w, V = eig(A)
idx = np.argmax(w.real)
centrality = V[:,idx].real
centrality = np.abs(centrality)
centrality = centrality/centrality.sum()
centralityarray([0.2192, 0.2808, 0.2808, 0.2192])
A nonnegative matrix can be interpreted as a weighted directed graph. There is an edge \(j\to i\) if \(a_{ij}>0\). This convention matches the column-vector update \(x_{k+1}=Ax_k\).
For a nonnegative matrix \(A\in\mathbb R^{n\times n}\), the associated directed graph has vertices \(1,\ldots,n\), and an edge \[ j\to i \] whenever \(a_{ij}>0\).
A nonnegative matrix \(A\) is irreducible if its associated directed graph is strongly connected: for any vertices \(i\) and \(j\), there is a directed path from \(i\) to \(j\).
Irreducible means the system communicates. Primitive means the system communicates without being trapped in a rigid period.
\[ A=\begin{bmatrix}0&1\\1&0\end{bmatrix}. \] This graph has two states that alternate. It is irreducible, but it is not primitive because powers keep alternating between two patterns.
A = np.array([[0,1],[1,0]])
for k in range(1,6):
print("A^", k, "=\n", matrix_power(A,k))A^ 1 =
[[0 1]
[1 0]]
A^ 2 =
[[1 0]
[0 1]]
A^ 3 =
[[0 1]
[1 0]]
A^ 4 =
[[1 0]
[0 1]]
A^ 5 =
[[0 1]
[1 0]]
A Leslie matrix models age-structured population growth. A simple two-stage model has the form \[ L=\begin{bmatrix}f_1&f_2\\s_1&0\end{bmatrix}, \] where \(f_i\) are fertility rates and \(s_1\) is the survival rate from stage 1 to stage 2.
L = np.array([[0.2, 1.4],
[0.6, 0.0]])
x = np.array([100.0, 40.0])
for k in range(8):
total = x.sum()
print(k, x, "total=", round(total,2), "normalized=", x/total)
x = L @ x0 [100. 40.] total= 140.0 normalized= [0.7143 0.2857]
1 [76. 60.] total= 136.0 normalized= [0.5588 0.4412]
2 [99.2 45.6] total= 144.8 normalized= [0.6851 0.3149]
3 [83.68 59.52] total= 143.2 normalized= [0.5844 0.4156]
4 [100.064 50.208] total= 150.27 normalized= [0.6659 0.3341]
5 [90.304 60.0384] total= 150.34 normalized= [0.6007 0.3993]
6 [102.1146 54.1824] total= 156.3 normalized= [0.6533 0.3467]
7 [96.2783 61.2687] total= 157.55 normalized= [0.6111 0.3889]
The total population may grow or decay, while the normalized age distribution may approach a stable structure.
In a simplified economy, sector outputs depend on each other. A nonnegative matrix may represent how much each sector feeds into others. Iteration can model growth or propagation of demand.
E = np.array([[0.7, 0.2],
[0.4, 0.6]])
w, V = eig(E)
idx = np.argmax(w.real)
print("dominant eigenvalue:", w[idx].real)
stable = V[:,idx].real
stable = stable/stable.sum()
print("stable sector proportions:", stable)dominant eigenvalue: 0.9372281323269014
stable sector proportions: [0.4574 0.5426]
Dominant eigenvectors and singular vectors both identify important directions, but they answer different questions.
Power iteration is the computational heart of many large-scale ranking problems.
def power_iteration(A, x0=None, steps=20):
n = A.shape[0]
x = np.ones(n)/n if x0 is None else x0.astype(float)
history = []
for k in range(steps):
x = A @ x
x = x / x.sum() if x.sum()!=0 else x
history.append(x.copy())
return np.array(history)
A = np.array([[0.9, 0.2],
[0.1, 0.8]])
history = power_iteration(A, steps=10)
historyarray([[0.55 , 0.45 ],
[0.585 , 0.415 ],
[0.6095, 0.3905],
[0.6267, 0.3734],
[0.6387, 0.3613],
[0.6471, 0.3529],
[0.6529, 0.3471],
[0.6571, 0.3429],
[0.6599, 0.3401],
[0.662 , 0.338 ]])
Let \(A\) be column stochastic. Prove that \(1\) is an eigenvalue of \(A\) or of \(A^T\). Which one is immediate from the column-sum condition?
If \(A\) is column stochastic, then each column sums to \(1\). Therefore \[ \mathbf 1^T A=\mathbf 1^T. \] This means \(\mathbf 1\) is a right eigenvector of \(A^T\) with eigenvalue \(1\), since \[ A^T\mathbf 1=\mathbf 1. \] Thus \(1\) is an eigenvalue of \(A^T\), and since \(A\) and \(A^T\) have the same characteristic polynomial, \(1\) is also an eigenvalue of \(A\).
For \[ A=\begin{bmatrix}0.9&0.2\\0.1&0.8\end{bmatrix}, \] find the stationary distribution.
Solve \(A\pi=\pi\). This gives \[ 0.9\pi_1+0.2\pi_2=\pi_1, \qquad 0.1\pi_1+0.8\pi_2=\pi_2. \] The first equation gives \(0.2\pi_2=0.1\pi_1\), so \(\pi_1=2\pi_2\). With \(\pi_1+\pi_2=1\), we get \[ \pi=\begin{bmatrix}2/3\\1/3\end{bmatrix}. \]
Give an example of an irreducible stochastic matrix that is not regular.
One example is \[ A=\begin{bmatrix}0&1\\1&0\end{bmatrix}. \] The two states communicate, so the matrix is irreducible. But \[ A^{2m}=I, \qquad A^{2m+1}=A, \] so no power has all entries positive. Therefore it is not regular.
Explain why adding a small positive teleportation term to a web transition matrix helps guarantee convergence.
The damped matrix \[ G=\alpha P+(1-\alpha)\frac{1}{n}\mathbf 1\mathbf 1^T \] has all entries positive when \(0<\alpha<1\). Therefore \(G\) is positive, hence primitive. Perron–Frobenius theory implies a unique attracting positive stationary distribution.
Let \[ A=\begin{bmatrix}0.8&0.3\\0.2&0.7\end{bmatrix}. \] Verify that \(A\) is column stochastic and find its stationary distribution.
Both columns sum to \(1\). Solve \(A\pi=\pi\): \[ 0.8\pi_1+0.3\pi_2=\pi_1. \] Thus \(0.3\pi_2=0.2\pi_1\), so \(\pi_1=1.5\pi_2\). With \(\pi_1+\pi_2=1\), we get \[ \pi=\begin{bmatrix}0.6\\0.4\end{bmatrix}. \]
For \[ B=\begin{bmatrix}1.1&0\\0&0.6\end{bmatrix}, \] compute \(B^k(1,1)^T\) and describe the long-term behavior.
\[ B^k=\begin{bmatrix}1.1^k&0\\0&0.6^k\end{bmatrix}, \] so \[ B^k\begin{bmatrix}1\\1\end{bmatrix} = \begin{bmatrix}1.1^k\\0.6^k\end{bmatrix}. \] The first component grows and the second component decays.
Let \[ A=\begin{bmatrix}0&1\\1&0\end{bmatrix}. \] Does \(A^k x_0\) converge for every probability vector \(x_0\)?
No. If \(x_0=(1,0)^T\), then \[ Ax_0=(0,1)^T, \qquad A^2x_0=(1,0)^T. \] The sequence alternates forever and does not converge.
Use Python to approximate the stationary distribution of the damped PageRank matrix in Example 9.4.
Use repeated multiplication by \(G\) starting from the uniform vector. Since \(G\) is positive and stochastic, the iterates converge to the unique stationary distribution.
P = np.array([[0, 1/2, 1],
[1/2, 0, 0],
[1/2, 1/2, 0]])
G = 0.85*P + 0.15/3*np.ones((3,3))
x = np.ones(3)/3
for k in range(100):
x = G @ x
xarray([0.4327, 0.2339, 0.3333])
For the Leslie matrix \[ L=\begin{bmatrix}0.2&1.4\\0.6&0\end{bmatrix}, \] find the dominant eigenvalue and interpret it.
L = np.array([[0.2, 1.4], [0.6, 0.0]])
w, V = eig(L)
warray([ 1.022, -0.822])
The dominant eigenvalue gives the asymptotic growth factor per time step. If it is greater than \(1\), the population grows asymptotically; if less than \(1\), it decays.
Use an AI assistant as a study partner, but verify all computations with Python.
Then write a short reflection: Which explanation helped you most, and which computation did you need to check independently?