9  Chapter 9: Dynamical Systems, Markov Chains, and Perron–Frobenius Theory

When repeated matrix multiplication becomes prediction

Author

He Wang

9.1 The story: a matrix that tells tomorrow from today

In earlier chapters, a matrix acted like a machine: it transformed one vector into another vector. In this chapter the machine is used repeatedly. Today becomes tomorrow, tomorrow becomes the next day, and a whole future is generated by powers of one matrix.

A bike-sharing system moves bicycles among stations. A population model moves individuals among age classes. A web-ranking model moves probability among pages. An economy moves production through sectors. In each case the central question is the same:

If the state evolves by \(x_{k+1}=Ax_k\), what happens after many steps?

This chapter develops the linear-algebra language for answering that question: eigenvalues, eigenvectors, stochastic matrices, Markov chains, and the Perron–Frobenius theorem.

9.2 Learning goals

After this chapter, you should be able to:

  1. model a discrete-time system by \(x_{k+1}=Ax_k\);
  2. compute and interpret \(A^k x_0\);
  3. use eigenvectors to understand long-term behavior;
  4. recognize stochastic, regular, irreducible, and primitive matrices;
  5. find stationary distributions of Markov chains;
  6. explain the Perron–Frobenius theorem for positive and primitive matrices;
  7. connect directed graphs with nonnegative matrices;
  8. use Python to simulate Markov chains, PageRank-style models, and population models.
Code
import numpy as np
from numpy.linalg import eig, matrix_power, solve, norm
np.set_printoptions(precision=4, suppress=True)

9.3 Discrete linear dynamical systems

NoteDefinition 9.1: Discrete linear dynamical system

Let \(A\in \mathbb R^{n\times n}\). A discrete linear dynamical system is a recurrence \[ x_{k+1}=Ax_k,\qquad k=0,1,2,\ldots, \] where \(x_k\in\mathbb R^n\) is the state at time \(k\). The solution is \[ x_k=A^k x_0. \]

The matrix \(A\) is not just one transformation. It is a rule for time. Powers of \(A\) encode the future.

9.3.1 Example 9.1: A three-location bike-sharing model

Suppose bikes move among Downtown, Cambridge, and Suburbs. Use a column vector \(x_k\), where the entries are proportions of bikes at the three locations. Let \[ A=\begin{bmatrix} 0.6&0.3&0.4\\ 0.1&0.4&0.3\\ 0.3&0.3&0.3 \end{bmatrix}. \] The entry \(a_{ij}\) is the fraction moving from location \(j\) to location \(i\). Each column sums to \(1\), so total bikes are conserved.

Code
A = np.array([[0.6, 0.3, 0.4],
              [0.1, 0.4, 0.3],
              [0.3, 0.3, 0.3]])
x0 = np.array([1.0, 0.0, 0.0])
for k in range(6):
    print(k, x0 if k==0 else matrix_power(A,k) @ x0)
0 [1. 0. 0.]
1 [0.6 0.1 0.3]
2 [0.51 0.19 0.3 ]
3 [0.483 0.217 0.3  ]
4 [0.4749 0.2251 0.3   ]
5 [0.4725 0.2275 0.3   ]

The distribution starts concentrated Downtown and gradually moves toward a stable distribution.

9.4 Eigenvectors as steady directions

If \(v\) is an eigenvector of \(A\) with eigenvalue \(\lambda\), then \[ Av=\lambda v. \] Applying \(A\) repeatedly gives \[ A^k v=\lambda^k v. \] So eigenvectors are directions whose behavior is simple under iteration.

NoteTheorem 9.2: Diagonalizable dynamics

Suppose \(A\in\mathbb R^{n\times n}\) is diagonalizable: \[ A=PDP^{-1}, \] where \(D=\operatorname{diag}(\lambda_1,\ldots,\lambda_n)\). Then \[ A^k=PD^kP^{-1}, \] and if \(x_0=c_1v_1+\cdots+c_nv_n\), then \[ x_k=A^k x_0=c_1\lambda_1^k v_1+\cdots+c_n\lambda_n^k v_n. \]

Proof

Since \(A=PDP^{-1}\), \[ A^2=(PDP^{-1})(PDP^{-1})=PD^2P^{-1}. \] By induction, \(A^k=PD^kP^{-1}\). If \(v_i\) is the \(i\)-th eigenvector, then \(Av_i=\lambda_i v_i\), so \(A^k v_i=\lambda_i^k v_i\). By linearity, the formula for \(x_k\) follows.

9.4.1 Example 9.2: Stable and unstable directions

Code
B = np.array([[1.2, 0.0],
              [0.0, 0.7]])
x = np.array([1.0, 1.0])
for k in range(8):
    print(k, matrix_power(B,k) @ x)
0 [1. 1.]
1 [1.2 0.7]
2 [1.44 0.49]
3 [1.728 0.343]
4 [2.0736 0.2401]
5 [2.4883 0.1681]
6 [2.986  0.1176]
7 [3.5832 0.0824]

The first component grows because \(1.2^k\) grows. The second component decays because \(0.7^k\to0\).

9.5 Markov chains

Many applied systems describe probabilities. In that setting, the state vector should remain a probability vector.

NoteDefinition 9.3: Probability vector

A vector \(x\in\mathbb R^n\) is a probability vector if \[ x_i\ge 0\quad\text{for all }i, \qquad \sum_{i=1}^n x_i=1. \]

NoteDefinition 9.4: Column stochastic matrix

A matrix \(A\in\mathbb R^{n\times n}\) is column stochastic if \[ a_{ij}\ge0\quad\text{for all }i,j, \qquad \sum_{i=1}^n a_{ij}=1\quad\text{for each column }j. \]

If \(A\) is column stochastic and \(x\) is a probability vector, then \(Ax\) is again a probability vector.

NoteDefinition 9.5: Markov chain

A finite Markov chain is a discrete system \[ x_{k+1}=Ax_k \] where \(A\) is stochastic and \(x_k\) is a probability distribution over finitely many states.

9.5.1 Example 9.3: Customer migration

Two companies compete for customers. Each month, 90% of customers of Company 1 stay with Company 1, while 10% move to Company 2. Also, 20% of customers of Company 2 move to Company 1, while 80% stay.

Using column-stochastic convention, \[ A=\begin{bmatrix}0.9&0.2\\0.1&0.8\end{bmatrix}. \]

Code
A = np.array([[0.9, 0.2],
              [0.1, 0.8]])
x = np.array([0.5, 0.5])
for k in range(10):
    print(k, x)
    x = A @ x
0 [0.5 0.5]
1 [0.55 0.45]
2 [0.585 0.415]
3 [0.6095 0.3905]
4 [0.6267 0.3734]
5 [0.6387 0.3613]
6 [0.6471 0.3529]
7 [0.6529 0.3471]
8 [0.6571 0.3429]
9 [0.6599 0.3401]

The process approaches a stationary vector.

9.6 Stationary distributions

NoteDefinition 9.6: Stationary distribution

A probability vector \(\pi\) is a stationary distribution for a column-stochastic matrix \(A\) if \[ A\pi=\pi. \] Equivalently, \(\pi\) is an eigenvector of \(A\) with eigenvalue \(1\), normalized so its entries sum to \(1\).

To find \(\pi\), solve \[ (A-I)\pi=0, \qquad \mathbf 1^T\pi=1. \]

Code
A = np.array([[0.9, 0.2],
              [0.1, 0.8]])
M = A - np.eye(2)
# Replace one equation with the normalization equation.
C = np.vstack([M[0], np.ones(2)])
d = np.array([0.0, 1.0])
pi = solve(C, d)
pi
array([0.6667, 0.3333])

The stationary distribution is approximately \((2/3,1/3)^T\).

Why the normalization equation is needed

The equation \((A-I)\pi=0\) describes an eigenspace. If \(\pi\) is a solution, then any nonzero scalar multiple is also a solution. A probability vector must satisfy \(\sum_i \pi_i=1\), so the normalization equation selects the correct scale.

9.7 Regular Markov chains

NoteDefinition 9.7: Regular stochastic matrix

A stochastic matrix \(A\) is regular if some positive power \(A^m\) has all entries strictly positive.

NoteTheorem 9.8: Long-term behavior of regular Markov chains

If \(A\) is a regular stochastic matrix, then there is a unique stationary distribution \(\pi\), and for every probability vector \(x_0\), \[ A^k x_0\to \pi \quad\text{as }k\to\infty. \] Equivalently, \(A^k\) converges to a matrix whose columns are all \(\pi\).

Proof idea

The theorem is a special case of Perron–Frobenius theory. A regular stochastic matrix has a dominant eigenvalue \(1\). All other eigenvalues have modulus strictly less than \(1\). Therefore all non-stationary eigen-directions decay under repeated multiplication.

9.7.1 Example 9.4: A PageRank-style model with damping

A common way to make a web transition matrix regular is to add damping: \[ G=\alpha P+(1-\alpha)\frac{1}{n}\mathbf 1\mathbf 1^T, \qquad 0<\alpha<1. \] This says: with probability \(\alpha\), follow a link; with probability \(1-\alpha\), jump uniformly to any page.

Code
P = np.array([[0,   1/2, 1],
              [1/2, 0,   0],
              [1/2, 1/2, 0]])
alpha = 0.85
n = 3
G = alpha*P + (1-alpha)/n*np.ones((n,n))
print(G)
print("column sums:", G.sum(axis=0))
[[0.05  0.475 0.9  ]
 [0.475 0.05  0.05 ]
 [0.475 0.475 0.05 ]]
column sums: [1. 1. 1.]
Code
x = np.ones(n)/n
for k in range(20):
    x = G @ x
x
array([0.4327, 0.2339, 0.3333])

The limiting distribution gives a ranking of the pages.

9.8 Nonnegative matrices and Perron–Frobenius theory

Markov chains are only one part of a larger theory. The same ideas apply to population growth, input–output economics, network centrality, and recommendation systems.

NoteDefinition 9.9: Nonnegative and positive matrices

A matrix \(A\) is nonnegative, written \(A\ge0\), if every entry satisfies \(a_{ij}\ge0\). It is positive, written \(A>0\), if every entry satisfies \(a_{ij}>0\).

NoteDefinition 9.10: Primitive matrix

A nonnegative matrix \(A\) is primitive if there exists a positive integer \(m\) such that \[ A^m>0. \]

NoteTheorem 9.11: Perron–Frobenius theorem, primitive case

Let \(A\ge0\) be primitive. Then:

  1. \(A\) has a positive real eigenvalue \(\rho(A)>0\), called the Perron eigenvalue or spectral radius.
  2. There is a positive eigenvector \(v>0\) such that \[ Av=\rho(A)v. \]
  3. The eigenvalue \(\rho(A)\) is simple and strictly dominates all other eigenvalues in modulus.
  4. For many initial vectors \(x_0\ge0\), the normalized iterates \(A^k x_0/\|A^k x_0\|_1\) converge to the normalized Perron eigenvector.
Proof idea

The full proof is deeper than ordinary diagonalization. The core idea is that positivity prevents cancellation. A positive matrix maps the positive cone into its interior. Repeated application forces many initial vectors to align with a unique attracting positive direction. This direction is the Perron eigenvector.

9.8.1 Example 9.5: Eigenvector centrality

In a network, a node is important if it is connected to important nodes. This recursive idea leads to \[ Ax=\lambda x, \] where \(A\) is an adjacency matrix and \(x\) is a vector of centrality scores.

Code
A = np.array([[0,1,1,0],
              [1,0,1,1],
              [1,1,0,1],
              [0,1,1,0]], dtype=float)
w, V = eig(A)
idx = np.argmax(w.real)
centrality = V[:,idx].real
centrality = np.abs(centrality)
centrality = centrality/centrality.sum()
centrality
array([0.2192, 0.2808, 0.2808, 0.2192])

9.9 Graphs and nonnegative matrices

A nonnegative matrix can be interpreted as a weighted directed graph. There is an edge \(j\to i\) if \(a_{ij}>0\). This convention matches the column-vector update \(x_{k+1}=Ax_k\).

NoteDefinition 9.12: Directed graph of a matrix

For a nonnegative matrix \(A\in\mathbb R^{n\times n}\), the associated directed graph has vertices \(1,\ldots,n\), and an edge \[ j\to i \] whenever \(a_{ij}>0\).

NoteDefinition 9.13: Irreducible matrix

A nonnegative matrix \(A\) is irreducible if its associated directed graph is strongly connected: for any vertices \(i\) and \(j\), there is a directed path from \(i\) to \(j\).

Irreducible means the system communicates. Primitive means the system communicates without being trapped in a rigid period.

9.9.1 Example 9.6: Irreducible but not primitive

\[ A=\begin{bmatrix}0&1\\1&0\end{bmatrix}. \] This graph has two states that alternate. It is irreducible, but it is not primitive because powers keep alternating between two patterns.

Code
A = np.array([[0,1],[1,0]])
for k in range(1,6):
    print("A^", k, "=\n", matrix_power(A,k))
A^ 1 =
 [[0 1]
 [1 0]]
A^ 2 =
 [[1 0]
 [0 1]]
A^ 3 =
 [[0 1]
 [1 0]]
A^ 4 =
 [[1 0]
 [0 1]]
A^ 5 =
 [[0 1]
 [1 0]]

9.10 Population models: Leslie matrices

A Leslie matrix models age-structured population growth. A simple two-stage model has the form \[ L=\begin{bmatrix}f_1&f_2\\s_1&0\end{bmatrix}, \] where \(f_i\) are fertility rates and \(s_1\) is the survival rate from stage 1 to stage 2.

Code
L = np.array([[0.2, 1.4],
              [0.6, 0.0]])
x = np.array([100.0, 40.0])
for k in range(8):
    total = x.sum()
    print(k, x, "total=", round(total,2), "normalized=", x/total)
    x = L @ x
0 [100.  40.] total= 140.0 normalized= [0.7143 0.2857]
1 [76. 60.] total= 136.0 normalized= [0.5588 0.4412]
2 [99.2 45.6] total= 144.8 normalized= [0.6851 0.3149]
3 [83.68 59.52] total= 143.2 normalized= [0.5844 0.4156]
4 [100.064  50.208] total= 150.27 normalized= [0.6659 0.3341]
5 [90.304  60.0384] total= 150.34 normalized= [0.6007 0.3993]
6 [102.1146  54.1824] total= 156.3 normalized= [0.6533 0.3467]
7 [96.2783 61.2687] total= 157.55 normalized= [0.6111 0.3889]

The total population may grow or decay, while the normalized age distribution may approach a stable structure.

9.11 Economic input–output model

In a simplified economy, sector outputs depend on each other. A nonnegative matrix may represent how much each sector feeds into others. Iteration can model growth or propagation of demand.

Code
E = np.array([[0.7, 0.2],
              [0.4, 0.6]])
w, V = eig(E)
idx = np.argmax(w.real)
print("dominant eigenvalue:", w[idx].real)
stable = V[:,idx].real
stable = stable/stable.sum()
print("stable sector proportions:", stable)
dominant eigenvalue: 0.9372281323269014
stable sector proportions: [0.4574 0.5426]

9.12 Connection with SVD and data analysis

Dominant eigenvectors and singular vectors both identify important directions, but they answer different questions.

  • A dominant eigenvector of \(A\) describes a direction that is preserved by the transformation \(A\).
  • A singular vector of \(X\) describes a direction of largest stretching for the data matrix \(X\).
  • PageRank and eigenvector centrality use eigenvectors of transition or adjacency matrices.
  • PCA uses eigenvectors of covariance matrices or singular vectors of centered data matrices.

9.13 Python computation: power iteration

Power iteration is the computational heart of many large-scale ranking problems.

Code
def power_iteration(A, x0=None, steps=20):
    n = A.shape[0]
    x = np.ones(n)/n if x0 is None else x0.astype(float)
    history = []
    for k in range(steps):
        x = A @ x
        x = x / x.sum() if x.sum()!=0 else x
        history.append(x.copy())
    return np.array(history)

A = np.array([[0.9, 0.2],
              [0.1, 0.8]])
history = power_iteration(A, steps=10)
history
array([[0.55  , 0.45  ],
       [0.585 , 0.415 ],
       [0.6095, 0.3905],
       [0.6267, 0.3734],
       [0.6387, 0.3613],
       [0.6471, 0.3529],
       [0.6529, 0.3471],
       [0.6571, 0.3429],
       [0.6599, 0.3401],
       [0.662 , 0.338 ]])

9.14 Challenge questions

9.14.1 Challenge 9.1

Let \(A\) be column stochastic. Prove that \(1\) is an eigenvalue of \(A\) or of \(A^T\). Which one is immediate from the column-sum condition?

Solution

If \(A\) is column stochastic, then each column sums to \(1\). Therefore \[ \mathbf 1^T A=\mathbf 1^T. \] This means \(\mathbf 1\) is a right eigenvector of \(A^T\) with eigenvalue \(1\), since \[ A^T\mathbf 1=\mathbf 1. \] Thus \(1\) is an eigenvalue of \(A^T\), and since \(A\) and \(A^T\) have the same characteristic polynomial, \(1\) is also an eigenvalue of \(A\).

9.14.2 Challenge 9.2

For \[ A=\begin{bmatrix}0.9&0.2\\0.1&0.8\end{bmatrix}, \] find the stationary distribution.

Solution

Solve \(A\pi=\pi\). This gives \[ 0.9\pi_1+0.2\pi_2=\pi_1, \qquad 0.1\pi_1+0.8\pi_2=\pi_2. \] The first equation gives \(0.2\pi_2=0.1\pi_1\), so \(\pi_1=2\pi_2\). With \(\pi_1+\pi_2=1\), we get \[ \pi=\begin{bmatrix}2/3\\1/3\end{bmatrix}. \]

9.14.3 Challenge 9.3

Give an example of an irreducible stochastic matrix that is not regular.

Solution

One example is \[ A=\begin{bmatrix}0&1\\1&0\end{bmatrix}. \] The two states communicate, so the matrix is irreducible. But \[ A^{2m}=I, \qquad A^{2m+1}=A, \] so no power has all entries positive. Therefore it is not regular.

9.14.4 Challenge 9.4

Explain why adding a small positive teleportation term to a web transition matrix helps guarantee convergence.

Solution

The damped matrix \[ G=\alpha P+(1-\alpha)\frac{1}{n}\mathbf 1\mathbf 1^T \] has all entries positive when \(0<\alpha<1\). Therefore \(G\) is positive, hence primitive. Perron–Frobenius theory implies a unique attracting positive stationary distribution.

9.15 Practice problems

9.15.1 Problem 9.1

Let \[ A=\begin{bmatrix}0.8&0.3\\0.2&0.7\end{bmatrix}. \] Verify that \(A\) is column stochastic and find its stationary distribution.

Solution

Both columns sum to \(1\). Solve \(A\pi=\pi\): \[ 0.8\pi_1+0.3\pi_2=\pi_1. \] Thus \(0.3\pi_2=0.2\pi_1\), so \(\pi_1=1.5\pi_2\). With \(\pi_1+\pi_2=1\), we get \[ \pi=\begin{bmatrix}0.6\\0.4\end{bmatrix}. \]

9.15.2 Problem 9.2

For \[ B=\begin{bmatrix}1.1&0\\0&0.6\end{bmatrix}, \] compute \(B^k(1,1)^T\) and describe the long-term behavior.

Solution

\[ B^k=\begin{bmatrix}1.1^k&0\\0&0.6^k\end{bmatrix}, \] so \[ B^k\begin{bmatrix}1\\1\end{bmatrix} = \begin{bmatrix}1.1^k\\0.6^k\end{bmatrix}. \] The first component grows and the second component decays.

9.15.3 Problem 9.3

Let \[ A=\begin{bmatrix}0&1\\1&0\end{bmatrix}. \] Does \(A^k x_0\) converge for every probability vector \(x_0\)?

Solution

No. If \(x_0=(1,0)^T\), then \[ Ax_0=(0,1)^T, \qquad A^2x_0=(1,0)^T. \] The sequence alternates forever and does not converge.

9.15.4 Problem 9.4

Use Python to approximate the stationary distribution of the damped PageRank matrix in Example 9.4.

Solution

Use repeated multiplication by \(G\) starting from the uniform vector. Since \(G\) is positive and stochastic, the iterates converge to the unique stationary distribution.

Code
P = np.array([[0, 1/2, 1],
              [1/2, 0, 0],
              [1/2, 1/2, 0]])
G = 0.85*P + 0.15/3*np.ones((3,3))
x = np.ones(3)/3
for k in range(100):
    x = G @ x
x
array([0.4327, 0.2339, 0.3333])

9.15.5 Problem 9.5

For the Leslie matrix \[ L=\begin{bmatrix}0.2&1.4\\0.6&0\end{bmatrix}, \] find the dominant eigenvalue and interpret it.

Solution
Code
L = np.array([[0.2, 1.4], [0.6, 0.0]])
w, V = eig(L)
w
array([ 1.022, -0.822])

The dominant eigenvalue gives the asymptotic growth factor per time step. If it is greater than \(1\), the population grows asymptotically; if less than \(1\), it decays.

9.16 AI companion activities

Use an AI assistant as a study partner, but verify all computations with Python.

  1. Ask: “Explain why a stationary distribution is an eigenvector with eigenvalue 1.”
  2. Ask: “Give me a real-world example of a Markov chain with three states.”
  3. Ask: “What is the difference between irreducible and primitive for a nonnegative matrix?”
  4. Ask: “Create a PageRank example with four pages and compute the stationary distribution.”
  5. Ask: “Explain Perron–Frobenius theory using population growth as an analogy.”

Then write a short reflection: Which explanation helped you most, and which computation did you need to check independently?