Code
import numpy as np
A = np.array([[1, -2, 0],
[3, 4, 1]])
B = np.array([[ 2, 1, 5],
[-1, 0, 3]])
2*A - 3*Barray([[ -4, -7, -15],
[ 9, 8, -7]])
From matrix arithmetic to reusable computation
Guiding question.
Once a system has been written as \(A\mathbf{x}=\mathbf b\), how do matrices behave as algebraic objects, geometric machines, and computational tools?
In Chapter 1 we learned that a linear system is not only a list of equations. It is a matrix equation, a geometric intersection problem, and a computational task.
Now we study the object that carries all the information:
\[ A. \]
A matrix can be viewed in several ways:
This chapter develops the basic algebra of matrices: addition, scalar multiplication, matrix-vector products, matrix multiplication, elementary matrices, inverses, transposes, rank, and LU factorization.
The purpose is not only to learn rules. The purpose is to understand why the rules matter.
Matrix algebra has two levels.
At the entry level, we compute with entries \(a_{ij}\).
At the structural level, we ask what the matrix does to vectors, spaces, data, and algorithms.
Both levels are important. Computation without structure is mechanical. Structure without computation is incomplete.
Use Python or AI tools to check matrix products, inverses, ranks, and LU factorizations. But always ask:
Matrices first behave like vectors. We can add matrices of the same size, and we can multiply a matrix by a scalar.
Let \(A=[a_{ij}]\) and \(B=[b_{ij}]\) be \(m\times n\) matrices over a field \(\mathbb F\), and let \(c\in\mathbb F\).
The sum \(A+B\) is the \(m\times n\) matrix
\[ A+B=[a_{ij}+b_{ij}]. \]
The scalar multiple \(cA\) is the \(m\times n\) matrix
\[ cA=[ca_{ij}]. \]
These operations are defined only when the matrix sizes are compatible.
For example,
\[ \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} + \begin{bmatrix} b_{11} & b_{12}\\ b_{21} & b_{22} \end{bmatrix} = \begin{bmatrix} a_{11}+b_{11} & a_{12}+b_{12}\\ a_{21}+b_{21} & a_{22}+b_{22} \end{bmatrix}. \]
Also,
\[ c \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} = \begin{bmatrix} ca_{11} & ca_{12}\\ ca_{21} & ca_{22} \end{bmatrix}. \]
Let \(A,B,C\in M_{m\times n}(\mathbb F)\), and let \(r,s\in\mathbb F\). Then:
Therefore \(M_{m\times n}(\mathbb F)\), the set of all \(m\times n\) matrices over \(\mathbb F\), is a vector space.
Each identity follows from the corresponding identity in the field \(\mathbb F\), applied entry by entry. For example, the \((i,j)\)-entry of \(r(A+B)\) is
\[ r(a_{ij}+b_{ij})=ra_{ij}+rb_{ij}, \]
which is the \((i,j)\)-entry of \(rA+rB\).
Let
\[ A= \begin{bmatrix} 1 & -2 & 0\\ 3 & 4 & 1 \end{bmatrix}, \qquad B= \begin{bmatrix} 2 & 1 & 5\\ -1 & 0 & 3 \end{bmatrix}. \]
Compute \(2A-3B\).
We have
\[ 2A= \begin{bmatrix} 2 & -4 & 0\\ 6 & 8 & 2 \end{bmatrix}, \qquad 3B= \begin{bmatrix} 6 & 3 & 15\\ -3 & 0 & 9 \end{bmatrix}. \]
Therefore
\[ 2A-3B= \begin{bmatrix} -4 & -7 & -15\\ 9 & 8 & -7 \end{bmatrix}. \]
import numpy as np
A = np.array([[1, -2, 0],
[3, 4, 1]])
B = np.array([[ 2, 1, 5],
[-1, 0, 3]])
2*A - 3*Barray([[ -4, -7, -15],
[ 9, 8, -7]])
Suppose \(A_1,A_2,A_3\in M_{m\times n}(\mathbb R)\) represent three model outputs, and let
\[ W=0.2A_1+0.3A_2+0.5A_3. \]
Because matrix addition and scalar multiplication are entrywise,
\[ w_{ij}=0.2(a_1)_{ij}+0.3(a_2)_{ij}+0.5(a_3)_{ij}. \]
Thus \(W\) is an entrywise weighted average of the three model outputs. This is the matrix form of ensemble averaging.
The product \(A\mathbf x\) is the bridge between matrix algebra and linear systems. It has two equally important interpretations:
For vectors
\[ \mathbf u= \begin{bmatrix} u_1\\u_2\\ \vdots\\u_n \end{bmatrix}, \qquad \mathbf v= \begin{bmatrix} v_1\\v_2\\ \vdots\\v_n \end{bmatrix} \in \mathbb F^n, \]
the dot product is
\[ \mathbf u\cdot \mathbf v = u_1v_1+u_2v_2+\cdots+u_nv_n. \]
For real vectors this is the standard Euclidean dot product. For complex vectors, the standard Hermitian inner product uses conjugation and will be discussed later.
Let
\[ \mathbf w=(0.2,0.5,0.3)^T, \qquad \mathbf x=(80,90,70)^T. \]
Then
\[ \mathbf w\cdot \mathbf x = 0.2(80)+0.5(90)+0.3(70) = 82. \]
The dot product gives a weighted score.
w = np.array([0.2, 0.5, 0.3])
x = np.array([80, 90, 70])
w @ x82.0
Let \(A\in M_{m\times n}(\mathbb F)\) have columns
\[ A=[\mathbf a_1\ \mathbf a_2\ \cdots\ \mathbf a_n], \]
and let
\[ \mathbf x= \begin{bmatrix} x_1\\x_2\\ \vdots\\x_n \end{bmatrix} \in\mathbb F^n. \]
The product \(A\mathbf x\in\mathbb F^m\) is defined by
\[ A\mathbf x = x_1\mathbf a_1+x_2\mathbf a_2+\cdots+x_n\mathbf a_n. \]
Equivalently, if \(R_i\) is the \(i\)-th row of \(A\), then
\[ A\mathbf x= \begin{bmatrix} R_1\cdot \mathbf x\\ R_2\cdot \mathbf x\\ \vdots\\ R_m\cdot \mathbf x \end{bmatrix}. \]
A vector \(\mathbf b\in\mathbb F^m\) is a linear combination of vectors \(\mathbf v_1,\ldots,\mathbf v_n\in\mathbb F^m\) if there exist scalars \(x_1,\ldots,x_n\in\mathbb F\) such that
\[ \mathbf b=x_1\mathbf v_1+\cdots+x_n\mathbf v_n. \]
Let
\[ A=[\mathbf a_1\ \mathbf a_2\ \cdots\ \mathbf a_n]\in M_{m\times n}(\mathbb F), \qquad \mathbf b\in\mathbb F^m. \]
The following have the same solution set:
Let \(A\in M_{m\times n}(\mathbb F)\), let \(\mathbf u,\mathbf v\in\mathbb F^n\), and let \(c\in\mathbb F\). Then
\[ A(\mathbf u+\mathbf v)=A\mathbf u+A\mathbf v, \qquad A(c\mathbf u)=cA\mathbf u. \]
Thus the map
\[ T_A:\mathbb F^n\to\mathbb F^m,\qquad T_A(\mathbf x)=A\mathbf x, \]
is linear.
Let
\[ A= \begin{bmatrix} 1 & 2 & -1\\ 0 & 3 & 4\\ 2 & -1 & 1 \end{bmatrix}, \qquad \mathbf x= \begin{bmatrix} 2\\-1\\3 \end{bmatrix}. \]
Using columns,
\[ A\mathbf x = 2 \begin{bmatrix} 1\\0\\2 \end{bmatrix} - \begin{bmatrix} 2\\3\\-1 \end{bmatrix} + 3 \begin{bmatrix} -1\\4\\1 \end{bmatrix} = \begin{bmatrix} -3\\9\\8 \end{bmatrix}. \]
A = np.array([[1, 2, -1],
[0, 3, 4],
[2,-1, 1]])
x = np.array([2, -1, 3])
A @ xarray([-3, 9, 8])
Suppose a system state \(\mathbf x_t\in\mathbb R^3\) evolves by
\[ \mathbf x_{t+1}=A\mathbf x_t, \]
where
\[ A= \begin{bmatrix} 0.8&0.1&0\\ 0.2&0.7&0.1\\ 0&0.2&0.9 \end{bmatrix}, \qquad \mathbf x_0= \begin{bmatrix} 100\\50\\20 \end{bmatrix}. \]
Then
\[ \mathbf x_1=A\mathbf x_0= \begin{bmatrix} 85\\57\\28 \end{bmatrix}. \]
A = np.array([[0.8, 0.1, 0.0],
[0.2, 0.7, 0.1],
[0.0, 0.2, 0.9]])
x0 = np.array([100, 50, 20])
x1 = A @ x0
x1array([85., 57., 28.])
A matrix-vector product is not just a formula. It is one step of a process.
Matrix multiplication is composition of matrix machines.
If \(B\) sends an input vector to an intermediate vector, and \(A\) acts next, then the combined transformation is \(AB\).
\[ \mathbf x \overset{B}{\longmapsto} B\mathbf x \overset{A}{\longmapsto} A(B\mathbf x). \]
The matrix for the combined machine is \(AB\).
Let \(A\in M_{m\times n}(\mathbb F)\), and let \(B\in M_{n\times p}(\mathbb F)\) have columns
\[ B=[\mathbf b_1\ \mathbf b_2\ \cdots\ \mathbf b_p]. \]
The product \(AB\) is the \(m\times p\) matrix
\[ AB=[A\mathbf b_1\ A\mathbf b_2\ \cdots\ A\mathbf b_p]. \]
If the number of columns of \(A\) does not equal the number of rows of \(B\), then \(AB\) is not defined.
If \(A=[a_{ij}]\) is \(m\times n\) and \(B=[b_{ij}]\) is \(n\times p\), then the \((i,j)\)-entry of \(AB\) is
\[ (AB)_{ij} = \sum_{k=1}^n a_{ik}b_{kj}. \]
This is the dot product of the \(i\)-th row of \(A\) and the \(j\)-th column of \(B\).
Let
\[ A= \begin{bmatrix} -3 & 5\\ 4 & 2\\ 1 & -5 \end{bmatrix}, \qquad B= \begin{bmatrix} 2 & -4\\ -4 & 1 \end{bmatrix}. \]
Then
\[ AB= \begin{bmatrix} -26 & 17\\ 0 & -14\\ 22 & -9 \end{bmatrix}. \]
A = np.array([[-3, 5],
[ 4, 2],
[ 1, -5]])
B = np.array([[ 2, -4],
[-4, 1]])
A @ Barray([[-26, 17],
[ 0, -14],
[ 22, -9]])
Let \(A,B,C\) be matrices over \(\mathbb F\), with sizes such that the indicated operations are defined, and let \(r\in\mathbb F\). Then:
The following familiar properties of real numbers do not generally hold for matrices:
Let
\[ A= \begin{bmatrix} 1&2\\ 3&4 \end{bmatrix}, \qquad B= \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix}. \]
Then
\[ AB= \begin{bmatrix} 1&3\\ 3&7 \end{bmatrix}, \qquad BA= \begin{bmatrix} 4&6\\ 3&4 \end{bmatrix}. \]
Thus \(AB\ne BA\).
A = np.array([[1,2],
[3,4]])
B = np.array([[1,1],
[0,1]])
A @ B, B @ A(array([[1, 3],
[3, 7]]),
array([[4, 6],
[3, 4]]))
If matrices are machines, then order should matter.
Rotating an object and then projecting it is usually not the same as projecting it and then rotating it. In data science, normalizing data and then applying a model may produce a different result from applying a model and then normalizing.
Find all \(2\times 2\) real matrices \(X\) such that
\[ XA=AX, \qquad A= \begin{bmatrix} 2&3\\ -3&2 \end{bmatrix}. \]
Write
\[ X= \begin{bmatrix} a&b\\ c&d \end{bmatrix}. \]
Then
\[ XA= \begin{bmatrix} 2a-3b & 3a+2b\\ 2c-3d & 3c+2d \end{bmatrix}, \]
and
\[ AX= \begin{bmatrix} 2a+3c & 2b+3d\\ -3a+2c & -3b+2d \end{bmatrix}. \]
Equating entries gives
\[ c=-b,\qquad d=a. \]
Therefore
\[ X= \begin{bmatrix} a&b\\ -b&a \end{bmatrix} = aI+ b \begin{bmatrix} 0&1\\ -1&0 \end{bmatrix}. \]
These are exactly the real matrices representing multiplication by a complex number \(a-bi\).
import sympy as sp
a,b,c,d = sp.symbols("a b c d")
X = sp.Matrix([[a,b],[c,d]])
A = sp.Matrix([[2,3],[-3,2]])
eq_matrix = X*A - A*X
sp.solve(list(eq_matrix), [a,b,c,d], dict=True)[{a: d, b: -c}]
Powers of a square matrix describe repeated application of the same transformation.
If \(A\in M_{n\times n}(\mathbb F)\) and \(k\ge 1\), then
\[ A^k=\underbrace{A\cdot A\cdots A}_{k\text{ factors}}. \]
We also define
\[ A^0=I_n. \]
Idempotent matrix.
\[ P= \begin{bmatrix} 1/2&1/2\\ 1/2&1/2 \end{bmatrix} \]
satisfies \(P^2=P\), so \(P^k=P\) for all \(k\ge 1\).
Shear matrix.
\[ N= \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix} \]
satisfies
\[ N^k= \begin{bmatrix} 1&k\\ 0&1 \end{bmatrix}. \]
Nilpotent matrix.
\[ T= \begin{bmatrix} 0&1&2\\ 0&0&3\\ 0&0&0 \end{bmatrix} \]
satisfies \(T^3=0\).
Diagonal matrix.
\[ D= \begin{bmatrix} 1&0&0\\ 0&2&0\\ 0&0&3 \end{bmatrix} \]
satisfies
\[ D^k= \begin{bmatrix} 1&0&0\\ 0&2^k&0\\ 0&0&3^k \end{bmatrix}. \]
P = np.array([[0.5,0.5],
[0.5,0.5]])
N = np.array([[1,1],
[0,1]])
T = np.array([[0,1,2],
[0,0,3],
[0,0,0]])
D = np.diag([1,2,3])
print("P^2 =\n", P @ P)
print("N^5 =\n", np.linalg.matrix_power(N, 5))
print("T^3 =\n", np.linalg.matrix_power(T, 3))
print("D^4 =\n", np.linalg.matrix_power(D, 4))P^2 =
[[0.5 0.5]
[0.5 0.5]]
N^5 =
[[1 5]
[0 1]]
T^3 =
[[0 0 0]
[0 0 0]
[0 0 0]]
D^4 =
[[ 1 0 0]
[ 0 16 0]
[ 0 0 81]]
Let
\[ P= \begin{bmatrix} 0.9&0.2\\ 0.1&0.8 \end{bmatrix}, \qquad \mathbf x_0= \begin{bmatrix} 0.6\\0.4 \end{bmatrix}. \]
Then the state after two steps is
\[ \mathbf x_2=P^2\mathbf x_0. \]
First,
\[ P^2= \begin{bmatrix} 0.83&0.34\\ 0.17&0.66 \end{bmatrix}. \]
Therefore
\[ \mathbf x_2= \begin{bmatrix} 0.634\\0.366 \end{bmatrix}. \]
P = np.array([[0.9,0.2],
[0.1,0.8]])
x0 = np.array([0.6,0.4])
x2 = np.linalg.matrix_power(P, 2) @ x0
x2array([0.634, 0.366])
Gaussian elimination can be written as matrix multiplication. This is a major conceptual step: row operations are themselves matrices.
An elementary matrix is a matrix obtained from the identity matrix by performing exactly one elementary row operation.
The three types are:
For \(3\times 3\) matrices,
\[ E_{12}= \begin{bmatrix} 0&1&0\\ 1&0&0\\ 0&0&1 \end{bmatrix}, \qquad E_2(k)= \begin{bmatrix} 1&0&0\\ 0&k&0\\ 0&0&1 \end{bmatrix}, \qquad E_{21}(k)= \begin{bmatrix} 1&0&0\\ k&1&0\\ 0&0&1 \end{bmatrix}. \]
Multiplying a matrix \(A\) on the left by an elementary matrix performs the corresponding elementary row operation on \(A\).
Let
\[ A= \begin{bmatrix} 1&2\\ 3&4\\ 5&6 \end{bmatrix}, \qquad E=E_{31}(-5)= \begin{bmatrix} 1&0&0\\ 0&1&0\\ -5&0&1 \end{bmatrix}. \]
Then
\[ EA= \begin{bmatrix} 1&2\\ 3&4\\ 0&-4 \end{bmatrix}, \]
which is the result of the row operation
\[ R_3\leftarrow R_3-5R_1. \]
A = np.array([[1,2],
[3,4],
[5,6]])
E = np.array([[ 1,0,0],
[ 0,1,0],
[-5,0,1]])
E @ Aarray([[ 1, 2],
[ 3, 4],
[ 0, -4]])
Block multiplication lets us multiply large matrices by treating compatible submatrices as entries.
Suppose
\[ A= \begin{bmatrix} A_1&A_2\\ A_3&A_4 \end{bmatrix}, \qquad B= \begin{bmatrix} B_1&B_2\\ B_3&B_4 \end{bmatrix}, \]
and all products below are defined. Then
\[ AB= \begin{bmatrix} A_1B_1+A_2B_3 & A_1B_2+A_2B_4\\ A_3B_1+A_4B_3 & A_3B_2+A_4B_4 \end{bmatrix}. \]
Let
\[ A= \begin{bmatrix} I&X\\ 0&I \end{bmatrix}, \qquad B= \begin{bmatrix} Y&0\\ Z&I \end{bmatrix}. \]
Then
\[ AB= \begin{bmatrix} Y+XZ&X\\ Z&I \end{bmatrix}. \]
This calculation appears in optimization, statistics, numerical linear algebra, and systems theory when variables are separated into blocks.
An inverse matrix reverses a matrix machine.
If
\[ A\mathbf x=\mathbf y, \]
then \(A^{-1}\) recovers the input:
\[ \mathbf x=A^{-1}\mathbf y. \]
Invertibility means no information is lost.
An \(n\times n\) matrix \(A\) is invertible if there exists an \(n\times n\) matrix \(B\) such that
\[ AB=BA=I_n. \]
In this case, \(B\) is called the inverse of \(A\) and is denoted by \(A^{-1}\).
If a matrix \(A\) is invertible, then its inverse is unique.
Proof. If \(B\) and \(C\) are both inverses of \(A\), then
\[ B=BI=B(AC)=(BA)C=IC=C. \]
Thus \(B=C\). \(\square\)
Let \(A\) and \(B\) be \(n\times n\) matrices.
Suppose \(A\) and \(B\) are invertible \(n\times n\) matrices. Find the inverse of
\[ M=3A^2B^{-1}. \]
Using \((XYZ)^{-1}=Z^{-1}Y^{-1}X^{-1}\),
\[ M^{-1} = (B^{-1})^{-1}(A^2)^{-1}(3I)^{-1} = B(A^{-1})^2\frac{1}{3}I = \frac13 B(A^{-1})^2. \]
Even if \(A\) and \(B\) are invertible, \(A+B\) need not be invertible.
Take
\[ A=I_2,\qquad B=-I_2. \]
Then both \(A\) and \(B\) are invertible, but
\[ A+B=0, \]
which is not invertible.
Let \(A\in M_{n\times n}(\mathbb F)\). The following statements are equivalent:
An invertible matrix is a reversible machine. No two different inputs lead to the same output, and every possible output is reached by exactly one input.
Consider
\[ A= \begin{bmatrix} 1&2&3\\ 2&4&6\\ 0&1&1 \end{bmatrix}. \]
The second row is twice the first row. Therefore the rows are linearly dependent, so
\[ \operatorname{rank}(A)<3. \]
By the inverse matrix theorem, \(A\) is not invertible.
A = sp.Matrix([[1,2,3],
[2,4,6],
[0,1,1]])
A.rank()2
Suppose \(A,B,C\in M_{n\times n}(\mathbb F)\) and
\[ ABC=I_n. \]
Then \(A(BC)=I_n\), so \(A\) has a right inverse. By the inverse matrix theorem, \(A\) is invertible.
Also \((AB)C=I_n\), so \(C\) has a left inverse. Hence \(C\) is invertible.
Multiplying \(ABC=I_n\) on the left by \(A^{-1}\) gives
\[ BC=A^{-1}. \]
Thus
\[ B=A^{-1}C^{-1} \]
is invertible. Therefore all three matrices are invertible, with
\[ A^{-1}=BC,\qquad C^{-1}=AB,\qquad B^{-1}=CA. \]
Let \(A\in M_{n\times n}(\mathbb F)\). Form the augmented matrix
\[ [A\mid I_n]. \]
Use elementary row operations to compute its reduced row echelon form.
Let
\[ A= \begin{bmatrix} 1&1&1\\ 2&3&2\\ 3&8&2 \end{bmatrix}. \]
Row reducing \([A\mid I_3]\) gives
\[ \begin{bmatrix} 1&0&0&10&-6&1\\ 0&1&0&-2&1&0\\ 0&0&1&-7&5&-1 \end{bmatrix}. \]
Hence
\[ A^{-1}= \begin{bmatrix} 10&-6&1\\ -2&1&0\\ -7&5&-1 \end{bmatrix}. \]
A = sp.Matrix([[1,1,1],
[2,3,2],
[3,8,2]])
A.inv()\(\displaystyle \left[\begin{matrix}10 & -6 & 1\\-2 & 1 & 0\\-7 & 5 & -1\end{matrix}\right]\)
The matrix
\[ A= \begin{bmatrix} a&b\\ c&d \end{bmatrix} \]
is invertible if and only if
\[ ad-bc\ne 0. \]
In that case,
\[ A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d&-b\\ -c&a \end{bmatrix}. \]
The scalar \(ad-bc\) is the determinant of \(A\).
Let
\[ A= \begin{bmatrix} 4&7\\ 2&6 \end{bmatrix}. \]
Then
\[ \det(A)=4(6)-7(2)=10\ne 0. \]
So
\[ A^{-1} = \frac{1}{10} \begin{bmatrix} 6&-7\\ -2&4 \end{bmatrix}. \]
The transpose exchanges rows and columns. It also reveals a deep identity about dot products.
Given \(A=[a_{ij}]\in M_{m\times n}(\mathbb F)\), the transpose \(A^T\) is the \(n\times m\) matrix whose \((i,j)\)-entry is \(a_{ji}\).
Let \(A\) and \(B\) be matrices for which the indicated operations are defined, and let \(r\in\mathbb F\). Then:
Let
\[ A= \begin{bmatrix} 1&0&2\\ -1&3&1 \end{bmatrix}, \qquad B= \begin{bmatrix} 2&1\\ 0&-1\\ 4&2 \end{bmatrix}. \]
Instead of computing \((AB)^T\) directly, use
\[ (AB)^T=B^TA^T. \]
A = np.array([[ 1,0,2],
[-1,3,1]])
B = np.array([[2, 1],
[0,-1],
[4, 2]])
np.allclose((A @ B).T, B.T @ A.T)True
For real matrices, the transpose satisfies
\[ (A\mathbf x)\cdot \mathbf y = \mathbf x\cdot (A^T\mathbf y). \]
This identity is central in least squares, optimization, sensitivity analysis, and machine learning. It explains why transposes appear when computing gradients.
Rank measures the number of independent directions captured by a matrix.
If \(AB\) is defined, then
\[ \operatorname{rank}(AB)\le \operatorname{rank}(A). \]
For every matrix \(A\),
\[ \operatorname{rank}(A)=\operatorname{rank}(A^T). \]
If \(AB\) is defined, then
\[ \operatorname{rank}(AB) \le \min\{\operatorname{rank}(A),\operatorname{rank}(B)\}. \]
The columns of \(AB\) are the vectors \(A\mathbf b_j\), where \(\mathbf b_j\) are the columns of \(B\). Therefore all columns of \(AB\) lie in the column space of \(A\). Hence
\[ \operatorname{rank}(AB)\le \operatorname{rank}(A). \]
Using \((AB)^T=B^TA^T\) and \(\operatorname{rank}(M)=\operatorname{rank}(M^T)\), one also obtains
\[ \operatorname{rank}(AB)\le \operatorname{rank}(B). \]
Let \(A\) be a \(100\times 5\) matrix and \(B\) be a \(5\times 100\) matrix. Then
\[ \operatorname{rank}(AB) \le \min\{\operatorname{rank}(A),\operatorname{rank}(B)\} \le 5. \]
Thus the \(100\times 100\) matrix \(AB\) has rank at most \(5\), so it is not invertible.
A large matrix produced through a low-dimensional bottleneck is low rank. This is one reason low-rank factorizations appear in compression, recommendation systems, and neural network model reduction.
An \(n\times n\) matrix \(A\) is symmetric if
\[ A^T=A. \]
Equivalently, if \(A=[a_{ij}]\), then
\[ a_{ij}=a_{ji} \]
for all \(i,j\).
For any real matrix \(X\), the matrix \(X^TX\) is symmetric because
\[ (X^TX)^T = X^T(X^T)^T = X^TX. \]
This observation is fundamental in least squares, covariance matrices, principal component analysis, and quadratic optimization.
X = np.array([[1,2],
[3,4],
[5,6]])
G = X.T @ X
Garray([[35, 44],
[44, 56]])
np.allclose(G, G.T)True
Gaussian elimination can be saved and reused. LU factorization records elimination as a product
\[ A=LU. \]
Here \(L\) records the row-operation multipliers, and \(U\) is the echelon-form matrix.
An \(m\times m\) matrix \(L=[\ell_{ij}]\) is lower triangular if
\[ \ell_{ij}=0 \qquad \text{whenever } j>i. \]
It is unit lower triangular if, in addition,
\[ \ell_{ii}=1 \qquad \text{for every } i. \]
Products and inverses of unit lower triangular matrices are unit lower triangular. Elementary matrices of the form \(E_{ij}(d)\) with \(i>j\) are unit lower triangular.
Let \(A\in M_{m\times n}(\mathbb F)\). An LU factorization of \(A\) is a factorization
\[ A=LU, \]
where \(L\) is a unit lower triangular \(m\times m\) matrix and \(U\) is an \(m\times n\) matrix in echelon form.
Suppose \(A\) can be transformed into echelon form using only row-replacement operations and no row swaps. Then an LU factorization may be found as follows:
Then
\[ A=LU, \]
where \(L\) is unit lower triangular.
Let
\[ A= \begin{bmatrix} 2&1&1\\ 4&-6&0\\ -2&7&2 \end{bmatrix}. \]
Eliminate entries below the first pivot:
\[ R_2\leftarrow R_2-2R_1, \qquad R_3\leftarrow R_3+R_1. \]
Then eliminate below the second pivot:
\[ R_3\leftarrow R_3+R_2. \]
This gives
\[ U= \begin{bmatrix} 2&1&1\\ 0&-8&-2\\ 0&0&1 \end{bmatrix}. \]
The multipliers are
\[ m_{21}=2,\qquad m_{31}=-1,\qquad m_{32}=-1. \]
Thus
\[ L= \begin{bmatrix} 1&0&0\\ 2&1&0\\ -1&-1&1 \end{bmatrix}. \]
Indeed,
\[ A=LU. \]
L = np.array([[ 1, 0, 0],
[ 2, 1, 0],
[-1,-1, 1]])
U = np.array([[2, 1, 1],
[0,-8,-2],
[0, 0, 1]])
L @ Uarray([[ 2, 1, 1],
[ 4, -6, 0],
[-2, 7, 2]])
If \(A=LU\), then
\[ A\mathbf x=\mathbf b \quad\Longleftrightarrow\quad L(U\mathbf x)=\mathbf b. \]
Set
\[ \mathbf y=U\mathbf x. \]
Then solve two triangular systems:
\[ L\mathbf y=\mathbf b \]
by forward substitution, and then
\[ U\mathbf x=\mathbf y \]
by back substitution.
Use the LU factorization from the previous example to solve
\[ A\mathbf x= \begin{bmatrix} 1\\2\\3 \end{bmatrix}. \]
First solve
\[ \begin{bmatrix} 1&0&0\\ 2&1&0\\ -1&-1&1 \end{bmatrix} \begin{bmatrix} y_1\\y_2\\y_3 \end{bmatrix} = \begin{bmatrix} 1\\2\\3 \end{bmatrix}. \]
This gives
\[ y_1=1,\qquad 2y_1+y_2=2,\qquad -y_1-y_2+y_3=3. \]
So
\[ \mathbf y= \begin{bmatrix} 1\\0\\4 \end{bmatrix}. \]
Next solve
\[ \begin{bmatrix} 2&1&1\\ 0&-8&-2\\ 0&0&1 \end{bmatrix} \begin{bmatrix} x_1\\x_2\\x_3 \end{bmatrix} = \begin{bmatrix} 1\\0\\4 \end{bmatrix}. \]
Back substitution gives
\[ x_3=4,\qquad -8x_2-8=0,\qquad 2x_1-1+4=1. \]
Therefore
\[ x_2=-1,\qquad x_1=-1. \]
Thus
\[ \mathbf x= \begin{bmatrix} -1\\-1\\4 \end{bmatrix}. \]
b = np.array([1,2,3])
# Direct check:
A = L @ U
np.linalg.solve(A, b)array([-1., -1., 4.])
Not every matrix has an LU factorization without row swaps. For example,
\[ \begin{bmatrix} 0&1\\ 1&0 \end{bmatrix} \]
requires a row interchange before elimination.
In practice, one often uses
\[ PA=LU, \]
where \(P\) is a permutation matrix. Pivoting is not merely a technical detail; it is essential for numerical reliability.
This section shows how to use Python to check the main computations of the chapter.
A = np.array([[1,2],
[3,4]])
B = np.array([[1,1],
[0,1]])
print("AB =")
print(A @ B)
print("BA =")
print(B @ A)
print("A^3 =")
print(np.linalg.matrix_power(A, 3))AB =
[[1 3]
[3 7]]
BA =
[[4 6]
[3 4]]
A^3 =
[[ 37 54]
[ 81 118]]
A = np.array([[1,2,3],
[2,4,6],
[0,1,1]], dtype=float)
print("rank(A) =", np.linalg.matrix_rank(A))rank(A) = 2
A = np.array([[4,7],
[2,6]], dtype=float)
print("det(A) =", np.linalg.det(A))
print("A inverse =")
print(np.linalg.inv(A))det(A) = 10.000000000000002
A inverse =
[[ 0.6 -0.7]
[-0.2 0.4]]
A = sp.Matrix([[1,1,1],
[2,3,2],
[3,8,2]])
A_inv = A.inv()
A_inv\(\displaystyle \left[\begin{matrix}10 & -6 & 1\\-2 & 1 & 0\\-7 & 5 & -1\end{matrix}\right]\)
A * A_inv\(\displaystyle \left[\begin{matrix}1 & 0 & 0\\0 & 1 & 0\\0 & 0 & 1\end{matrix}\right]\)
A = sp.Matrix([[2,1,1],
[4,-6,0],
[-2,7,2]])
L, U, perm = A.LUdecomposition()
L, U, perm(Matrix([
[ 1, 0, 0],
[ 2, 1, 0],
[-1, -1, 1]]),
Matrix([
[2, 1, 1],
[0, -8, -2],
[0, 0, 1]]),
[])
NumPy is excellent for fast numerical computation. SymPy is useful for exact symbolic computation. In applied linear algebra, both are useful, but they answer slightly different questions.
These questions are designed for graduate-level discussion. They are not mainly long computations. They ask you to connect matrix algebra with structure, algorithms, modeling, and numerical reasoning.
Matrix multiplication is associative but usually not commutative. Explain why this is natural if matrices are viewed as linear transformations. Give one practical situation where the order of two operations should matter.
A matrix maps input vectors to output vectors. Interpret invertibility as “no information is lost.” How does this interpretation relate to pivots, rank, null space, and solving \(A\mathbf x=\mathbf b\)?
In data science, the columns of a matrix may represent features or signals. Explain why rank can be interpreted as the number of independent directions contained in the data. What does it mean if a large data matrix has low rank?
For a data matrix \(A\), the matrix \(A^TA\) records pairwise dot products among the columns of \(A\). What geometric information is contained in \(A^TA\)? What information about \(A\) is lost when replacing \(A\) by \(A^TA\)?
The transpose \(A^T\) can be interpreted as the matrix that moves a dot product from one side to the other:
\[ (A\mathbf x)\cdot\mathbf y=\mathbf x\cdot(A^T\mathbf y). \]
Explain why this identity is important in least squares, optimization, and sensitivity analysis.
LU factorization separates Gaussian elimination into two triangular solves. Why is this valuable when the same matrix \(A\) appears in many systems
\[ A\mathbf x_1=\mathbf b_1,\quad A\mathbf x_2=\mathbf b_2,\quad \ldots \]
with different right-hand sides?
The identity \(A=LU\) may fail without row exchanges, and even when it exists it may be numerically unstable. Explain why row pivoting is not just a technical detail but an important part of reliable computation.
Large matrices are often divided into blocks. Explain how block multiplication helps organize problems in networks, multivariable systems, machine learning, or scientific computing. What is gained by thinking at the block level instead of only at the entry level?
Two formulas can be mathematically identical but computationally very different. For example,
\[ (AB)\mathbf x \qquad\text{and}\qquad A(B\mathbf x) \]
give the same result, but may require very different work. Explain why computational cost should influence how we write and implement matrix expressions.
Let
\[ A= \begin{bmatrix} 1&2&0\\ -1&3&4 \end{bmatrix}, \qquad B= \begin{bmatrix} 2&-1&5\\ 0&1&-2 \end{bmatrix}. \]
Compute \(3A-2B\).
Let
\[ A= \begin{bmatrix} 1&0&2\\ -1&3&1 \end{bmatrix}, \qquad \mathbf x= \begin{bmatrix} 2\\-1\\4 \end{bmatrix}. \]
Compute \(A\mathbf x\) using both the row interpretation and the column interpretation.
Let
\[ A= \begin{bmatrix} 1&2\\ 0&1\\ 3&-1 \end{bmatrix}, \qquad B= \begin{bmatrix} 2&0&1\\ -1&4&3 \end{bmatrix}. \]
Compute \(AB\), and determine whether \(BA\) is defined.
Find all matrices
\[ X= \begin{bmatrix} a&b\\ c&d \end{bmatrix} \]
such that
\[ X \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix} = \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix} X. \]
Let
\[ A= \begin{bmatrix} 1&2&3\\ 0&1&4\\ 0&0&2 \end{bmatrix}. \]
Use the inverse matrix theorem to decide whether \(A\) is invertible.
Compute the inverse of
\[ A= \begin{bmatrix} 3&5\\ 1&2 \end{bmatrix}. \]
Let \(A\) be a \(50\times 8\) matrix and \(B\) be an \(8\times 50\) matrix. Use rank inequalities to explain why \(AB\) cannot be invertible.
Let
\[ A= \begin{bmatrix} 1&2&1\\ 2&5&3\\ 1&0&8 \end{bmatrix}. \]
Find an LU factorization of \(A\) without row swaps.
\[ 3A= \begin{bmatrix} 3&6&0\\ -3&9&12 \end{bmatrix}, \qquad 2B= \begin{bmatrix} 4&-2&10\\ 0&2&-4 \end{bmatrix}. \]
Therefore
\[ 3A-2B= \begin{bmatrix} -1&8&-10\\ -3&7&16 \end{bmatrix}. \]
Using rows,
\[ A\mathbf x= \begin{bmatrix} 1(2)+0(-1)+2(4)\\ -1(2)+3(-1)+1(4) \end{bmatrix} = \begin{bmatrix} 10\\ -1 \end{bmatrix}. \]
Using columns,
\[ A\mathbf x= 2 \begin{bmatrix} 1\\-1 \end{bmatrix} - \begin{bmatrix} 0\\3 \end{bmatrix} + 4 \begin{bmatrix} 2\\1 \end{bmatrix} = \begin{bmatrix} 10\\-1 \end{bmatrix}. \]
\[ AB= \begin{bmatrix} 0&8&7\\ -1&4&3\\ 7&-4&0 \end{bmatrix}. \]
Since \(B\) is \(2\times 3\) and \(A\) is \(3\times 2\), the product \(BA\) is also defined and has size \(2\times 2\).
Let
\[ N= \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix}. \]
Then
\[ XN= \begin{bmatrix} a&a+b\\ c&c+d \end{bmatrix}, \qquad NX= \begin{bmatrix} a+c&b+d\\ c&d \end{bmatrix}. \]
Equating entries gives
\[ c=0,\qquad a=d. \]
Thus
\[ X= \begin{bmatrix} a&b\\ 0&a \end{bmatrix}. \]
The matrix \(A\) is upper triangular with diagonal entries \(1,1,2\). Since all diagonal entries are nonzero, it has a pivot in every column. Thus
\[ \operatorname{rank}(A)=3. \]
By the inverse matrix theorem, \(A\) is invertible.
\[ \det(A)=3(2)-5(1)=1. \]
Therefore
\[ A^{-1} = \begin{bmatrix} 2&-5\\ -1&3 \end{bmatrix}. \]
By the rank inequality,
\[ \operatorname{rank}(AB) \le \min\{\operatorname{rank}(A),\operatorname{rank}(B)\} \le 8. \]
But \(AB\) is \(50\times 50\). An invertible \(50\times 50\) matrix must have rank \(50\). Since
\[ \operatorname{rank}(AB)\le 8<50, \]
the matrix \(AB\) cannot be invertible.
Use elimination:
\[ A= \begin{bmatrix} 1&2&1\\ 2&5&3\\ 1&0&8 \end{bmatrix}. \]
First,
\[ R_2\leftarrow R_2-2R_1, \qquad R_3\leftarrow R_3-R_1, \]
which gives
\[ \begin{bmatrix} 1&2&1\\ 0&1&1\\ 0&-2&7 \end{bmatrix}. \]
Then
\[ R_3\leftarrow R_3+2R_2, \]
which gives
\[ U= \begin{bmatrix} 1&2&1\\ 0&1&1\\ 0&0&9 \end{bmatrix}. \]
The multipliers are
\[ m_{21}=2,\qquad m_{31}=1,\qquad m_{32}=-2. \]
Hence
\[ L= \begin{bmatrix} 1&0&0\\ 2&1&0\\ 1&-2&1 \end{bmatrix}. \]
Therefore
\[ A=LU = \begin{bmatrix} 1&0&0\\ 2&1&0\\ 1&-2&1 \end{bmatrix} \begin{bmatrix} 1&2&1\\ 0&1&1\\ 0&0&9 \end{bmatrix}. \]
Ask an AI tool:
Compute \(AB\) for \[ A=\begin{bmatrix}1&2\\3&4\end{bmatrix}, \qquad B=\begin{bmatrix}1&1\\0&1\end{bmatrix}. \] Explain why \(AB\ne BA\).
Then verify the answer by hand and by Python.
Ask an AI tool:
Explain the inverse matrix theorem in simple language.
Then critique the explanation. Does it mention rank? pivots? the null space? uniqueness of solutions? row reduction?
Ask an AI tool:
Why is \(A(B\mathbf x)\) often faster than \((AB)\mathbf x\)?
Then test a large example in Python.
# Optional timing experiment
import time
m, n, p = 800, 600, 400
A = np.random.randn(m, n)
B = np.random.randn(n, p)
x = np.random.randn(p)
t0 = time.time()
y1 = (A @ B) @ x
t1 = time.time()
t2 = time.time()
y2 = A @ (B @ x)
t3 = time.time()
print("Time for (A @ B) @ x:", t1 - t0)
print("Time for A @ (B @ x):", t3 - t2)
print("Same answer?", np.allclose(y1, y2))Time for (A @ B) @ x: 0.04521799087524414
Time for A @ (B @ x): 0.013590097427368164
Same answer? True
Matrix algebra is not just a collection of rules.
It is the language of:
The rest of the book will deepen this story through subspaces, orthogonality, projections, eigenvalues, SVD, Schur decomposition, Fourier and Haar bases, and modern data science.