2  Chapter 2. Matrices as Machines: Algebra, Inverses, Rank, and LU

From matrix arithmetic to reusable computation

Guiding question.
Once a system has been written as \(A\mathbf{x}=\mathbf b\), how do matrices behave as algebraic objects, geometric machines, and computational tools?

In Chapter 1 we learned that a linear system is not only a list of equations. It is a matrix equation, a geometric intersection problem, and a computational task.

Now we study the object that carries all the information:

\[ A. \]

A matrix can be viewed in several ways:

This chapter develops the basic algebra of matrices: addition, scalar multiplication, matrix-vector products, matrix multiplication, elementary matrices, inverses, transposes, rank, and LU factorization.

The purpose is not only to learn rules. The purpose is to understand why the rules matter.

NoteHow to read this chapter

Matrix algebra has two levels.

At the entry level, we compute with entries \(a_{ij}\).

At the structural level, we ask what the matrix does to vectors, spaces, data, and algorithms.

Both levels are important. Computation without structure is mechanical. Structure without computation is incomplete.

TipAI and coding companion

Use Python or AI tools to check matrix products, inverses, ranks, and LU factorizations. But always ask:

  1. Are the matrix sizes compatible?
  2. What does the computation mean geometrically?
  3. Is the formula numerically reliable?
  4. Can the same expression be written in a faster way?

2.1 2.1 Matrix addition and scalar multiplication

Matrices first behave like vectors. We can add matrices of the same size, and we can multiply a matrix by a scalar.

NoteDefinition 2.1: Matrix addition and scalar multiplication

Let \(A=[a_{ij}]\) and \(B=[b_{ij}]\) be \(m\times n\) matrices over a field \(\mathbb F\), and let \(c\in\mathbb F\).

The sum \(A+B\) is the \(m\times n\) matrix

\[ A+B=[a_{ij}+b_{ij}]. \]

The scalar multiple \(cA\) is the \(m\times n\) matrix

\[ cA=[ca_{ij}]. \]

These operations are defined only when the matrix sizes are compatible.

For example,

\[ \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} + \begin{bmatrix} b_{11} & b_{12}\\ b_{21} & b_{22} \end{bmatrix} = \begin{bmatrix} a_{11}+b_{11} & a_{12}+b_{12}\\ a_{21}+b_{21} & a_{22}+b_{22} \end{bmatrix}. \]

Also,

\[ c \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} = \begin{bmatrix} ca_{11} & ca_{12}\\ ca_{21} & ca_{22} \end{bmatrix}. \]

ImportantTheorem 2.2: Algebraic rules for matrix addition

Let \(A,B,C\in M_{m\times n}(\mathbb F)\), and let \(r,s\in\mathbb F\). Then:

  1. \(A+B=B+A\).
  2. \((A+B)+C=A+(B+C)\).
  3. \(A+0=A\).
  4. \(A+(-A)=0\).
  5. \(r(A+B)=rA+rB\).
  6. \((r+s)A=rA+sA\).
  7. \(r(sA)=(rs)A\).
  8. \(1A=A\).

Therefore \(M_{m\times n}(\mathbb F)\), the set of all \(m\times n\) matrices over \(\mathbb F\), is a vector space.

NoteProof idea

Each identity follows from the corresponding identity in the field \(\mathbb F\), applied entry by entry. For example, the \((i,j)\)-entry of \(r(A+B)\) is

\[ r(a_{ij}+b_{ij})=ra_{ij}+rb_{ij}, \]

which is the \((i,j)\)-entry of \(rA+rB\).

2.1.1 Example 2.3: Combining data arrays

Let

\[ A= \begin{bmatrix} 1 & -2 & 0\\ 3 & 4 & 1 \end{bmatrix}, \qquad B= \begin{bmatrix} 2 & 1 & 5\\ -1 & 0 & 3 \end{bmatrix}. \]

Compute \(2A-3B\).

We have

\[ 2A= \begin{bmatrix} 2 & -4 & 0\\ 6 & 8 & 2 \end{bmatrix}, \qquad 3B= \begin{bmatrix} 6 & 3 & 15\\ -3 & 0 & 9 \end{bmatrix}. \]

Therefore

\[ 2A-3B= \begin{bmatrix} -4 & -7 & -15\\ 9 & 8 & -7 \end{bmatrix}. \]

Code
import numpy as np

A = np.array([[1, -2, 0],
              [3,  4, 1]])

B = np.array([[ 2, 1, 5],
              [-1, 0, 3]])

2*A - 3*B
array([[ -4,  -7, -15],
       [  9,   8,  -7]])
TipGraduate viewpoint: affine combinations

Suppose \(A_1,A_2,A_3\in M_{m\times n}(\mathbb R)\) represent three model outputs, and let

\[ W=0.2A_1+0.3A_2+0.5A_3. \]

Because matrix addition and scalar multiplication are entrywise,

\[ w_{ij}=0.2(a_1)_{ij}+0.3(a_2)_{ij}+0.5(a_3)_{ij}. \]

Thus \(W\) is an entrywise weighted average of the three model outputs. This is the matrix form of ensemble averaging.

2.2 2.2 Vectors, dot products, and matrix-vector products

The product \(A\mathbf x\) is the bridge between matrix algebra and linear systems. It has two equally important interpretations:

  1. a linear combination of columns;
  2. a list of dot products with rows.

2.2.1 Dot products

NoteDefinition 2.4: Dot product

For vectors

\[ \mathbf u= \begin{bmatrix} u_1\\u_2\\ \vdots\\u_n \end{bmatrix}, \qquad \mathbf v= \begin{bmatrix} v_1\\v_2\\ \vdots\\v_n \end{bmatrix} \in \mathbb F^n, \]

the dot product is

\[ \mathbf u\cdot \mathbf v = u_1v_1+u_2v_2+\cdots+u_nv_n. \]

For real vectors this is the standard Euclidean dot product. For complex vectors, the standard Hermitian inner product uses conjugation and will be discussed later.

2.2.2 Example 2.5: Dot product as a score

Let

\[ \mathbf w=(0.2,0.5,0.3)^T, \qquad \mathbf x=(80,90,70)^T. \]

Then

\[ \mathbf w\cdot \mathbf x = 0.2(80)+0.5(90)+0.3(70) = 82. \]

The dot product gives a weighted score.

Code
w = np.array([0.2, 0.5, 0.3])
x = np.array([80, 90, 70])
w @ x
82.0

2.2.3 Matrix-vector products

NoteDefinition 2.6: Matrix-vector product

Let \(A\in M_{m\times n}(\mathbb F)\) have columns

\[ A=[\mathbf a_1\ \mathbf a_2\ \cdots\ \mathbf a_n], \]

and let

\[ \mathbf x= \begin{bmatrix} x_1\\x_2\\ \vdots\\x_n \end{bmatrix} \in\mathbb F^n. \]

The product \(A\mathbf x\in\mathbb F^m\) is defined by

\[ A\mathbf x = x_1\mathbf a_1+x_2\mathbf a_2+\cdots+x_n\mathbf a_n. \]

Equivalently, if \(R_i\) is the \(i\)-th row of \(A\), then

\[ A\mathbf x= \begin{bmatrix} R_1\cdot \mathbf x\\ R_2\cdot \mathbf x\\ \vdots\\ R_m\cdot \mathbf x \end{bmatrix}. \]

NoteDefinition 2.7: Linear combination

A vector \(\mathbf b\in\mathbb F^m\) is a linear combination of vectors \(\mathbf v_1,\ldots,\mathbf v_n\in\mathbb F^m\) if there exist scalars \(x_1,\ldots,x_n\in\mathbb F\) such that

\[ \mathbf b=x_1\mathbf v_1+\cdots+x_n\mathbf v_n. \]

ImportantTheorem 2.8: Equivalent forms of a linear system

Let

\[ A=[\mathbf a_1\ \mathbf a_2\ \cdots\ \mathbf a_n]\in M_{m\times n}(\mathbb F), \qquad \mathbf b\in\mathbb F^m. \]

The following have the same solution set:

  1. the matrix equation \(A\mathbf x=\mathbf b\);
  2. the vector equation \[ x_1\mathbf a_1+\cdots+x_n\mathbf a_n=\mathbf b; \]
  3. the linear system with augmented matrix \([A\mid \mathbf b]\).
ImportantTheorem 2.9: Linearity of \(A\mathbf x\)

Let \(A\in M_{m\times n}(\mathbb F)\), let \(\mathbf u,\mathbf v\in\mathbb F^n\), and let \(c\in\mathbb F\). Then

\[ A(\mathbf u+\mathbf v)=A\mathbf u+A\mathbf v, \qquad A(c\mathbf u)=cA\mathbf u. \]

Thus the map

\[ T_A:\mathbb F^n\to\mathbb F^m,\qquad T_A(\mathbf x)=A\mathbf x, \]

is linear.

2.2.4 Example 2.10: The column interpretation

Let

\[ A= \begin{bmatrix} 1 & 2 & -1\\ 0 & 3 & 4\\ 2 & -1 & 1 \end{bmatrix}, \qquad \mathbf x= \begin{bmatrix} 2\\-1\\3 \end{bmatrix}. \]

Using columns,

\[ A\mathbf x = 2 \begin{bmatrix} 1\\0\\2 \end{bmatrix} - \begin{bmatrix} 2\\3\\-1 \end{bmatrix} + 3 \begin{bmatrix} -1\\4\\1 \end{bmatrix} = \begin{bmatrix} -3\\9\\8 \end{bmatrix}. \]

Code
A = np.array([[1, 2, -1],
              [0, 3,  4],
              [2,-1,  1]])

x = np.array([2, -1, 3])

A @ x
array([-3,  9,  8])

2.2.5 Example 2.11: One step of a linear dynamical system

Suppose a system state \(\mathbf x_t\in\mathbb R^3\) evolves by

\[ \mathbf x_{t+1}=A\mathbf x_t, \]

where

\[ A= \begin{bmatrix} 0.8&0.1&0\\ 0.2&0.7&0.1\\ 0&0.2&0.9 \end{bmatrix}, \qquad \mathbf x_0= \begin{bmatrix} 100\\50\\20 \end{bmatrix}. \]

Then

\[ \mathbf x_1=A\mathbf x_0= \begin{bmatrix} 85\\57\\28 \end{bmatrix}. \]

Code
A = np.array([[0.8, 0.1, 0.0],
              [0.2, 0.7, 0.1],
              [0.0, 0.2, 0.9]])

x0 = np.array([100, 50, 20])
x1 = A @ x0
x1
array([85., 57., 28.])
TipStory interpretation

A matrix-vector product is not just a formula. It is one step of a process.

  • A population evolves.
  • A signal is filtered.
  • A data point is embedded.
  • A neural network layer transforms features.
  • A Markov chain updates probabilities.

2.3 2.3 Matrix multiplication

Matrix multiplication is composition of matrix machines.

If \(B\) sends an input vector to an intermediate vector, and \(A\) acts next, then the combined transformation is \(AB\).

\[ \mathbf x \overset{B}{\longmapsto} B\mathbf x \overset{A}{\longmapsto} A(B\mathbf x). \]

The matrix for the combined machine is \(AB\).

2.3.1 Definition and row-column rule

NoteDefinition 2.12: Matrix product

Let \(A\in M_{m\times n}(\mathbb F)\), and let \(B\in M_{n\times p}(\mathbb F)\) have columns

\[ B=[\mathbf b_1\ \mathbf b_2\ \cdots\ \mathbf b_p]. \]

The product \(AB\) is the \(m\times p\) matrix

\[ AB=[A\mathbf b_1\ A\mathbf b_2\ \cdots\ A\mathbf b_p]. \]

If the number of columns of \(A\) does not equal the number of rows of \(B\), then \(AB\) is not defined.

If \(A=[a_{ij}]\) is \(m\times n\) and \(B=[b_{ij}]\) is \(n\times p\), then the \((i,j)\)-entry of \(AB\) is

\[ (AB)_{ij} = \sum_{k=1}^n a_{ik}b_{kj}. \]

This is the dot product of the \(i\)-th row of \(A\) and the \(j\)-th column of \(B\).

2.3.2 Example 2.13: Computing a matrix product

Let

\[ A= \begin{bmatrix} -3 & 5\\ 4 & 2\\ 1 & -5 \end{bmatrix}, \qquad B= \begin{bmatrix} 2 & -4\\ -4 & 1 \end{bmatrix}. \]

Then

\[ AB= \begin{bmatrix} -26 & 17\\ 0 & -14\\ 22 & -9 \end{bmatrix}. \]

Code
A = np.array([[-3,  5],
              [ 4,  2],
              [ 1, -5]])

B = np.array([[ 2, -4],
              [-4,  1]])

A @ B
array([[-26,  17],
       [  0, -14],
       [ 22,  -9]])

2.3.3 Algebraic rules and non-rules

ImportantTheorem 2.14: Properties of matrix multiplication

Let \(A,B,C\) be matrices over \(\mathbb F\), with sizes such that the indicated operations are defined, and let \(r\in\mathbb F\). Then:

  1. \(A(BC)=(AB)C\).
  2. \(A(B+C)=AB+AC\).
  3. \((A+B)C=AC+BC\).
  4. \(r(AB)=(rA)B=A(rB)\).
  5. \(I_mA=A=AI_n\) when \(A\) is \(m\times n\).
WarningNon-properties

The following familiar properties of real numbers do not generally hold for matrices:

  1. Even if \(AB\) and \(BA\) are both defined, usually \(AB\ne BA\).
  2. If \(AB=AC\), it does not necessarily follow that \(B=C\).
  3. If \(AB=0\), it does not necessarily follow that \(A=0\) or \(B=0\).

2.3.4 Example 2.15: Noncommutativity

Let

\[ A= \begin{bmatrix} 1&2\\ 3&4 \end{bmatrix}, \qquad B= \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix}. \]

Then

\[ AB= \begin{bmatrix} 1&3\\ 3&7 \end{bmatrix}, \qquad BA= \begin{bmatrix} 4&6\\ 3&4 \end{bmatrix}. \]

Thus \(AB\ne BA\).

Code
A = np.array([[1,2],
              [3,4]])

B = np.array([[1,1],
              [0,1]])

A @ B, B @ A
(array([[1, 3],
        [3, 7]]),
 array([[4, 6],
        [3, 4]]))
TipWhy order matters

If matrices are machines, then order should matter.

Rotating an object and then projecting it is usually not the same as projecting it and then rotating it. In data science, normalizing data and then applying a model may produce a different result from applying a model and then normalizing.

2.3.5 Example 2.16: Solving a commutation equation

Find all \(2\times 2\) real matrices \(X\) such that

\[ XA=AX, \qquad A= \begin{bmatrix} 2&3\\ -3&2 \end{bmatrix}. \]

Write

\[ X= \begin{bmatrix} a&b\\ c&d \end{bmatrix}. \]

Then

\[ XA= \begin{bmatrix} 2a-3b & 3a+2b\\ 2c-3d & 3c+2d \end{bmatrix}, \]

and

\[ AX= \begin{bmatrix} 2a+3c & 2b+3d\\ -3a+2c & -3b+2d \end{bmatrix}. \]

Equating entries gives

\[ c=-b,\qquad d=a. \]

Therefore

\[ X= \begin{bmatrix} a&b\\ -b&a \end{bmatrix} = aI+ b \begin{bmatrix} 0&1\\ -1&0 \end{bmatrix}. \]

These are exactly the real matrices representing multiplication by a complex number \(a-bi\).

Code
import sympy as sp

a,b,c,d = sp.symbols("a b c d")
X = sp.Matrix([[a,b],[c,d]])
A = sp.Matrix([[2,3],[-3,2]])

eq_matrix = X*A - A*X
sp.solve(list(eq_matrix), [a,b,c,d], dict=True)
[{a: d, b: -c}]

2.4 2.4 Powers of matrices

Powers of a square matrix describe repeated application of the same transformation.

NoteDefinition 2.17: Matrix powers

If \(A\in M_{n\times n}(\mathbb F)\) and \(k\ge 1\), then

\[ A^k=\underbrace{A\cdot A\cdots A}_{k\text{ factors}}. \]

We also define

\[ A^0=I_n. \]

2.4.1 Example 2.18: Four common power patterns

Idempotent matrix.

\[ P= \begin{bmatrix} 1/2&1/2\\ 1/2&1/2 \end{bmatrix} \]

satisfies \(P^2=P\), so \(P^k=P\) for all \(k\ge 1\).

Shear matrix.

\[ N= \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix} \]

satisfies

\[ N^k= \begin{bmatrix} 1&k\\ 0&1 \end{bmatrix}. \]

Nilpotent matrix.

\[ T= \begin{bmatrix} 0&1&2\\ 0&0&3\\ 0&0&0 \end{bmatrix} \]

satisfies \(T^3=0\).

Diagonal matrix.

\[ D= \begin{bmatrix} 1&0&0\\ 0&2&0\\ 0&0&3 \end{bmatrix} \]

satisfies

\[ D^k= \begin{bmatrix} 1&0&0\\ 0&2^k&0\\ 0&0&3^k \end{bmatrix}. \]

Code
P = np.array([[0.5,0.5],
              [0.5,0.5]])

N = np.array([[1,1],
              [0,1]])

T = np.array([[0,1,2],
              [0,0,3],
              [0,0,0]])

D = np.diag([1,2,3])

print("P^2 =\n", P @ P)
print("N^5 =\n", np.linalg.matrix_power(N, 5))
print("T^3 =\n", np.linalg.matrix_power(T, 3))
print("D^4 =\n", np.linalg.matrix_power(D, 4))
P^2 =
 [[0.5 0.5]
 [0.5 0.5]]
N^5 =
 [[1 5]
 [0 1]]
T^3 =
 [[0 0 0]
 [0 0 0]
 [0 0 0]]
D^4 =
 [[ 1  0  0]
 [ 0 16  0]
 [ 0  0 81]]

2.4.2 Example 2.19: Use in Markov chains

Let

\[ P= \begin{bmatrix} 0.9&0.2\\ 0.1&0.8 \end{bmatrix}, \qquad \mathbf x_0= \begin{bmatrix} 0.6\\0.4 \end{bmatrix}. \]

Then the state after two steps is

\[ \mathbf x_2=P^2\mathbf x_0. \]

First,

\[ P^2= \begin{bmatrix} 0.83&0.34\\ 0.17&0.66 \end{bmatrix}. \]

Therefore

\[ \mathbf x_2= \begin{bmatrix} 0.634\\0.366 \end{bmatrix}. \]

Code
P = np.array([[0.9,0.2],
              [0.1,0.8]])

x0 = np.array([0.6,0.4])
x2 = np.linalg.matrix_power(P, 2) @ x0
x2
array([0.634, 0.366])

2.5 2.5 Elementary matrices and block multiplication

Gaussian elimination can be written as matrix multiplication. This is a major conceptual step: row operations are themselves matrices.

2.5.1 Elementary matrices

NoteDefinition 2.20: Elementary matrices

An elementary matrix is a matrix obtained from the identity matrix by performing exactly one elementary row operation.

The three types are:

  1. \(E_{ij}\): switch rows \(i\) and \(j\);
  2. \(E_i(c)\): multiply row \(i\) by a nonzero scalar \(c\);
  3. \(E_{ij}(d)\): add \(d\) times row \(j\) to row \(i\).

For \(3\times 3\) matrices,

\[ E_{12}= \begin{bmatrix} 0&1&0\\ 1&0&0\\ 0&0&1 \end{bmatrix}, \qquad E_2(k)= \begin{bmatrix} 1&0&0\\ 0&k&0\\ 0&0&1 \end{bmatrix}, \qquad E_{21}(k)= \begin{bmatrix} 1&0&0\\ k&1&0\\ 0&0&1 \end{bmatrix}. \]

ImportantProposition 2.21: Elementary matrices act by row operations

Multiplying a matrix \(A\) on the left by an elementary matrix performs the corresponding elementary row operation on \(A\).

2.5.2 Example 2.22: Using elementary matrices

Let

\[ A= \begin{bmatrix} 1&2\\ 3&4\\ 5&6 \end{bmatrix}, \qquad E=E_{31}(-5)= \begin{bmatrix} 1&0&0\\ 0&1&0\\ -5&0&1 \end{bmatrix}. \]

Then

\[ EA= \begin{bmatrix} 1&2\\ 3&4\\ 0&-4 \end{bmatrix}, \]

which is the result of the row operation

\[ R_3\leftarrow R_3-5R_1. \]

Code
A = np.array([[1,2],
              [3,4],
              [5,6]])

E = np.array([[ 1,0,0],
              [ 0,1,0],
              [-5,0,1]])

E @ A
array([[ 1,  2],
       [ 3,  4],
       [ 0, -4]])

2.5.3 Block multiplication

Block multiplication lets us multiply large matrices by treating compatible submatrices as entries.

Suppose

\[ A= \begin{bmatrix} A_1&A_2\\ A_3&A_4 \end{bmatrix}, \qquad B= \begin{bmatrix} B_1&B_2\\ B_3&B_4 \end{bmatrix}, \]

and all products below are defined. Then

\[ AB= \begin{bmatrix} A_1B_1+A_2B_3 & A_1B_2+A_2B_4\\ A_3B_1+A_4B_3 & A_3B_2+A_4B_4 \end{bmatrix}. \]

2.5.4 Example 2.23: Block multiplication for a linear model

Let

\[ A= \begin{bmatrix} I&X\\ 0&I \end{bmatrix}, \qquad B= \begin{bmatrix} Y&0\\ Z&I \end{bmatrix}. \]

Then

\[ AB= \begin{bmatrix} Y+XZ&X\\ Z&I \end{bmatrix}. \]

This calculation appears in optimization, statistics, numerical linear algebra, and systems theory when variables are separated into blocks.

2.6 2.6 Inverse matrices

An inverse matrix reverses a matrix machine.

If

\[ A\mathbf x=\mathbf y, \]

then \(A^{-1}\) recovers the input:

\[ \mathbf x=A^{-1}\mathbf y. \]

Invertibility means no information is lost.

2.6.1 Definition and basic properties

NoteDefinition 2.24: Invertible matrix

An \(n\times n\) matrix \(A\) is invertible if there exists an \(n\times n\) matrix \(B\) such that

\[ AB=BA=I_n. \]

In this case, \(B\) is called the inverse of \(A\) and is denoted by \(A^{-1}\).

ImportantProposition 2.25: Uniqueness of inverse

If a matrix \(A\) is invertible, then its inverse is unique.

Proof. If \(B\) and \(C\) are both inverses of \(A\), then

\[ B=BI=B(AC)=(BA)C=IC=C. \]

Thus \(B=C\). \(\square\)

ImportantTheorem 2.26: Basic inverse rules

Let \(A\) and \(B\) be \(n\times n\) matrices.

  1. If \(A\) is invertible, then \(A^{-1}\) is invertible and \[ (A^{-1})^{-1}=A. \]
  2. If \(A\) is invertible, then \(A^T\) is invertible and \[ (A^T)^{-1}=(A^{-1})^T. \]
  3. If \(A\) and \(B\) are invertible, then \(AB\) is invertible and \[ (AB)^{-1}=B^{-1}A^{-1}. \]
  4. If \(A\) is invertible and \(c\ne 0\), then \(cA\) is invertible and \[ (cA)^{-1}=c^{-1}A^{-1}. \]
  5. If \(A\) is invertible and \(AB=AC\), then \(B=C\). If \(BA=CA\), then \(B=C\).
  6. If \(A\) is invertible, then \(A^m\) is invertible and \[ (A^m)^{-1}=(A^{-1})^m. \]

2.6.2 Example 2.27: Using inverse rules without recomputing

Suppose \(A\) and \(B\) are invertible \(n\times n\) matrices. Find the inverse of

\[ M=3A^2B^{-1}. \]

Using \((XYZ)^{-1}=Z^{-1}Y^{-1}X^{-1}\),

\[ M^{-1} = (B^{-1})^{-1}(A^2)^{-1}(3I)^{-1} = B(A^{-1})^2\frac{1}{3}I = \frac13 B(A^{-1})^2. \]

2.6.3 Example 2.28: \(A+B\) may fail to be invertible

Even if \(A\) and \(B\) are invertible, \(A+B\) need not be invertible.

Take

\[ A=I_2,\qquad B=-I_2. \]

Then both \(A\) and \(B\) are invertible, but

\[ A+B=0, \]

which is not invertible.

2.6.4 The inverse matrix theorem

ImportantTheorem 2.29: Inverse matrix theorem

Let \(A\in M_{n\times n}(\mathbb F)\). The following statements are equivalent:

  1. \(A\) is invertible.
  2. There exists a square matrix \(B\) such that \(BA=I_n\).
  3. The homogeneous system \(A\mathbf x=\mathbf 0\) has only the trivial solution.
  4. \(\operatorname{rank}(A)=n\).
  5. The reduced row echelon form of \(A\) is \(I_n\).
  6. \(A\) is a product of elementary matrices.
  7. There exists a square matrix \(C\) such that \(AC=I_n\).
  8. For every \(\mathbf b\in\mathbb F^n\), the system \(A\mathbf x=\mathbf b\) has a unique solution.
TipStory interpretation

An invertible matrix is a reversible machine. No two different inputs lead to the same output, and every possible output is reached by exactly one input.

2.6.5 Example 2.30: Testing invertibility

Consider

\[ A= \begin{bmatrix} 1&2&3\\ 2&4&6\\ 0&1&1 \end{bmatrix}. \]

The second row is twice the first row. Therefore the rows are linearly dependent, so

\[ \operatorname{rank}(A)<3. \]

By the inverse matrix theorem, \(A\) is not invertible.

Code
A = sp.Matrix([[1,2,3],
               [2,4,6],
               [0,1,1]])
A.rank()
2

2.6.6 Example 2.31: Invertibility from a product

Suppose \(A,B,C\in M_{n\times n}(\mathbb F)\) and

\[ ABC=I_n. \]

Then \(A(BC)=I_n\), so \(A\) has a right inverse. By the inverse matrix theorem, \(A\) is invertible.

Also \((AB)C=I_n\), so \(C\) has a left inverse. Hence \(C\) is invertible.

Multiplying \(ABC=I_n\) on the left by \(A^{-1}\) gives

\[ BC=A^{-1}. \]

Thus

\[ B=A^{-1}C^{-1} \]

is invertible. Therefore all three matrices are invertible, with

\[ A^{-1}=BC,\qquad C^{-1}=AB,\qquad B^{-1}=CA. \]

2.6.7 Computing inverses

ImportantTheorem 2.32: Algorithm for computing \(A^{-1}\)

Let \(A\in M_{n\times n}(\mathbb F)\). Form the augmented matrix

\[ [A\mid I_n]. \]

Use elementary row operations to compute its reduced row echelon form.

  1. If \([A\mid I_n]\) row reduces to \([I_n\mid C]\), then \[ C=A^{-1}. \]
  2. If the left block cannot be row reduced to \(I_n\), then \(A\) is not invertible.

2.6.8 Example 2.33: Finding an inverse

Let

\[ A= \begin{bmatrix} 1&1&1\\ 2&3&2\\ 3&8&2 \end{bmatrix}. \]

Row reducing \([A\mid I_3]\) gives

\[ \begin{bmatrix} 1&0&0&10&-6&1\\ 0&1&0&-2&1&0\\ 0&0&1&-7&5&-1 \end{bmatrix}. \]

Hence

\[ A^{-1}= \begin{bmatrix} 10&-6&1\\ -2&1&0\\ -7&5&-1 \end{bmatrix}. \]

Code
A = sp.Matrix([[1,1,1],
               [2,3,2],
               [3,8,2]])

A.inv()

\(\displaystyle \left[\begin{matrix}10 & -6 & 1\\-2 & 1 & 0\\-7 & 5 & -1\end{matrix}\right]\)

ImportantTheorem 2.34: Inverse of a \(2\times 2\) matrix

The matrix

\[ A= \begin{bmatrix} a&b\\ c&d \end{bmatrix} \]

is invertible if and only if

\[ ad-bc\ne 0. \]

In that case,

\[ A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d&-b\\ -c&a \end{bmatrix}. \]

The scalar \(ad-bc\) is the determinant of \(A\).

2.6.9 Example 2.35: Fast inverse for a \(2\times 2\) matrix

Let

\[ A= \begin{bmatrix} 4&7\\ 2&6 \end{bmatrix}. \]

Then

\[ \det(A)=4(6)-7(2)=10\ne 0. \]

So

\[ A^{-1} = \frac{1}{10} \begin{bmatrix} 6&-7\\ -2&4 \end{bmatrix}. \]

2.7 2.7 Transpose, symmetry, and rank

The transpose exchanges rows and columns. It also reveals a deep identity about dot products.

2.7.1 Transpose

NoteDefinition 2.36: Transpose

Given \(A=[a_{ij}]\in M_{m\times n}(\mathbb F)\), the transpose \(A^T\) is the \(n\times m\) matrix whose \((i,j)\)-entry is \(a_{ji}\).

ImportantTheorem 2.37: Properties of transposition

Let \(A\) and \(B\) be matrices for which the indicated operations are defined, and let \(r\in\mathbb F\). Then:

  1. \((A^T)^T=A\).
  2. \((A+B)^T=A^T+B^T\).
  3. \((rA)^T=rA^T\).
  4. \((AB)^T=B^TA^T\).

2.7.2 Example 2.38: Transpose of a product

Let

\[ A= \begin{bmatrix} 1&0&2\\ -1&3&1 \end{bmatrix}, \qquad B= \begin{bmatrix} 2&1\\ 0&-1\\ 4&2 \end{bmatrix}. \]

Instead of computing \((AB)^T\) directly, use

\[ (AB)^T=B^TA^T. \]

Code
A = np.array([[ 1,0,2],
              [-1,3,1]])

B = np.array([[2, 1],
              [0,-1],
              [4, 2]])

np.allclose((A @ B).T, B.T @ A.T)
True
TipAdjoint viewpoint

For real matrices, the transpose satisfies

\[ (A\mathbf x)\cdot \mathbf y = \mathbf x\cdot (A^T\mathbf y). \]

This identity is central in least squares, optimization, sensitivity analysis, and machine learning. It explains why transposes appear when computing gradients.

2.7.3 Rank inequalities

Rank measures the number of independent directions captured by a matrix.

ImportantTheorem 2.39: Rank of a product

If \(AB\) is defined, then

\[ \operatorname{rank}(AB)\le \operatorname{rank}(A). \]

ImportantTheorem 2.40: Rank and transpose

For every matrix \(A\),

\[ \operatorname{rank}(A)=\operatorname{rank}(A^T). \]

ImportantTheorem 2.41: Rank bound for matrix products

If \(AB\) is defined, then

\[ \operatorname{rank}(AB) \le \min\{\operatorname{rank}(A),\operatorname{rank}(B)\}. \]

NoteProof idea

The columns of \(AB\) are the vectors \(A\mathbf b_j\), where \(\mathbf b_j\) are the columns of \(B\). Therefore all columns of \(AB\) lie in the column space of \(A\). Hence

\[ \operatorname{rank}(AB)\le \operatorname{rank}(A). \]

Using \((AB)^T=B^TA^T\) and \(\operatorname{rank}(M)=\operatorname{rank}(M^T)\), one also obtains

\[ \operatorname{rank}(AB)\le \operatorname{rank}(B). \]

2.7.4 Example 2.42: Rank bottleneck

Let \(A\) be a \(100\times 5\) matrix and \(B\) be a \(5\times 100\) matrix. Then

\[ \operatorname{rank}(AB) \le \min\{\operatorname{rank}(A),\operatorname{rank}(B)\} \le 5. \]

Thus the \(100\times 100\) matrix \(AB\) has rank at most \(5\), so it is not invertible.

TipData science interpretation

A large matrix produced through a low-dimensional bottleneck is low rank. This is one reason low-rank factorizations appear in compression, recommendation systems, and neural network model reduction.

2.7.5 Symmetric matrices

NoteDefinition 2.43: Symmetric matrix

An \(n\times n\) matrix \(A\) is symmetric if

\[ A^T=A. \]

Equivalently, if \(A=[a_{ij}]\), then

\[ a_{ij}=a_{ji} \]

for all \(i,j\).

2.7.6 Example 2.44: Gram matrices are symmetric

For any real matrix \(X\), the matrix \(X^TX\) is symmetric because

\[ (X^TX)^T = X^T(X^T)^T = X^TX. \]

This observation is fundamental in least squares, covariance matrices, principal component analysis, and quadratic optimization.

Code
X = np.array([[1,2],
              [3,4],
              [5,6]])

G = X.T @ X
G
array([[35, 44],
       [44, 56]])
Code
np.allclose(G, G.T)
True

2.8 2.8 LU factorization and Gaussian elimination

Gaussian elimination can be saved and reused. LU factorization records elimination as a product

\[ A=LU. \]

Here \(L\) records the row-operation multipliers, and \(U\) is the echelon-form matrix.

2.8.1 Lower triangular matrices

NoteDefinition 2.45: Lower triangular matrix

An \(m\times m\) matrix \(L=[\ell_{ij}]\) is lower triangular if

\[ \ell_{ij}=0 \qquad \text{whenever } j>i. \]

It is unit lower triangular if, in addition,

\[ \ell_{ii}=1 \qquad \text{for every } i. \]

NoteRemark

Products and inverses of unit lower triangular matrices are unit lower triangular. Elementary matrices of the form \(E_{ij}(d)\) with \(i>j\) are unit lower triangular.

2.8.2 LU factorization

NoteDefinition 2.46: LU factorization

Let \(A\in M_{m\times n}(\mathbb F)\). An LU factorization of \(A\) is a factorization

\[ A=LU, \]

where \(L\) is a unit lower triangular \(m\times m\) matrix and \(U\) is an \(m\times n\) matrix in echelon form.

ImportantTheorem 2.47: Algorithm for finding an LU factorization

Suppose \(A\) can be transformed into echelon form using only row-replacement operations and no row swaps. Then an LU factorization may be found as follows:

  1. Reduce \(A\) to echelon form \(U\) using only row-replacement operations.
  2. Record the multipliers used in elimination in the corresponding entries of \(L\).

Then

\[ A=LU, \]

where \(L\) is unit lower triangular.

2.8.3 Example 2.48: Computing an LU factorization

Let

\[ A= \begin{bmatrix} 2&1&1\\ 4&-6&0\\ -2&7&2 \end{bmatrix}. \]

Eliminate entries below the first pivot:

\[ R_2\leftarrow R_2-2R_1, \qquad R_3\leftarrow R_3+R_1. \]

Then eliminate below the second pivot:

\[ R_3\leftarrow R_3+R_2. \]

This gives

\[ U= \begin{bmatrix} 2&1&1\\ 0&-8&-2\\ 0&0&1 \end{bmatrix}. \]

The multipliers are

\[ m_{21}=2,\qquad m_{31}=-1,\qquad m_{32}=-1. \]

Thus

\[ L= \begin{bmatrix} 1&0&0\\ 2&1&0\\ -1&-1&1 \end{bmatrix}. \]

Indeed,

\[ A=LU. \]

Code
L = np.array([[ 1, 0, 0],
              [ 2, 1, 0],
              [-1,-1, 1]])

U = np.array([[2, 1, 1],
              [0,-8,-2],
              [0, 0, 1]])

L @ U
array([[ 2,  1,  1],
       [ 4, -6,  0],
       [-2,  7,  2]])

2.8.4 Solving systems with LU

If \(A=LU\), then

\[ A\mathbf x=\mathbf b \quad\Longleftrightarrow\quad L(U\mathbf x)=\mathbf b. \]

Set

\[ \mathbf y=U\mathbf x. \]

Then solve two triangular systems:

\[ L\mathbf y=\mathbf b \]

by forward substitution, and then

\[ U\mathbf x=\mathbf y \]

by back substitution.

2.8.5 Example 2.49: Solving with LU

Use the LU factorization from the previous example to solve

\[ A\mathbf x= \begin{bmatrix} 1\\2\\3 \end{bmatrix}. \]

First solve

\[ \begin{bmatrix} 1&0&0\\ 2&1&0\\ -1&-1&1 \end{bmatrix} \begin{bmatrix} y_1\\y_2\\y_3 \end{bmatrix} = \begin{bmatrix} 1\\2\\3 \end{bmatrix}. \]

This gives

\[ y_1=1,\qquad 2y_1+y_2=2,\qquad -y_1-y_2+y_3=3. \]

So

\[ \mathbf y= \begin{bmatrix} 1\\0\\4 \end{bmatrix}. \]

Next solve

\[ \begin{bmatrix} 2&1&1\\ 0&-8&-2\\ 0&0&1 \end{bmatrix} \begin{bmatrix} x_1\\x_2\\x_3 \end{bmatrix} = \begin{bmatrix} 1\\0\\4 \end{bmatrix}. \]

Back substitution gives

\[ x_3=4,\qquad -8x_2-8=0,\qquad 2x_1-1+4=1. \]

Therefore

\[ x_2=-1,\qquad x_1=-1. \]

Thus

\[ \mathbf x= \begin{bmatrix} -1\\-1\\4 \end{bmatrix}. \]

Code
b = np.array([1,2,3])

# Direct check:
A = L @ U
np.linalg.solve(A, b)
array([-1., -1.,  4.])
WarningPivoting

Not every matrix has an LU factorization without row swaps. For example,

\[ \begin{bmatrix} 0&1\\ 1&0 \end{bmatrix} \]

requires a row interchange before elimination.

In practice, one often uses

\[ PA=LU, \]

where \(P\) is a permutation matrix. Pivoting is not merely a technical detail; it is essential for numerical reliability.

2.9 2.9 Python computation: matrix algebra checklist

This section shows how to use Python to check the main computations of the chapter.

2.9.1 Matrix products and powers

Code
A = np.array([[1,2],
              [3,4]])

B = np.array([[1,1],
              [0,1]])

print("AB =")
print(A @ B)

print("BA =")
print(B @ A)

print("A^3 =")
print(np.linalg.matrix_power(A, 3))
AB =
[[1 3]
 [3 7]]
BA =
[[4 6]
 [3 4]]
A^3 =
[[ 37  54]
 [ 81 118]]

2.9.2 Rank and inverse

Code
A = np.array([[1,2,3],
              [2,4,6],
              [0,1,1]], dtype=float)

print("rank(A) =", np.linalg.matrix_rank(A))
rank(A) = 2
Code
A = np.array([[4,7],
              [2,6]], dtype=float)

print("det(A) =", np.linalg.det(A))
print("A inverse =")
print(np.linalg.inv(A))
det(A) = 10.000000000000002
A inverse =
[[ 0.6 -0.7]
 [-0.2  0.4]]

2.9.3 Exact symbolic computation with SymPy

Code
A = sp.Matrix([[1,1,1],
               [2,3,2],
               [3,8,2]])

A_inv = A.inv()
A_inv

\(\displaystyle \left[\begin{matrix}10 & -6 & 1\\-2 & 1 & 0\\-7 & 5 & -1\end{matrix}\right]\)

Code
A * A_inv

\(\displaystyle \left[\begin{matrix}1 & 0 & 0\\0 & 1 & 0\\0 & 0 & 1\end{matrix}\right]\)

2.9.4 LU factorization

Code
A = sp.Matrix([[2,1,1],
               [4,-6,0],
               [-2,7,2]])

L, U, perm = A.LUdecomposition()
L, U, perm
(Matrix([
 [ 1,  0, 0],
 [ 2,  1, 0],
 [-1, -1, 1]]),
 Matrix([
 [2,  1,  1],
 [0, -8, -2],
 [0,  0,  1]]),
 [])
NoteNumerical note

NumPy is excellent for fast numerical computation. SymPy is useful for exact symbolic computation. In applied linear algebra, both are useful, but they answer slightly different questions.

  • NumPy: fast floating-point computation.
  • SymPy: exact algebraic computation.

2.10 2.10 Classical challenge questions

These questions are designed for graduate-level discussion. They are not mainly long computations. They ask you to connect matrix algebra with structure, algorithms, modeling, and numerical reasoning.

TipChallenge 2.1: Matrix multiplication as composition

Matrix multiplication is associative but usually not commutative. Explain why this is natural if matrices are viewed as linear transformations. Give one practical situation where the order of two operations should matter.

TipChallenge 2.2: Information loss and invertibility

A matrix maps input vectors to output vectors. Interpret invertibility as “no information is lost.” How does this interpretation relate to pivots, rank, null space, and solving \(A\mathbf x=\mathbf b\)?

TipChallenge 2.3: Rank as effective dimension

In data science, the columns of a matrix may represent features or signals. Explain why rank can be interpreted as the number of independent directions contained in the data. What does it mean if a large data matrix has low rank?

TipChallenge 2.4: Gram matrices

For a data matrix \(A\), the matrix \(A^TA\) records pairwise dot products among the columns of \(A\). What geometric information is contained in \(A^TA\)? What information about \(A\) is lost when replacing \(A\) by \(A^TA\)?

TipChallenge 2.5: Transpose and adjoint viewpoint

The transpose \(A^T\) can be interpreted as the matrix that moves a dot product from one side to the other:

\[ (A\mathbf x)\cdot\mathbf y=\mathbf x\cdot(A^T\mathbf y). \]

Explain why this identity is important in least squares, optimization, and sensitivity analysis.

TipChallenge 2.6: LU factorization and reusable computation

LU factorization separates Gaussian elimination into two triangular solves. Why is this valuable when the same matrix \(A\) appears in many systems

\[ A\mathbf x_1=\mathbf b_1,\quad A\mathbf x_2=\mathbf b_2,\quad \ldots \]

with different right-hand sides?

TipChallenge 2.7: Pivoting and numerical reliability

The identity \(A=LU\) may fail without row exchanges, and even when it exists it may be numerically unstable. Explain why row pivoting is not just a technical detail but an important part of reliable computation.

TipChallenge 2.8: Block matrix thinking

Large matrices are often divided into blocks. Explain how block multiplication helps organize problems in networks, multivariable systems, machine learning, or scientific computing. What is gained by thinking at the block level instead of only at the entry level?

TipChallenge 2.9: From algebra to algorithms

Two formulas can be mathematically identical but computationally very different. For example,

\[ (AB)\mathbf x \qquad\text{and}\qquad A(B\mathbf x) \]

give the same result, but may require very different work. Explain why computational cost should influence how we write and implement matrix expressions.

2.11 2.11 Practice problems

2.11.1 Problem 2.1

Let

\[ A= \begin{bmatrix} 1&2&0\\ -1&3&4 \end{bmatrix}, \qquad B= \begin{bmatrix} 2&-1&5\\ 0&1&-2 \end{bmatrix}. \]

Compute \(3A-2B\).

2.11.2 Problem 2.2

Let

\[ A= \begin{bmatrix} 1&0&2\\ -1&3&1 \end{bmatrix}, \qquad \mathbf x= \begin{bmatrix} 2\\-1\\4 \end{bmatrix}. \]

Compute \(A\mathbf x\) using both the row interpretation and the column interpretation.

2.11.3 Problem 2.3

Let

\[ A= \begin{bmatrix} 1&2\\ 0&1\\ 3&-1 \end{bmatrix}, \qquad B= \begin{bmatrix} 2&0&1\\ -1&4&3 \end{bmatrix}. \]

Compute \(AB\), and determine whether \(BA\) is defined.

2.11.4 Problem 2.4

Find all matrices

\[ X= \begin{bmatrix} a&b\\ c&d \end{bmatrix} \]

such that

\[ X \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix} = \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix} X. \]

2.11.5 Problem 2.5

Let

\[ A= \begin{bmatrix} 1&2&3\\ 0&1&4\\ 0&0&2 \end{bmatrix}. \]

Use the inverse matrix theorem to decide whether \(A\) is invertible.

2.11.6 Problem 2.6

Compute the inverse of

\[ A= \begin{bmatrix} 3&5\\ 1&2 \end{bmatrix}. \]

2.11.7 Problem 2.7

Let \(A\) be a \(50\times 8\) matrix and \(B\) be an \(8\times 50\) matrix. Use rank inequalities to explain why \(AB\) cannot be invertible.

2.11.8 Problem 2.8

Let

\[ A= \begin{bmatrix} 1&2&1\\ 2&5&3\\ 1&0&8 \end{bmatrix}. \]

Find an LU factorization of \(A\) without row swaps.

2.12 2.12 Solutions to practice problems

2.12.1 Solution to Problem 2.1

\[ 3A= \begin{bmatrix} 3&6&0\\ -3&9&12 \end{bmatrix}, \qquad 2B= \begin{bmatrix} 4&-2&10\\ 0&2&-4 \end{bmatrix}. \]

Therefore

\[ 3A-2B= \begin{bmatrix} -1&8&-10\\ -3&7&16 \end{bmatrix}. \]

2.12.2 Solution to Problem 2.2

Using rows,

\[ A\mathbf x= \begin{bmatrix} 1(2)+0(-1)+2(4)\\ -1(2)+3(-1)+1(4) \end{bmatrix} = \begin{bmatrix} 10\\ -1 \end{bmatrix}. \]

Using columns,

\[ A\mathbf x= 2 \begin{bmatrix} 1\\-1 \end{bmatrix} - \begin{bmatrix} 0\\3 \end{bmatrix} + 4 \begin{bmatrix} 2\\1 \end{bmatrix} = \begin{bmatrix} 10\\-1 \end{bmatrix}. \]

2.12.3 Solution to Problem 2.3

\[ AB= \begin{bmatrix} 0&8&7\\ -1&4&3\\ 7&-4&0 \end{bmatrix}. \]

Since \(B\) is \(2\times 3\) and \(A\) is \(3\times 2\), the product \(BA\) is also defined and has size \(2\times 2\).

2.12.4 Solution to Problem 2.4

Let

\[ N= \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix}. \]

Then

\[ XN= \begin{bmatrix} a&a+b\\ c&c+d \end{bmatrix}, \qquad NX= \begin{bmatrix} a+c&b+d\\ c&d \end{bmatrix}. \]

Equating entries gives

\[ c=0,\qquad a=d. \]

Thus

\[ X= \begin{bmatrix} a&b\\ 0&a \end{bmatrix}. \]

2.12.5 Solution to Problem 2.5

The matrix \(A\) is upper triangular with diagonal entries \(1,1,2\). Since all diagonal entries are nonzero, it has a pivot in every column. Thus

\[ \operatorname{rank}(A)=3. \]

By the inverse matrix theorem, \(A\) is invertible.

2.12.6 Solution to Problem 2.6

\[ \det(A)=3(2)-5(1)=1. \]

Therefore

\[ A^{-1} = \begin{bmatrix} 2&-5\\ -1&3 \end{bmatrix}. \]

2.12.7 Solution to Problem 2.7

By the rank inequality,

\[ \operatorname{rank}(AB) \le \min\{\operatorname{rank}(A),\operatorname{rank}(B)\} \le 8. \]

But \(AB\) is \(50\times 50\). An invertible \(50\times 50\) matrix must have rank \(50\). Since

\[ \operatorname{rank}(AB)\le 8<50, \]

the matrix \(AB\) cannot be invertible.

2.12.8 Solution to Problem 2.8

Use elimination:

\[ A= \begin{bmatrix} 1&2&1\\ 2&5&3\\ 1&0&8 \end{bmatrix}. \]

First,

\[ R_2\leftarrow R_2-2R_1, \qquad R_3\leftarrow R_3-R_1, \]

which gives

\[ \begin{bmatrix} 1&2&1\\ 0&1&1\\ 0&-2&7 \end{bmatrix}. \]

Then

\[ R_3\leftarrow R_3+2R_2, \]

which gives

\[ U= \begin{bmatrix} 1&2&1\\ 0&1&1\\ 0&0&9 \end{bmatrix}. \]

The multipliers are

\[ m_{21}=2,\qquad m_{31}=1,\qquad m_{32}=-2. \]

Hence

\[ L= \begin{bmatrix} 1&0&0\\ 2&1&0\\ 1&-2&1 \end{bmatrix}. \]

Therefore

\[ A=LU = \begin{bmatrix} 1&0&0\\ 2&1&0\\ 1&-2&1 \end{bmatrix} \begin{bmatrix} 1&2&1\\ 0&1&1\\ 0&0&9 \end{bmatrix}. \]

2.13 2.13 AI companion activities

2.13.1 Activity 2.1: Check a matrix product

Ask an AI tool:

Compute \(AB\) for \[ A=\begin{bmatrix}1&2\\3&4\end{bmatrix}, \qquad B=\begin{bmatrix}1&1\\0&1\end{bmatrix}. \] Explain why \(AB\ne BA\).

Then verify the answer by hand and by Python.

2.13.2 Activity 2.2: Critique an explanation of invertibility

Ask an AI tool:

Explain the inverse matrix theorem in simple language.

Then critique the explanation. Does it mention rank? pivots? the null space? uniqueness of solutions? row reduction?

2.13.3 Activity 2.3: Compare two computational expressions

Ask an AI tool:

Why is \(A(B\mathbf x)\) often faster than \((AB)\mathbf x\)?

Then test a large example in Python.

Code
# Optional timing experiment
import time

m, n, p = 800, 600, 400
A = np.random.randn(m, n)
B = np.random.randn(n, p)
x = np.random.randn(p)

t0 = time.time()
y1 = (A @ B) @ x
t1 = time.time()

t2 = time.time()
y2 = A @ (B @ x)
t3 = time.time()

print("Time for (A @ B) @ x:", t1 - t0)
print("Time for A @ (B @ x):", t3 - t2)
print("Same answer?", np.allclose(y1, y2))
Time for (A @ B) @ x: 0.04521799087524414
Time for A @ (B @ x): 0.013590097427368164
Same answer? True
ImportantMain message of Chapter 2

Matrix algebra is not just a collection of rules.

It is the language of:

  • transforming vectors;
  • composing processes;
  • detecting information loss;
  • measuring effective dimension;
  • reusing computation;
  • organizing large-scale algorithms.

The rest of the book will deepen this story through subspaces, orthogonality, projections, eigenvalues, SVD, Schur decomposition, Fourier and Haar bases, and modern data science.