2 Chapter 2. Matrices as Machines: Algebra, Inverses, Rank, and LU

From matrix arithmetic to reusable computation

Guiding question.
Once a system has been written as $A\mathbf{x}=\mathbf b$, how do matrices behave as algebraic objects, geometric machines, and computational tools?

In Chapter 1 we learned that a linear system is not only a list of equations. It is a matrix equation, a geometric intersection problem, and a computational task.

Now we study the object that carries all the information:

\[ A. \]

A matrix can be viewed in several ways:

as a rectangular table of numbers;
as a data array;
as a collection of column vectors;
as a machine acting on vectors;
as a linear transformation;
as a compressed record of row operations;
as a computational object whose structure affects speed and stability.

This chapter develops the basic algebra of matrices: addition, scalar multiplication, matrix-vector products, matrix multiplication, elementary matrices, inverses, transposes, rank, and LU factorization.

The purpose is not only to learn rules. The purpose is to understand why the rules matter.

How to read this chapter

Matrix algebra has two levels.

At the entry level, we compute with entries $a_{ij}$.

At the structural level, we ask what the matrix does to vectors, spaces, data, and algorithms.

Both levels are important. Computation without structure is mechanical. Structure without computation is incomplete.

AI and coding companion

Use Python or AI tools to check matrix products, inverses, ranks, and LU factorizations. But always ask:

Are the matrix sizes compatible?
What does the computation mean geometrically?
Is the formula numerically reliable?
Can the same expression be written in a faster way?

2.1 2.1 Matrix addition and scalar multiplication

Matrices first behave like vectors. We can add matrices of the same size, and we can multiply a matrix by a scalar.

Definition 2.1: Matrix addition and scalar multiplication

Let $A=[a_{ij}]$ and $B=[b_{ij}]$ be $m\times n$ matrices over a field $\mathbb F$, and let $c\in\mathbb F$.

The sum $A+B$ is the $m\times n$ matrix

\[ A+B=[a_{ij}+b_{ij}]. \]

The scalar multiple $cA$ is the $m\times n$ matrix

\[ cA=[ca_{ij}]. \]

These operations are defined only when the matrix sizes are compatible.

For example,

\[ \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} + \begin{bmatrix} b_{11} & b_{12}\\ b_{21} & b_{22} \end{bmatrix} = \begin{bmatrix} a_{11}+b_{11} & a_{12}+b_{12}\\ a_{21}+b_{21} & a_{22}+b_{22} \end{bmatrix}. \]

Also,

\[ c \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} = \begin{bmatrix} ca_{11} & ca_{12}\\ ca_{21} & ca_{22} \end{bmatrix}. \]

Theorem 2.2: Algebraic rules for matrix addition

Let $A,B,C\in M_{m\times n}(\mathbb F)$, and let $r,s\in\mathbb F$. Then:

$A+B=B+A$.
$(A+B)+C=A+(B+C)$.
$A+0=A$.
$A+(-A)=0$.
$r(A+B)=rA+rB$.
$(r+s)A=rA+sA$.
$r(sA)=(rs)A$.
$1A=A$.

Therefore $M_{m\times n}(\mathbb F)$, the set of all $m\times n$ matrices over $\mathbb F$, is a vector space.

Proof idea

Each identity follows from the corresponding identity in the field $\mathbb F$, applied entry by entry. For example, the $(i,j)$-entry of $r(A+B)$ is

\[ r(a_{ij}+b_{ij})=ra_{ij}+rb_{ij}, \]

which is the $(i,j)$-entry of $rA+rB$.

2.1.1 Example 2.3: Combining data arrays

Let

\[ A= \begin{bmatrix} 1 & -2 & 0\\ 3 & 4 & 1 \end{bmatrix}, \qquad B= \begin{bmatrix} 2 & 1 & 5\\ -1 & 0 & 3 \end{bmatrix}. \]

Compute $2A-3B$.

We have

\[ 2A= \begin{bmatrix} 2 & -4 & 0\\ 6 & 8 & 2 \end{bmatrix}, \qquad 3B= \begin{bmatrix} 6 & 3 & 15\\ -3 & 0 & 9 \end{bmatrix}. \]

Therefore

\[ 2A-3B= \begin{bmatrix} -4 & -7 & -15\\ 9 & 8 & -7 \end{bmatrix}. \]

Code

import numpy as np

A = np.array([[1, -2, 0],
              [3,  4, 1]])

B = np.array([[ 2, 1, 5],
              [-1, 0, 3]])

2*A - 3*B

array([[ -4,  -7, -15],
       [  9,   8,  -7]])

Graduate viewpoint: affine combinations

Suppose $A_1,A_2,A_3\in M_{m\times n}(\mathbb R)$ represent three model outputs, and let

\[ W=0.2A_1+0.3A_2+0.5A_3. \]

Because matrix addition and scalar multiplication are entrywise,

\[ w_{ij}=0.2(a_1)_{ij}+0.3(a_2)_{ij}+0.5(a_3)_{ij}. \]

Thus $W$ is an entrywise weighted average of the three model outputs. This is the matrix form of ensemble averaging.

2.2 2.2 Vectors, dot products, and matrix-vector products

The product $A\mathbf x$ is the bridge between matrix algebra and linear systems. It has two equally important interpretations:

a linear combination of columns;
a list of dot products with rows.

2.2.1 Dot products

Definition 2.4: Dot product

For vectors

\[ \mathbf u= \begin{bmatrix} u_1\\u_2\\ \vdots\\u_n \end{bmatrix}, \qquad \mathbf v= \begin{bmatrix} v_1\\v_2\\ \vdots\\v_n \end{bmatrix} \in \mathbb F^n, \]

the dot product is

\[ \mathbf u\cdot \mathbf v = u_1v_1+u_2v_2+\cdots+u_nv_n. \]

For real vectors this is the standard Euclidean dot product. For complex vectors, the standard Hermitian inner product uses conjugation and will be discussed later.

2.2.2 Example 2.5: Dot product as a score

Let

\[ \mathbf w=(0.2,0.5,0.3)^T, \qquad \mathbf x=(80,90,70)^T. \]

Then

\[ \mathbf w\cdot \mathbf x = 0.2(80)+0.5(90)+0.3(70) = 82. \]

The dot product gives a weighted score.

Code

w = np.array([0.2, 0.5, 0.3])
x = np.array([80, 90, 70])
w @ x

82.0

2.2.3 Matrix-vector products

Definition 2.6: Matrix-vector product

Let $A\in M_{m\times n}(\mathbb F)$ have columns

\[ A=[\mathbf a_1\ \mathbf a_2\ \cdots\ \mathbf a_n], \]

and let

\[ \mathbf x= \begin{bmatrix} x_1\\x_2\\ \vdots\\x_n \end{bmatrix} \in\mathbb F^n. \]

The product $A\mathbf x\in\mathbb F^m$ is defined by

\[ A\mathbf x = x_1\mathbf a_1+x_2\mathbf a_2+\cdots+x_n\mathbf a_n. \]

Equivalently, if $R_i$ is the $i$-th row of $A$, then

\[ A\mathbf x= \begin{bmatrix} R_1\cdot \mathbf x\\ R_2\cdot \mathbf x\\ \vdots\\ R_m\cdot \mathbf x \end{bmatrix}. \]

Definition 2.7: Linear combination

A vector $\mathbf b\in\mathbb F^m$ is a linear combination of vectors $\mathbf v_1,\ldots,\mathbf v_n\in\mathbb F^m$ if there exist scalars $x_1,\ldots,x_n\in\mathbb F$ such that

\[ \mathbf b=x_1\mathbf v_1+\cdots+x_n\mathbf v_n. \]

Theorem 2.8: Equivalent forms of a linear system

Let

\[ A=[\mathbf a_1\ \mathbf a_2\ \cdots\ \mathbf a_n]\in M_{m\times n}(\mathbb F), \qquad \mathbf b\in\mathbb F^m. \]

The following have the same solution set:

the matrix equation $A\mathbf x=\mathbf b$;
the vector equation \[ x_1\mathbf a_1+\cdots+x_n\mathbf a_n=\mathbf b; \]
the linear system with augmented matrix $[A\mid \mathbf b]$.

Theorem 2.9: Linearity of $A\mathbf x$

Let $A\in M_{m\times n}(\mathbb F)$, let $\mathbf u,\mathbf v\in\mathbb F^n$, and let $c\in\mathbb F$. Then

\[ A(\mathbf u+\mathbf v)=A\mathbf u+A\mathbf v, \qquad A(c\mathbf u)=cA\mathbf u. \]

Thus the map

\[ T_A:\mathbb F^n\to\mathbb F^m,\qquad T_A(\mathbf x)=A\mathbf x, \]

is linear.

2.2.4 Example 2.10: The column interpretation

Let

\[ A= \begin{bmatrix} 1 & 2 & -1\\ 0 & 3 & 4\\ 2 & -1 & 1 \end{bmatrix}, \qquad \mathbf x= \begin{bmatrix} 2\\-1\\3 \end{bmatrix}. \]

Using columns,

\[ A\mathbf x = 2 \begin{bmatrix} 1\\0\\2 \end{bmatrix} - \begin{bmatrix} 2\\3\\-1 \end{bmatrix} + 3 \begin{bmatrix} -1\\4\\1 \end{bmatrix} = \begin{bmatrix} -3\\9\\8 \end{bmatrix}. \]

Code

A = np.array([[1, 2, -1],
              [0, 3,  4],
              [2,-1,  1]])

x = np.array([2, -1, 3])

A @ x

array([-3,  9,  8])

2.2.5 Example 2.11: One step of a linear dynamical system

Suppose a system state $\mathbf x_t\in\mathbb R^3$ evolves by

\[ \mathbf x_{t+1}=A\mathbf x_t, \]

where

\[ A= \begin{bmatrix} 0.8&0.1&0\\ 0.2&0.7&0.1\\ 0&0.2&0.9 \end{bmatrix}, \qquad \mathbf x_0= \begin{bmatrix} 100\\50\\20 \end{bmatrix}. \]

Then

\[ \mathbf x_1=A\mathbf x_0= \begin{bmatrix} 85\\57\\28 \end{bmatrix}. \]

Code

A = np.array([[0.8, 0.1, 0.0],
              [0.2, 0.7, 0.1],
              [0.0, 0.2, 0.9]])

x0 = np.array([100, 50, 20])
x1 = A @ x0
x1

array([85., 57., 28.])

Story interpretation

A matrix-vector product is not just a formula. It is one step of a process.

A population evolves.
A signal is filtered.
A data point is embedded.
A neural network layer transforms features.
A Markov chain updates probabilities.

2.3 2.3 Matrix multiplication

Matrix multiplication is composition of matrix machines.

If $B$ sends an input vector to an intermediate vector, and $A$ acts next, then the combined transformation is $AB$.

\[ \mathbf x \overset{B}{\longmapsto} B\mathbf x \overset{A}{\longmapsto} A(B\mathbf x). \]

The matrix for the combined machine is $AB$.

2.3.1 Definition and row-column rule

Definition 2.12: Matrix product

Let $A\in M_{m\times n}(\mathbb F)$, and let $B\in M_{n\times p}(\mathbb F)$ have columns

\[ B=[\mathbf b_1\ \mathbf b_2\ \cdots\ \mathbf b_p]. \]

The product $AB$ is the $m\times p$ matrix

\[ AB=[A\mathbf b_1\ A\mathbf b_2\ \cdots\ A\mathbf b_p]. \]

If the number of columns of $A$ does not equal the number of rows of $B$, then $AB$ is not defined.

If $A=[a_{ij}]$ is $m\times n$ and $B=[b_{ij}]$ is $n\times p$, then the $(i,j)$-entry of $AB$ is

\[ (AB)_{ij} = \sum_{k=1}^n a_{ik}b_{kj}. \]

This is the dot product of the $i$-th row of $A$ and the $j$-th column of $B$.

2.3.2 Example 2.13: Computing a matrix product

Let

\[ A= \begin{bmatrix} -3 & 5\\ 4 & 2\\ 1 & -5 \end{bmatrix}, \qquad B= \begin{bmatrix} 2 & -4\\ -4 & 1 \end{bmatrix}. \]

Then

\[ AB= \begin{bmatrix} -26 & 17\\ 0 & -14\\ 22 & -9 \end{bmatrix}. \]

Code

A = np.array([[-3,  5],
              [ 4,  2],
              [ 1, -5]])

B = np.array([[ 2, -4],
              [-4,  1]])

A @ B

array([[-26,  17],
       [  0, -14],
       [ 22,  -9]])

2.3.3 Algebraic rules and non-rules

Theorem 2.14: Properties of matrix multiplication

Let $A,B,C$ be matrices over $\mathbb F$, with sizes such that the indicated operations are defined, and let $r\in\mathbb F$. Then:

$A(BC)=(AB)C$.
$A(B+C)=AB+AC$.
$(A+B)C=AC+BC$.
$r(AB)=(rA)B=A(rB)$.
$I_mA=A=AI_n$ when $A$ is $m\times n$.

Non-properties

The following familiar properties of real numbers do not generally hold for matrices:

Even if $AB$ and $BA$ are both defined, usually $AB\ne BA$.
If $AB=AC$, it does not necessarily follow that $B=C$.
If $AB=0$, it does not necessarily follow that $A=0$ or $B=0$.

2.3.4 Example 2.15: Noncommutativity

Let

\[ A= \begin{bmatrix} 1&2\\ 3&4 \end{bmatrix}, \qquad B= \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix}. \]

Then

\[ AB= \begin{bmatrix} 1&3\\ 3&7 \end{bmatrix}, \qquad BA= \begin{bmatrix} 4&6\\ 3&4 \end{bmatrix}. \]

Thus $AB\ne BA$.

Code

A = np.array([[1,2],
              [3,4]])

B = np.array([[1,1],
              [0,1]])

A @ B, B @ A

(array([[1, 3],
        [3, 7]]),
 array([[4, 6],
        [3, 4]]))

Why order matters

If matrices are machines, then order should matter.

Rotating an object and then projecting it is usually not the same as projecting it and then rotating it. In data science, normalizing data and then applying a model may produce a different result from applying a model and then normalizing.

2.3.5 Example 2.16: Solving a commutation equation

Find all $2\times 2$ real matrices $X$ such that

\[ XA=AX, \qquad A= \begin{bmatrix} 2&3\\ -3&2 \end{bmatrix}. \]

Write

\[ X= \begin{bmatrix} a&b\\ c&d \end{bmatrix}. \]

Then

\[ XA= \begin{bmatrix} 2a-3b & 3a+2b\\ 2c-3d & 3c+2d \end{bmatrix}, \]

and

\[ AX= \begin{bmatrix} 2a+3c & 2b+3d\\ -3a+2c & -3b+2d \end{bmatrix}. \]

Equating entries gives

\[ c=-b,\qquad d=a. \]

Therefore

\[ X= \begin{bmatrix} a&b\\ -b&a \end{bmatrix} = aI+ b \begin{bmatrix} 0&1\\ -1&0 \end{bmatrix}. \]

These are exactly the real matrices representing multiplication by a complex number $a-bi$.

Code

import sympy as sp

a,b,c,d = sp.symbols("a b c d")
X = sp.Matrix([[a,b],[c,d]])
A = sp.Matrix([[2,3],[-3,2]])

eq_matrix = X*A - A*X
sp.solve(list(eq_matrix), [a,b,c,d], dict=True)

[{a: d, b: -c}]

2.4 2.4 Powers of matrices

Powers of a square matrix describe repeated application of the same transformation.

Definition 2.17: Matrix powers

If $A\in M_{n\times n}(\mathbb F)$ and $k\ge 1$, then

\[ A^k=\underbrace{A\cdot A\cdots A}_{k\text{ factors}}. \]

We also define

\[ A^0=I_n. \]

2.4.1 Example 2.18: Four common power patterns

Idempotent matrix.

\[ P= \begin{bmatrix} 1/2&1/2\\ 1/2&1/2 \end{bmatrix} \]

satisfies $P^2=P$, so $P^k=P$ for all $k\ge 1$.

Shear matrix.

\[ N= \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix} \]

satisfies

\[ N^k= \begin{bmatrix} 1&k\\ 0&1 \end{bmatrix}. \]

Nilpotent matrix.

\[ T= \begin{bmatrix} 0&1&2\\ 0&0&3\\ 0&0&0 \end{bmatrix} \]

satisfies $T^3=0$.

Diagonal matrix.

\[ D= \begin{bmatrix} 1&0&0\\ 0&2&0\\ 0&0&3 \end{bmatrix} \]

satisfies

\[ D^k= \begin{bmatrix} 1&0&0\\ 0&2^k&0\\ 0&0&3^k \end{bmatrix}. \]

Code

P = np.array([[0.5,0.5],
              [0.5,0.5]])

N = np.array([[1,1],
              [0,1]])

T = np.array([[0,1,2],
              [0,0,3],
              [0,0,0]])

D = np.diag([1,2,3])

print("P^2 =\n", P @ P)
print("N^5 =\n", np.linalg.matrix_power(N, 5))
print("T^3 =\n", np.linalg.matrix_power(T, 3))
print("D^4 =\n", np.linalg.matrix_power(D, 4))

P^2 =
 [[0.5 0.5]
 [0.5 0.5]]
N^5 =
 [[1 5]
 [0 1]]
T^3 =
 [[0 0 0]
 [0 0 0]
 [0 0 0]]
D^4 =
 [[ 1  0  0]
 [ 0 16  0]
 [ 0  0 81]]

2.4.2 Example 2.19: Use in Markov chains

Let

\[ P= \begin{bmatrix} 0.9&0.2\\ 0.1&0.8 \end{bmatrix}, \qquad \mathbf x_0= \begin{bmatrix} 0.6\\0.4 \end{bmatrix}. \]

Then the state after two steps is

\[ \mathbf x_2=P^2\mathbf x_0. \]

First,

\[ P^2= \begin{bmatrix} 0.83&0.34\\ 0.17&0.66 \end{bmatrix}. \]

Therefore

\[ \mathbf x_2= \begin{bmatrix} 0.634\\0.366 \end{bmatrix}. \]

Code

P = np.array([[0.9,0.2],
              [0.1,0.8]])

x0 = np.array([0.6,0.4])
x2 = np.linalg.matrix_power(P, 2) @ x0
x2

array([0.634, 0.366])

2.5 2.5 Elementary matrices and block multiplication

Gaussian elimination can be written as matrix multiplication. This is a major conceptual step: row operations are themselves matrices.

2.5.1 Elementary matrices

Definition 2.20: Elementary matrices

An elementary matrix is a matrix obtained from the identity matrix by performing exactly one elementary row operation.

The three types are:

$E_{ij}$: switch rows $i$ and $j$;
$E_i(c)$: multiply row $i$ by a nonzero scalar $c$;
$E_{ij}(d)$: add $d$ times row $j$ to row $i$.

For $3\times 3$ matrices,

\[ E_{12}= \begin{bmatrix} 0&1&0\\ 1&0&0\\ 0&0&1 \end{bmatrix}, \qquad E_2(k)= \begin{bmatrix} 1&0&0\\ 0&k&0\\ 0&0&1 \end{bmatrix}, \qquad E_{21}(k)= \begin{bmatrix} 1&0&0\\ k&1&0\\ 0&0&1 \end{bmatrix}. \]

Proposition 2.21: Elementary matrices act by row operations

Multiplying a matrix $A$ on the left by an elementary matrix performs the corresponding elementary row operation on $A$.

2.5.2 Example 2.22: Using elementary matrices

Let

\[ A= \begin{bmatrix} 1&2\\ 3&4\\ 5&6 \end{bmatrix}, \qquad E=E_{31}(-5)= \begin{bmatrix} 1&0&0\\ 0&1&0\\ -5&0&1 \end{bmatrix}. \]

Then

\[ EA= \begin{bmatrix} 1&2\\ 3&4\\ 0&-4 \end{bmatrix}, \]

which is the result of the row operation

\[ R_3\leftarrow R_3-5R_1. \]

Code

A = np.array([[1,2],
              [3,4],
              [5,6]])

E = np.array([[ 1,0,0],
              [ 0,1,0],
              [-5,0,1]])

E @ A

array([[ 1,  2],
       [ 3,  4],
       [ 0, -4]])

2.5.3 Block multiplication

Block multiplication lets us multiply large matrices by treating compatible submatrices as entries.

Suppose

\[ A= \begin{bmatrix} A_1&A_2\\ A_3&A_4 \end{bmatrix}, \qquad B= \begin{bmatrix} B_1&B_2\\ B_3&B_4 \end{bmatrix}, \]

and all products below are defined. Then

\[ AB= \begin{bmatrix} A_1B_1+A_2B_3 & A_1B_2+A_2B_4\\ A_3B_1+A_4B_3 & A_3B_2+A_4B_4 \end{bmatrix}. \]

2.5.4 Example 2.23: Block multiplication for a linear model

Let

\[ A= \begin{bmatrix} I&X\\ 0&I \end{bmatrix}, \qquad B= \begin{bmatrix} Y&0\\ Z&I \end{bmatrix}. \]

Then

\[ AB= \begin{bmatrix} Y+XZ&X\\ Z&I \end{bmatrix}. \]

This calculation appears in optimization, statistics, numerical linear algebra, and systems theory when variables are separated into blocks.

2.6 2.6 Inverse matrices

An inverse matrix reverses a matrix machine.

\[ A\mathbf x=\mathbf y, \]

then $A^{-1}$ recovers the input:

\[ \mathbf x=A^{-1}\mathbf y. \]

Invertibility means no information is lost.

2.6.1 Definition and basic properties

Definition 2.24: Invertible matrix

An $n\times n$ matrix $A$ is invertible if there exists an $n\times n$ matrix $B$ such that

\[ AB=BA=I_n. \]

In this case, $B$ is called the inverse of $A$ and is denoted by $A^{-1}$.

Proposition 2.25: Uniqueness of inverse

If a matrix $A$ is invertible, then its inverse is unique.

Proof. If $B$ and $C$ are both inverses of $A$, then

\[ B=BI=B(AC)=(BA)C=IC=C. \]

Thus $B=C$. $\square$

Theorem 2.26: Basic inverse rules

Let $A$ and $B$ be $n\times n$ matrices.

If $A$ is invertible, then $A^{-1}$ is invertible and \[ (A^{-1})^{-1}=A. \]
If $A$ is invertible, then $A^T$ is invertible and \[ (A^T)^{-1}=(A^{-1})^T. \]
If $A$ and $B$ are invertible, then $AB$ is invertible and \[ (AB)^{-1}=B^{-1}A^{-1}. \]
If $A$ is invertible and $c\ne 0$, then $cA$ is invertible and \[ (cA)^{-1}=c^{-1}A^{-1}. \]
If $A$ is invertible and $AB=AC$, then $B=C$. If $BA=CA$, then $B=C$.
If $A$ is invertible, then $A^m$ is invertible and \[ (A^m)^{-1}=(A^{-1})^m. \]

2.6.2 Example 2.27: Using inverse rules without recomputing

Suppose $A$ and $B$ are invertible $n\times n$ matrices. Find the inverse of

\[ M=3A^2B^{-1}. \]

Using $(XYZ)^{-1}=Z^{-1}Y^{-1}X^{-1}$,

\[ M^{-1} = (B^{-1})^{-1}(A^2)^{-1}(3I)^{-1} = B(A^{-1})^2\frac{1}{3}I = \frac13 B(A^{-1})^2. \]

2.6.3 Example 2.28: $A+B$ may fail to be invertible

Even if $A$ and $B$ are invertible, $A+B$ need not be invertible.

Take

\[ A=I_2,\qquad B=-I_2. \]

Then both $A$ and $B$ are invertible, but

\[ A+B=0, \]

which is not invertible.

2.6.4 The inverse matrix theorem

Theorem 2.29: Inverse matrix theorem

Let $A\in M_{n\times n}(\mathbb F)$. The following statements are equivalent:

$A$ is invertible.
There exists a square matrix $B$ such that $BA=I_n$.
The homogeneous system $A\mathbf x=\mathbf 0$ has only the trivial solution.
$\operatorname{rank}(A)=n$.
The reduced row echelon form of $A$ is $I_n$.
$A$ is a product of elementary matrices.
There exists a square matrix $C$ such that $AC=I_n$.
For every $\mathbf b\in\mathbb F^n$, the system $A\mathbf x=\mathbf b$ has a unique solution.

Story interpretation

An invertible matrix is a reversible machine. No two different inputs lead to the same output, and every possible output is reached by exactly one input.

2.6.5 Example 2.30: Testing invertibility

Consider

\[ A= \begin{bmatrix} 1&2&3\\ 2&4&6\\ 0&1&1 \end{bmatrix}. \]

The second row is twice the first row. Therefore the rows are linearly dependent, so

\[ \operatorname{rank}(A)<3. \]

By the inverse matrix theorem, $A$ is not invertible.

Code

A = sp.Matrix([[1,2,3],
               [2,4,6],
               [0,1,1]])
A.rank()

2.6.6 Example 2.31: Invertibility from a product

Suppose $A,B,C\in M_{n\times n}(\mathbb F)$ and

\[ ABC=I_n. \]

Then $A(BC)=I_n$, so $A$ has a right inverse. By the inverse matrix theorem, $A$ is invertible.

Also $(AB)C=I_n$, so $C$ has a left inverse. Hence $C$ is invertible.

Multiplying $ABC=I_n$ on the left by $A^{-1}$ gives

\[ BC=A^{-1}. \]

Thus

\[ B=A^{-1}C^{-1} \]

is invertible. Therefore all three matrices are invertible, with

\[ A^{-1}=BC,\qquad C^{-1}=AB,\qquad B^{-1}=CA. \]

2.6.7 Computing inverses

Theorem 2.32: Algorithm for computing $A^{-1}$

Let $A\in M_{n\times n}(\mathbb F)$. Form the augmented matrix

\[ [A\mid I_n]. \]

Use elementary row operations to compute its reduced row echelon form.

If $[A\mid I_n]$ row reduces to $[I_n\mid C]$, then \[ C=A^{-1}. \]
If the left block cannot be row reduced to $I_n$, then $A$ is not invertible.

2.6.8 Example 2.33: Finding an inverse

Let

\[ A= \begin{bmatrix} 1&1&1\\ 2&3&2\\ 3&8&2 \end{bmatrix}. \]

Row reducing $[A\mid I_3]$ gives

\[ \begin{bmatrix} 1&0&0&10&-6&1\\ 0&1&0&-2&1&0\\ 0&0&1&-7&5&-1 \end{bmatrix}. \]

Hence

\[ A^{-1}= \begin{bmatrix} 10&-6&1\\ -2&1&0\\ -7&5&-1 \end{bmatrix}. \]

Code

A = sp.Matrix([[1,1,1],
               [2,3,2],
               [3,8,2]])

A.inv()

$\displaystyle \left[\begin{matrix}10 & -6 & 1\\-2 & 1 & 0\\-7 & 5 & -1\end{matrix}\right]$

Theorem 2.34: Inverse of a $2\times 2$ matrix

The matrix

\[ A= \begin{bmatrix} a&b\\ c&d \end{bmatrix} \]

is invertible if and only if

\[ ad-bc\ne 0. \]

In that case,

\[ A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d&-b\\ -c&a \end{bmatrix}. \]

The scalar $ad-bc$ is the determinant of $A$.

2.6.9 Example 2.35: Fast inverse for a $2\times 2$ matrix

Let

\[ A= \begin{bmatrix} 4&7\\ 2&6 \end{bmatrix}. \]

Then

\[ \det(A)=4(6)-7(2)=10\ne 0. \]

\[ A^{-1} = \frac{1}{10} \begin{bmatrix} 6&-7\\ -2&4 \end{bmatrix}. \]

2.7 2.7 Transpose, symmetry, and rank

The transpose exchanges rows and columns. It also reveals a deep identity about dot products.

2.7.1 Transpose

Definition 2.36: Transpose

Given $A=[a_{ij}]\in M_{m\times n}(\mathbb F)$, the transpose $A^T$ is the $n\times m$ matrix whose $(i,j)$-entry is $a_{ji}$.

Theorem 2.37: Properties of transposition

Let $A$ and $B$ be matrices for which the indicated operations are defined, and let $r\in\mathbb F$. Then:

$(A^T)^T=A$.
$(A+B)^T=A^T+B^T$.
$(rA)^T=rA^T$.
$(AB)^T=B^TA^T$.

2.7.2 Example 2.38: Transpose of a product

Let

\[ A= \begin{bmatrix} 1&0&2\\ -1&3&1 \end{bmatrix}, \qquad B= \begin{bmatrix} 2&1\\ 0&-1\\ 4&2 \end{bmatrix}. \]

Instead of computing $(AB)^T$ directly, use

\[ (AB)^T=B^TA^T. \]

Code

A = np.array([[ 1,0,2],
              [-1,3,1]])

B = np.array([[2, 1],
              [0,-1],
              [4, 2]])

np.allclose((A @ B).T, B.T @ A.T)

True

Adjoint viewpoint

For real matrices, the transpose satisfies

\[ (A\mathbf x)\cdot \mathbf y = \mathbf x\cdot (A^T\mathbf y). \]

This identity is central in least squares, optimization, sensitivity analysis, and machine learning. It explains why transposes appear when computing gradients.

2.7.3 Rank inequalities

Rank measures the number of independent directions captured by a matrix.

Theorem 2.39: Rank of a product

If $AB$ is defined, then

\[ \operatorname{rank}(AB)\le \operatorname{rank}(A). \]

Theorem 2.40: Rank and transpose

For every matrix $A$,

\[ \operatorname{rank}(A)=\operatorname{rank}(A^T). \]

Theorem 2.41: Rank bound for matrix products

If $AB$ is defined, then

\[ \operatorname{rank}(AB) \le \min\{\operatorname{rank}(A),\operatorname{rank}(B)\}. \]

Proof idea

The columns of $AB$ are the vectors $A\mathbf b_j$, where $\mathbf b_j$ are the columns of $B$. Therefore all columns of $AB$ lie in the column space of $A$. Hence

\[ \operatorname{rank}(AB)\le \operatorname{rank}(A). \]

Using $(AB)^T=B^TA^T$ and $\operatorname{rank}(M)=\operatorname{rank}(M^T)$, one also obtains

\[ \operatorname{rank}(AB)\le \operatorname{rank}(B). \]

2.7.4 Example 2.42: Rank bottleneck

Let $A$ be a $100\times 5$ matrix and $B$ be a $5\times 100$ matrix. Then

\[ \operatorname{rank}(AB) \le \min\{\operatorname{rank}(A),\operatorname{rank}(B)\} \le 5. \]

Thus the $100\times 100$ matrix $AB$ has rank at most $5$, so it is not invertible.

Data science interpretation

A large matrix produced through a low-dimensional bottleneck is low rank. This is one reason low-rank factorizations appear in compression, recommendation systems, and neural network model reduction.

2.7.5 Symmetric matrices

Definition 2.43: Symmetric matrix

An $n\times n$ matrix $A$ is symmetric if

\[ A^T=A. \]

Equivalently, if $A=[a_{ij}]$, then

\[ a_{ij}=a_{ji} \]

for all $i,j$.

2.7.6 Example 2.44: Gram matrices are symmetric

For any real matrix $X$, the matrix $X^TX$ is symmetric because

\[ (X^TX)^T = X^T(X^T)^T = X^TX. \]

This observation is fundamental in least squares, covariance matrices, principal component analysis, and quadratic optimization.

Code

X = np.array([[1,2],
              [3,4],
              [5,6]])

G = X.T @ X
G

array([[35, 44],
       [44, 56]])

Code

np.allclose(G, G.T)

True

2.8 2.8 LU factorization and Gaussian elimination

Gaussian elimination can be saved and reused. LU factorization records elimination as a product

\[ A=LU. \]

Here $L$ records the row-operation multipliers, and $U$ is the echelon-form matrix.

2.8.1 Lower triangular matrices

Definition 2.45: Lower triangular matrix

An $m\times m$ matrix $L=[\ell_{ij}]$ is lower triangular if

\[ \ell_{ij}=0 \qquad \text{whenever } j>i. \]

It is unit lower triangular if, in addition,

\[ \ell_{ii}=1 \qquad \text{for every } i. \]

Remark

Products and inverses of unit lower triangular matrices are unit lower triangular. Elementary matrices of the form $E_{ij}(d)$ with $i>j$ are unit lower triangular.

2.8.2 LU factorization

Definition 2.46: LU factorization

Let $A\in M_{m\times n}(\mathbb F)$. An LU factorization of $A$ is a factorization

\[ A=LU, \]

where $L$ is a unit lower triangular $m\times m$ matrix and $U$ is an $m\times n$ matrix in echelon form.

Theorem 2.47: Algorithm for finding an LU factorization

Suppose $A$ can be transformed into echelon form using only row-replacement operations and no row swaps. Then an LU factorization may be found as follows:

Reduce $A$ to echelon form $U$ using only row-replacement operations.
Record the multipliers used in elimination in the corresponding entries of $L$.

Then

\[ A=LU, \]

where $L$ is unit lower triangular.

2.8.3 Example 2.48: Computing an LU factorization

Let

\[ A= \begin{bmatrix} 2&1&1\\ 4&-6&0\\ -2&7&2 \end{bmatrix}. \]

Eliminate entries below the first pivot:

\[ R_2\leftarrow R_2-2R_1, \qquad R_3\leftarrow R_3+R_1. \]

Then eliminate below the second pivot:

\[ R_3\leftarrow R_3+R_2. \]

This gives

\[ U= \begin{bmatrix} 2&1&1\\ 0&-8&-2\\ 0&0&1 \end{bmatrix}. \]

The multipliers are

\[ m_{21}=2,\qquad m_{31}=-1,\qquad m_{32}=-1. \]

Thus

\[ L= \begin{bmatrix} 1&0&0\\ 2&1&0\\ -1&-1&1 \end{bmatrix}. \]

Indeed,

\[ A=LU. \]

Code

L = np.array([[ 1, 0, 0],
              [ 2, 1, 0],
              [-1,-1, 1]])

U = np.array([[2, 1, 1],
              [0,-8,-2],
              [0, 0, 1]])

L @ U

array([[ 2,  1,  1],
       [ 4, -6,  0],
       [-2,  7,  2]])

2.8.4 Solving systems with LU

If $A=LU$, then

\[ A\mathbf x=\mathbf b \quad\Longleftrightarrow\quad L(U\mathbf x)=\mathbf b. \]

Set

\[ \mathbf y=U\mathbf x. \]

Then solve two triangular systems:

\[ L\mathbf y=\mathbf b \]

by forward substitution, and then

\[ U\mathbf x=\mathbf y \]

by back substitution.

2.8.5 Example 2.49: Solving with LU

Use the LU factorization from the previous example to solve

\[ A\mathbf x= \begin{bmatrix} 1\\2\\3 \end{bmatrix}. \]

First solve

\[ \begin{bmatrix} 1&0&0\\ 2&1&0\\ -1&-1&1 \end{bmatrix} \begin{bmatrix} y_1\\y_2\\y_3 \end{bmatrix} = \begin{bmatrix} 1\\2\\3 \end{bmatrix}. \]

This gives

\[ y_1=1,\qquad 2y_1+y_2=2,\qquad -y_1-y_2+y_3=3. \]

\[ \mathbf y= \begin{bmatrix} 1\\0\\4 \end{bmatrix}. \]

Next solve

\[ \begin{bmatrix} 2&1&1\\ 0&-8&-2\\ 0&0&1 \end{bmatrix} \begin{bmatrix} x_1\\x_2\\x_3 \end{bmatrix} = \begin{bmatrix} 1\\0\\4 \end{bmatrix}. \]

Back substitution gives

\[ x_3=4,\qquad -8x_2-8=0,\qquad 2x_1-1+4=1. \]

Therefore

\[ x_2=-1,\qquad x_1=-1. \]

Thus

\[ \mathbf x= \begin{bmatrix} -1\\-1\\4 \end{bmatrix}. \]

Code

b = np.array([1,2,3])

# Direct check:
A = L @ U
np.linalg.solve(A, b)

array([-1., -1.,  4.])

Pivoting

Not every matrix has an LU factorization without row swaps. For example,

\[ \begin{bmatrix} 0&1\\ 1&0 \end{bmatrix} \]

requires a row interchange before elimination.

In practice, one often uses

\[ PA=LU, \]

where $P$ is a permutation matrix. Pivoting is not merely a technical detail; it is essential for numerical reliability.

2.9 2.9 Python computation: matrix algebra checklist

This section shows how to use Python to check the main computations of the chapter.

2.9.1 Matrix products and powers

Code

A = np.array([[1,2],
              [3,4]])

B = np.array([[1,1],
              [0,1]])

print("AB =")
print(A @ B)

print("BA =")
print(B @ A)

print("A^3 =")
print(np.linalg.matrix_power(A, 3))

AB =
[[1 3]
 [3 7]]
BA =
[[4 6]
 [3 4]]
A^3 =
[[ 37  54]
 [ 81 118]]

2.9.2 Rank and inverse

Code

A = np.array([[1,2,3],
              [2,4,6],
              [0,1,1]], dtype=float)

print("rank(A) =", np.linalg.matrix_rank(A))

rank(A) = 2

Code

A = np.array([[4,7],
              [2,6]], dtype=float)

print("det(A) =", np.linalg.det(A))
print("A inverse =")
print(np.linalg.inv(A))

det(A) = 10.000000000000002
A inverse =
[[ 0.6 -0.7]
 [-0.2  0.4]]

2.9.3 Exact symbolic computation with SymPy

Code

A = sp.Matrix([[1,1,1],
               [2,3,2],
               [3,8,2]])

A_inv = A.inv()
A_inv

$\displaystyle \left[\begin{matrix}10 & -6 & 1\\-2 & 1 & 0\\-7 & 5 & -1\end{matrix}\right]$

Code

A * A_inv

$\displaystyle \left[\begin{matrix}1 & 0 & 0\\0 & 1 & 0\\0 & 0 & 1\end{matrix}\right]$

2.9.4 LU factorization

Code

A = sp.Matrix([[2,1,1],
               [4,-6,0],
               [-2,7,2]])

L, U, perm = A.LUdecomposition()
L, U, perm

(Matrix([
 [ 1,  0, 0],
 [ 2,  1, 0],
 [-1, -1, 1]]),
 Matrix([
 [2,  1,  1],
 [0, -8, -2],
 [0,  0,  1]]),
 [])

Numerical note

NumPy is excellent for fast numerical computation. SymPy is useful for exact symbolic computation. In applied linear algebra, both are useful, but they answer slightly different questions.

NumPy: fast floating-point computation.
SymPy: exact algebraic computation.

2.10 2.10 Classical challenge questions

These questions are designed for graduate-level discussion. They are not mainly long computations. They ask you to connect matrix algebra with structure, algorithms, modeling, and numerical reasoning.

Challenge 2.1: Matrix multiplication as composition

Matrix multiplication is associative but usually not commutative. Explain why this is natural if matrices are viewed as linear transformations. Give one practical situation where the order of two operations should matter.

Challenge 2.2: Information loss and invertibility

A matrix maps input vectors to output vectors. Interpret invertibility as “no information is lost.” How does this interpretation relate to pivots, rank, null space, and solving $A\mathbf x=\mathbf b$?

Challenge 2.3: Rank as effective dimension

In data science, the columns of a matrix may represent features or signals. Explain why rank can be interpreted as the number of independent directions contained in the data. What does it mean if a large data matrix has low rank?

Challenge 2.4: Gram matrices

For a data matrix $A$, the matrix $A^TA$ records pairwise dot products among the columns of $A$. What geometric information is contained in $A^TA$? What information about $A$ is lost when replacing $A$ by $A^TA$?

Challenge 2.5: Transpose and adjoint viewpoint

The transpose $A^T$ can be interpreted as the matrix that moves a dot product from one side to the other:

\[ (A\mathbf x)\cdot\mathbf y=\mathbf x\cdot(A^T\mathbf y). \]

Explain why this identity is important in least squares, optimization, and sensitivity analysis.

Challenge 2.6: LU factorization and reusable computation

LU factorization separates Gaussian elimination into two triangular solves. Why is this valuable when the same matrix $A$ appears in many systems

\[ A\mathbf x_1=\mathbf b_1,\quad A\mathbf x_2=\mathbf b_2,\quad \ldots \]

with different right-hand sides?

Challenge 2.7: Pivoting and numerical reliability

The identity $A=LU$ may fail without row exchanges, and even when it exists it may be numerically unstable. Explain why row pivoting is not just a technical detail but an important part of reliable computation.

Challenge 2.8: Block matrix thinking

Large matrices are often divided into blocks. Explain how block multiplication helps organize problems in networks, multivariable systems, machine learning, or scientific computing. What is gained by thinking at the block level instead of only at the entry level?

Challenge 2.9: From algebra to algorithms

Two formulas can be mathematically identical but computationally very different. For example,

\[ (AB)\mathbf x \qquad\text{and}\qquad A(B\mathbf x) \]

give the same result, but may require very different work. Explain why computational cost should influence how we write and implement matrix expressions.

2.11 2.11 Practice problems

2.11.1 Problem 2.1

Let

\[ A= \begin{bmatrix} 1&2&0\\ -1&3&4 \end{bmatrix}, \qquad B= \begin{bmatrix} 2&-1&5\\ 0&1&-2 \end{bmatrix}. \]

Compute $3A-2B$.

2.11.2 Problem 2.2

Let

\[ A= \begin{bmatrix} 1&0&2\\ -1&3&1 \end{bmatrix}, \qquad \mathbf x= \begin{bmatrix} 2\\-1\\4 \end{bmatrix}. \]

Compute $A\mathbf x$ using both the row interpretation and the column interpretation.

2.11.3 Problem 2.3

Let

\[ A= \begin{bmatrix} 1&2\\ 0&1\\ 3&-1 \end{bmatrix}, \qquad B= \begin{bmatrix} 2&0&1\\ -1&4&3 \end{bmatrix}. \]

Compute $AB$, and determine whether $BA$ is defined.

2.11.4 Problem 2.4

Find all matrices

\[ X= \begin{bmatrix} a&b\\ c&d \end{bmatrix} \]

such that

\[ X \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix} = \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix} X. \]

2.11.5 Problem 2.5

Let

\[ A= \begin{bmatrix} 1&2&3\\ 0&1&4\\ 0&0&2 \end{bmatrix}. \]

Use the inverse matrix theorem to decide whether $A$ is invertible.

2.11.6 Problem 2.6

Compute the inverse of

\[ A= \begin{bmatrix} 3&5\\ 1&2 \end{bmatrix}. \]

2.11.7 Problem 2.7

Let $A$ be a $50\times 8$ matrix and $B$ be an $8\times 50$ matrix. Use rank inequalities to explain why $AB$ cannot be invertible.

2.11.8 Problem 2.8

Let

\[ A= \begin{bmatrix} 1&2&1\\ 2&5&3\\ 1&0&8 \end{bmatrix}. \]

Find an LU factorization of $A$ without row swaps.

2.12 2.12 Solutions to practice problems

2.12.1 Solution to Problem 2.1

\[ 3A= \begin{bmatrix} 3&6&0\\ -3&9&12 \end{bmatrix}, \qquad 2B= \begin{bmatrix} 4&-2&10\\ 0&2&-4 \end{bmatrix}. \]

Therefore

\[ 3A-2B= \begin{bmatrix} -1&8&-10\\ -3&7&16 \end{bmatrix}. \]

2.12.2 Solution to Problem 2.2

Using rows,

\[ A\mathbf x= \begin{bmatrix} 1(2)+0(-1)+2(4)\\ -1(2)+3(-1)+1(4) \end{bmatrix} = \begin{bmatrix} 10\\ -1 \end{bmatrix}. \]

Using columns,

\[ A\mathbf x= 2 \begin{bmatrix} 1\\-1 \end{bmatrix} - \begin{bmatrix} 0\\3 \end{bmatrix} + 4 \begin{bmatrix} 2\\1 \end{bmatrix} = \begin{bmatrix} 10\\-1 \end{bmatrix}. \]

2.12.3 Solution to Problem 2.3

\[ AB= \begin{bmatrix} 0&8&7\\ -1&4&3\\ 7&-4&0 \end{bmatrix}. \]

Since $B$ is $2\times 3$ and $A$ is $3\times 2$, the product $BA$ is also defined and has size $2\times 2$.

2.12.4 Solution to Problem 2.4

Let

\[ N= \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix}. \]

Then

\[ XN= \begin{bmatrix} a&a+b\\ c&c+d \end{bmatrix}, \qquad NX= \begin{bmatrix} a+c&b+d\\ c&d \end{bmatrix}. \]

Equating entries gives

\[ c=0,\qquad a=d. \]

Thus

\[ X= \begin{bmatrix} a&b\\ 0&a \end{bmatrix}. \]

2.12.5 Solution to Problem 2.5

The matrix $A$ is upper triangular with diagonal entries $1,1,2$. Since all diagonal entries are nonzero, it has a pivot in every column. Thus

\[ \operatorname{rank}(A)=3. \]

By the inverse matrix theorem, $A$ is invertible.

2.12.6 Solution to Problem 2.6

\[ \det(A)=3(2)-5(1)=1. \]

Therefore

\[ A^{-1} = \begin{bmatrix} 2&-5\\ -1&3 \end{bmatrix}. \]

2.12.7 Solution to Problem 2.7

By the rank inequality,

\[ \operatorname{rank}(AB) \le \min\{\operatorname{rank}(A),\operatorname{rank}(B)\} \le 8. \]

But $AB$ is $50\times 50$. An invertible $50\times 50$ matrix must have rank $50$. Since

\[ \operatorname{rank}(AB)\le 8<50, \]

the matrix $AB$ cannot be invertible.

2.12.8 Solution to Problem 2.8

Use elimination:

\[ A= \begin{bmatrix} 1&2&1\\ 2&5&3\\ 1&0&8 \end{bmatrix}. \]

First,

\[ R_2\leftarrow R_2-2R_1, \qquad R_3\leftarrow R_3-R_1, \]

which gives

\[ \begin{bmatrix} 1&2&1\\ 0&1&1\\ 0&-2&7 \end{bmatrix}. \]

Then

\[ R_3\leftarrow R_3+2R_2, \]

which gives

\[ U= \begin{bmatrix} 1&2&1\\ 0&1&1\\ 0&0&9 \end{bmatrix}. \]

The multipliers are

\[ m_{21}=2,\qquad m_{31}=1,\qquad m_{32}=-2. \]

Hence

\[ L= \begin{bmatrix} 1&0&0\\ 2&1&0\\ 1&-2&1 \end{bmatrix}. \]

Therefore

\[ A=LU = \begin{bmatrix} 1&0&0\\ 2&1&0\\ 1&-2&1 \end{bmatrix} \begin{bmatrix} 1&2&1\\ 0&1&1\\ 0&0&9 \end{bmatrix}. \]

2.13 2.13 AI companion activities

2.13.1 Activity 2.1: Check a matrix product

Ask an AI tool:

Compute $AB$ for \[ A=\begin{bmatrix}1&2\\3&4\end{bmatrix}, \qquad B=\begin{bmatrix}1&1\\0&1\end{bmatrix}. \] Explain why $AB\ne BA$.

Then verify the answer by hand and by Python.

2.13.2 Activity 2.2: Critique an explanation of invertibility

Ask an AI tool:

Explain the inverse matrix theorem in simple language.

Then critique the explanation. Does it mention rank? pivots? the null space? uniqueness of solutions? row reduction?

2.13.3 Activity 2.3: Compare two computational expressions

Ask an AI tool:

Why is $A(B\mathbf x)$ often faster than $(AB)\mathbf x$?

Then test a large example in Python.

Code

# Optional timing experiment
import time

m, n, p = 800, 600, 400
A = np.random.randn(m, n)
B = np.random.randn(n, p)
x = np.random.randn(p)

t0 = time.time()
y1 = (A @ B) @ x
t1 = time.time()

t2 = time.time()
y2 = A @ (B @ x)
t3 = time.time()

print("Time for (A @ B) @ x:", t1 - t0)
print("Time for A @ (B @ x):", t3 - t2)
print("Same answer?", np.allclose(y1, y2))

Time for (A @ B) @ x: 0.04521799087524414
Time for A @ (B @ x): 0.013590097427368164
Same answer? True

Main message of Chapter 2

Matrix algebra is not just a collection of rules.

It is the language of:

transforming vectors;
composing processes;
detecting information loss;
measuring effective dimension;
reusing computation;
organizing large-scale algorithms.

The rest of the book will deepen this story through subspaces, orthogonality, projections, eigenvalues, SVD, Schur decomposition, Fourier and Haar bases, and modern data science.

--- title: "Chapter 2. Matrices as Machines: Algebra, Inverses, Rank, and LU" subtitle: "From matrix arithmetic to reusable computation" format: html: toc: true toc-depth: 3 number-sections: true code-fold: show code-tools: true html-math-method: mathjax pdf: toc: true number-sections: true jupyter: python3 --- > **Guiding question.** > Once a system has been written as $A\mathbf{x}=\mathbf b$, how do matrices behave as algebraic objects, geometric machines, and computational tools? In Chapter 1 we learned that a linear system is not only a list of equations. It is a matrix equation, a geometric intersection problem, and a computational task. Now we study the object that carries all the information: $$ A. $$ A matrix can be viewed in several ways: - as a rectangular table of numbers; - as a data array; - as a collection of column vectors; - as a machine acting on vectors; - as a linear transformation; - as a compressed record of row operations; - as a computational object whose structure affects speed and stability. This chapter develops the basic algebra of matrices: addition, scalar multiplication, matrix-vector products, matrix multiplication, elementary matrices, inverses, transposes, rank, and LU factorization. The purpose is not only to learn rules. The purpose is to understand why the rules matter. ::: {.callout-note title="How to read this chapter"} Matrix algebra has two levels. At the **entry level**, we compute with entries $a_{ij}$. At the **structural level**, we ask what the matrix does to vectors, spaces, data, and algorithms. Both levels are important. Computation without structure is mechanical. Structure without computation is incomplete. ::: ::: {.callout-tip title="AI and coding companion"} Use Python or AI tools to check matrix products, inverses, ranks, and LU factorizations. But always ask: 1. Are the matrix sizes compatible? 2. What does the computation mean geometrically? 3. Is the formula numerically reliable? 4. Can the same expression be written in a faster way? ::: ## 2.1 Matrix addition and scalar multiplication Matrices first behave like vectors. We can add matrices of the same size, and we can multiply a matrix by a scalar. ::: {.callout-note title="Definition 2.1: Matrix addition and scalar multiplication"} Let $A=[a_{ij}]$ and $B=[b_{ij}]$ be $m\times n$ matrices over a field $\mathbb F$, and let $c\in\mathbb F$. The **sum** $A+B$ is the $m\times n$ matrix $$ A+B=[a_{ij}+b_{ij}]. $$ The **scalar multiple** $cA$ is the $m\times n$ matrix $$ cA=[ca_{ij}]. $$ These operations are defined only when the matrix sizes are compatible. ::: For example, $$ \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} + \begin{bmatrix} b_{11} & b_{12}\\ b_{21} & b_{22} \end{bmatrix} = \begin{bmatrix} a_{11}+b_{11} & a_{12}+b_{12}\\ a_{21}+b_{21} & a_{22}+b_{22} \end{bmatrix}. $$ Also, $$ c \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} = \begin{bmatrix} ca_{11} & ca_{12}\\ ca_{21} & ca_{22} \end{bmatrix}. $$ ::: {.callout-important title="Theorem 2.2: Algebraic rules for matrix addition"} Let $A,B,C\in M_{m\times n}(\mathbb F)$, and let $r,s\in\mathbb F$. Then: 1. $A+B=B+A$. 2. $(A+B)+C=A+(B+C)$. 3. $A+0=A$. 4. $A+(-A)=0$. 5. $r(A+B)=rA+rB$. 6. $(r+s)A=rA+sA$. 7. $r(sA)=(rs)A$. 8. $1A=A$. Therefore $M_{m\times n}(\mathbb F)$, the set of all $m\times n$ matrices over $\mathbb F$, is a vector space. ::: ::: {.callout-note title="Proof idea"} Each identity follows from the corresponding identity in the field $\mathbb F$, applied entry by entry. For example, the $(i,j)$-entry of $r(A+B)$ is $$ r(a_{ij}+b_{ij})=ra_{ij}+rb_{ij}, $$ which is the $(i,j)$-entry of $rA+rB$. ::: ### Example 2.3: Combining data arrays Let $$ A= \begin{bmatrix} 1 & -2 & 0\\ 3 & 4 & 1 \end{bmatrix}, \qquad B= \begin{bmatrix} 2 & 1 & 5\\ -1 & 0 & 3 \end{bmatrix}. $$ Compute $2A-3B$. We have $$ 2A= \begin{bmatrix} 2 & -4 & 0\\ 6 & 8 & 2 \end{bmatrix}, \qquad 3B= \begin{bmatrix} 6 & 3 & 15\\ -3 & 0 & 9 \end{bmatrix}. $$ Therefore $$ 2A-3B= \begin{bmatrix} -4 & -7 & -15\\ 9 & 8 & -7 \end{bmatrix}. $$ ```{python} import numpy as np A = np.array([[1, -2, 0], [3, 4, 1]]) B = np.array([[ 2, 1, 5], [-1, 0, 3]]) 2*A - 3*B ``` ::: {.callout-tip title="Graduate viewpoint: affine combinations"} Suppose $A_1,A_2,A_3\in M_{m\times n}(\mathbb R)$ represent three model outputs, and let $$ W=0.2A_1+0.3A_2+0.5A_3. $$ Because matrix addition and scalar multiplication are entrywise, $$ w_{ij}=0.2(a_1)_{ij}+0.3(a_2)_{ij}+0.5(a_3)_{ij}. $$ Thus $W$ is an entrywise weighted average of the three model outputs. This is the matrix form of ensemble averaging. ::: ## 2.2 Vectors, dot products, and matrix-vector products The product $A\mathbf x$ is the bridge between matrix algebra and linear systems. It has two equally important interpretations: 1. a linear combination of columns; 2. a list of dot products with rows. ### Dot products ::: {.callout-note title="Definition 2.4: Dot product"} For vectors $$ \mathbf u= \begin{bmatrix} u_1\\u_2\\ \vdots\\u_n \end{bmatrix}, \qquad \mathbf v= \begin{bmatrix} v_1\\v_2\\ \vdots\\v_n \end{bmatrix} \in \mathbb F^n, $$ the **dot product** is $$ \mathbf u\cdot \mathbf v = u_1v_1+u_2v_2+\cdots+u_nv_n. $$ For real vectors this is the standard Euclidean dot product. For complex vectors, the standard Hermitian inner product uses conjugation and will be discussed later. ::: ### Example 2.5: Dot product as a score Let $$ \mathbf w=(0.2,0.5,0.3)^T, \qquad \mathbf x=(80,90,70)^T. $$ Then $$ \mathbf w\cdot \mathbf x = 0.2(80)+0.5(90)+0.3(70) = 82. $$ The dot product gives a weighted score. ```{python} w = np.array([0.2, 0.5, 0.3]) x = np.array([80, 90, 70]) w @ x ``` ### Matrix-vector products ::: {.callout-note title="Definition 2.6: Matrix-vector product"} Let $A\in M_{m\times n}(\mathbb F)$ have columns $$ A=[\mathbf a_1\ \mathbf a_2\ \cdots\ \mathbf a_n], $$ and let $$ \mathbf x= \begin{bmatrix} x_1\\x_2\\ \vdots\\x_n \end{bmatrix} \in\mathbb F^n. $$ The product $A\mathbf x\in\mathbb F^m$ is defined by $$ A\mathbf x = x_1\mathbf a_1+x_2\mathbf a_2+\cdots+x_n\mathbf a_n. $$ Equivalently, if $R_i$ is the $i$-th row of $A$, then $$ A\mathbf x= \begin{bmatrix} R_1\cdot \mathbf x\\ R_2\cdot \mathbf x\\ \vdots\\ R_m\cdot \mathbf x \end{bmatrix}. $$ ::: ::: {.callout-note title="Definition 2.7: Linear combination"} A vector $\mathbf b\in\mathbb F^m$ is a **linear combination** of vectors $\mathbf v_1,\ldots,\mathbf v_n\in\mathbb F^m$ if there exist scalars $x_1,\ldots,x_n\in\mathbb F$ such that $$ \mathbf b=x_1\mathbf v_1+\cdots+x_n\mathbf v_n. $$ ::: ::: {.callout-important title="Theorem 2.8: Equivalent forms of a linear system"} Let $$ A=[\mathbf a_1\ \mathbf a_2\ \cdots\ \mathbf a_n]\in M_{m\times n}(\mathbb F), \qquad \mathbf b\in\mathbb F^m. $$ The following have the same solution set: 1. the matrix equation $A\mathbf x=\mathbf b$; 2. the vector equation $$ x_1\mathbf a_1+\cdots+x_n\mathbf a_n=\mathbf b; $$ 3. the linear system with augmented matrix $[A\mid \mathbf b]$. ::: ::: {.callout-important title="Theorem 2.9: Linearity of $A\mathbf x$"} Let $A\in M_{m\times n}(\mathbb F)$, let $\mathbf u,\mathbf v\in\mathbb F^n$, and let $c\in\mathbb F$. Then $$ A(\mathbf u+\mathbf v)=A\mathbf u+A\mathbf v, \qquad A(c\mathbf u)=cA\mathbf u. $$ Thus the map $$ T_A:\mathbb F^n\to\mathbb F^m,\qquad T_A(\mathbf x)=A\mathbf x, $$ is linear. ::: ### Example 2.10: The column interpretation Let $$ A= \begin{bmatrix} 1 & 2 & -1\\ 0 & 3 & 4\\ 2 & -1 & 1 \end{bmatrix}, \qquad \mathbf x= \begin{bmatrix} 2\\-1\\3 \end{bmatrix}. $$ Using columns, $$ A\mathbf x = 2 \begin{bmatrix} 1\\0\\2 \end{bmatrix} - \begin{bmatrix} 2\\3\\-1 \end{bmatrix} + 3 \begin{bmatrix} -1\\4\\1 \end{bmatrix} = \begin{bmatrix} -3\\9\\8 \end{bmatrix}. $$ ```{python} A = np.array([[1, 2, -1], [0, 3, 4], [2,-1, 1]]) x = np.array([2, -1, 3]) A @ x ``` ### Example 2.11: One step of a linear dynamical system Suppose a system state $\mathbf x_t\in\mathbb R^3$ evolves by $$ \mathbf x_{t+1}=A\mathbf x_t, $$ where $$ A= \begin{bmatrix} 0.8&0.1&0\\ 0.2&0.7&0.1\\ 0&0.2&0.9 \end{bmatrix}, \qquad \mathbf x_0= \begin{bmatrix} 100\\50\\20 \end{bmatrix}. $$ Then $$ \mathbf x_1=A\mathbf x_0= \begin{bmatrix} 85\\57\\28 \end{bmatrix}. $$ ```{python} A = np.array([[0.8, 0.1, 0.0], [0.2, 0.7, 0.1], [0.0, 0.2, 0.9]]) x0 = np.array([100, 50, 20]) x1 = A @ x0 x1 ``` ::: {.callout-tip title="Story interpretation"} A matrix-vector product is not just a formula. It is one step of a process. - A population evolves. - A signal is filtered. - A data point is embedded. - A neural network layer transforms features. - A Markov chain updates probabilities. ::: ## 2.3 Matrix multiplication Matrix multiplication is composition of matrix machines. If $B$ sends an input vector to an intermediate vector, and $A$ acts next, then the combined transformation is $AB$. $$ \mathbf x \overset{B}{\longmapsto} B\mathbf x \overset{A}{\longmapsto} A(B\mathbf x). $$ The matrix for the combined machine is $AB$. ### Definition and row-column rule ::: {.callout-note title="Definition 2.12: Matrix product"} Let $A\in M_{m\times n}(\mathbb F)$, and let $B\in M_{n\times p}(\mathbb F)$ have columns $$ B=[\mathbf b_1\ \mathbf b_2\ \cdots\ \mathbf b_p]. $$ The product $AB$ is the $m\times p$ matrix $$ AB=[A\mathbf b_1\ A\mathbf b_2\ \cdots\ A\mathbf b_p]. $$ If the number of columns of $A$ does not equal the number of rows of $B$, then $AB$ is not defined. ::: If $A=[a_{ij}]$ is $m\times n$ and $B=[b_{ij}]$ is $n\times p$, then the $(i,j)$-entry of $AB$ is $$ (AB)_{ij} = \sum_{k=1}^n a_{ik}b_{kj}. $$ This is the dot product of the $i$-th row of $A$ and the $j$-th column of $B$. ### Example 2.13: Computing a matrix product Let $$ A= \begin{bmatrix} -3 & 5\\ 4 & 2\\ 1 & -5 \end{bmatrix}, \qquad B= \begin{bmatrix} 2 & -4\\ -4 & 1 \end{bmatrix}. $$ Then $$ AB= \begin{bmatrix} -26 & 17\\ 0 & -14\\ 22 & -9 \end{bmatrix}. $$ ```{python} A = np.array([[-3, 5], [ 4, 2], [ 1, -5]]) B = np.array([[ 2, -4], [-4, 1]]) A @ B ``` ### Algebraic rules and non-rules ::: {.callout-important title="Theorem 2.14: Properties of matrix multiplication"} Let $A,B,C$ be matrices over $\mathbb F$, with sizes such that the indicated operations are defined, and let $r\in\mathbb F$. Then: 1. $A(BC)=(AB)C$. 2. $A(B+C)=AB+AC$. 3. $(A+B)C=AC+BC$. 4. $r(AB)=(rA)B=A(rB)$. 5. $I_mA=A=AI_n$ when $A$ is $m\times n$. ::: ::: {.callout-warning title="Non-properties"} The following familiar properties of real numbers do not generally hold for matrices: 1. Even if $AB$ and $BA$ are both defined, usually $AB\ne BA$. 2. If $AB=AC$, it does not necessarily follow that $B=C$. 3. If $AB=0$, it does not necessarily follow that $A=0$ or $B=0$. ::: ### Example 2.15: Noncommutativity Let $$ A= \begin{bmatrix} 1&2\\ 3&4 \end{bmatrix}, \qquad B= \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix}. $$ Then $$ AB= \begin{bmatrix} 1&3\\ 3&7 \end{bmatrix}, \qquad BA= \begin{bmatrix} 4&6\\ 3&4 \end{bmatrix}. $$ Thus $AB\ne BA$. ```{python} A = np.array([[1,2], [3,4]]) B = np.array([[1,1], [0,1]]) A @ B, B @ A ``` ::: {.callout-tip title="Why order matters"} If matrices are machines, then order should matter. Rotating an object and then projecting it is usually not the same as projecting it and then rotating it. In data science, normalizing data and then applying a model may produce a different result from applying a model and then normalizing. ::: ### Example 2.16: Solving a commutation equation Find all $2\times 2$ real matrices $X$ such that $$ XA=AX, \qquad A= \begin{bmatrix} 2&3\\ -3&2 \end{bmatrix}. $$ Write $$ X= \begin{bmatrix} a&b\\ c&d \end{bmatrix}. $$ Then $$ XA= \begin{bmatrix} 2a-3b & 3a+2b\\ 2c-3d & 3c+2d \end{bmatrix}, $$ and $$ AX= \begin{bmatrix} 2a+3c & 2b+3d\\ -3a+2c & -3b+2d \end{bmatrix}. $$ Equating entries gives $$ c=-b,\qquad d=a. $$ Therefore $$ X= \begin{bmatrix} a&b\\ -b&a \end{bmatrix} = aI+ b \begin{bmatrix} 0&1\\ -1&0 \end{bmatrix}. $$ These are exactly the real matrices representing multiplication by a complex number $a-bi$. ```{python} import sympy as sp a,b,c,d = sp.symbols("a b c d") X = sp.Matrix([[a,b],[c,d]]) A = sp.Matrix([[2,3],[-3,2]]) eq_matrix = X*A - A*X sp.solve(list(eq_matrix), [a,b,c,d], dict=True) ``` ## 2.4 Powers of matrices Powers of a square matrix describe repeated application of the same transformation. ::: {.callout-note title="Definition 2.17: Matrix powers"} If $A\in M_{n\times n}(\mathbb F)$ and $k\ge 1$, then $$ A^k=\underbrace{A\cdot A\cdots A}_{k\text{ factors}}. $$ We also define $$ A^0=I_n. $$ ::: ### Example 2.18: Four common power patterns **Idempotent matrix.** $$ P= \begin{bmatrix} 1/2&1/2\\ 1/2&1/2 \end{bmatrix} $$ satisfies $P^2=P$, so $P^k=P$ for all $k\ge 1$. **Shear matrix.** $$ N= \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix} $$ satisfies $$ N^k= \begin{bmatrix} 1&k\\ 0&1 \end{bmatrix}. $$ **Nilpotent matrix.** $$ T= \begin{bmatrix} 0&1&2\\ 0&0&3\\ 0&0&0 \end{bmatrix} $$ satisfies $T^3=0$. **Diagonal matrix.** $$ D= \begin{bmatrix} 1&0&0\\ 0&2&0\\ 0&0&3 \end{bmatrix} $$ satisfies $$ D^k= \begin{bmatrix} 1&0&0\\ 0&2^k&0\\ 0&0&3^k \end{bmatrix}. $$ ```{python} P = np.array([[0.5,0.5], [0.5,0.5]]) N = np.array([[1,1], [0,1]]) T = np.array([[0,1,2], [0,0,3], [0,0,0]]) D = np.diag([1,2,3]) print("P^2 =\n", P @ P) print("N^5 =\n", np.linalg.matrix_power(N, 5)) print("T^3 =\n", np.linalg.matrix_power(T, 3)) print("D^4 =\n", np.linalg.matrix_power(D, 4)) ``` ### Example 2.19: Use in Markov chains Let $$ P= \begin{bmatrix} 0.9&0.2\\ 0.1&0.8 \end{bmatrix}, \qquad \mathbf x_0= \begin{bmatrix} 0.6\\0.4 \end{bmatrix}. $$ Then the state after two steps is $$ \mathbf x_2=P^2\mathbf x_0. $$ First, $$ P^2= \begin{bmatrix} 0.83&0.34\\ 0.17&0.66 \end{bmatrix}. $$ Therefore $$ \mathbf x_2= \begin{bmatrix} 0.634\\0.366 \end{bmatrix}. $$ ```{python} P = np.array([[0.9,0.2], [0.1,0.8]]) x0 = np.array([0.6,0.4]) x2 = np.linalg.matrix_power(P, 2) @ x0 x2 ``` ## 2.5 Elementary matrices and block multiplication Gaussian elimination can be written as matrix multiplication. This is a major conceptual step: row operations are themselves matrices. ### Elementary matrices ::: {.callout-note title="Definition 2.20: Elementary matrices"} An **elementary matrix** is a matrix obtained from the identity matrix by performing exactly one elementary row operation. The three types are: 1. $E_{ij}$: switch rows $i$ and $j$; 2. $E_i(c)$: multiply row $i$ by a nonzero scalar $c$; 3. $E_{ij}(d)$: add $d$ times row $j$ to row $i$. ::: For $3\times 3$ matrices, $$ E_{12}= \begin{bmatrix} 0&1&0\\ 1&0&0\\ 0&0&1 \end{bmatrix}, \qquad E_2(k)= \begin{bmatrix} 1&0&0\\ 0&k&0\\ 0&0&1 \end{bmatrix}, \qquad E_{21}(k)= \begin{bmatrix} 1&0&0\\ k&1&0\\ 0&0&1 \end{bmatrix}. $$ ::: {.callout-important title="Proposition 2.21: Elementary matrices act by row operations"} Multiplying a matrix $A$ on the left by an elementary matrix performs the corresponding elementary row operation on $A$. ::: ### Example 2.22: Using elementary matrices Let $$ A= \begin{bmatrix} 1&2\\ 3&4\\ 5&6 \end{bmatrix}, \qquad E=E_{31}(-5)= \begin{bmatrix} 1&0&0\\ 0&1&0\\ -5&0&1 \end{bmatrix}. $$ Then $$ EA= \begin{bmatrix} 1&2\\ 3&4\\ 0&-4 \end{bmatrix}, $$ which is the result of the row operation $$ R_3\leftarrow R_3-5R_1. $$ ```{python} A = np.array([[1,2], [3,4], [5,6]]) E = np.array([[ 1,0,0], [ 0,1,0], [-5,0,1]]) E @ A ``` ### Block multiplication Block multiplication lets us multiply large matrices by treating compatible submatrices as entries. Suppose $$ A= \begin{bmatrix} A_1&A_2\\ A_3&A_4 \end{bmatrix}, \qquad B= \begin{bmatrix} B_1&B_2\\ B_3&B_4 \end{bmatrix}, $$ and all products below are defined. Then $$ AB= \begin{bmatrix} A_1B_1+A_2B_3 & A_1B_2+A_2B_4\\ A_3B_1+A_4B_3 & A_3B_2+A_4B_4 \end{bmatrix}. $$ ### Example 2.23: Block multiplication for a linear model Let $$ A= \begin{bmatrix} I&X\\ 0&I \end{bmatrix}, \qquad B= \begin{bmatrix} Y&0\\ Z&I \end{bmatrix}. $$ Then $$ AB= \begin{bmatrix} Y+XZ&X\\ Z&I \end{bmatrix}. $$ This calculation appears in optimization, statistics, numerical linear algebra, and systems theory when variables are separated into blocks. ## 2.6 Inverse matrices An inverse matrix reverses a matrix machine. If $$ A\mathbf x=\mathbf y, $$ then $A^{-1}$ recovers the input: $$ \mathbf x=A^{-1}\mathbf y. $$ Invertibility means no information is lost. ### Definition and basic properties ::: {.callout-note title="Definition 2.24: Invertible matrix"} An $n\times n$ matrix $A$ is **invertible** if there exists an $n\times n$ matrix $B$ such that $$ AB=BA=I_n. $$ In this case, $B$ is called the **inverse** of $A$ and is denoted by $A^{-1}$. ::: ::: {.callout-important title="Proposition 2.25: Uniqueness of inverse"} If a matrix $A$ is invertible, then its inverse is unique. ::: **Proof.** If $B$ and $C$ are both inverses of $A$, then $$ B=BI=B(AC)=(BA)C=IC=C. $$ Thus $B=C$. $\square$ ::: {.callout-important title="Theorem 2.26: Basic inverse rules"} Let $A$ and $B$ be $n\times n$ matrices. 1. If $A$ is invertible, then $A^{-1}$ is invertible and $$ (A^{-1})^{-1}=A. $$ 2. If $A$ is invertible, then $A^T$ is invertible and $$ (A^T)^{-1}=(A^{-1})^T. $$ 3. If $A$ and $B$ are invertible, then $AB$ is invertible and $$ (AB)^{-1}=B^{-1}A^{-1}. $$ 4. If $A$ is invertible and $c\ne 0$, then $cA$ is invertible and $$ (cA)^{-1}=c^{-1}A^{-1}. $$ 5. If $A$ is invertible and $AB=AC$, then $B=C$. If $BA=CA$, then $B=C$. 6. If $A$ is invertible, then $A^m$ is invertible and $$ (A^m)^{-1}=(A^{-1})^m. $$ ::: ### Example 2.27: Using inverse rules without recomputing Suppose $A$ and $B$ are invertible $n\times n$ matrices. Find the inverse of $$ M=3A^2B^{-1}. $$ Using $(XYZ)^{-1}=Z^{-1}Y^{-1}X^{-1}$, $$ M^{-1} = (B^{-1})^{-1}(A^2)^{-1}(3I)^{-1} = B(A^{-1})^2\frac{1}{3}I = \frac13 B(A^{-1})^2. $$ ### Example 2.28: $A+B$ may fail to be invertible Even if $A$ and $B$ are invertible, $A+B$ need not be invertible. Take $$ A=I_2,\qquad B=-I_2. $$ Then both $A$ and $B$ are invertible, but $$ A+B=0, $$ which is not invertible. ### The inverse matrix theorem ::: {.callout-important title="Theorem 2.29: Inverse matrix theorem"} Let $A\in M_{n\times n}(\mathbb F)$. The following statements are equivalent: 1. $A$ is invertible. 2. There exists a square matrix $B$ such that $BA=I_n$. 3. The homogeneous system $A\mathbf x=\mathbf 0$ has only the trivial solution. 4. $\operatorname{rank}(A)=n$. 5. The reduced row echelon form of $A$ is $I_n$. 6. $A$ is a product of elementary matrices. 7. There exists a square matrix $C$ such that $AC=I_n$. 8. For every $\mathbf b\in\mathbb F^n$, the system $A\mathbf x=\mathbf b$ has a unique solution. ::: ::: {.callout-tip title="Story interpretation"} An invertible matrix is a reversible machine. No two different inputs lead to the same output, and every possible output is reached by exactly one input. ::: ### Example 2.30: Testing invertibility Consider $$ A= \begin{bmatrix} 1&2&3\\ 2&4&6\\ 0&1&1 \end{bmatrix}. $$ The second row is twice the first row. Therefore the rows are linearly dependent, so $$ \operatorname{rank}(A)<3. $$ By the inverse matrix theorem, $A$ is not invertible. ```{python} A = sp.Matrix([[1,2,3], [2,4,6], [0,1,1]]) A.rank() ``` ### Example 2.31: Invertibility from a product Suppose $A,B,C\in M_{n\times n}(\mathbb F)$ and $$ ABC=I_n. $$ Then $A(BC)=I_n$, so $A$ has a right inverse. By the inverse matrix theorem, $A$ is invertible. Also $(AB)C=I_n$, so $C$ has a left inverse. Hence $C$ is invertible. Multiplying $ABC=I_n$ on the left by $A^{-1}$ gives $$ BC=A^{-1}. $$ Thus $$ B=A^{-1}C^{-1} $$ is invertible. Therefore all three matrices are invertible, with $$ A^{-1}=BC,\qquad C^{-1}=AB,\qquad B^{-1}=CA. $$ ### Computing inverses ::: {.callout-important title="Theorem 2.32: Algorithm for computing $A^{-1}$"} Let $A\in M_{n\times n}(\mathbb F)$. Form the augmented matrix $$ [A\mid I_n]. $$ Use elementary row operations to compute its reduced row echelon form. 1. If $[A\mid I_n]$ row reduces to $[I_n\mid C]$, then $$ C=A^{-1}. $$ 2. If the left block cannot be row reduced to $I_n$, then $A$ is not invertible. ::: ### Example 2.33: Finding an inverse Let $$ A= \begin{bmatrix} 1&1&1\\ 2&3&2\\ 3&8&2 \end{bmatrix}. $$ Row reducing $[A\mid I_3]$ gives $$ \begin{bmatrix} 1&0&0&10&-6&1\\ 0&1&0&-2&1&0\\ 0&0&1&-7&5&-1 \end{bmatrix}. $$ Hence $$ A^{-1}= \begin{bmatrix} 10&-6&1\\ -2&1&0\\ -7&5&-1 \end{bmatrix}. $$ ```{python} A = sp.Matrix([[1,1,1], [2,3,2], [3,8,2]]) A.inv() ``` ::: {.callout-important title="Theorem 2.34: Inverse of a $2\times 2$ matrix"} The matrix $$ A= \begin{bmatrix} a&b\\ c&d \end{bmatrix} $$ is invertible if and only if $$ ad-bc\ne 0. $$ In that case, $$ A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d&-b\\ -c&a \end{bmatrix}. $$ The scalar $ad-bc$ is the determinant of $A$. ::: ### Example 2.35: Fast inverse for a $2\times 2$ matrix Let $$ A= \begin{bmatrix} 4&7\\ 2&6 \end{bmatrix}. $$ Then $$ \det(A)=4(6)-7(2)=10\ne 0. $$ So $$ A^{-1} = \frac{1}{10} \begin{bmatrix} 6&-7\\ -2&4 \end{bmatrix}. $$ ## 2.7 Transpose, symmetry, and rank The transpose exchanges rows and columns. It also reveals a deep identity about dot products. ### Transpose ::: {.callout-note title="Definition 2.36: Transpose"} Given $A=[a_{ij}]\in M_{m\times n}(\mathbb F)$, the **transpose** $A^T$ is the $n\times m$ matrix whose $(i,j)$-entry is $a_{ji}$. ::: ::: {.callout-important title="Theorem 2.37: Properties of transposition"} Let $A$ and $B$ be matrices for which the indicated operations are defined, and let $r\in\mathbb F$. Then: 1. $(A^T)^T=A$. 2. $(A+B)^T=A^T+B^T$. 3. $(rA)^T=rA^T$. 4. $(AB)^T=B^TA^T$. ::: ### Example 2.38: Transpose of a product Let $$ A= \begin{bmatrix} 1&0&2\\ -1&3&1 \end{bmatrix}, \qquad B= \begin{bmatrix} 2&1\\ 0&-1\\ 4&2 \end{bmatrix}. $$ Instead of computing $(AB)^T$ directly, use $$ (AB)^T=B^TA^T. $$ ```{python} A = np.array([[ 1,0,2], [-1,3,1]]) B = np.array([[2, 1], [0,-1], [4, 2]]) np.allclose((A @ B).T, B.T @ A.T) ``` ::: {.callout-tip title="Adjoint viewpoint"} For real matrices, the transpose satisfies $$ (A\mathbf x)\cdot \mathbf y = \mathbf x\cdot (A^T\mathbf y). $$ This identity is central in least squares, optimization, sensitivity analysis, and machine learning. It explains why transposes appear when computing gradients. ::: ### Rank inequalities Rank measures the number of independent directions captured by a matrix. ::: {.callout-important title="Theorem 2.39: Rank of a product"} If $AB$ is defined, then $$ \operatorname{rank}(AB)\le \operatorname{rank}(A). $$ ::: ::: {.callout-important title="Theorem 2.40: Rank and transpose"} For every matrix $A$, $$ \operatorname{rank}(A)=\operatorname{rank}(A^T). $$ ::: ::: {.callout-important title="Theorem 2.41: Rank bound for matrix products"} If $AB$ is defined, then $$ \operatorname{rank}(AB) \le \min\{\operatorname{rank}(A),\operatorname{rank}(B)\}. $$ ::: ::: {.callout-note title="Proof idea"} The columns of $AB$ are the vectors $A\mathbf b_j$, where $\mathbf b_j$ are the columns of $B$. Therefore all columns of $AB$ lie in the column space of $A$. Hence $$ \operatorname{rank}(AB)\le \operatorname{rank}(A). $$ Using $(AB)^T=B^TA^T$ and $\operatorname{rank}(M)=\operatorname{rank}(M^T)$, one also obtains $$ \operatorname{rank}(AB)\le \operatorname{rank}(B). $$ ::: ### Example 2.42: Rank bottleneck Let $A$ be a $100\times 5$ matrix and $B$ be a $5\times 100$ matrix. Then $$ \operatorname{rank}(AB) \le \min\{\operatorname{rank}(A),\operatorname{rank}(B)\} \le 5. $$ Thus the $100\times 100$ matrix $AB$ has rank at most $5$, so it is not invertible. ::: {.callout-tip title="Data science interpretation"} A large matrix produced through a low-dimensional bottleneck is low rank. This is one reason low-rank factorizations appear in compression, recommendation systems, and neural network model reduction. ::: ### Symmetric matrices ::: {.callout-note title="Definition 2.43: Symmetric matrix"} An $n\times n$ matrix $A$ is **symmetric** if $$ A^T=A. $$ Equivalently, if $A=[a_{ij}]$, then $$ a_{ij}=a_{ji} $$ for all $i,j$. ::: ### Example 2.44: Gram matrices are symmetric For any real matrix $X$, the matrix $X^TX$ is symmetric because $$ (X^TX)^T = X^T(X^T)^T = X^TX. $$ This observation is fundamental in least squares, covariance matrices, principal component analysis, and quadratic optimization. ```{python} X = np.array([[1,2], [3,4], [5,6]]) G = X.T @ X G ``` ```{python} np.allclose(G, G.T) ``` ## 2.8 LU factorization and Gaussian elimination Gaussian elimination can be saved and reused. LU factorization records elimination as a product $$ A=LU. $$ Here $L$ records the row-operation multipliers, and $U$ is the echelon-form matrix. ### Lower triangular matrices ::: {.callout-note title="Definition 2.45: Lower triangular matrix"} An $m\times m$ matrix $L=[\ell_{ij}]$ is **lower triangular** if $$ \ell_{ij}=0 \qquad \text{whenever } j>i. $$ It is **unit lower triangular** if, in addition, $$ \ell_{ii}=1 \qquad \text{for every } i. $$ ::: ::: {.callout-note title="Remark"} Products and inverses of unit lower triangular matrices are unit lower triangular. Elementary matrices of the form $E_{ij}(d)$ with $i>j$ are unit lower triangular. ::: ### LU factorization ::: {.callout-note title="Definition 2.46: LU factorization"} Let $A\in M_{m\times n}(\mathbb F)$. An **LU factorization** of $A$ is a factorization $$ A=LU, $$ where $L$ is a unit lower triangular $m\times m$ matrix and $U$ is an $m\times n$ matrix in echelon form. ::: ::: {.callout-important title="Theorem 2.47: Algorithm for finding an LU factorization"} Suppose $A$ can be transformed into echelon form using only row-replacement operations and no row swaps. Then an LU factorization may be found as follows: 1. Reduce $A$ to echelon form $U$ using only row-replacement operations. 2. Record the multipliers used in elimination in the corresponding entries of $L$. Then $$ A=LU, $$ where $L$ is unit lower triangular. ::: ### Example 2.48: Computing an LU factorization Let $$ A= \begin{bmatrix} 2&1&1\\ 4&-6&0\\ -2&7&2 \end{bmatrix}. $$ Eliminate entries below the first pivot: $$ R_2\leftarrow R_2-2R_1, \qquad R_3\leftarrow R_3+R_1. $$ Then eliminate below the second pivot: $$ R_3\leftarrow R_3+R_2. $$ This gives $$ U= \begin{bmatrix} 2&1&1\\ 0&-8&-2\\ 0&0&1 \end{bmatrix}. $$ The multipliers are $$ m_{21}=2,\qquad m_{31}=-1,\qquad m_{32}=-1. $$ Thus $$ L= \begin{bmatrix} 1&0&0\\ 2&1&0\\ -1&-1&1 \end{bmatrix}. $$ Indeed, $$ A=LU. $$ ```{python} L = np.array([[ 1, 0, 0], [ 2, 1, 0], [-1,-1, 1]]) U = np.array([[2, 1, 1], [0,-8,-2], [0, 0, 1]]) L @ U ``` ### Solving systems with LU If $A=LU$, then $$ A\mathbf x=\mathbf b \quad\Longleftrightarrow\quad L(U\mathbf x)=\mathbf b. $$ Set $$ \mathbf y=U\mathbf x. $$ Then solve two triangular systems: $$ L\mathbf y=\mathbf b $$ by forward substitution, and then $$ U\mathbf x=\mathbf y $$ by back substitution. ### Example 2.49: Solving with LU Use the LU factorization from the previous example to solve $$ A\mathbf x= \begin{bmatrix} 1\\2\\3 \end{bmatrix}. $$ First solve $$ \begin{bmatrix} 1&0&0\\ 2&1&0\\ -1&-1&1 \end{bmatrix} \begin{bmatrix} y_1\\y_2\\y_3 \end{bmatrix} = \begin{bmatrix} 1\\2\\3 \end{bmatrix}. $$ This gives $$ y_1=1,\qquad 2y_1+y_2=2,\qquad -y_1-y_2+y_3=3. $$ So $$ \mathbf y= \begin{bmatrix} 1\\0\\4 \end{bmatrix}. $$ Next solve $$ \begin{bmatrix} 2&1&1\\ 0&-8&-2\\ 0&0&1 \end{bmatrix} \begin{bmatrix} x_1\\x_2\\x_3 \end{bmatrix} = \begin{bmatrix} 1\\0\\4 \end{bmatrix}. $$ Back substitution gives $$ x_3=4,\qquad -8x_2-8=0,\qquad 2x_1-1+4=1. $$ Therefore $$ x_2=-1,\qquad x_1=-1. $$ Thus $$ \mathbf x= \begin{bmatrix} -1\\-1\\4 \end{bmatrix}. $$ ```{python} b = np.array([1,2,3]) # Direct check: A = L @ U np.linalg.solve(A, b) ``` ::: {.callout-warning title="Pivoting"} Not every matrix has an LU factorization without row swaps. For example, $$ \begin{bmatrix} 0&1\\ 1&0 \end{bmatrix} $$ requires a row interchange before elimination. In practice, one often uses $$ PA=LU, $$ where $P$ is a permutation matrix. Pivoting is not merely a technical detail; it is essential for numerical reliability. ::: ## 2.9 Python computation: matrix algebra checklist This section shows how to use Python to check the main computations of the chapter. ### Matrix products and powers ```{python} A = np.array([[1,2], [3,4]]) B = np.array([[1,1], [0,1]]) print("AB =") print(A @ B) print("BA =") print(B @ A) print("A^3 =") print(np.linalg.matrix_power(A, 3)) ``` ### Rank and inverse ```{python} A = np.array([[1,2,3], [2,4,6], [0,1,1]], dtype=float) print("rank(A) =", np.linalg.matrix_rank(A)) ``` ```{python} A = np.array([[4,7], [2,6]], dtype=float) print("det(A) =", np.linalg.det(A)) print("A inverse =") print(np.linalg.inv(A)) ``` ### Exact symbolic computation with SymPy ```{python} A = sp.Matrix([[1,1,1], [2,3,2], [3,8,2]]) A_inv = A.inv() A_inv ``` ```{python} A * A_inv ``` ### LU factorization ```{python} A = sp.Matrix([[2,1,1], [4,-6,0], [-2,7,2]]) L, U, perm = A.LUdecomposition() L, U, perm ``` ::: {.callout-note title="Numerical note"} NumPy is excellent for fast numerical computation. SymPy is useful for exact symbolic computation. In applied linear algebra, both are useful, but they answer slightly different questions. - NumPy: fast floating-point computation. - SymPy: exact algebraic computation. ::: ## 2.10 Classical challenge questions These questions are designed for graduate-level discussion. They are not mainly long computations. They ask you to connect matrix algebra with structure, algorithms, modeling, and numerical reasoning. ::: {.callout-tip title="Challenge 2.1: Matrix multiplication as composition"} Matrix multiplication is associative but usually not commutative. Explain why this is natural if matrices are viewed as linear transformations. Give one practical situation where the order of two operations should matter. ::: ::: {.callout-tip title="Challenge 2.2: Information loss and invertibility"} A matrix maps input vectors to output vectors. Interpret invertibility as “no information is lost.” How does this interpretation relate to pivots, rank, null space, and solving $A\mathbf x=\mathbf b$? ::: ::: {.callout-tip title="Challenge 2.3: Rank as effective dimension"} In data science, the columns of a matrix may represent features or signals. Explain why rank can be interpreted as the number of independent directions contained in the data. What does it mean if a large data matrix has low rank? ::: ::: {.callout-tip title="Challenge 2.4: Gram matrices"} For a data matrix $A$, the matrix $A^TA$ records pairwise dot products among the columns of $A$. What geometric information is contained in $A^TA$? What information about $A$ is lost when replacing $A$ by $A^TA$? ::: ::: {.callout-tip title="Challenge 2.5: Transpose and adjoint viewpoint"} The transpose $A^T$ can be interpreted as the matrix that moves a dot product from one side to the other: $$ (A\mathbf x)\cdot\mathbf y=\mathbf x\cdot(A^T\mathbf y). $$ Explain why this identity is important in least squares, optimization, and sensitivity analysis. ::: ::: {.callout-tip title="Challenge 2.6: LU factorization and reusable computation"} LU factorization separates Gaussian elimination into two triangular solves. Why is this valuable when the same matrix $A$ appears in many systems $$ A\mathbf x_1=\mathbf b_1,\quad A\mathbf x_2=\mathbf b_2,\quad \ldots $$ with different right-hand sides? ::: ::: {.callout-tip title="Challenge 2.7: Pivoting and numerical reliability"} The identity $A=LU$ may fail without row exchanges, and even when it exists it may be numerically unstable. Explain why row pivoting is not just a technical detail but an important part of reliable computation. ::: ::: {.callout-tip title="Challenge 2.8: Block matrix thinking"} Large matrices are often divided into blocks. Explain how block multiplication helps organize problems in networks, multivariable systems, machine learning, or scientific computing. What is gained by thinking at the block level instead of only at the entry level? ::: ::: {.callout-tip title="Challenge 2.9: From algebra to algorithms"} Two formulas can be mathematically identical but computationally very different. For example, $$ (AB)\mathbf x \qquad\text{and}\qquad A(B\mathbf x) $$ give the same result, but may require very different work. Explain why computational cost should influence how we write and implement matrix expressions. ::: ## 2.11 Practice problems ### Problem 2.1 Let $$ A= \begin{bmatrix} 1&2&0\\ -1&3&4 \end{bmatrix}, \qquad B= \begin{bmatrix} 2&-1&5\\ 0&1&-2 \end{bmatrix}. $$ Compute $3A-2B$. ### Problem 2.2 Let $$ A= \begin{bmatrix} 1&0&2\\ -1&3&1 \end{bmatrix}, \qquad \mathbf x= \begin{bmatrix} 2\\-1\\4 \end{bmatrix}. $$ Compute $A\mathbf x$ using both the row interpretation and the column interpretation. ### Problem 2.3 Let $$ A= \begin{bmatrix} 1&2\\ 0&1\\ 3&-1 \end{bmatrix}, \qquad B= \begin{bmatrix} 2&0&1\\ -1&4&3 \end{bmatrix}. $$ Compute $AB$, and determine whether $BA$ is defined. ### Problem 2.4 Find all matrices $$ X= \begin{bmatrix} a&b\\ c&d \end{bmatrix} $$ such that $$ X \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix} = \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix} X. $$ ### Problem 2.5 Let $$ A= \begin{bmatrix} 1&2&3\\ 0&1&4\\ 0&0&2 \end{bmatrix}. $$ Use the inverse matrix theorem to decide whether $A$ is invertible. ### Problem 2.6 Compute the inverse of $$ A= \begin{bmatrix} 3&5\\ 1&2 \end{bmatrix}. $$ ### Problem 2.7 Let $A$ be a $50\times 8$ matrix and $B$ be an $8\times 50$ matrix. Use rank inequalities to explain why $AB$ cannot be invertible. ### Problem 2.8 Let $$ A= \begin{bmatrix} 1&2&1\\ 2&5&3\\ 1&0&8 \end{bmatrix}. $$ Find an LU factorization of $A$ without row swaps. ## 2.12 Solutions to practice problems ### Solution to Problem 2.1 $$ 3A= \begin{bmatrix} 3&6&0\\ -3&9&12 \end{bmatrix}, \qquad 2B= \begin{bmatrix} 4&-2&10\\ 0&2&-4 \end{bmatrix}. $$ Therefore $$ 3A-2B= \begin{bmatrix} -1&8&-10\\ -3&7&16 \end{bmatrix}. $$ ### Solution to Problem 2.2 Using rows, $$ A\mathbf x= \begin{bmatrix} 1(2)+0(-1)+2(4)\\ -1(2)+3(-1)+1(4) \end{bmatrix} = \begin{bmatrix} 10\\ -1 \end{bmatrix}. $$ Using columns, $$ A\mathbf x= 2 \begin{bmatrix} 1\\-1 \end{bmatrix} - \begin{bmatrix} 0\\3 \end{bmatrix} + 4 \begin{bmatrix} 2\\1 \end{bmatrix} = \begin{bmatrix} 10\\-1 \end{bmatrix}. $$ ### Solution to Problem 2.3 $$ AB= \begin{bmatrix} 0&8&7\\ -1&4&3\\ 7&-4&0 \end{bmatrix}. $$ Since $B$ is $2\times 3$ and $A$ is $3\times 2$, the product $BA$ is also defined and has size $2\times 2$. ### Solution to Problem 2.4 Let $$ N= \begin{bmatrix} 1&1\\ 0&1 \end{bmatrix}. $$ Then $$ XN= \begin{bmatrix} a&a+b\\ c&c+d \end{bmatrix}, \qquad NX= \begin{bmatrix} a+c&b+d\\ c&d \end{bmatrix}. $$ Equating entries gives $$ c=0,\qquad a=d. $$ Thus $$ X= \begin{bmatrix} a&b\\ 0&a \end{bmatrix}. $$ ### Solution to Problem 2.5 The matrix $A$ is upper triangular with diagonal entries $1,1,2$. Since all diagonal entries are nonzero, it has a pivot in every column. Thus $$ \operatorname{rank}(A)=3. $$ By the inverse matrix theorem, $A$ is invertible. ### Solution to Problem 2.6 $$ \det(A)=3(2)-5(1)=1. $$ Therefore $$ A^{-1} = \begin{bmatrix} 2&-5\\ -1&3 \end{bmatrix}. $$ ### Solution to Problem 2.7 By the rank inequality, $$ \operatorname{rank}(AB) \le \min\{\operatorname{rank}(A),\operatorname{rank}(B)\} \le 8. $$ But $AB$ is $50\times 50$. An invertible $50\times 50$ matrix must have rank $50$. Since $$ \operatorname{rank}(AB)\le 8<50, $$ the matrix $AB$ cannot be invertible. ### Solution to Problem 2.8 Use elimination: $$ A= \begin{bmatrix} 1&2&1\\ 2&5&3\\ 1&0&8 \end{bmatrix}. $$ First, $$ R_2\leftarrow R_2-2R_1, \qquad R_3\leftarrow R_3-R_1, $$ which gives $$ \begin{bmatrix} 1&2&1\\ 0&1&1\\ 0&-2&7 \end{bmatrix}. $$ Then $$ R_3\leftarrow R_3+2R_2, $$ which gives $$ U= \begin{bmatrix} 1&2&1\\ 0&1&1\\ 0&0&9 \end{bmatrix}. $$ The multipliers are $$ m_{21}=2,\qquad m_{31}=1,\qquad m_{32}=-2. $$ Hence $$ L= \begin{bmatrix} 1&0&0\\ 2&1&0\\ 1&-2&1 \end{bmatrix}. $$ Therefore $$ A=LU = \begin{bmatrix} 1&0&0\\ 2&1&0\\ 1&-2&1 \end{bmatrix} \begin{bmatrix} 1&2&1\\ 0&1&1\\ 0&0&9 \end{bmatrix}. $$ ## 2.13 AI companion activities ### Activity 2.1: Check a matrix product Ask an AI tool: > Compute $AB$ for > $$ > A=\begin{bmatrix}1&2\\3&4\end{bmatrix}, > \qquad > B=\begin{bmatrix}1&1\\0&1\end{bmatrix}. > $$ > Explain why $AB\ne BA$. Then verify the answer by hand and by Python. ### Activity 2.2: Critique an explanation of invertibility Ask an AI tool: > Explain the inverse matrix theorem in simple language. Then critique the explanation. Does it mention rank? pivots? the null space? uniqueness of solutions? row reduction? ### Activity 2.3: Compare two computational expressions Ask an AI tool: > Why is $A(B\mathbf x)$ often faster than $(AB)\mathbf x$? Then test a large example in Python. ```{python} # Optional timing experiment import time m, n, p = 800, 600, 400 A = np.random.randn(m, n) B = np.random.randn(n, p) x = np.random.randn(p) t0 = time.time() y1 = (A @ B) @ x t1 = time.time() t2 = time.time() y2 = A @ (B @ x) t3 = time.time() print("Time for (A @ B) @ x:", t1 - t0) print("Time for A @ (B @ x):", t3 - t2) print("Same answer?", np.allclose(y1, y2)) ``` ::: {.callout-important title="Main message of Chapter 2"} Matrix algebra is not just a collection of rules. It is the language of: - transforming vectors; - composing processes; - detecting information loss; - measuring effective dimension; - reusing computation; - organizing large-scale algorithms. The rest of the book will deepen this story through subspaces, orthogonality, projections, eigenvalues, SVD, Schur decomposition, Fourier and Haar bases, and modern data science. :::

2.1 2.1 Matrix addition and scalar multiplication

2.1.1 Example 2.3: Combining data arrays

2.2 2.2 Vectors, dot products, and matrix-vector products

2.2.1 Dot products

2.2.2 Example 2.5: Dot product as a score

2.2.3 Matrix-vector products

2.2.4 Example 2.10: The column interpretation

2.2.5 Example 2.11: One step of a linear dynamical system

2.3 2.3 Matrix multiplication

2.3.1 Definition and row-column rule

2.3.2 Example 2.13: Computing a matrix product

2.3.3 Algebraic rules and non-rules

2.3.4 Example 2.15: Noncommutativity

2.3.5 Example 2.16: Solving a commutation equation

2.4 2.4 Powers of matrices

2.4.1 Example 2.18: Four common power patterns

2.4.2 Example 2.19: Use in Markov chains

2.5 2.5 Elementary matrices and block multiplication

2.5.1 Elementary matrices

2.5.2 Example 2.22: Using elementary matrices

2.5.3 Block multiplication

2.5.4 Example 2.23: Block multiplication for a linear model

2.6 2.6 Inverse matrices

2.6.1 Definition and basic properties

2.6.2 Example 2.27: Using inverse rules without recomputing

2.6.3 Example 2.28: \(A+B\) may fail to be invertible

2.6.4 The inverse matrix theorem

2.6.5 Example 2.30: Testing invertibility

2.6.6 Example 2.31: Invertibility from a product

2.6.7 Computing inverses

2.6.8 Example 2.33: Finding an inverse

2.6.9 Example 2.35: Fast inverse for a \(2\times 2\) matrix

2.7 2.7 Transpose, symmetry, and rank

2.7.1 Transpose

2.7.2 Example 2.38: Transpose of a product

2.7.3 Rank inequalities

2.7.4 Example 2.42: Rank bottleneck

2.7.5 Symmetric matrices

2.7.6 Example 2.44: Gram matrices are symmetric

2.8 2.8 LU factorization and Gaussian elimination

2.8.1 Lower triangular matrices

2.8.2 LU factorization

2.8.3 Example 2.48: Computing an LU factorization

2.8.4 Solving systems with LU

2.8.5 Example 2.49: Solving with LU

2.9 2.9 Python computation: matrix algebra checklist

2.9.1 Matrix products and powers

2.9.2 Rank and inverse

2.9.3 Exact symbolic computation with SymPy

2.9.4 LU factorization

2.10 2.10 Classical challenge questions

2.11 2.11 Practice problems

2.11.1 Problem 2.1

2.11.2 Problem 2.2

2.11.3 Problem 2.3

2.11.4 Problem 2.4

2.11.5 Problem 2.5

2.11.6 Problem 2.6

2.11.7 Problem 2.7

2.11.8 Problem 2.8

2.12 2.12 Solutions to practice problems

2.12.1 Solution to Problem 2.1

2.12.2 Solution to Problem 2.2

2.12.3 Solution to Problem 2.3

2.12.4 Solution to Problem 2.4

2.12.5 Solution to Problem 2.5

2.12.6 Solution to Problem 2.6

2.12.7 Solution to Problem 2.7

2.12.8 Solution to Problem 2.8

2.13 2.13 AI companion activities

2.13.1 Activity 2.1: Check a matrix product

2.13.2 Activity 2.2: Critique an explanation of invertibility

2.13.3 Activity 2.3: Compare two computational expressions