Code
import numpy as np
u = np.array([1, 2, 3])
v = np.array([4, 5])
outer = np.outer(u, v)
outerarray([[ 4, 5],
[ 8, 10],
[12, 15]])
From areas and volumes to tensors, images, and quantum states
Linear algebra begins with one vector at a time. A matrix takes one input vector and returns one output vector:
\[ T:V\to W. \]
But many important quantities depend on several vectors at once. The area of a parallelogram depends on two vectors. The volume of a parallelepiped depends on three vectors. A determinant depends on all columns of a matrix. A color image has row, column, and color-channel directions. A video has row, column, color, and time directions. A quantum system made of two parts has a state space built from both parts at once.
The language for these situations is multilinear algebra.
Multilinear algebra studies functions that are linear in each input separately, and it builds new vector spaces such as
\[ V\otimes W, \qquad \bigwedge^k V, \qquad V_1\otimes\cdots\otimes V_k. \]
The tensor product turns multi-input linear behavior into ordinary linear algebra on a larger vector space.
This chapter is an application chapter. We will connect abstract definitions with concrete computations in matrices, determinants, geometry, tensor data, image models, kernel methods, and quantum information.
Let \(V_1,\ldots,V_k,W\) be vector spaces over a field \(\mathbb F\). A map
\[ T:V_1\times\cdots\times V_k\to W \]
is called multilinear if it is linear in each input separately. That is, for each fixed index \(i\), after fixing all variables except \(v_i\), the map
\[ v_i\mapsto T(v_1,\ldots,v_i,\ldots,v_k) \]
is a linear map from \(V_i\) to \(W\).
For \(k=2\), the map is called bilinear. For \(k=3\), it is called trilinear.
\[ (u,v)\mapsto u^Tv. \]
\[ (A,B)\mapsto AB. \]
\[ (v_1,\ldots,v_n)\mapsto \det[v_1\ \cdots\ v_n]. \]
\[ (u,v,w)\mapsto u\cdot(v\times w) \]
is trilinear and measures signed volume in \(\mathbb R^3\).
The map
\[ B(x,y)=xy \]
from \(\mathbb R\times\mathbb R\) to \(\mathbb R\) is bilinear because it is linear in \(x\) when \(y\) is fixed and linear in \(y\) when \(x\) is fixed. But it is not linear as a map from \(\mathbb R^2\) to \(\mathbb R\), because
\[ B((x_1,y_1)+(x_2,y_2))=(x_1+x_2)(y_1+y_2) \]
has cross terms.
The tensor product is the central construction of multilinear algebra. It lets us replace bilinear maps by ordinary linear maps.
Let \(V\) and \(W\) be vector spaces over a field \(\mathbb F\). The tensor product \(V\otimes W\) is a vector space together with a bilinear map
\[ \phi:V\times W\to V\otimes W,\qquad \phi(v,w)=v\otimes w, \]
such that every bilinear map
\[ h:V\times W\to Z \]
factors uniquely through a linear map
\[ \widetilde h:V\otimes W\to Z \]
with
\[ \widetilde h(v\otimes w)=h(v,w). \]
This property is called the universal property of the tensor product.
The symbol \(v\otimes w\) remembers the pair \((v,w)\) in a way that is compatible with linearity in each variable. The tensor product is designed so that bilinear maps out of \(V\times W\) become linear maps out of \(V\otimes W\).
Let
\[ \mathcal A=\{a_1,\ldots,a_m\} \]
be a basis of \(V\), and let
\[ \mathcal B=\{b_1,\ldots,b_n\} \]
be a basis of \(W\). Then
\[ \{a_i\otimes b_j:1\le i\le m,\ 1\le j\le n\} \]
is a basis of \(V\otimes W\). In particular,
\[ \dim(V\otimes W)=\dim(V)\dim(W)=mn. \]
Every vector \(v\in V\) and \(w\in W\) can be written as
\[ v=\sum_{i=1}^m \alpha_i a_i, \qquad w=\sum_{j=1}^n \beta_j b_j. \]
By bilinearity,
\[ v\otimes w =\sum_{i=1}^m\sum_{j=1}^n \alpha_i\beta_j(a_i\otimes b_j). \]
Thus the displayed tensors span all simple tensors and hence span \(V\otimes W\). Linear independence can be checked by applying bilinear coordinate functionals that extract the \((i,j)\) coefficient. Therefore the displayed family is a basis.
For \(v\in\mathbb R^m\) and \(w\in\mathbb R^n\),
\[ v\otimes w \quad\longleftrightarrow\quad vw^T. \]
Thus
\[ v\otimes w \quad\text{corresponds to the rank-one matrix}\quad \begin{bmatrix} v_1w_1&\cdots&v_1w_n\\ \vdots&\ddots&\vdots\\ v_mw_1&\cdots&v_mw_n \end{bmatrix}. \]
A tensor of the form \(v\otimes w\) is called a simple tensor or rank-one tensor. Not every tensor is simple.
For example,
\[ I_2=e_1\otimes e_1+e_2\otimes e_2 \]
cannot be written as one single outer product \(uv^T\). It has matrix rank \(2\).
import numpy as np
u = np.array([1, 2, 3])
v = np.array([4, 5])
outer = np.outer(u, v)
outerarray([[ 4, 5],
[ 8, 10],
[12, 15]])
The Kronecker product is the computational matrix version of the tensor product.
Let \(A=[a_{ij}]\in\mathbb R^{m\times n}\) and \(B\in\mathbb R^{p\times q}\). The Kronecker product \(A\otimes B\) is the \(mp\times nq\) block matrix
\[ A\otimes B= \begin{bmatrix} a_{11}B&a_{12}B&\cdots&a_{1n}B\\ a_{21}B&a_{22}B&\cdots&a_{2n}B\\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}B&a_{m2}B&\cdots&a_{mn}B \end{bmatrix}. \]
Whenever the products are defined,
\[ (A\otimes B)(C\otimes D)=(AC)\otimes(BD), \]
\[ (A\otimes B)^T=A^T\otimes B^T, \]
and if \(A\) and \(B\) are invertible, then
\[ (A\otimes B)^{-1}=A^{-1}\otimes B^{-1}. \]
The first identity follows from block multiplication. The \((i,j)\) block of \((A\otimes B)(C\otimes D)\) is
\[ \sum_k (a_{ik}B)(c_{kj}D) =\left(\sum_k a_{ik}c_{kj}\right)BD =(AC)_{ij}BD. \]
This is exactly the \((i,j)\) block of \((AC)\otimes(BD)\). The transpose and inverse identities follow directly from the definition and the first identity.
If
\[ X=[x_1\ x_2\ \cdots\ x_n]\in\mathbb R^{m\times n} \]
has columns \(x_j\in\mathbb R^m\), define
\[ \operatorname{vec}(X)= \begin{bmatrix} x_1\\x_2\\ \vdots\\ x_n \end{bmatrix} \in\mathbb R^{mn}. \]
This stacks the columns of \(X\) into one long vector.
Let \(A\in\mathbb R^{p\times m}\), \(X\in\mathbb R^{m\times n}\), and \(B\in\mathbb R^{n\times q}\). Then
\[ \operatorname{vec}(AXB)=(B^T\otimes A)\operatorname{vec}(X). \]
It is enough to check the formula on the basis matrices \(E_{ij}=e_i e_j^T\). For such a matrix,
\[ AE_{ij}B=(Ae_i)(e_j^TB). \]
The vectorization of this outer product agrees with multiplication by \(B^T\otimes A\). Since both sides are linear in \(X\), the identity holds for every \(X\).
A = np.array([[1, 2], [0, 1]])
X = np.array([[1, 3], [2, 4]])
B = np.array([[2, 0], [1, 1]])
left = (A @ X @ B).reshape(-1, order="F")
right = np.kron(B.T, A) @ X.reshape(-1, order="F")
print(left)
print(right)
print(np.allclose(left, right))[21 8 11 4]
[21 8 11 4]
True
The matrix equation
\[ AXB=C \]
can be rewritten as
\[ (B^T\otimes A)\operatorname{vec}(X)=\operatorname{vec}(C). \]
Thus a matrix equation can be solved as one ordinary linear system.
The tensor product keeps all multilinear information. The exterior product keeps the alternating part. This is the part responsible for signed area, signed volume, determinants, and orientation.
Let \(V\) be a vector space. The second exterior power \(\bigwedge^2 V\) is generated by symbols
\[ v\wedge w \]
subject to bilinearity and the alternating rule
\[ v\wedge v=0. \]
Equivalently,
\[ v\wedge w=-w\wedge v. \]
More generally, \(\bigwedge^k V\) is generated by wedge products
\[ v_1\wedge\cdots\wedge v_k \]
that are multilinear and alternating.
If \(V\) has basis \(\{e_1,\ldots,e_n\}\), then a basis for \(\bigwedge^k V\) is
\[ \{e_{i_1}\wedge\cdots\wedge e_{i_k}:1\le i_1<\cdots<i_k\le n\}. \]
Therefore
\[ \dim\bigwedge^k V={n\choose k}. \]
By multilinearity, every wedge product can be expanded into wedge products of basis vectors. If a basis vector appears twice, the wedge is zero. If all indices are distinct, alternating signs allow the indices to be rearranged in increasing order. Thus the displayed set spans. Linear independence follows from alternating coordinate functionals, or equivalently from the standard construction of exterior powers as alternating tensors.
Let
\[ u=\begin{bmatrix}a\\b\end{bmatrix}, \qquad v=\begin{bmatrix}c\\d\end{bmatrix}. \]
Then
\[ u\wedge v=(ad-bc)(e_1\wedge e_2). \]
The coefficient \(ad-bc\) is the signed area of the parallelogram spanned by \(u\) and \(v\).
For \(u,v,w\in\mathbb R^3\),
\[ u\wedge v\wedge w =\det[u\ v\ w]\ e_1\wedge e_2\wedge e_3. \]
The coefficient is the signed volume of the parallelepiped spanned by the three vectors.
u = np.array([2, 1])
v = np.array([1, 3])
signed_area = np.linalg.det(np.column_stack([u, v]))
signed_area5.000000000000001
There is a unique function
\[ D:(\mathbb R^n)^n\to\mathbb R \]
such that:
This function is
\[ D(v_1,\ldots,v_n)=\det[v_1\ \cdots\ v_n]. \]
Write each vector as
\[ v_j=\sum_i a_{ij}e_i. \]
By multilinearity,
\[ D(v_1,\ldots,v_n) =\sum_{i_1,\ldots,i_n}a_{i_1,1}\cdots a_{i_n,n}D(e_{i_1},\ldots,e_{i_n}). \]
If two indices are equal, the alternating property gives zero. Thus only permutations remain. For a permutation \(\sigma\),
\[ D(e_{\sigma(1)},\ldots,e_{\sigma(n)})=\operatorname{sgn}(\sigma). \]
Therefore
\[ D(v_1,\ldots,v_n) =\sum_{\sigma\in S_n}\operatorname{sgn}(\sigma) a_{\sigma(1),1}\cdots a_{\sigma(n),n}, \]
which is the determinant formula.
The determinant is the scalar by which a linear map \(A:\mathbb R^n\to\mathbb R^n\) acts on the one-dimensional space \(\bigwedge^n\mathbb R^n\):
\[ \bigwedge^n A(e_1\wedge\cdots\wedge e_n)=\det(A)(e_1\wedge\cdots\wedge e_n). \]
A bilinear form on a real vector space \(V\) is a bilinear map
\[ B:V\times V\to\mathbb R. \]
If \(V=\mathbb R^n\), then every bilinear form has the form
\[ B(x,y)=x^TAy \]
for a unique matrix \(A\in\mathbb R^{n\times n}\).
If \(A\) is symmetric positive definite, then
\[ \langle x,y\rangle_A=x^TAy \]
is an inner product. The associated quadratic form
\[ E(x)=x^TAx \]
is an energy. Such expressions occur in least squares, optimization, mechanics, Gaussian models, and graph Laplacians.
A kernel function
\[ k:X\times X\to\mathbb R \]
can often be written as
\[ k(x,y)=\langle \Phi(x),\Phi(y)\rangle \]
for a feature map \(\Phi\). For data points \(x_1,\ldots,x_N\), the kernel matrix
\[ K=[k(x_i,x_j)] \]
is a Gram matrix and is symmetric positive semidefinite.
An order-\(d\) tensor is an element of
\[ V_1\otimes V_2\otimes\cdots\otimes V_d. \]
If \(V_i=\mathbb R^{n_i}\), then an order-\(d\) tensor can be represented as a multi-index array
\[ \mathcal T=(t_{i_1i_2\cdots i_d}), \qquad 1\le i_j\le n_j. \]
\[ \mathbb R^{\text{users}}\otimes\mathbb R^{\text{items}}\otimes\mathbb R^{\text{time}}. \]
A tensor \(\mathcal T\in V_1\otimes\cdots\otimes V_d\) is rank one if it can be written as
\[ \mathcal T=v_1\otimes\cdots\otimes v_d. \]
The CP rank of \(\mathcal T\) is the smallest \(r\) such that
\[ \mathcal T=\sum_{s=1}^r \lambda_s a_s^{(1)}\otimes a_s^{(2)}\otimes\cdots\otimes a_s^{(d)}. \]
For matrices, rank can be computed by row reduction or SVD. Higher-order tensor rank is much more subtle. Best low-rank tensor approximation may fail to exist in some settings, and computing tensor rank is much harder than computing matrix rank.
rng = np.random.default_rng(0)
a = np.array([1, 2])
b = np.array([3, 4, 5])
c = np.array([6, 7])
T = np.einsum('i,j,k->ijk', a, b, c)
print(T.shape)
print(T[:, :, 0])(2, 3, 2)
[[18 24 30]
[36 48 60]]
The singular value decomposition writes a matrix as a sum of rank-one matrices:
\[ X=\sum_{i=1}^r \sigma_i u_i v_i^T. \]
Multilinear algebra asks for analogous decompositions of higher-order tensors.
Let \(\mathcal T\in\mathbb R^{n_1\times\cdots\times n_d}\) and let \(A\in\mathbb R^{m\times n_k}\). The mode-\(k\) product \(\mathcal T\times_k A\) is the tensor obtained by multiplying every mode-\(k\) fiber by \(A\).
A Tucker decomposition of a tensor \(\mathcal T\) has the form
\[ \mathcal T=\mathcal G\times_1 U_1\times_2 U_2\cdots\times_d U_d, \]
where \(\mathcal G\) is a core tensor and \(U_i\) are factor matrices.
Suppose \(\mathcal T_{ijk}\) stores the expression level of gene \(i\) for patient \(j\) at time \(k\). A rank-\(r\) CP approximation
\[ \mathcal T\approx \sum_{s=1}^r \lambda_s a_s\otimes b_s\otimes c_s \]
tries to identify latent patterns, each consisting of a gene pattern \(a_s\), a patient pattern \(b_s\), and a time pattern \(c_s\).
rng = np.random.default_rng(1)
a1, a2 = rng.normal(size=4), rng.normal(size=4)
b1, b2 = rng.normal(size=5), rng.normal(size=5)
c1, c2 = rng.normal(size=3), rng.normal(size=3)
T = 2*np.einsum('i,j,k->ijk', a1,b1,c1) - np.einsum('i,j,k->ijk', a2,b2,c2)
T1 = T.reshape(4, 15)
U, s, Vt = np.linalg.svd(T1, full_matrices=False)
sarray([2.74943855e+00, 1.76763962e+00, 2.09572234e-16, 1.27605086e-16])
Tensor products are essential in quantum mechanics because the state space of a combined system is the tensor product of the state spaces of the parts.
A single qubit has state space \(\mathbb C^2\) with basis
\[ |0\rangle=\begin{bmatrix}1\\0\end{bmatrix}, \qquad |1\rangle=\begin{bmatrix}0\\1\end{bmatrix}. \]
Two qubits have state space
\[ \mathbb C^2\otimes\mathbb C^2\cong\mathbb C^4 \]
with basis
\[ |00\rangle,\ |01\rangle,\ |10\rangle,\ |11\rangle. \]
A two-qubit state is separable if it can be written as
\[ \psi=u\otimes v. \]
Otherwise it is entangled.
The Bell state
\[ \psi=\frac{1}{\sqrt2}(|00\rangle+|11\rangle) \]
corresponds to the matrix
\[ \frac{1}{\sqrt2}\begin{bmatrix}1&0\\0&1\end{bmatrix}. \]
This matrix has rank \(2\), so the state is not separable.
A \(k\)-form on a vector space \(V\) is an alternating multilinear function
\[ \omega:V^k\to\mathbb R. \]
The space of \(k\)-forms is naturally identified with
\[ \bigwedge^k V^*. \]
The wedge product builds higher-dimensional measurements from lower-dimensional measurements.
A color image may be stored as
\[ \mathcal I\in\mathbb R^{m\times n\times 3}. \]
A Tucker-style model
\[ \mathcal I\approx \mathcal G\times_1 U\times_2 V\times_3 C \]
separates row patterns, column patterns, and color-channel patterns.
A tensor model can store user-item-time ratings:
\[ \mathcal R_{u,i,t} =\text{rating of user }u\text{ for item }i\text{ at time }t. \]
A low-rank tensor approximation can reveal latent user groups, item categories, and temporal patterns simultaneously.
For matrix input \(X\in\mathbb R^{m\times n}\) and scalar output \(y\), a linear model has the form
\[ y\approx \langle W,X\rangle=\operatorname{tr}(W^TX). \]
A low-rank model assumes
\[ W\approx \sum_{s=1}^r a_s b_s^T. \]
This reduces parameters from \(mn\) to approximately \(r(m+n)\).
Let
\[ T=e_1\otimes e_1+e_2\otimes e_2\in\mathbb R^2\otimes\mathbb R^2. \]
Show that \(T\) is not a simple tensor.
Identify \(\mathbb R^2\otimes\mathbb R^2\) with \(2\times2\) matrices. Then
\[ T\leftrightarrow I_2. \]
A simple tensor corresponds to an outer product \(uv^T\), which has rank at most \(1\). But \(I_2\) has rank \(2\). Therefore \(T\) is not simple.
Let
\[ u=\begin{bmatrix}2\\1\end{bmatrix}, \qquad v=\begin{bmatrix}1\\3\end{bmatrix}. \]
Compute \(u\wedge v\) and interpret the result.
\[ u\wedge v=(2\cdot 3-1\cdot1)(e_1\wedge e_2)=5(e_1\wedge e_2). \]
The signed area is \(5\), so the ordinary area is \(5\).
Consider
\[ \psi=\alpha|00\rangle+\beta|01\rangle+\gamma|10\rangle+\delta|11\rangle. \]
Show that \(\psi\) is separable if and only if
\[ \alpha\delta-\beta\gamma=0. \]
Identify \(\psi\) with the matrix
\[ M=\begin{bmatrix}\alpha&\beta\\ \gamma&\delta\end{bmatrix}. \]
The state is separable if and only if this matrix has rank one or less. A \(2\times2\) matrix has rank one or less exactly when its determinant is zero. Thus
\[ \alpha\delta-\beta\gamma=0. \]
Let \(u=(1,2,3)^T\) and \(v=(4,5)^T\). Compute the matrix corresponding to \(u\otimes v\).
\[ uv^T=\begin{bmatrix}1\\2\\3\end{bmatrix} \begin{bmatrix}4&5\end{bmatrix} = \begin{bmatrix} 4&5\\ 8&10\\ 12&15 \end{bmatrix}. \]
Let
\[ A=\begin{bmatrix}1&2\\3&4\end{bmatrix}, \qquad B=\begin{bmatrix}0&5\\6&7\end{bmatrix}. \]
Compute \(A\otimes B\).
\[ A\otimes B= \begin{bmatrix} 1B&2B\\ 3B&4B \end{bmatrix} = \begin{bmatrix} 0&5&0&10\\ 6&7&12&14\\ 0&15&0&20\\ 18&21&24&28 \end{bmatrix}. \]
Show that \(u\wedge v=0\) if \(u\) and \(v\) are linearly dependent.
If \(v=cu\), then
\[ u\wedge v=u\wedge(cu)=c(u\wedge u)=0. \]
If \(\dim V=5\), compute \(\dim\bigwedge^2 V\) and \(\dim\bigwedge^3 V\).
\[ \dim\bigwedge^2 V={5\choose2}=10, \qquad \dim\bigwedge^3 V={5\choose3}=10. \]
Let \(A\) be a \(3\times3\) matrix. Explain why \(\bigwedge^3 A\) is multiplication by \(\det(A)\).
The space \(\bigwedge^3\mathbb R^3\) is one-dimensional with basis \(e_1\wedge e_2\wedge e_3\). The induced map sends this basis vector to
\[ Ae_1\wedge Ae_2\wedge Ae_3 =\det(A)(e_1\wedge e_2\wedge e_3). \]
Thus \(\bigwedge^3 A\) is multiplication by \(\det(A)\).
Use an AI assistant as a study partner, not as a replacement for your reasoning.
Ask:
Explain the universal property of the tensor product using a concrete example with \(\mathbb R^2\otimes\mathbb R^3\).
Then check whether the explanation correctly distinguishes bilinear maps from linear maps.
Ask:
Give three real-world data sets that are naturally third-order tensors. For each one, identify the three modes.
Then decide which examples are truly tensor data and which are just matrices with labels.
Ask:
Prove and numerically verify the identity \(\operatorname{vec}(AXB)=(B^T\otimes A)\operatorname{vec}(X)\).
Then test the identity using your own small matrices in Python.
Multilinear algebra extends linear algebra from one input vector to many input vectors. Tensor products linearize multilinear maps and model multi-way data. Kronecker products provide concrete computational tools. Exterior products capture oriented area, volume, determinant, and differential forms. These ideas connect linear algebra to modern applications in images, machine learning, geometry, physics, and quantum information.