MATH 5010 · Section 11

Sufficient Statistics

Compress a random sample without losing information about the parameter. Use factorization, conditional distributions, and likelihood ratios to see what information matters.

1. What is sufficiency?

A statistic that keeps all parameter information

A statistic $T(X_1,\dots,X_n)$ is sufficient for a parameter $\theta$ if, after we know $T$, the remaining details of the sample no longer tell us anything about $\theta$.

$$T(X)\text{ is sufficient for }\theta \quad\Longleftrightarrow\quad \mathcal L(X\mid T(X)=t)\text{ does not depend on }\theta.$$
Teaching message: sufficiency is data compression for inference. For many models, the whole sample can be replaced by a simple summary such as $\sum X_i$, $\bar X$, or $(\sum X_i,\sum X_i^2)$.
Bernoulli: $\sum X_i$Poisson: $\sum X_i$Normal mean, known variance: $\bar X$Normal mean and variance: $(\bar X,S^2)$Exponential scale: $\sum X_i$
2. Neyman–Fisher factorization

The main theorem

For a random sample with joint density or mass function $f_\theta(x_1,\dots,x_n)$, a statistic $T(X)$ is sufficient for $\theta$ if the joint model can be factored as

$$f_\theta(x_1,\dots,x_n)=g_\theta(T(x_1,\dots,x_n))\,h(x_1,\dots,x_n).$$

The parameter $\theta$ may appear in $g_\theta$, but the leftover factor $h$ cannot involve $\theta$.

Classroom proof sketch

After conditioning on $T=t$, the factor $g_\theta(t)$ is constant over all samples with the same statistic value. It cancels from the conditional distribution, leaving a distribution that depends only on $h$ and the sample space, not on $\theta$.

Interactive factorization checker

3. Bernoulli model

Only the number of successes matters

Let $X_1,\dots,X_n\overset{iid}{\sim}\mathrm{Bernoulli}(p)$. The joint PMF is

$$p^{\sum x_i}(1-p)^{n-\sum x_i}.$$

So $T=\sum_{i=1}^n X_i$ is sufficient for $p$.

Same statistic, same likelihood shape

For Bernoulli data, the order of 0s and 1s does not affect the likelihood. Only $T=\sum X_i$ matters.

4. Poisson model

The total count is sufficient

Let $X_1,\dots,X_n\overset{iid}{\sim}\mathrm{Poisson}(\lambda)$. Then

$$\prod_{i=1}^n e^{-\lambda}\frac{\lambda^{x_i}}{x_i!}=e^{-n\lambda}\lambda^{\sum x_i}\prod_{i=1}^n\frac1{x_i!}.$$

Thus $T=\sum X_i$ is sufficient for $\lambda$.

Likelihood as a function of $\lambda$

The likelihood uses the sample through the total count $T$. The MLE is $\hat\lambda=T/n=\bar X$.

5. Normal model

Which normal statistic is sufficient?

For $X_i\sim N(\mu,\sigma^2)$, the answer depends on which parameters are unknown.

Unknown parameter(s)Sufficient statisticReason
$\mu$ only, $\sigma^2$ known$\sum X_i$ or $\bar X$Likelihood depends on data through $\sum X_i$
$\sigma^2$ only, $\mu$ known$\sum (X_i-\mu)^2$Likelihood depends on squared deviations
Both $\mu,\sigma^2$ unknown$(\sum X_i,\sum X_i^2)$Equivalent to $(\bar X,S^2)$

Data cloud and summaries

For unknown $\mu$ and $\sigma^2$, two summaries are needed: location and spread.

6. Minimal sufficient statistics

The smallest useful compression

A sufficient statistic may not be the smallest possible. A statistic $T$ is minimal sufficient if every other sufficient statistic must contain the information in $T$.

$$\frac{f_\theta(x)}{f_\theta(y)}\text{ is independent of }\theta\quad\Longleftrightarrow\quad T(x)=T(y).$$

This likelihood-ratio criterion is often the easiest way to identify minimal sufficiency.

Example: for Bernoulli and Poisson samples, $T=\sum X_i$ is not only sufficient but minimal sufficient.

Likelihood-ratio test for two samples

Choose a model and compare two samples. The ratio is parameter-free exactly when the sufficient statistic agrees.

7. Quick checks

Self-check questions

Q1

For $X_i\sim\mathrm{Poisson}(\lambda)$, is $\bar X$ sufficient?

Q2

For $X_i\sim N(\mu,\sigma^2)$ with both parameters unknown, is $\bar X$ alone sufficient?

Q3

What does factorization mean intuitively?