MATH 5010 Section 11 — Sufficient Statistics

1. What is sufficiency?

A statistic that keeps all parameter information

A statistic $T(X_1,\dots,X_n)$ is sufficient for a parameter $\theta$ if, after we know $T$, the remaining details of the sample no longer tell us anything about $\theta$.

$$T(X)\text{ is sufficient for }\theta \quad\Longleftrightarrow\quad \mathcal L(X\mid T(X)=t)\text{ does not depend on }\theta.$$

Teaching message: sufficiency is data compression for inference. For many models, the whole sample can be replaced by a simple summary such as $\sum X_i$, $\bar X$, or $(\sum X_i,\sum X_i^2)$.

Bernoulli: $\sum X_i$Poisson: $\sum X_i$Normal mean, known variance: $\bar X$Normal mean and variance: $(\bar X,S^2)$Exponential scale: $\sum X_i$

2. Neyman–Fisher factorization

The main theorem

For a random sample with joint density or mass function $f_\theta(x_1,\dots,x_n)$, a statistic $T(X)$ is sufficient for $\theta$ if the joint model can be factored as

$$f_\theta(x_1,\dots,x_n)=g_\theta(T(x_1,\dots,x_n))\,h(x_1,\dots,x_n).$$

The parameter $\theta$ may appear in $g_\theta$, but the leftover factor $h$ cannot involve $\theta$.

Classroom proof sketch

After conditioning on $T=t$, the factor $g_\theta(t)$ is constant over all samples with the same statistic value. It cancels from the conditional distribution, leaving a distribution that depends only on $h$ and the sample space, not on $\theta$.

Interactive factorization checker

Model

3. Bernoulli model

Only the number of successes matters

Let $X_1,\dots,X_n\overset{iid}{\sim}\mathrm{Bernoulli}(p)$. The joint PMF is

$$p^{\sum x_i}(1-p)^{n-\sum x_i}.$$

So $T=\sum_{i=1}^n X_i$ is sufficient for $p$.

Sample size $n$: 12Success probability $p$: 0.40

Same statistic, same likelihood shape

For Bernoulli data, the order of 0s and 1s does not affect the likelihood. Only $T=\sum X_i$ matters.

4. Poisson model

The total count is sufficient

Let $X_1,\dots,X_n\overset{iid}{\sim}\mathrm{Poisson}(\lambda)$. Then

$$\prod_{i=1}^n e^{-\lambda}\frac{\lambda^{x_i}}{x_i!}=e^{-n\lambda}\lambda^{\sum x_i}\prod_{i=1}^n\frac1{x_i!}.$$

Thus $T=\sum X_i$ is sufficient for $\lambda$.

Sample size $n$: 10Rate $\lambda$: 3.00

Likelihood as a function of $\lambda$

The likelihood uses the sample through the total count $T$. The MLE is $\hat\lambda=T/n=\bar X$.

5. Normal model

Which normal statistic is sufficient?

For $X_i\sim N(\mu,\sigma^2)$, the answer depends on which parameters are unknown.

Unknown parameter(s)	Sufficient statistic	Reason
$\mu$ only, $\sigma^2$ known	$\sum X_i$ or $\bar X$	Likelihood depends on data through $\sum X_i$
$\sigma^2$ only, $\mu$ known	$\sum (X_i-\mu)^2$	Likelihood depends on squared deviations
Both $\mu,\sigma^2$ unknown	$(\sum X_i,\sum X_i^2)$	Equivalent to $(\bar X,S^2)$

Sample size $n$: 20Mean $\mu$: 1.00Standard deviation $\sigma$: 1.50

Data cloud and summaries

For unknown $\mu$ and $\sigma^2$, two summaries are needed: location and spread.

6. Minimal sufficient statistics

The smallest useful compression

A sufficient statistic may not be the smallest possible. A statistic $T$ is minimal sufficient if every other sufficient statistic must contain the information in $T$.

$$\frac{f_\theta(x)}{f_\theta(y)}\text{ is independent of }\theta\quad\Longleftrightarrow\quad T(x)=T(y).$$

This likelihood-ratio criterion is often the easiest way to identify minimal sufficiency.

Example: for Bernoulli and Poisson samples, $T=\sum X_i$ is not only sufficient but minimal sufficient.

Likelihood-ratio test for two samples

Choose a model and compare two samples. The ratio is parameter-free exactly when the sufficient statistic agrees.

ModelSample ASample B

7. Quick checks

Self-check questions

Q1

For $X_i\sim\mathrm{Poisson}(\lambda)$, is $\bar X$ sufficient?

Q2

For $X_i\sim N(\mu,\sigma^2)$ with both parameters unknown, is $\bar X$ alone sufficient?

Q3

What does factorization mean intuitively?