MATH 5010 Section 7 — Probability Inequalities

Concept map

Why probability inequalities?

Probability inequalities give useful upper bounds when an exact probability is hard to compute. They turn limited information such as a mean, variance, or moment-generating function into a statement about tail probabilities.

Markov
Uses only $E[X]$ for nonnegative $X$.

Chebyshev
Uses $E[X]$ and $\operatorname{Var}(X)$.

Chernoff
Uses $E[e^{tX}]$ and optimizes over $t$.

Jensen
Compares $\varphi(E[X])$ and $E[\varphi(X)]$.

$$P(X\ge a)\le \frac{E[X]}{a}\quad (X\ge 0),$$ $$P(|X-\mu|\ge k)\le \frac{\sigma^2}{k^2},$$ $$P(X\ge a)\le \inf_{t>0} e^{-ta}E[e^{tX}],$$ $$\varphi(E[X])\le E[\varphi(X)]\quad \text{for convex }\varphi.$$

Interactive 1

Markov inequality: one mean controls one tail

For a nonnegative random variable, large values cannot happen too often unless the mean is large. Try an exponential random variable $X\sim \operatorname{Exp}(\lambda)$, whose true tail is $P(X\ge a)=e^{-\lambda a}$.

Rate $\lambda$: 1.00 Threshold $a$: 2.00

Derivation

Since $X\ge 0$, $E[X]\ge aP(X\ge a)$. Therefore $P(X\ge a)\le E[X]/a=1/(\lambda a)$. The bound is useful only after truncating at $1$, because probabilities cannot exceed $1$.

The shaded tail is the true exponential probability. The horizontal guide shows the Markov upper bound.

Interactive 2

Chebyshev inequality: variance controls spread

Chebyshev's inequality works for any distribution with finite variance. It says that the probability of being far from the mean is small if the variance is small.

Mean $\mu$: 0.0 Standard deviation $\sigma$: 1.0 Distance $k$: 2.0

The curve is a normal example. Chebyshev's bound is distribution-free, so it is often conservative.

$$P(|X-\mu|\ge k)\le \frac{\sigma^2}{k^2}=\frac{1}{(k/\sigma)^2}.$$

Interactive 3

Chernoff bound for an exponential tail

Chernoff uses exponential Markov inequality. For $X\sim\operatorname{Exp}(\lambda)$, $E[e^{tX}]=\lambda/(\lambda-t)$ for $0

Rate $\lambda$: 1.00 Threshold $a$: 3.00 Choose $t$: 0.60

The graph shows the Chernoff objective $e^{-ta}\lambda/(\lambda-t)$ as a function of $t$.

$$P(X\ge a)\le e^{-ta}\frac{\lambda}{\lambda-t},\qquad 01/\lambda, \qquad P(X\ge a)\le \lambda a e^{1-\lambda a}.$$

Interactive 4

Compare true tail, Markov, Chebyshev, and Chernoff

This reproduces the key Section 7 homework-style comparison for $X\sim\operatorname{Exp}(\lambda)$. For large $a$, Chernoff is much closer because it decays exponentially like the true tail.

Rate $\lambda$: 1.00 Threshold $a$: 4.00

Log-scale plot: lower curves are tighter upper bounds. Values above $1$ are clipped to $1$ because probabilities cannot exceed $1$.

Interactive 5

Jensen inequality: convex functions reward variability

For a convex function $\varphi$, the average value of $\varphi(X)$ is at least $\varphi$ of the average value of $X$. Move probability mass between two points and watch the Jensen gap change.

Point $x_1$: 0.0 Point $x_2$: 2.0 Probability $P(X=x_1)=p$: 0.50 Convex function:

The chord between $(x_1,\varphi(x_1))$ and $(x_2,\varphi(x_2))$ lies above a convex graph. This is the geometric reason for Jensen's inequality.

Practice

Quick checks

1. Markov

If $X\ge0$ and $E[X]=5$, what is an upper bound for $P(X\ge20)$?

2. Chebyshev

If $\mu=10$, $\sigma=2$, what bound do we get for $P(|X-10|\ge6)$?

3. Jensen

For convex $\varphi(x)=x^2$, Jensen gives which familiar inequality?