Why probability inequalities?
Probability inequalities give useful upper bounds when an exact probability is hard to compute. They turn limited information such as a mean, variance, or moment-generating function into a statement about tail probabilities.
Uses only $E[X]$ for nonnegative $X$.
Uses $E[X]$ and $\operatorname{Var}(X)$.
Uses $E[e^{tX}]$ and optimizes over $t$.
Compares $\varphi(E[X])$ and $E[\varphi(X)]$.
Markov inequality: one mean controls one tail
For a nonnegative random variable, large values cannot happen too often unless the mean is large. Try an exponential random variable $X\sim \operatorname{Exp}(\lambda)$, whose true tail is $P(X\ge a)=e^{-\lambda a}$.
Derivation
Since $X\ge 0$, $E[X]\ge aP(X\ge a)$. Therefore $P(X\ge a)\le E[X]/a=1/(\lambda a)$. The bound is useful only after truncating at $1$, because probabilities cannot exceed $1$.The shaded tail is the true exponential probability. The horizontal guide shows the Markov upper bound.
Chebyshev inequality: variance controls spread
Chebyshev's inequality works for any distribution with finite variance. It says that the probability of being far from the mean is small if the variance is small.
The curve is a normal example. Chebyshev's bound is distribution-free, so it is often conservative.
Chernoff bound for an exponential tail
Chernoff uses exponential Markov inequality. For $X\sim\operatorname{Exp}(\lambda)$,
$E[e^{tX}]=\lambda/(\lambda-t)$ for $0 The graph shows the Chernoff objective $e^{-ta}\lambda/(\lambda-t)$ as a function of $t$.
Compare true tail, Markov, Chebyshev, and Chernoff
This reproduces the key Section 7 homework-style comparison for $X\sim\operatorname{Exp}(\lambda)$. For large $a$, Chernoff is much closer because it decays exponentially like the true tail.
Log-scale plot: lower curves are tighter upper bounds. Values above $1$ are clipped to $1$ because probabilities cannot exceed $1$.
Jensen inequality: convex functions reward variability
For a convex function $\varphi$, the average value of $\varphi(X)$ is at least $\varphi$ of the average value of $X$. Move probability mass between two points and watch the Jensen gap change.
The chord between $(x_1,\varphi(x_1))$ and $(x_2,\varphi(x_2))$ lies above a convex graph. This is the geometric reason for Jensen's inequality.
Quick checks
1. Markov
If $X\ge0$ and $E[X]=5$, what is an upper bound for $P(X\ge20)$?
2. Chebyshev
If $\mu=10$, $\sigma=2$, what bound do we get for $P(|X-10|\ge6)$?
3. Jensen
For convex $\varphi(x)=x^2$, Jensen gives which familiar inequality?