MATH 5010 Section 8 — Random Samples and Order Statistics

1. From one random variable to a sample

The central object: $X_1,\ldots,X_n$

A random sample from a distribution $F$ means

$$X_1,X_2,\ldots,X_n \stackrel{iid}{\sim} F.$$

Identically distributed

Each $X_i$ has the same CDF $F$ and the same mean and variance.

Independent

Knowing one observation does not change the distribution of another.

Statistics

A statistic is any function of the sample, such as $\bar X$, $S^2$, $X_{(1)}$, or $X_{(n)}$.

$$\bar X=\frac1n\sum_{i=1}^n X_i,\qquad S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2.$$

2. Interactive random sample

Generate one sample

Distribution Sample size: 30 Bernoulli probability $p$: 0.50

Click generate to start.

Sample histogram

The histogram shows a single realized sample. Repeated samples fluctuate, but their long-run behavior is governed by the parent distribution.

3. Sampling distribution

Many sample means

Generate many independent samples of size $n$. Each sample gives one $\bar X$. The histogram below is an empirical sampling distribution.

Number of repeated samples: 1000

$$E[\bar X]=\mu,\qquad \operatorname{Var}(\bar X)=\frac{\sigma^2}{n}.$$

Choose settings and simulate.

Histogram of $\bar X$

As $n$ grows, the distribution of $\bar X$ becomes more concentrated around $\mu$.

4. Empirical distribution

Empirical CDF

The empirical CDF estimates the true CDF using the sample:

$$\widehat F_n(x)=\frac{1}{n}\sum_{i=1}^n \mathbf{1}\{X_i\le x\}.$$

For Uniform(0,1), the true CDF is $F(x)=x$ on $0\le x\le 1$. Generate uniform samples and watch the step function approach the diagonal.

Uniform sample size: 25

The maximum vertical gap is a simple way to measure fit.

ECDF vs true CDF

5. Order statistics

$k$-th smallest observation

Sort the sample:

$$X_{(1)}\le X_{(2)}\le \cdots \le X_{(n)}.$$

For $X_i\sim \operatorname{Uniform}(0,1)$, the $k$-th order statistic has density

$$f_{X_{(k)}}(x)=\frac{n!}{(k-1)!(n-k)!}x^{k-1}(1-x)^{n-k},\quad 0 $n$: 10 $k$: 5

For Uniform samples, $X_{(k)}\sim \operatorname{Beta}(k,n+1-k)$.

Simulation vs Beta curve

6. Simulation preview

Monte Carlo integration

Random samples can approximate integrals. For $U_i\sim\operatorname{Uniform}(0,1)$,

$$\int_0^1 g(x)\,dx = E[g(U)]\approx \frac1m\sum_{i=1}^m g(U_i).$$

Function Number of Monte Carlo points: 1000

Monte Carlo error decreases slowly, usually like $1/\sqrt m$.

Estimate as points accumulate

7. Quick checks

Practice questions

1. If $X_1,\ldots,X_n$ are iid with mean $\mu$ and variance $\sigma^2$, what are $E[\bar X]$ and $\operatorname{Var}(\bar X)$?

Answer: $E[\bar X]=\mu$ and $\operatorname{Var}(\bar X)=\sigma^2/n$.

2. For a Uniform(0,1) sample of size $n$, what is $E[X_{(k)}]$?

Since $X_{(k)}\sim \operatorname{Beta}(k,n+1-k)$, $E[X_{(k)}]=k/(n+1)$.

3. For $n=5$, what are the expected minimum, median, and maximum of a Uniform(0,1) sample?

$E[X_{(1)}]=1/6$, $E[X_{(3)}]=3/6=1/2$, and $E[X_{(5)}]=5/6$.

4. Why is $S^2$ divided by $n-1$ instead of $n$?

The sample mean $\bar X$ is estimated from the same data, using one degree of freedom. Dividing by $n-1$ makes $S^2$ an unbiased estimator of $\sigma^2$.