15 Chapter 14: Hypothesis Tests I — Methods of Finding Tests

This chapter introduces the foundations of hypothesis testing and several general methods for constructing tests. The central ideas are null and alternative hypotheses, test statistics, rejection regions, Type I and Type II errors, significance level, power, $p$-values, likelihood ratio tests, Bayesian tests, union-intersection tests, intersection-union tests, and the Neyman–Pearson lemma.

Topics

Hypothesis testing; null and alternative hypotheses; Type I and Type II errors; significance level; power; $p$-values; likelihood ratio tests; Bayesian tests; union-intersection tests; intersection-union tests; radar detection; coin testing; normal mean tests; sufficient statistics and LRTs.

16 Introduction to Hypothesis Testing

This section develops the mathematical theory behind hypothesis tests and explains how classical tests arise from general testing principles.

In introductory statistics, students often learn concrete tests such as the $Z$-test, $t$-test, chi-square test, two-proportion test, two-sample mean test, and $F$-test. Here we study the theoretical principles behind such procedures.

Key idea

Main goal A hypothesis test uses observed data to decide between two competing statements about a population parameter.

For example, a pharmaceutical company may want to know whether a new drug is effective in treating a disease. A natural pair of hypotheses is \[H_0: \text{the drug is not effective}, \qquad H_1: \text{the drug is effective}.\] The null hypothesis $H_0$ represents the default or no-effect claim, while the alternative hypothesis $H_1$ represents the new effect or departure from the default.

16.1 The testing decision

A hypothesis test is a rule that specifies which sample values lead us to reject $H_0$ and which sample values lead us not to reject $H_0$.

Definition

Definition 1 (Hypothesis test). Let $X=(X_1,\ldots,X_n)$ be a sample from a population distribution depending on a parameter $\theta$. A hypothesis test is a rule that divides the sample space into two regions:

a rejection region $R$, where we reject $H_0$;
an acceptance region $A=R^c$, where we fail to reject $H_0$.

Equivalently, a test may be specified by a test statistic $W(X)$ and a threshold rule.

Remark

Remark 2 (Terminology). In modern statistical language, one often says “fail to reject $H_0$” instead of “accept $H_0$.” This emphasizes that lack of evidence against $H_0$ is not the same as proof that $H_0$ is true.

16.2 Mathematical formulation

The mathematical formulation of testing is based on partitioning the parameter space.

Definition

Definition 3 (Null and alternative hypotheses). Let $\Theta$ be the parameter space. A hypothesis test compares \[H_0: \theta\in \Theta_0 \qquad \text{versus} \qquad H_1: \theta\in \Theta_0^c.\] The set $\Theta_0$ is the null parameter space, and $\Theta_0^c$ is the alternative parameter space.

Examples include \[H_0:\mu=100 \quad \text{versus}\quad H_1:\mu\ne 100,\] or \[H_0:\mu\ge 100 \quad \text{versus}\quad H_1:\mu<100.\]

17 A First Example: Testing Whether a Coin Is Fair

This example introduces the main ingredients of hypothesis testing through the familiar problem of checking whether a coin is fair.

Suppose $\theta$ is the probability of heads. We want to test \[H_0:\theta=\frac12 \qquad \text{versus}\qquad H_1:\theta\ne \frac12.\] Let $X_i\sim \operatorname{Bernoulli}(\theta)$ represent the result of the $i$th toss, where $X_i=1$ for heads and $X_i=0$ for tails. Let \[X=X_1+\cdots+X_n.\] Then \[X\sim \operatorname{Binomial}(n,\theta).\]

Example

Example 4 (Coin test with 100 tosses). Suppose the coin is tossed $n=100$ times. Under $H_0$, $\theta=1/2$, so \[X\sim \operatorname{Binomial}\left(100,\frac12\right), \qquad \mathbb{E}_{H_0}[X]=50, \qquad \operatorname{Var}_{H_0}(X)=25.\] A natural test rejects $H_0$ when $X$ is too far from $50$.

Solution

Let $t>0$ be a threshold. The test is \[\text{fail to reject }H_0 \text{ if } |X-50|\le t, \qquad \text{reject }H_0 \text{ if } |X-50|>t.\] The threshold should control the probability of rejecting a fair coin: \[\mathbb{P}(\text{Type I error})= \mathbb{P}_{H_0}(|X-50|>t).\] Using the central limit theorem, under $H_0$, \[Y=\frac{X-n\theta_0}{\sqrt{n\theta_0(1-\theta_0)}} =\frac{X-50}{5} \approx \operatorname{Normal}(0,1).\] For significance level $\alpha=0.05$, the two-sided critical value is approximately $1.96$. Thus we reject $H_0$ when \[\left|\frac{X-50}{5}\right|>1.96.\] Equivalently, \[|X-50|>9.8.\] Using integer values, we fail to reject $H_0$ approximately when \[X\in\{41,42,\ldots,59\},\] and reject $H_0$ otherwise.

17.1 Type I and Type II errors

Every hypothesis test can make two kinds of mistakes.

Definition

Definition 5 (Type I and Type II errors). For a hypothesis test of $H_0$ versus $H_1$:

A Type I error occurs when $H_0$ is true but we reject $H_0$.
A Type II error occurs when $H_1$ is true but we fail to reject $H_0$.

The Type I error probability is usually denoted by $\alpha$, and the Type II error probability is usually denoted by $\beta$.

For the coin example, if $\theta_1\ne 1/2$, then \[\beta(\theta_1)=\mathbb{P}_{\theta_1}(\text{fail to reject }H_0) =\mathbb{P}_{\theta_1}(41\le X\le 59).\] The power function is \[\operatorname{Power}(\theta)=\mathbb{P}_\theta(\text{reject }H_0)=1-\beta(\theta)\] for $\theta$ in the alternative.

Decision / Truth	$H_0$ true	$H_1$ true
Reject $H_0$	Type I error	Correct decision
Fail to reject $H_0$	Correct decision	Type II error

Warning

Trade-off For a fixed sample size, making $\alpha$ smaller often makes $\beta$ larger. Reducing both errors usually requires increasing the sample size or using a more informative statistic.

18 Significance Level, Critical Values, and $p$-Values

This section explains the practical language of hypothesis testing: significance level, rejection region, critical value, and $p$-value.

18.1 Significance level

The significance level controls the probability of falsely rejecting the null hypothesis.

Definition

Definition 6 (Level-$\alpha$ test). A test has significance level $\alpha$ if \[\sup_{\theta\in\Theta_0}\mathbb{P}_\theta(\text{reject }H_0)\le \alpha.\] For a simple null hypothesis $H_0:\theta=\theta_0$, this reduces to \[\mathbb{P}_{\theta_0}(\text{reject }H_0)\le \alpha.\]

18.2 $p$-value

The $p$-value measures how extreme the observed statistic is under the null model.

Definition

Definition 7 ($p$-value). For an observed statistic $W(x_1,\ldots,x_n)$, the $p$-value is the probability, computed under $H_0$, of observing a test statistic at least as extreme as the one observed. For a right-sided test, this is often \[p\text{-value}=\mathbb{P}_{H_0}\left(W(X_1,\ldots,X_n)\ge W(x_1,\ldots,x_n)\right).\] For a left-sided test, this is often \[p\text{-value}=\mathbb{P}_{H_0}\left(W(X_1,\ldots,X_n)\le W(x_1,\ldots,x_n)\right).\]

Key idea

Interpretation The $p$-value is the smallest significance level at which the observed data would lead to rejection of $H_0$.

A small $p$-value indicates that the observed data are unusual under $H_0$, so the data provide evidence against $H_0$.

19 Example: Radar Aircraft Detection

This example shows how hypothesis testing appears in signal detection and illustrates the trade-off between Type I and Type II errors.

A radar system receives a signal $X$. If no aircraft is present, then \[X=W.\] If an aircraft is present, then \[X=1+W.\] Here \[W\sim \operatorname{Normal}\left(0,\sigma^2\right), \qquad \sigma^2=\frac19.\] Equivalently, \[X=\theta+W, \qquad \theta=\begin{cases} 0, & \text{no aircraft is present},\\ 1, & \text{an aircraft is present}. \end{cases}\] We test \[H_0:\theta=0 \qquad \text{versus}\qquad H_1:\theta=1.\]

Example

Example 8 (Level $0.05$ radar test). Construct a level $\alpha=0.05$ test that rejects $H_0$ when $X>c$.

Solution

Under $H_0$, $X=W\sim \operatorname{Normal}(0,1/9)$. Thus \[\mathbb{P}_{H_0}(X>c)=\mathbb{P}(3X>3c)=1-\Phi(3c).\] To make this probability equal to $0.05$, choose \[1-\Phi(3c)=0.05.\] Therefore \[3c=z_{0.95}\approx 1.645, \qquad c\approx \frac{1.645}{3}=0.5483.\] The level-$0.05$ test is \[\text{reject }H_0 \quad \text{if} \quad X>0.5483.\]

Example

Example 9 (Type II error for the radar test). For the level $0.05$ radar test above, compute the Type II error probability.

Solution

Under $H_1$, $X=1+W$ with $W\sim \operatorname{Normal}(0,1/9)$. The Type II error probability is \[\beta=\mathbb{P}_{H_1}(X\le c)=\mathbb{P}(1+W\le c)=\mathbb{P}(W\le c-1).\] Standardizing gives \[\beta=\Phi(3(c-1)).\] Using $c=1.645/3\approx 0.5483$, \[3(c-1)\approx -1.355,\] so \[\beta\approx \Phi(-1.355)\approx 0.0877.\] Thus the probability of missing a present aircraft is about $8.77\%$.

Example

Example 10 (Evidence check at level $0.01$). Suppose the observed signal is $X=0.6$. Determine whether there is sufficient evidence to reject $H_0$ at significance level $\alpha=0.01$.

Solution

For a right-sided level-$0.01$ test, \[\mathbb{P}_{H_0}(X>c)=0.01.\] Thus \[3c=z_{0.99}\approx 2.326, \qquad c\approx \frac{2.326}{3}=0.7753.\] Since the observed value is \[0.6<0.7753,\] we do not reject $H_0$ at the $0.01$ level.

Example

Example 11 (Power constraint). Find a critical value $c$ so that the probability of missing a present aircraft is less than $5\%$. Then compute the resulting significance level.

Solution

We want \[\beta=\mathbb{P}_{H_1}(X\le c)=0.05.\] Since under $H_1$, $X\sim \operatorname{Normal}(1,1/9)$, \[\mathbb{P}(3(X-1)\le 3(c-1))=0.05.\] Thus \[3(c-1)=z_{0.05}\approx -1.645,\] so \[c=1-\frac{1.645}{3}\approx 0.4517.\] The resulting significance level is \[\alpha=\mathbb{P}_{H_0}(X>c)=1-\Phi(3c).\] Since $3c\approx 1.355$, \[\alpha\approx 1-\Phi(1.355)\approx 0.0877.\] To reduce the miss probability to $5\%$, the false alarm probability must increase to about $8.77\%$.

Example

Example 12 ($p$-value for the radar test). For the observed value $X_0=0.6$, compute the $p$-value for the right-sided radar test.

Solution

Under $H_0$, $X\sim \operatorname{Normal}(0,1/9)$. The standardized observed statistic is \[Z_0=\frac{0.6}{1/3}=1.8.\] For a right-sided test, \[p\text{-value}=\mathbb{P}_{H_0}(X\ge 0.6)=\mathbb{P}(Z\ge 1.8)=1-\Phi(1.8).\] Numerically, \[p\text{-value}\approx 0.0359.\] Therefore:

at $\alpha=0.05$, reject $H_0$;
at $\alpha=0.01$, do not reject $H_0$.

20 Classical Test for a Normal Mean

This section reviews how a familiar normal mean test fits into the general framework of test statistics and rejection regions.

Suppose \[X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2),\] where $\sigma^2$ is known. We test \[H_0:\mu=\mu_0 \qquad \text{versus}\qquad H_1:\mu\ne \mu_0.\]

Under $H_0$, \[\bar X\sim \operatorname{Normal}\left(\mu_0,\frac{\sigma^2}{n}\right).\] Therefore \[Z=\frac{\bar X-\mu_0}{\sigma/\sqrt n}\sim \operatorname{Normal}(0,1).\]

A two-sided level-$\alpha$ test rejects $H_0$ when \[|Z|>z_{1-\alpha/2}.\] Equivalently, \[\left|\bar X-\mu_0\right|> z_{1-\alpha/2}\frac{\sigma}{\sqrt n}.\]

When $\sigma^2$ is unknown and the data are normal, the classical test statistic is \[t=\frac{\bar X-\mu_0}{S/\sqrt n}, \qquad S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2,\] and under $H_0$, \[t\sim t_{n-1}.\] The two-sided level-$\alpha$ test rejects when \[|t|>t_{n-1,1-\alpha/2}.\]

21 Methods of Finding Tests

This section introduces several general methods for constructing hypothesis tests.

The main methods discussed in this section are:

likelihood ratio tests;
Bayesian tests;
union-intersection tests;
intersection-union tests;
the Neyman-Pearson lemma for simple hypotheses.

Key idea

Guiding principle Different testing methods correspond to different statistical philosophies. Likelihood ratio tests compare best likelihoods; Bayesian tests use posterior probabilities; union-intersection and intersection-union tests build complex tests from simpler component tests.

22 Likelihood Ratio Tests

Likelihood ratio tests compare how well the null parameter space explains the data against how well the full parameter space explains the data.

Let $X_1,\ldots,X_n$ be a sample from a population distribution with density or mass function $f(x\mid\theta)$. For observed data $x=(x_1,\ldots,x_n)$, the likelihood function is \[L(\theta\mid x)=f(x_1,\ldots,x_n\mid\theta)=\prod_{i=1}^n f(x_i\mid\theta).\]

Definition

Definition 13 (Likelihood ratio statistic). For testing \[H_0:\theta\in\Theta_0 \qquad \text{versus}\qquad H_1:\theta\in\Theta_0^c,\] the likelihood ratio statistic is \[\lambda(x)=\frac{\sup_{\theta\in\Theta_0}L(\theta\mid x)}{\sup_{\theta\in\Theta}L(\theta\mid x)}.\] A likelihood ratio test rejects $H_0$ for small values of $\lambda(x)$: \[R=\{x:\lambda(x)\le c\}, \qquad 0\le c\le 1.\]

Since $\Theta_0\subseteq \Theta$, we always have \[0\le \lambda(x)\le 1.\] A small value of $\lambda(x)$ means that the null model fits much worse than the unrestricted model.

22.1 Simple versus simple likelihood ratio test

The simplest LRT compares two point hypotheses.

Definition

Definition 14 (Simple likelihood ratio). For \[H_0:\theta=\theta_0 \qquad \text{versus}\qquad H_1:\theta=\theta_1,\] define \[\lambda(x)=\frac{L(\theta_0\mid x)}{L(\theta_1\mid x)}.\] The likelihood ratio test rejects $H_0$ for small values of $\lambda(x)$.

23 Likelihood Ratio Test: Radar Example

This section rederives the radar test using the likelihood ratio method.

Recall that \[X=\theta+W, \qquad W\sim \operatorname{Normal}\left(0,\frac19\right),\] and we test \[H_0:\theta=0 \qquad \text{versus}\qquad H_1:\theta=1.\]

Example

Example 15 (LRT for radar detection). Find the likelihood ratio statistic and show that the LRT rejects for large values of $X$.

Solution

The density under $\theta=0$ is \[L(0\mid x)=\frac{3}{\sqrt{2\pi}}\exp\left(-\frac{9x^2}{2}\right).\] The density under $\theta=1$ is \[L(1\mid x)=\frac{3}{\sqrt{2\pi}}\exp\left(-\frac{9(x-1)^2}{2}\right).\] Thus the likelihood ratio is \[\lambda(x)=\frac{L(0\mid x)}{L(1\mid x)} =\exp\left(-\frac{9x^2}{2}+\frac{9(x-1)^2}{2}\right).\] Simplifying, \[\lambda(x)=\exp\left(\frac{9(1-2x)}{2}\right).\] Since this is a decreasing function of $x$, rejecting for small $\lambda(x)$ is equivalent to rejecting for large $x$. Therefore the LRT has the form \[\text{reject }H_0 \quad \text{if}\quad x>c'.\] For a level $0.05$ test, \[\mathbb{P}_{H_0}(X>c')=0.05,\] so \[c'=\frac{z_{0.95}}{3}\approx \frac{1.645}{3}=0.5483.\] This is the same test constructed directly from Type I error control.

24 LRT for a Normal Mean with Known Variance

This section shows that the classical two-sided $Z$-test is a likelihood ratio test.

Suppose \[X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2),\] where $\sigma^2$ is known. We test \[H_0:\mu=\mu_0 \qquad \text{versus}\qquad H_1:\mu\ne\mu_0.\]

Example

Example 16 (Normal mean LRT with known variance). Derive the likelihood ratio statistic.

Solution

The likelihood is \[L(\mu\mid x)= (2\pi\sigma^2)^{-n/2} \exp\left(-\frac{1}{2\sigma^2}\sum_{i=1}^n (x_i-\mu)^2\right).\] Under $H_0$, the best null value is fixed at $\mu_0$. Under the full parameter space, the MLE is \[\widehat\mu=\bar x.\] Therefore \[\lambda(x)=\frac{L(\mu_0\mid x)}{L(\bar x\mid x)}.\] Using the identity \[\sum_{i=1}^n (x_i-\mu_0)^2 = \sum_{i=1}^n (x_i-\bar x)^2+n(\bar x-\mu_0)^2,\] we obtain \[\lambda(x)=\exp\left(-\frac{n(\bar x-\mu_0)^2}{2\sigma^2}\right).\] This statistic decreases as $|\bar x-\mu_0|$ increases. Thus rejecting for small $\lambda(x)$ is equivalent to rejecting for large \[\left|\frac{\bar X-\mu_0}{\sigma/\sqrt n}\right|.\] Therefore the level-$\alpha$ LRT rejects when \[\left|\frac{\bar X-\mu_0}{\sigma/\sqrt n}\right|>z_{1-\alpha/2}.\] This is the classical two-sided $Z$-test.

The cutoff $c$ in the likelihood-ratio form can be written as \[c=\exp\left(-\frac{z_{1-\alpha/2}^2}{2}\right).\]

25 LRT for a Normal Mean with Unknown Variance

This section shows that the classical two-sided Student’s $t$-test can also be derived as a likelihood ratio test.

Suppose \[X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2),\] where both $\mu$ and $\sigma^2$ are unknown. We test \[H_0:\mu=\mu_0 \qquad \text{versus}\qquad H_1:\mu\ne\mu_0.\]

Example

Example 17 (Normal mean LRT with unknown variance). Derive the LRT and connect it to the Student $t$ statistic.

Solution

Under the full model, the MLEs are \[\widehat\mu=\bar x, \qquad \widehat\sigma^2=\frac1n\sum_{i=1}^n (x_i-\bar x)^2.\] Under $H_0$, $\mu=\mu_0$, and the MLE of $\sigma^2$ is \[\widehat\sigma_0^2=\frac1n\sum_{i=1}^n (x_i-\mu_0)^2.\] After substituting these into the normal likelihood, the likelihood ratio is \[\lambda(x)= \left(\frac{\widehat\sigma_0^2}{\widehat\sigma^2}\right)^{-n/2}.\] Using \[\sum_{i=1}^n (x_i-\mu_0)^2 = \sum_{i=1}^n (x_i-\bar x)^2+n(\bar x-\mu_0)^2,\] we get \[\frac{\widehat\sigma_0^2}{\widehat\sigma^2} =1+\frac{n(\bar x-\mu_0)^2}{\sum_{i=1}^n (x_i-\bar x)^2}.\] Let \[S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2\] and \[t=\frac{\bar X-\mu_0}{S/\sqrt n}.\] Then \[\frac{\widehat\sigma_0^2}{\widehat\sigma^2}=1+\frac{t^2}{n-1}.\] Therefore $\lambda(x)$ is small exactly when $|t|$ is large. The LRT rejects $H_0$ when \[|t|>t_{n-1,1-\alpha/2}.\] This is the classical two-sided Student’s $t$-test with $n-1$ degrees of freedom.

Remark

Remark 18. The classical $Z$-, $t$-, chi-square, proportion, pooled two-sample $t$-, two-proportion, and $F$-tests can all be interpreted as special cases or asymptotic versions of likelihood ratio tests.

26 LRTs and Sufficient Statistics

This section explains why likelihood ratio tests can be computed using sufficient statistics without losing information.

Theorem

Theorem 19 (LRT based on a sufficient statistic). Suppose $T(X)$ is sufficient for $\theta$. Let $\lambda(x)$ be the likelihood ratio statistic based on the full data $X$, and let $\lambda^*(T(x))$ be the likelihood ratio statistic based on the statistic $T$. Then \[\lambda^*(T(x))=\lambda(x).\]

Proof

Proof. By the factorization theorem, the likelihood can be written as \[L(\theta\mid x)=g(T(x),\theta)h(x),\] where $h(x)$ does not depend on $\theta$. Then \[\lambda(x)=\frac{\sup_{\theta\in\Theta_0}g(T(x),\theta)h(x)}{\sup_{\theta\in\Theta}g(T(x),\theta)h(x)}.\] The factor $h(x)$ cancels, so \[\lambda(x)=\frac{\sup_{\theta\in\Theta_0}g(T(x),\theta)}{\sup_{\theta\in\Theta}g(T(x),\theta)},\] which is exactly the likelihood ratio based on $T(x)$. ◻

Example

Example 20 (Normal mean with known variance using $\bar X$). Suppose $X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2)$ with known $\sigma^2$. Since $\bar X$ is sufficient for $\mu$, derive the LRT using $T=\bar X$.

Solution

The statistic \[T=\bar X\] has distribution \[T\sim \operatorname{Normal}\left(\mu,\frac{\sigma^2}{n}\right).\] Thus the likelihood based on $T=t$ is \[L(\mu\mid t)=\frac{1}{\sqrt{2\pi\sigma^2/n}} \exp\left(-\frac{n(t-\mu)^2}{2\sigma^2}\right).\] For testing $H_0:\mu=\mu_0$ versus $H_1:\mu\ne\mu_0$, the unrestricted MLE based on $t$ is $\widehat\mu=t$. Therefore \[\lambda^*(t)=\frac{L(\mu_0\mid t)}{L(t\mid t)} =\exp\left(-\frac{n(t-\mu_0)^2}{2\sigma^2}\right).\] Substituting $t=\bar x$ gives \[\lambda^*(\bar x)=\exp\left(-\frac{n(\bar x-\mu_0)^2}{2\sigma^2}\right),\] which is the same likelihood ratio statistic obtained from the full sample.

27 Bayesian Tests

This section presents hypothesis testing from a Bayesian viewpoint, where inference is based on posterior probabilities.

In Bayesian statistics, the parameter $\theta$ is treated as a random quantity with prior distribution $\pi(\theta)$. Given data $x=(x_1,\ldots,x_n)$, the posterior distribution is \[\pi(\theta\mid x)=\frac{f(x\mid\theta)\pi(\theta)}{m(x)},\] where \[m(x)=\int f(x\mid\theta)\pi(\theta)\,d\theta\] is the marginal distribution of the data.

Definition

Definition 21 (Bayesian test by posterior probabilities). For testing \[H_0:\theta\in\Theta_0 \qquad \text{versus}\qquad H_1:\theta\in\Theta_0^c,\] compute the posterior probabilities \[\mathbb{P}(\theta\in\Theta_0\mid x) \quad \text{and}\quad \mathbb{P}(\theta\in\Theta_0^c\mid x).\] A simple Bayesian rule rejects $H_0$ if \[\mathbb{P}(\theta\in\Theta_0\mid x)<\mathbb{P}(\theta\in\Theta_0^c\mid x).\] Equivalently, reject if \[\mathbb{P}(\theta\in\Theta_0\mid x)<\frac12.\] A more conservative rule might reject only if \[\mathbb{P}(\theta\in\Theta_0\mid x)<0.05.\]

27.1 Bayesian normal mean test

We now compute a Bayesian test for a normal mean.

Example

Example 22 (Bayesian test for a normal mean). Suppose \[X_1,\ldots,X_n\mid\mu \sim \operatorname{Normal}(\mu,\sigma^2),\] where $\sigma^2$ is known. Suppose the prior is \[\mu\sim \operatorname{Normal}(\theta,\tau^2).\] Test \[H_0:\mu\le \mu_0 \qquad \text{versus}\qquad H_1:\mu>\mu_0.\] Derive the posterior and the Bayesian decision rule that rejects when the posterior probability of $H_0$ is less than $1/2$.

Solution

The normal-normal conjugate posterior is normal: \[\mu\mid x \sim \operatorname{Normal}(m_n,v_n),\] where \[m_n=\frac{\tau^2\sum_{i=1}^n x_i+\sigma^2\theta}{n\tau^2+\sigma^2}\] and \[v_n=\frac{\sigma^2\tau^2}{\sigma^2+n\tau^2}.\] We reject $H_0$ when \[\mathbb{P}(\mu\le \mu_0\mid x)<\frac12.\] Since the posterior distribution is normal and symmetric, this condition is equivalent to \[m_n>\mu_0.\] Thus the Bayesian test rejects when \[\frac{\tau^2\sum_{i=1}^n x_i+\sigma^2\theta}{n\tau^2+\sigma^2}>\mu_0.\] Equivalently, \[\bar x>\mu_0+\frac{\sigma^2}{n\tau^2}(\mu_0-\theta).\] This shows how the prior mean $\theta$ shifts the rejection threshold.

28 Union-Intersection Tests

This section explains how to construct a test for a complicated alternative by combining simpler component tests.

The union-intersection test is useful when the null hypothesis can be written as an intersection of simpler hypotheses.

Suppose \[H_0:\theta\in\Theta_0=\bigcap_{\gamma\in\Gamma}\Theta_\gamma.\] Then by De Morgan’s law, \[H_1:\theta\in\Theta_0^c=\bigcup_{\gamma\in\Gamma}\Theta_\gamma^c.\] For each $\gamma$, we test \[H_{0\gamma}:\theta\in\Theta_\gamma \qquad \text{versus}\qquad H_{1\gamma}:\theta\in\Theta_\gamma^c.\] Let the rejection region for the $\gamma$th subtest be \[R_\gamma=\{x:T_\gamma(x)>c\}.\] The union-intersection test rejects if any component test rejects: \[R=\bigcup_{\gamma\in\Gamma}R_\gamma.\] Equivalently, \[R=\left\{x:\sup_{\gamma\in\Gamma}T_\gamma(x)>c\right\}.\] Thus the combined test statistic is \[T(x)=\sup_{\gamma\in\Gamma}T_\gamma(x).\]

Example

Example 23 (UIT for a two-sided normal mean test). Suppose $X_1,\ldots,X_n$ are normal and we want to test \[H_0:\mu=\mu_0 \qquad \text{versus}\qquad H_1:\mu\ne \mu_0.\] Explain how this can be constructed from one-sided tests.

Solution

The null hypothesis can be written as \[H_0:\mu=\mu_0 \quad \Longleftrightarrow \quad \{\mu\le \mu_0\}\cap \{\mu\ge \mu_0\}.\] The alternative is \[H_1:\mu\ne \mu_0 \quad \Longleftrightarrow \quad \{\mu>\mu_0\}\cup \{\mu<\mu_0\}.\] Thus we combine two one-sided tests:

reject $H_{0L}:\mu\le \mu_0$ for large positive values of the standardized statistic;
reject $H_{0R}:\mu\ge \mu_0$ for large negative values of the standardized statistic.

If $\sigma^2$ is known, the standardized statistic is \[Z=\frac{\bar X-\mu_0}{\sigma/\sqrt n}.\] The UIT rejects when either tail is too extreme, which is equivalent to \[|Z|>z_{1-\alpha/2}.\] This is the classical two-sided $Z$-test.

If $\sigma^2$ is unknown, replace $\sigma$ by $S$: \[t=\frac{\bar X-\mu_0}{S/\sqrt n},\] and reject when \[|t|>t_{n-1,1-\alpha/2}.\] This is the classical two-sided $t$-test.

29 Intersection-Union Tests

This section explains the complementary construction, where we reject only if all component tests reject.

The intersection-union test is useful when the alternative hypothesis requires several conditions to hold simultaneously.

Suppose \[H_0:\theta\in\Theta_0=\bigcup_{\gamma\in\Gamma}\Theta_\gamma.\] Then by De Morgan’s law, \[H_1:\theta\in\Theta_0^c=\bigcap_{\gamma\in\Gamma}\Theta_\gamma^c.\] For each component test, let \[R_\gamma=\{x:T_\gamma(x)>c\}.\] The IUT rejects only when all component tests reject: \[R=\bigcap_{\gamma\in\Gamma}R_\gamma.\] Equivalently, \[R=\left\{x:\inf_{\gamma\in\Gamma}T_\gamma(x)>c\right\}.\] Thus the combined test statistic is \[T(x)=\inf_{\gamma\in\Gamma}T_\gamma(x).\]

Example

Example 24 (Acceptance sampling for upholstery fabric). A batch of upholstery fabric has two quality parameters:

$\theta_1$: mean breaking strength, which should exceed $50$ pounds;
$\theta_2$: probability of passing a flammability test, which should exceed $0.95$.

Set up an intersection-union test for determining whether a batch is acceptable.

Solution

The batch is acceptable only if both standards are met: \[H_1:\theta_1>50 \quad \text{and}\quad \theta_2>0.95.\] Thus the null hypothesis is that at least one standard fails: \[H_0:\theta_1\le 50 \quad \text{or}\quad \theta_2\le 0.95.\] Equivalently, \[H_0=\{\theta_1\le 50\}\cup \{\theta_2\le 0.95\}\] and \[H_1=\{\theta_1>50\}\cap \{\theta_2>0.95\}.\] We collect two types of data:

breaking strengths $X_1,\ldots,X_n$, often modeled as normal with mean $\theta_1$;
flammability indicators $Y_1,\ldots,Y_m$, where $Y_i=1$ if the item passes and $0$ otherwise, often modeled as Bernoulli with probability $\theta_2$.

We test the two component null hypotheses \[H_{01}:\theta_1\le 50 \qquad \text{and}\qquad H_{02}:\theta_2\le 0.95.\] The intersection-union rule rejects the overall null $H_0$ only if both component null hypotheses are rejected. In words, we declare the batch acceptable only if the strength requirement and the flammability requirement both pass their respective tests.

30 Neyman-Pearson Lemma: Simple Hypotheses

This section records the fundamental optimality result behind likelihood ratio tests for simple hypotheses.

Although the detailed proof is usually saved for a more advanced treatment, the main message is essential: for simple null and simple alternative hypotheses, the most powerful test at a fixed significance level is a likelihood ratio test.

Theorem

Theorem 25 (Neyman-Pearson lemma). Consider testing two simple hypotheses \[H_0:\theta=\theta_0 \qquad \text{versus}\qquad H_1:\theta=\theta_1.\] Among all tests with Type I error probability at most $\alpha$, the most powerful test rejects $H_0$ for sufficiently small values of \[\frac{L(\theta_0\mid X)}{L(\theta_1\mid X)}.\] Equivalently, it rejects for sufficiently large values of \[\frac{L(\theta_1\mid X)}{L(\theta_0\mid X)}.\]

Remark

Remark 26. The Neyman-Pearson lemma explains why likelihood ratio tests are not only natural, but optimal, for simple-versus-simple testing problems.

31 Practice Problems

This section gives practice problems that reinforce the main ideas of hypothesis tests, likelihood ratios, $p$-values, and combined tests.

Practice Problem

Practice Problem 27 (Coin test). A coin is tossed $100$ times and $62$ heads are observed. Test \[H_0:p=\frac12 \qquad \text{versus}\qquad H_1:p\ne \frac12\] using the normal approximation at level $\alpha=0.05$.

Solution

Under $H_0$, \[X\sim \operatorname{Binomial}\left(100,\frac12\right), \qquad \mathbb{E}[X]=50, \qquad \operatorname{sd}(X)=5.\] The standardized statistic is \[Z=\frac{62-50}{5}=2.4.\] For a two-sided level-$0.05$ test, the critical value is $1.96$. Since \[|2.4|>1.96,\] we reject $H_0$. The approximate two-sided $p$-value is \[2(1-\Phi(2.4))\approx 0.0164.\]

Practice Problem

Practice Problem 28 (Radar $p$-value). In the radar example, suppose $X=0.4$. Compute the right-sided $p$-value under $H_0$.

Solution

Under $H_0$, $X\sim \operatorname{Normal}(0,1/9)$. Standardizing gives \[Z=3X=3(0.4)=1.2.\] Thus the right-sided $p$-value is \[p=\mathbb{P}(Z\ge 1.2)=1-\Phi(1.2)\approx 0.1151.\] At level $0.05$, we would not reject $H_0$.

Practice Problem

Practice Problem 29 (Normal mean, known variance). Suppose $X_1,\ldots,X_{25}\sim \operatorname{Normal}(\mu,16)$ and $\bar x=53$. Test \[H_0:\mu=50 \qquad \text{versus}\qquad H_1:\mu\ne 50\] at level $0.05$.

Solution

Here $\sigma=4$ and $n=25$, so \[Z=\frac{\bar x-\mu_0}{\sigma/\sqrt n} =\frac{53-50}{4/5}=\frac{3}{0.8}=3.75.\] Since \[|3.75|>1.96,\] we reject $H_0$ at level $0.05$.

Practice Problem

Practice Problem 30 (Normal mean, unknown variance). Suppose $X_1,\ldots,X_{16}$ are normal, $\bar x=10.8$, and $s=2.4$. Test \[H_0:\mu=10 \qquad \text{versus}\qquad H_1:\mu\ne 10\] at level $0.05$.

Solution

The test statistic is \[t=\frac{\bar x-\mu_0}{s/\sqrt n} =\frac{10.8-10}{2.4/4} =\frac{0.8}{0.6}=1.333.\] There are $n-1=15$ degrees of freedom. The two-sided $0.05$ critical value is about \[t_{15,0.975}\approx 2.131.\] Since \[|1.333|<2.131,\] we fail to reject $H_0$.

Practice Problem

Practice Problem 31 (Likelihood ratio for Bernoulli simple hypotheses). Let $X_1,\ldots,X_n\sim \operatorname{Bernoulli}(p)$. Test \[H_0:p=p_0 \qquad \text{versus}\qquad H_1:p=p_1,\] where $p_1>p_0$. Find the likelihood ratio and describe the rejection region.

Solution

Let \[S=\sum_{i=1}^n X_i.\] The likelihood is \[L(p\mid x)=p^S(1-p)^{n-S}.\] Thus \[\lambda(x)=\frac{L(p_0\mid x)}{L(p_1\mid x)} =\left(\frac{p_0}{p_1}\right)^S \left(\frac{1-p_0}{1-p_1}\right)^{n-S}.\] Taking logs, \[\log\lambda(x)=S\log\left(\frac{p_0}{p_1}\right) +(n-S)\log\left(\frac{1-p_0}{1-p_1}\right).\] Because $p_1>p_0$, the likelihood ratio decreases as $S$ increases. Therefore the LRT rejects $H_0$ for large values of $S$, i.e. \[\sum_{i=1}^n X_i \ge k\] for a threshold $k$ chosen to control the Type I error probability.

Practice Problem

Practice Problem 32 (Bayesian posterior decision). Suppose $\mu\mid x\sim \operatorname{Normal}(4,1)$. Test \[H_0:\mu\le 3 \qquad \text{versus}\qquad H_1:\mu>3.\] Using the Bayesian rule “reject $H_0$ if $\mathbb{P}(H_0\mid x)<1/2$,” what is the decision?

Solution

Since the posterior is normal with mean $4$ and variance $1$, \[\mathbb{P}(H_0\mid x)=\mathbb{P}(\mu\le 3\mid x)=\Phi\left(\frac{3-4}{1}\right)=\Phi(-1)\approx 0.1587.\] Since \[0.1587<\frac12,\] we reject $H_0$ using this Bayesian decision rule.

Practice Problem

Practice Problem 33 (Intersection-union logic). A device is acceptable only if its battery life is above $10$ hours and its failure probability is below $0.01$. Write the null and alternative hypotheses for an IUT.

Solution

Let $\theta_1$ be the mean battery life and $\theta_2$ be the failure probability. The device is acceptable only if \[\theta_1>10 \qquad \text{and}\qquad \theta_2<0.01.\] Thus the alternative is \[H_1:\theta_1>10 \text{ and } \theta_2<0.01.\] The null is that at least one requirement fails: \[H_0:\theta_1\le 10 \text{ or } \theta_2\ge 0.01.\] An IUT rejects $H_0$ only if both component tests reject their corresponding component null hypotheses.

32 Summary

This section introduced the foundations and construction methods for hypothesis testing.

Key idea

Key takeaways

A hypothesis test divides the sample space into an acceptance region and a rejection region.
Type I error means rejecting a true null; Type II error means failing to reject a false null.
The significance level controls Type I error probability.
The $p$-value is the smallest significance level that would reject $H_0$.
Likelihood ratio tests reject when the null model fits much worse than the unrestricted model.
The classical $Z$-test and $t$-test can be derived from likelihood ratio tests.
Bayesian tests use posterior probabilities of hypotheses.
UITs reject when any component test rejects; IUTs reject only when all component tests reject.

--- title: "Chapter 14: Hypothesis Tests I — Methods of Finding Tests" format: html: toc: true toc-depth: 3 number-sections: true pdf: toc: true number-sections: true execute: warning: false message: false --- This chapter introduces the foundations of hypothesis testing and several general methods for constructing tests. The central ideas are null and alternative hypotheses, test statistics, rejection regions, Type I and Type II errors, significance level, power, $p$-values, likelihood ratio tests, Bayesian tests, union-intersection tests, intersection-union tests, and the Neyman--Pearson lemma. ::: {.callout-note title="Topics"} Hypothesis testing; null and alternative hypotheses; Type I and Type II errors; significance level; power; $p$-values; likelihood ratio tests; Bayesian tests; union-intersection tests; intersection-union tests; radar detection; coin testing; normal mean tests; sufficient statistics and LRTs. ::: # Introduction to Hypothesis Testing This section develops the mathematical theory behind hypothesis tests and explains how classical tests arise from general testing principles. In introductory statistics, students often learn concrete tests such as the $Z$-test, $t$-test, chi-square test, two-proportion test, two-sample mean test, and $F$-test. Here we study the theoretical principles behind such procedures. ::: {.callout-tip title="Key idea"} Main goal A hypothesis test uses observed data to decide between two competing statements about a population parameter. ::: For example, a pharmaceutical company may want to know whether a new drug is effective in treating a disease. A natural pair of hypotheses is $$H_0: \text{the drug is not effective}, \qquad H_1: \text{the drug is effective}.$$ The null hypothesis $H_0$ represents the default or no-effect claim, while the alternative hypothesis $H_1$ represents the new effect or departure from the default. ## The testing decision A hypothesis test is a rule that specifies which sample values lead us to reject $H_0$ and which sample values lead us not to reject $H_0$. ::: {.callout-note title="Definition"} **Definition 1** (Hypothesis test). Let $X=(X_1,\ldots,X_n)$ be a sample from a population distribution depending on a parameter $\theta$. A hypothesis test is a rule that divides the sample space into two regions: - a **rejection region** $R$, where we reject $H_0$; - an **acceptance region** $A=R^c$, where we fail to reject $H_0$. Equivalently, a test may be specified by a test statistic $W(X)$ and a threshold rule. ::: ::: {.callout-important title="Remark"} *Remark 2* (Terminology). In modern statistical language, one often says "fail to reject $H_0$" instead of "accept $H_0$." This emphasizes that lack of evidence against $H_0$ is not the same as proof that $H_0$ is true. ::: ## Mathematical formulation The mathematical formulation of testing is based on partitioning the parameter space. ::: {.callout-note title="Definition"} **Definition 3** (Null and alternative hypotheses). Let $\Theta$ be the parameter space. A hypothesis test compares $$H_0: \theta\in \Theta_0 \qquad \text{versus} \qquad H_1: \theta\in \Theta_0^c.$$ The set $\Theta_0$ is the null parameter space, and $\Theta_0^c$ is the alternative parameter space. ::: Examples include $$H_0:\mu=100 \quad \text{versus}\quad H_1:\mu\ne 100,$$ or $$H_0:\mu\ge 100 \quad \text{versus}\quad H_1:\mu<100.$$ # A First Example: Testing Whether a Coin Is Fair This example introduces the main ingredients of hypothesis testing through the familiar problem of checking whether a coin is fair. Suppose $\theta$ is the probability of heads. We want to test $$H_0:\theta=\frac12 \qquad \text{versus}\qquad H_1:\theta\ne \frac12.$$ Let $X_i\sim \operatorname{Bernoulli}(\theta)$ represent the result of the $i$th toss, where $X_i=1$ for heads and $X_i=0$ for tails. Let $$X=X_1+\cdots+X_n.$$ Then $$X\sim \operatorname{Binomial}(n,\theta).$$ ::: {.callout-note title="Example"} **Example 4** (Coin test with 100 tosses). Suppose the coin is tossed $n=100$ times. Under $H_0$, $\theta=1/2$, so $$X\sim \operatorname{Binomial}\left(100,\frac12\right), \qquad \mathbb{E}_{H_0}[X]=50, \qquad \operatorname{Var}_{H_0}(X)=25.$$ A natural test rejects $H_0$ when $X$ is too far from $50$. ::: ::: {.callout-tip title="Solution"} Let $t>0$ be a threshold. The test is $$\text{fail to reject }H_0 \text{ if } |X-50|\le t, \qquad \text{reject }H_0 \text{ if } |X-50|>t.$$ The threshold should control the probability of rejecting a fair coin: $$\mathbb{P}(\text{Type I error})= \mathbb{P}_{H_0}(|X-50|>t).$$ Using the central limit theorem, under $H_0$, $$Y=\frac{X-n\theta_0}{\sqrt{n\theta_0(1-\theta_0)}} =\frac{X-50}{5} \approx \operatorname{Normal}(0,1).$$ For significance level $\alpha=0.05$, the two-sided critical value is approximately $1.96$. Thus we reject $H_0$ when $$\left|\frac{X-50}{5}\right|>1.96.$$ Equivalently, $$|X-50|>9.8.$$ Using integer values, we fail to reject $H_0$ approximately when $$X\in\{41,42,\ldots,59\},$$ and reject $H_0$ otherwise. ::: ## Type I and Type II errors Every hypothesis test can make two kinds of mistakes. ::: {.callout-note title="Definition"} **Definition 5** (Type I and Type II errors). For a hypothesis test of $H_0$ versus $H_1$: - A **Type I error** occurs when $H_0$ is true but we reject $H_0$. - A **Type II error** occurs when $H_1$ is true but we fail to reject $H_0$. The Type I error probability is usually denoted by $\alpha$, and the Type II error probability is usually denoted by $\beta$. ::: For the coin example, if $\theta_1\ne 1/2$, then $$\beta(\theta_1)=\mathbb{P}_{\theta_1}(\text{fail to reject }H_0) =\mathbb{P}_{\theta_1}(41\le X\le 59).$$ The power function is $$\operatorname{Power}(\theta)=\mathbb{P}_\theta(\text{reject }H_0)=1-\beta(\theta)$$ for $\theta$ in the alternative. ::: center Decision / Truth $H_0$ true $H_1$ true ---------------------- ------------------ ------------------ Reject $H_0$ Type I error Correct decision Fail to reject $H_0$ Correct decision Type II error ::: ::: {.callout-warning title="Warning"} Trade-off For a fixed sample size, making $\alpha$ smaller often makes $\beta$ larger. Reducing both errors usually requires increasing the sample size or using a more informative statistic. ::: # Significance Level, Critical Values, and $p$-Values This section explains the practical language of hypothesis testing: significance level, rejection region, critical value, and $p$-value. ## Significance level The significance level controls the probability of falsely rejecting the null hypothesis. ::: {.callout-note title="Definition"} **Definition 6** (Level-$\alpha$ test). A test has significance level $\alpha$ if $$\sup_{\theta\in\Theta_0}\mathbb{P}_\theta(\text{reject }H_0)\le \alpha.$$ For a simple null hypothesis $H_0:\theta=\theta_0$, this reduces to $$\mathbb{P}_{\theta_0}(\text{reject }H_0)\le \alpha.$$ ::: ## $p$-value The $p$-value measures how extreme the observed statistic is under the null model. ::: {.callout-note title="Definition"} **Definition 7** ($p$-value). For an observed statistic $W(x_1,\ldots,x_n)$, the $p$-value is the probability, computed under $H_0$, of observing a test statistic at least as extreme as the one observed. For a right-sided test, this is often $$p\text{-value}=\mathbb{P}_{H_0}\left(W(X_1,\ldots,X_n)\ge W(x_1,\ldots,x_n)\right).$$ For a left-sided test, this is often $$p\text{-value}=\mathbb{P}_{H_0}\left(W(X_1,\ldots,X_n)\le W(x_1,\ldots,x_n)\right).$$ ::: ::: {.callout-tip title="Key idea"} Interpretation The $p$-value is the smallest significance level at which the observed data would lead to rejection of $H_0$. ::: A small $p$-value indicates that the observed data are unusual under $H_0$, so the data provide evidence against $H_0$. # Example: Radar Aircraft Detection This example shows how hypothesis testing appears in signal detection and illustrates the trade-off between Type I and Type II errors. A radar system receives a signal $X$. If no aircraft is present, then $$X=W.$$ If an aircraft is present, then $$X=1+W.$$ Here $$W\sim \operatorname{Normal}\left(0,\sigma^2\right), \qquad \sigma^2=\frac19.$$ Equivalently, $$X=\theta+W, \qquad \theta=\begin{cases} 0, & \text{no aircraft is present},\\ 1, & \text{an aircraft is present}. \end{cases}$$ We test $$H_0:\theta=0 \qquad \text{versus}\qquad H_1:\theta=1.$$ ::: {.callout-note title="Example"} **Example 8** (Level $0.05$ radar test). Construct a level $\alpha=0.05$ test that rejects $H_0$ when $X>c$. ::: ::: {.callout-tip title="Solution"} Under $H_0$, $X=W\sim \operatorname{Normal}(0,1/9)$. Thus $$\mathbb{P}_{H_0}(X>c)=\mathbb{P}(3X>3c)=1-\Phi(3c).$$ To make this probability equal to $0.05$, choose $$1-\Phi(3c)=0.05.$$ Therefore $$3c=z_{0.95}\approx 1.645, \qquad c\approx \frac{1.645}{3}=0.5483.$$ The level-$0.05$ test is $$\text{reject }H_0 \quad \text{if} \quad X>0.5483.$$ ::: ::: {.callout-note title="Example"} **Example 9** (Type II error for the radar test). For the level $0.05$ radar test above, compute the Type II error probability. ::: ::: {.callout-tip title="Solution"} Under $H_1$, $X=1+W$ with $W\sim \operatorname{Normal}(0,1/9)$. The Type II error probability is $$\beta=\mathbb{P}_{H_1}(X\le c)=\mathbb{P}(1+W\le c)=\mathbb{P}(W\le c-1).$$ Standardizing gives $$\beta=\Phi(3(c-1)).$$ Using $c=1.645/3\approx 0.5483$, $$3(c-1)\approx -1.355,$$ so $$\beta\approx \Phi(-1.355)\approx 0.0877.$$ Thus the probability of missing a present aircraft is about $8.77\%$. ::: ::: {.callout-note title="Example"} **Example 10** (Evidence check at level $0.01$). Suppose the observed signal is $X=0.6$. Determine whether there is sufficient evidence to reject $H_0$ at significance level $\alpha=0.01$. ::: ::: {.callout-tip title="Solution"} For a right-sided level-$0.01$ test, $$\mathbb{P}_{H_0}(X>c)=0.01.$$ Thus $$3c=z_{0.99}\approx 2.326, \qquad c\approx \frac{2.326}{3}=0.7753.$$ Since the observed value is $$0.6<0.7753,$$ we do not reject $H_0$ at the $0.01$ level. ::: ::: {.callout-note title="Example"} **Example 11** (Power constraint). Find a critical value $c$ so that the probability of missing a present aircraft is less than $5\%$. Then compute the resulting significance level. ::: ::: {.callout-tip title="Solution"} We want $$\beta=\mathbb{P}_{H_1}(X\le c)=0.05.$$ Since under $H_1$, $X\sim \operatorname{Normal}(1,1/9)$, $$\mathbb{P}(3(X-1)\le 3(c-1))=0.05.$$ Thus $$3(c-1)=z_{0.05}\approx -1.645,$$ so $$c=1-\frac{1.645}{3}\approx 0.4517.$$ The resulting significance level is $$\alpha=\mathbb{P}_{H_0}(X>c)=1-\Phi(3c).$$ Since $3c\approx 1.355$, $$\alpha\approx 1-\Phi(1.355)\approx 0.0877.$$ To reduce the miss probability to $5\%$, the false alarm probability must increase to about $8.77\%$. ::: ::: {.callout-note title="Example"} **Example 12** ($p$-value for the radar test). For the observed value $X_0=0.6$, compute the $p$-value for the right-sided radar test. ::: ::: {.callout-tip title="Solution"} Under $H_0$, $X\sim \operatorname{Normal}(0,1/9)$. The standardized observed statistic is $$Z_0=\frac{0.6}{1/3}=1.8.$$ For a right-sided test, $$p\text{-value}=\mathbb{P}_{H_0}(X\ge 0.6)=\mathbb{P}(Z\ge 1.8)=1-\Phi(1.8).$$ Numerically, $$p\text{-value}\approx 0.0359.$$ Therefore: - at $\alpha=0.05$, reject $H_0$; - at $\alpha=0.01$, do not reject $H_0$. ::: # Classical Test for a Normal Mean This section reviews how a familiar normal mean test fits into the general framework of test statistics and rejection regions. Suppose $$X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2),$$ where $\sigma^2$ is known. We test $$H_0:\mu=\mu_0 \qquad \text{versus}\qquad H_1:\mu\ne \mu_0.$$ Under $H_0$, $$\bar X\sim \operatorname{Normal}\left(\mu_0,\frac{\sigma^2}{n}\right).$$ Therefore $$Z=\frac{\bar X-\mu_0}{\sigma/\sqrt n}\sim \operatorname{Normal}(0,1).$$ A two-sided level-$\alpha$ test rejects $H_0$ when $$|Z|>z_{1-\alpha/2}.$$ Equivalently, $$\left|\bar X-\mu_0\right|> z_{1-\alpha/2}\frac{\sigma}{\sqrt n}.$$ When $\sigma^2$ is unknown and the data are normal, the classical test statistic is $$t=\frac{\bar X-\mu_0}{S/\sqrt n}, \qquad S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2,$$ and under $H_0$, $$t\sim t_{n-1}.$$ The two-sided level-$\alpha$ test rejects when $$|t|>t_{n-1,1-\alpha/2}.$$ # Methods of Finding Tests This section introduces several general methods for constructing hypothesis tests. The main methods discussed in this section are: 1. likelihood ratio tests; 2. Bayesian tests; 3. union-intersection tests; 4. intersection-union tests; 5. the Neyman-Pearson lemma for simple hypotheses. ::: {.callout-tip title="Key idea"} Guiding principle Different testing methods correspond to different statistical philosophies. Likelihood ratio tests compare best likelihoods; Bayesian tests use posterior probabilities; union-intersection and intersection-union tests build complex tests from simpler component tests. ::: # Likelihood Ratio Tests Likelihood ratio tests compare how well the null parameter space explains the data against how well the full parameter space explains the data. Let $X_1,\ldots,X_n$ be a sample from a population distribution with density or mass function $f(x\mid\theta)$. For observed data $x=(x_1,\ldots,x_n)$, the likelihood function is $$L(\theta\mid x)=f(x_1,\ldots,x_n\mid\theta)=\prod_{i=1}^n f(x_i\mid\theta).$$ ::: {.callout-note title="Definition"} **Definition 13** (Likelihood ratio statistic). For testing $$H_0:\theta\in\Theta_0 \qquad \text{versus}\qquad H_1:\theta\in\Theta_0^c,$$ the likelihood ratio statistic is $$\lambda(x)=\frac{\sup_{\theta\in\Theta_0}L(\theta\mid x)}{\sup_{\theta\in\Theta}L(\theta\mid x)}.$$ A likelihood ratio test rejects $H_0$ for small values of $\lambda(x)$: $$R=\{x:\lambda(x)\le c\}, \qquad 0\le c\le 1.$$ ::: Since $\Theta_0\subseteq \Theta$, we always have $$0\le \lambda(x)\le 1.$$ A small value of $\lambda(x)$ means that the null model fits much worse than the unrestricted model. ## Simple versus simple likelihood ratio test The simplest LRT compares two point hypotheses. ::: {.callout-note title="Definition"} **Definition 14** (Simple likelihood ratio). For $$H_0:\theta=\theta_0 \qquad \text{versus}\qquad H_1:\theta=\theta_1,$$ define $$\lambda(x)=\frac{L(\theta_0\mid x)}{L(\theta_1\mid x)}.$$ The likelihood ratio test rejects $H_0$ for small values of $\lambda(x)$. ::: # Likelihood Ratio Test: Radar Example This section rederives the radar test using the likelihood ratio method. Recall that $$X=\theta+W, \qquad W\sim \operatorname{Normal}\left(0,\frac19\right),$$ and we test $$H_0:\theta=0 \qquad \text{versus}\qquad H_1:\theta=1.$$ ::: {.callout-note title="Example"} **Example 15** (LRT for radar detection). Find the likelihood ratio statistic and show that the LRT rejects for large values of $X$. ::: ::: {.callout-tip title="Solution"} The density under $\theta=0$ is $$L(0\mid x)=\frac{3}{\sqrt{2\pi}}\exp\left(-\frac{9x^2}{2}\right).$$ The density under $\theta=1$ is $$L(1\mid x)=\frac{3}{\sqrt{2\pi}}\exp\left(-\frac{9(x-1)^2}{2}\right).$$ Thus the likelihood ratio is $$\lambda(x)=\frac{L(0\mid x)}{L(1\mid x)} =\exp\left(-\frac{9x^2}{2}+\frac{9(x-1)^2}{2}\right).$$ Simplifying, $$\lambda(x)=\exp\left(\frac{9(1-2x)}{2}\right).$$ Since this is a decreasing function of $x$, rejecting for small $\lambda(x)$ is equivalent to rejecting for large $x$. Therefore the LRT has the form $$\text{reject }H_0 \quad \text{if}\quad x>c'.$$ For a level $0.05$ test, $$\mathbb{P}_{H_0}(X>c')=0.05,$$ so $$c'=\frac{z_{0.95}}{3}\approx \frac{1.645}{3}=0.5483.$$ This is the same test constructed directly from Type I error control. ::: # LRT for a Normal Mean with Known Variance This section shows that the classical two-sided $Z$-test is a likelihood ratio test. Suppose $$X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2),$$ where $\sigma^2$ is known. We test $$H_0:\mu=\mu_0 \qquad \text{versus}\qquad H_1:\mu\ne\mu_0.$$ ::: {.callout-note title="Example"} **Example 16** (Normal mean LRT with known variance). Derive the likelihood ratio statistic. ::: ::: {.callout-tip title="Solution"} The likelihood is $$L(\mu\mid x)= (2\pi\sigma^2)^{-n/2} \exp\left(-\frac{1}{2\sigma^2}\sum_{i=1}^n (x_i-\mu)^2\right).$$ Under $H_0$, the best null value is fixed at $\mu_0$. Under the full parameter space, the MLE is $$\widehat\mu=\bar x.$$ Therefore $$\lambda(x)=\frac{L(\mu_0\mid x)}{L(\bar x\mid x)}.$$ Using the identity $$\sum_{i=1}^n (x_i-\mu_0)^2 = \sum_{i=1}^n (x_i-\bar x)^2+n(\bar x-\mu_0)^2,$$ we obtain $$\lambda(x)=\exp\left(-\frac{n(\bar x-\mu_0)^2}{2\sigma^2}\right).$$ This statistic decreases as $|\bar x-\mu_0|$ increases. Thus rejecting for small $\lambda(x)$ is equivalent to rejecting for large $$\left|\frac{\bar X-\mu_0}{\sigma/\sqrt n}\right|.$$ Therefore the level-$\alpha$ LRT rejects when $$\left|\frac{\bar X-\mu_0}{\sigma/\sqrt n}\right|>z_{1-\alpha/2}.$$ This is the classical two-sided $Z$-test. ::: The cutoff $c$ in the likelihood-ratio form can be written as $$c=\exp\left(-\frac{z_{1-\alpha/2}^2}{2}\right).$$ # LRT for a Normal Mean with Unknown Variance This section shows that the classical two-sided Student's $t$-test can also be derived as a likelihood ratio test. Suppose $$X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2),$$ where both $\mu$ and $\sigma^2$ are unknown. We test $$H_0:\mu=\mu_0 \qquad \text{versus}\qquad H_1:\mu\ne\mu_0.$$ ::: {.callout-note title="Example"} **Example 17** (Normal mean LRT with unknown variance). Derive the LRT and connect it to the Student $t$ statistic. ::: ::: {.callout-tip title="Solution"} Under the full model, the MLEs are $$\widehat\mu=\bar x, \qquad \widehat\sigma^2=\frac1n\sum_{i=1}^n (x_i-\bar x)^2.$$ Under $H_0$, $\mu=\mu_0$, and the MLE of $\sigma^2$ is $$\widehat\sigma_0^2=\frac1n\sum_{i=1}^n (x_i-\mu_0)^2.$$ After substituting these into the normal likelihood, the likelihood ratio is $$\lambda(x)= \left(\frac{\widehat\sigma_0^2}{\widehat\sigma^2}\right)^{-n/2}.$$ Using $$\sum_{i=1}^n (x_i-\mu_0)^2 = \sum_{i=1}^n (x_i-\bar x)^2+n(\bar x-\mu_0)^2,$$ we get $$\frac{\widehat\sigma_0^2}{\widehat\sigma^2} =1+\frac{n(\bar x-\mu_0)^2}{\sum_{i=1}^n (x_i-\bar x)^2}.$$ Let $$S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2$$ and $$t=\frac{\bar X-\mu_0}{S/\sqrt n}.$$ Then $$\frac{\widehat\sigma_0^2}{\widehat\sigma^2}=1+\frac{t^2}{n-1}.$$ Therefore $\lambda(x)$ is small exactly when $|t|$ is large. The LRT rejects $H_0$ when $$|t|>t_{n-1,1-\alpha/2}.$$ This is the classical two-sided Student's $t$-test with $n-1$ degrees of freedom. ::: ::: {.callout-important title="Remark"} *Remark 18*. The classical $Z$-, $t$-, chi-square, proportion, pooled two-sample $t$-, two-proportion, and $F$-tests can all be interpreted as special cases or asymptotic versions of likelihood ratio tests. ::: # LRTs and Sufficient Statistics This section explains why likelihood ratio tests can be computed using sufficient statistics without losing information. ::: {.callout-important title="Theorem"} **Theorem 19** (LRT based on a sufficient statistic). *Suppose $T(X)$ is sufficient for $\theta$. Let $\lambda(x)$ be the likelihood ratio statistic based on the full data $X$, and let $\lambda^*(T(x))$ be the likelihood ratio statistic based on the statistic $T$. Then $$\lambda^*(T(x))=\lambda(x).$$* ::: ::: {.callout-note title="Proof"} *Proof.* By the factorization theorem, the likelihood can be written as $$L(\theta\mid x)=g(T(x),\theta)h(x),$$ where $h(x)$ does not depend on $\theta$. Then $$\lambda(x)=\frac{\sup_{\theta\in\Theta_0}g(T(x),\theta)h(x)}{\sup_{\theta\in\Theta}g(T(x),\theta)h(x)}.$$ The factor $h(x)$ cancels, so $$\lambda(x)=\frac{\sup_{\theta\in\Theta_0}g(T(x),\theta)}{\sup_{\theta\in\Theta}g(T(x),\theta)},$$ which is exactly the likelihood ratio based on $T(x)$. ◻ ::: ::: {.callout-note title="Example"} **Example 20** (Normal mean with known variance using $\bar X$). Suppose $X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2)$ with known $\sigma^2$. Since $\bar X$ is sufficient for $\mu$, derive the LRT using $T=\bar X$. ::: ::: {.callout-tip title="Solution"} The statistic $$T=\bar X$$ has distribution $$T\sim \operatorname{Normal}\left(\mu,\frac{\sigma^2}{n}\right).$$ Thus the likelihood based on $T=t$ is $$L(\mu\mid t)=\frac{1}{\sqrt{2\pi\sigma^2/n}} \exp\left(-\frac{n(t-\mu)^2}{2\sigma^2}\right).$$ For testing $H_0:\mu=\mu_0$ versus $H_1:\mu\ne\mu_0$, the unrestricted MLE based on $t$ is $\widehat\mu=t$. Therefore $$\lambda^*(t)=\frac{L(\mu_0\mid t)}{L(t\mid t)} =\exp\left(-\frac{n(t-\mu_0)^2}{2\sigma^2}\right).$$ Substituting $t=\bar x$ gives $$\lambda^*(\bar x)=\exp\left(-\frac{n(\bar x-\mu_0)^2}{2\sigma^2}\right),$$ which is the same likelihood ratio statistic obtained from the full sample. ::: # Bayesian Tests This section presents hypothesis testing from a Bayesian viewpoint, where inference is based on posterior probabilities. In Bayesian statistics, the parameter $\theta$ is treated as a random quantity with prior distribution $\pi(\theta)$. Given data $x=(x_1,\ldots,x_n)$, the posterior distribution is $$\pi(\theta\mid x)=\frac{f(x\mid\theta)\pi(\theta)}{m(x)},$$ where $$m(x)=\int f(x\mid\theta)\pi(\theta)\,d\theta$$ is the marginal distribution of the data. ::: {.callout-note title="Definition"} **Definition 21** (Bayesian test by posterior probabilities). For testing $$H_0:\theta\in\Theta_0 \qquad \text{versus}\qquad H_1:\theta\in\Theta_0^c,$$ compute the posterior probabilities $$\mathbb{P}(\theta\in\Theta_0\mid x) \quad \text{and}\quad \mathbb{P}(\theta\in\Theta_0^c\mid x).$$ A simple Bayesian rule rejects $H_0$ if $$\mathbb{P}(\theta\in\Theta_0\mid x)<\mathbb{P}(\theta\in\Theta_0^c\mid x).$$ Equivalently, reject if $$\mathbb{P}(\theta\in\Theta_0\mid x)<\frac12.$$ A more conservative rule might reject only if $$\mathbb{P}(\theta\in\Theta_0\mid x)<0.05.$$ ::: ## Bayesian normal mean test We now compute a Bayesian test for a normal mean. ::: {.callout-note title="Example"} **Example 22** (Bayesian test for a normal mean). Suppose $$X_1,\ldots,X_n\mid\mu \sim \operatorname{Normal}(\mu,\sigma^2),$$ where $\sigma^2$ is known. Suppose the prior is $$\mu\sim \operatorname{Normal}(\theta,\tau^2).$$ Test $$H_0:\mu\le \mu_0 \qquad \text{versus}\qquad H_1:\mu>\mu_0.$$ Derive the posterior and the Bayesian decision rule that rejects when the posterior probability of $H_0$ is less than $1/2$. ::: ::: {.callout-tip title="Solution"} The normal-normal conjugate posterior is normal: $$\mu\mid x \sim \operatorname{Normal}(m_n,v_n),$$ where $$m_n=\frac{\tau^2\sum_{i=1}^n x_i+\sigma^2\theta}{n\tau^2+\sigma^2}$$ and $$v_n=\frac{\sigma^2\tau^2}{\sigma^2+n\tau^2}.$$ We reject $H_0$ when $$\mathbb{P}(\mu\le \mu_0\mid x)<\frac12.$$ Since the posterior distribution is normal and symmetric, this condition is equivalent to $$m_n>\mu_0.$$ Thus the Bayesian test rejects when $$\frac{\tau^2\sum_{i=1}^n x_i+\sigma^2\theta}{n\tau^2+\sigma^2}>\mu_0.$$ Equivalently, $$\bar x>\mu_0+\frac{\sigma^2}{n\tau^2}(\mu_0-\theta).$$ This shows how the prior mean $\theta$ shifts the rejection threshold. ::: # Union-Intersection Tests This section explains how to construct a test for a complicated alternative by combining simpler component tests. The union-intersection test is useful when the null hypothesis can be written as an intersection of simpler hypotheses. Suppose $$H_0:\theta\in\Theta_0=\bigcap_{\gamma\in\Gamma}\Theta_\gamma.$$ Then by De Morgan's law, $$H_1:\theta\in\Theta_0^c=\bigcup_{\gamma\in\Gamma}\Theta_\gamma^c.$$ For each $\gamma$, we test $$H_{0\gamma}:\theta\in\Theta_\gamma \qquad \text{versus}\qquad H_{1\gamma}:\theta\in\Theta_\gamma^c.$$ Let the rejection region for the $\gamma$th subtest be $$R_\gamma=\{x:T_\gamma(x)>c\}.$$ The union-intersection test rejects if any component test rejects: $$R=\bigcup_{\gamma\in\Gamma}R_\gamma.$$ Equivalently, $$R=\left\{x:\sup_{\gamma\in\Gamma}T_\gamma(x)>c\right\}.$$ Thus the combined test statistic is $$T(x)=\sup_{\gamma\in\Gamma}T_\gamma(x).$$ ::: {.callout-note title="Example"} **Example 23** (UIT for a two-sided normal mean test). Suppose $X_1,\ldots,X_n$ are normal and we want to test $$H_0:\mu=\mu_0 \qquad \text{versus}\qquad H_1:\mu\ne \mu_0.$$ Explain how this can be constructed from one-sided tests. ::: ::: {.callout-tip title="Solution"} The null hypothesis can be written as $$H_0:\mu=\mu_0 \quad \Longleftrightarrow \quad \{\mu\le \mu_0\}\cap \{\mu\ge \mu_0\}.$$ The alternative is $$H_1:\mu\ne \mu_0 \quad \Longleftrightarrow \quad \{\mu>\mu_0\}\cup \{\mu<\mu_0\}.$$ Thus we combine two one-sided tests: - reject $H_{0L}:\mu\le \mu_0$ for large positive values of the standardized statistic; - reject $H_{0R}:\mu\ge \mu_0$ for large negative values of the standardized statistic. If $\sigma^2$ is known, the standardized statistic is $$Z=\frac{\bar X-\mu_0}{\sigma/\sqrt n}.$$ The UIT rejects when either tail is too extreme, which is equivalent to $$|Z|>z_{1-\alpha/2}.$$ This is the classical two-sided $Z$-test. If $\sigma^2$ is unknown, replace $\sigma$ by $S$: $$t=\frac{\bar X-\mu_0}{S/\sqrt n},$$ and reject when $$|t|>t_{n-1,1-\alpha/2}.$$ This is the classical two-sided $t$-test. ::: # Intersection-Union Tests This section explains the complementary construction, where we reject only if all component tests reject. The intersection-union test is useful when the alternative hypothesis requires several conditions to hold simultaneously. Suppose $$H_0:\theta\in\Theta_0=\bigcup_{\gamma\in\Gamma}\Theta_\gamma.$$ Then by De Morgan's law, $$H_1:\theta\in\Theta_0^c=\bigcap_{\gamma\in\Gamma}\Theta_\gamma^c.$$ For each component test, let $$R_\gamma=\{x:T_\gamma(x)>c\}.$$ The IUT rejects only when all component tests reject: $$R=\bigcap_{\gamma\in\Gamma}R_\gamma.$$ Equivalently, $$R=\left\{x:\inf_{\gamma\in\Gamma}T_\gamma(x)>c\right\}.$$ Thus the combined test statistic is $$T(x)=\inf_{\gamma\in\Gamma}T_\gamma(x).$$ ::: {.callout-note title="Example"} **Example 24** (Acceptance sampling for upholstery fabric). A batch of upholstery fabric has two quality parameters: - $\theta_1$: mean breaking strength, which should exceed $50$ pounds; - $\theta_2$: probability of passing a flammability test, which should exceed $0.95$. Set up an intersection-union test for determining whether a batch is acceptable. ::: ::: {.callout-tip title="Solution"} The batch is acceptable only if both standards are met: $$H_1:\theta_1>50 \quad \text{and}\quad \theta_2>0.95.$$ Thus the null hypothesis is that at least one standard fails: $$H_0:\theta_1\le 50 \quad \text{or}\quad \theta_2\le 0.95.$$ Equivalently, $$H_0=\{\theta_1\le 50\}\cup \{\theta_2\le 0.95\}$$ and $$H_1=\{\theta_1>50\}\cap \{\theta_2>0.95\}.$$ We collect two types of data: - breaking strengths $X_1,\ldots,X_n$, often modeled as normal with mean $\theta_1$; - flammability indicators $Y_1,\ldots,Y_m$, where $Y_i=1$ if the item passes and $0$ otherwise, often modeled as Bernoulli with probability $\theta_2$. We test the two component null hypotheses $$H_{01}:\theta_1\le 50 \qquad \text{and}\qquad H_{02}:\theta_2\le 0.95.$$ The intersection-union rule rejects the overall null $H_0$ only if both component null hypotheses are rejected. In words, we declare the batch acceptable only if the strength requirement and the flammability requirement both pass their respective tests. ::: # Neyman-Pearson Lemma: Simple Hypotheses This section records the fundamental optimality result behind likelihood ratio tests for simple hypotheses. Although the detailed proof is usually saved for a more advanced treatment, the main message is essential: for simple null and simple alternative hypotheses, the most powerful test at a fixed significance level is a likelihood ratio test. ::: {.callout-important title="Theorem"} **Theorem 25** (Neyman-Pearson lemma). *Consider testing two simple hypotheses $$H_0:\theta=\theta_0 \qquad \text{versus}\qquad H_1:\theta=\theta_1.$$ Among all tests with Type I error probability at most $\alpha$, the most powerful test rejects $H_0$ for sufficiently small values of $$\frac{L(\theta_0\mid X)}{L(\theta_1\mid X)}.$$ Equivalently, it rejects for sufficiently large values of $$\frac{L(\theta_1\mid X)}{L(\theta_0\mid X)}.$$* ::: ::: {.callout-important title="Remark"} *Remark 26*. The Neyman-Pearson lemma explains why likelihood ratio tests are not only natural, but optimal, for simple-versus-simple testing problems. ::: # Practice Problems This section gives practice problems that reinforce the main ideas of hypothesis tests, likelihood ratios, $p$-values, and combined tests. ::: {.callout-warning title="Practice Problem"} **Practice Problem 27** (Coin test). A coin is tossed $100$ times and $62$ heads are observed. Test $$H_0:p=\frac12 \qquad \text{versus}\qquad H_1:p\ne \frac12$$ using the normal approximation at level $\alpha=0.05$. ::: ::: {.callout-tip title="Solution"} Under $H_0$, $$X\sim \operatorname{Binomial}\left(100,\frac12\right), \qquad \mathbb{E}[X]=50, \qquad \operatorname{sd}(X)=5.$$ The standardized statistic is $$Z=\frac{62-50}{5}=2.4.$$ For a two-sided level-$0.05$ test, the critical value is $1.96$. Since $$|2.4|>1.96,$$ we reject $H_0$. The approximate two-sided $p$-value is $$2(1-\Phi(2.4))\approx 0.0164.$$ ::: ::: {.callout-warning title="Practice Problem"} **Practice Problem 28** (Radar $p$-value). In the radar example, suppose $X=0.4$. Compute the right-sided $p$-value under $H_0$. ::: ::: {.callout-tip title="Solution"} Under $H_0$, $X\sim \operatorname{Normal}(0,1/9)$. Standardizing gives $$Z=3X=3(0.4)=1.2.$$ Thus the right-sided $p$-value is $$p=\mathbb{P}(Z\ge 1.2)=1-\Phi(1.2)\approx 0.1151.$$ At level $0.05$, we would not reject $H_0$. ::: ::: {.callout-warning title="Practice Problem"} **Practice Problem 29** (Normal mean, known variance). Suppose $X_1,\ldots,X_{25}\sim \operatorname{Normal}(\mu,16)$ and $\bar x=53$. Test $$H_0:\mu=50 \qquad \text{versus}\qquad H_1:\mu\ne 50$$ at level $0.05$. ::: ::: {.callout-tip title="Solution"} Here $\sigma=4$ and $n=25$, so $$Z=\frac{\bar x-\mu_0}{\sigma/\sqrt n} =\frac{53-50}{4/5}=\frac{3}{0.8}=3.75.$$ Since $$|3.75|>1.96,$$ we reject $H_0$ at level $0.05$. ::: ::: {.callout-warning title="Practice Problem"} **Practice Problem 30** (Normal mean, unknown variance). Suppose $X_1,\ldots,X_{16}$ are normal, $\bar x=10.8$, and $s=2.4$. Test $$H_0:\mu=10 \qquad \text{versus}\qquad H_1:\mu\ne 10$$ at level $0.05$. ::: ::: {.callout-tip title="Solution"} The test statistic is $$t=\frac{\bar x-\mu_0}{s/\sqrt n} =\frac{10.8-10}{2.4/4} =\frac{0.8}{0.6}=1.333.$$ There are $n-1=15$ degrees of freedom. The two-sided $0.05$ critical value is about $$t_{15,0.975}\approx 2.131.$$ Since $$|1.333|<2.131,$$ we fail to reject $H_0$. ::: ::: {.callout-warning title="Practice Problem"} **Practice Problem 31** (Likelihood ratio for Bernoulli simple hypotheses). Let $X_1,\ldots,X_n\sim \operatorname{Bernoulli}(p)$. Test $$H_0:p=p_0 \qquad \text{versus}\qquad H_1:p=p_1,$$ where $p_1>p_0$. Find the likelihood ratio and describe the rejection region. ::: ::: {.callout-tip title="Solution"} Let $$S=\sum_{i=1}^n X_i.$$ The likelihood is $$L(p\mid x)=p^S(1-p)^{n-S}.$$ Thus $$\lambda(x)=\frac{L(p_0\mid x)}{L(p_1\mid x)} =\left(\frac{p_0}{p_1}\right)^S \left(\frac{1-p_0}{1-p_1}\right)^{n-S}.$$ Taking logs, $$\log\lambda(x)=S\log\left(\frac{p_0}{p_1}\right) +(n-S)\log\left(\frac{1-p_0}{1-p_1}\right).$$ Because $p_1>p_0$, the likelihood ratio decreases as $S$ increases. Therefore the LRT rejects $H_0$ for large values of $S$, i.e. $$\sum_{i=1}^n X_i \ge k$$ for a threshold $k$ chosen to control the Type I error probability. ::: ::: {.callout-warning title="Practice Problem"} **Practice Problem 32** (Bayesian posterior decision). Suppose $\mu\mid x\sim \operatorname{Normal}(4,1)$. Test $$H_0:\mu\le 3 \qquad \text{versus}\qquad H_1:\mu>3.$$ Using the Bayesian rule "reject $H_0$ if $\mathbb{P}(H_0\mid x)<1/2$," what is the decision? ::: ::: {.callout-tip title="Solution"} Since the posterior is normal with mean $4$ and variance $1$, $$\mathbb{P}(H_0\mid x)=\mathbb{P}(\mu\le 3\mid x)=\Phi\left(\frac{3-4}{1}\right)=\Phi(-1)\approx 0.1587.$$ Since $$0.1587<\frac12,$$ we reject $H_0$ using this Bayesian decision rule. ::: ::: {.callout-warning title="Practice Problem"} **Practice Problem 33** (Intersection-union logic). A device is acceptable only if its battery life is above $10$ hours and its failure probability is below $0.01$. Write the null and alternative hypotheses for an IUT. ::: ::: {.callout-tip title="Solution"} Let $\theta_1$ be the mean battery life and $\theta_2$ be the failure probability. The device is acceptable only if $$\theta_1>10 \qquad \text{and}\qquad \theta_2<0.01.$$ Thus the alternative is $$H_1:\theta_1>10 \text{ and } \theta_2<0.01.$$ The null is that at least one requirement fails: $$H_0:\theta_1\le 10 \text{ or } \theta_2\ge 0.01.$$ An IUT rejects $H_0$ only if both component tests reject their corresponding component null hypotheses. ::: # Summary This section introduced the foundations and construction methods for hypothesis testing. ::: {.callout-tip title="Key idea"} Key takeaways - A hypothesis test divides the sample space into an acceptance region and a rejection region. - Type I error means rejecting a true null; Type II error means failing to reject a false null. - The significance level controls Type I error probability. - The $p$-value is the smallest significance level that would reject $H_0$. - Likelihood ratio tests reject when the null model fits much worse than the unrestricted model. - The classical $Z$-test and $t$-test can be derived from likelihood ratio tests. - Bayesian tests use posterior probabilities of hypotheses. - UITs reject when any component test rejects; IUTs reject only when all component tests reject. :::

15 Chapter 14: Hypothesis Tests I — Methods of Finding Tests

16 Introduction to Hypothesis Testing

16.1 The testing decision

16.2 Mathematical formulation

17 A First Example: Testing Whether a Coin Is Fair

17.1 Type I and Type II errors

18 Significance Level, Critical Values, and \(p\)-Values

18.1 Significance level

18.2 \(p\)-value

19 Example: Radar Aircraft Detection

20 Classical Test for a Normal Mean

21 Methods of Finding Tests

22 Likelihood Ratio Tests

22.1 Simple versus simple likelihood ratio test

23 Likelihood Ratio Test: Radar Example

24 LRT for a Normal Mean with Known Variance

25 LRT for a Normal Mean with Unknown Variance

26 LRTs and Sufficient Statistics

27 Bayesian Tests

27.1 Bayesian normal mean test

28 Union-Intersection Tests

29 Intersection-Union Tests

30 Neyman-Pearson Lemma: Simple Hypotheses

31 Practice Problems

32 Summary