15  Chapter 14: Hypothesis Tests I — Methods of Finding Tests

This chapter introduces the foundations of hypothesis testing and several general methods for constructing tests. The central ideas are null and alternative hypotheses, test statistics, rejection regions, Type I and Type II errors, significance level, power, \(p\)-values, likelihood ratio tests, Bayesian tests, union-intersection tests, intersection-union tests, and the Neyman–Pearson lemma.

NoteTopics

Hypothesis testing; null and alternative hypotheses; Type I and Type II errors; significance level; power; \(p\)-values; likelihood ratio tests; Bayesian tests; union-intersection tests; intersection-union tests; radar detection; coin testing; normal mean tests; sufficient statistics and LRTs.

16 Introduction to Hypothesis Testing

This section develops the mathematical theory behind hypothesis tests and explains how classical tests arise from general testing principles.

In introductory statistics, students often learn concrete tests such as the \(Z\)-test, \(t\)-test, chi-square test, two-proportion test, two-sample mean test, and \(F\)-test. Here we study the theoretical principles behind such procedures.

TipKey idea

Main goal A hypothesis test uses observed data to decide between two competing statements about a population parameter.

For example, a pharmaceutical company may want to know whether a new drug is effective in treating a disease. A natural pair of hypotheses is \[H_0: \text{the drug is not effective}, \qquad H_1: \text{the drug is effective}.\] The null hypothesis \(H_0\) represents the default or no-effect claim, while the alternative hypothesis \(H_1\) represents the new effect or departure from the default.

16.1 The testing decision

A hypothesis test is a rule that specifies which sample values lead us to reject \(H_0\) and which sample values lead us not to reject \(H_0\).

NoteDefinition

Definition 1 (Hypothesis test). Let \(X=(X_1,\ldots,X_n)\) be a sample from a population distribution depending on a parameter \(\theta\). A hypothesis test is a rule that divides the sample space into two regions:

  • a rejection region \(R\), where we reject \(H_0\);

  • an acceptance region \(A=R^c\), where we fail to reject \(H_0\).

Equivalently, a test may be specified by a test statistic \(W(X)\) and a threshold rule.

ImportantRemark

Remark 2 (Terminology). In modern statistical language, one often says “fail to reject \(H_0\)” instead of “accept \(H_0\).” This emphasizes that lack of evidence against \(H_0\) is not the same as proof that \(H_0\) is true.

16.2 Mathematical formulation

The mathematical formulation of testing is based on partitioning the parameter space.

NoteDefinition

Definition 3 (Null and alternative hypotheses). Let \(\Theta\) be the parameter space. A hypothesis test compares \[H_0: \theta\in \Theta_0 \qquad \text{versus} \qquad H_1: \theta\in \Theta_0^c.\] The set \(\Theta_0\) is the null parameter space, and \(\Theta_0^c\) is the alternative parameter space.

Examples include \[H_0:\mu=100 \quad \text{versus}\quad H_1:\mu\ne 100,\] or \[H_0:\mu\ge 100 \quad \text{versus}\quad H_1:\mu<100.\]

17 A First Example: Testing Whether a Coin Is Fair

This example introduces the main ingredients of hypothesis testing through the familiar problem of checking whether a coin is fair.

Suppose \(\theta\) is the probability of heads. We want to test \[H_0:\theta=\frac12 \qquad \text{versus}\qquad H_1:\theta\ne \frac12.\] Let \(X_i\sim \operatorname{Bernoulli}(\theta)\) represent the result of the \(i\)th toss, where \(X_i=1\) for heads and \(X_i=0\) for tails. Let \[X=X_1+\cdots+X_n.\] Then \[X\sim \operatorname{Binomial}(n,\theta).\]

NoteExample

Example 4 (Coin test with 100 tosses). Suppose the coin is tossed \(n=100\) times. Under \(H_0\), \(\theta=1/2\), so \[X\sim \operatorname{Binomial}\left(100,\frac12\right), \qquad \mathbb{E}_{H_0}[X]=50, \qquad \operatorname{Var}_{H_0}(X)=25.\] A natural test rejects \(H_0\) when \(X\) is too far from \(50\).

TipSolution

Let \(t>0\) be a threshold. The test is \[\text{fail to reject }H_0 \text{ if } |X-50|\le t, \qquad \text{reject }H_0 \text{ if } |X-50|>t.\] The threshold should control the probability of rejecting a fair coin: \[\mathbb{P}(\text{Type I error})= \mathbb{P}_{H_0}(|X-50|>t).\] Using the central limit theorem, under \(H_0\), \[Y=\frac{X-n\theta_0}{\sqrt{n\theta_0(1-\theta_0)}} =\frac{X-50}{5} \approx \operatorname{Normal}(0,1).\] For significance level \(\alpha=0.05\), the two-sided critical value is approximately \(1.96\). Thus we reject \(H_0\) when \[\left|\frac{X-50}{5}\right|>1.96.\] Equivalently, \[|X-50|>9.8.\] Using integer values, we fail to reject \(H_0\) approximately when \[X\in\{41,42,\ldots,59\},\] and reject \(H_0\) otherwise.

17.1 Type I and Type II errors

Every hypothesis test can make two kinds of mistakes.

NoteDefinition

Definition 5 (Type I and Type II errors). For a hypothesis test of \(H_0\) versus \(H_1\):

  • A Type I error occurs when \(H_0\) is true but we reject \(H_0\).

  • A Type II error occurs when \(H_1\) is true but we fail to reject \(H_0\).

The Type I error probability is usually denoted by \(\alpha\), and the Type II error probability is usually denoted by \(\beta\).

For the coin example, if \(\theta_1\ne 1/2\), then \[\beta(\theta_1)=\mathbb{P}_{\theta_1}(\text{fail to reject }H_0) =\mathbb{P}_{\theta_1}(41\le X\le 59).\] The power function is \[\operatorname{Power}(\theta)=\mathbb{P}_\theta(\text{reject }H_0)=1-\beta(\theta)\] for \(\theta\) in the alternative.

Decision / Truth \(H_0\) true \(H_1\) true
Reject \(H_0\) Type I error Correct decision
Fail to reject \(H_0\) Correct decision Type II error
WarningWarning

Trade-off For a fixed sample size, making \(\alpha\) smaller often makes \(\beta\) larger. Reducing both errors usually requires increasing the sample size or using a more informative statistic.

18 Significance Level, Critical Values, and \(p\)-Values

This section explains the practical language of hypothesis testing: significance level, rejection region, critical value, and \(p\)-value.

18.1 Significance level

The significance level controls the probability of falsely rejecting the null hypothesis.

NoteDefinition

Definition 6 (Level-\(\alpha\) test). A test has significance level \(\alpha\) if \[\sup_{\theta\in\Theta_0}\mathbb{P}_\theta(\text{reject }H_0)\le \alpha.\] For a simple null hypothesis \(H_0:\theta=\theta_0\), this reduces to \[\mathbb{P}_{\theta_0}(\text{reject }H_0)\le \alpha.\]

18.2 \(p\)-value

The \(p\)-value measures how extreme the observed statistic is under the null model.

NoteDefinition

Definition 7 (\(p\)-value). For an observed statistic \(W(x_1,\ldots,x_n)\), the \(p\)-value is the probability, computed under \(H_0\), of observing a test statistic at least as extreme as the one observed. For a right-sided test, this is often \[p\text{-value}=\mathbb{P}_{H_0}\left(W(X_1,\ldots,X_n)\ge W(x_1,\ldots,x_n)\right).\] For a left-sided test, this is often \[p\text{-value}=\mathbb{P}_{H_0}\left(W(X_1,\ldots,X_n)\le W(x_1,\ldots,x_n)\right).\]

TipKey idea

Interpretation The \(p\)-value is the smallest significance level at which the observed data would lead to rejection of \(H_0\).

A small \(p\)-value indicates that the observed data are unusual under \(H_0\), so the data provide evidence against \(H_0\).

19 Example: Radar Aircraft Detection

This example shows how hypothesis testing appears in signal detection and illustrates the trade-off between Type I and Type II errors.

A radar system receives a signal \(X\). If no aircraft is present, then \[X=W.\] If an aircraft is present, then \[X=1+W.\] Here \[W\sim \operatorname{Normal}\left(0,\sigma^2\right), \qquad \sigma^2=\frac19.\] Equivalently, \[X=\theta+W, \qquad \theta=\begin{cases} 0, & \text{no aircraft is present},\\ 1, & \text{an aircraft is present}. \end{cases}\] We test \[H_0:\theta=0 \qquad \text{versus}\qquad H_1:\theta=1.\]

NoteExample

Example 8 (Level \(0.05\) radar test). Construct a level \(\alpha=0.05\) test that rejects \(H_0\) when \(X>c\).

TipSolution

Under \(H_0\), \(X=W\sim \operatorname{Normal}(0,1/9)\). Thus \[\mathbb{P}_{H_0}(X>c)=\mathbb{P}(3X>3c)=1-\Phi(3c).\] To make this probability equal to \(0.05\), choose \[1-\Phi(3c)=0.05.\] Therefore \[3c=z_{0.95}\approx 1.645, \qquad c\approx \frac{1.645}{3}=0.5483.\] The level-\(0.05\) test is \[\text{reject }H_0 \quad \text{if} \quad X>0.5483.\]

NoteExample

Example 9 (Type II error for the radar test). For the level \(0.05\) radar test above, compute the Type II error probability.

TipSolution

Under \(H_1\), \(X=1+W\) with \(W\sim \operatorname{Normal}(0,1/9)\). The Type II error probability is \[\beta=\mathbb{P}_{H_1}(X\le c)=\mathbb{P}(1+W\le c)=\mathbb{P}(W\le c-1).\] Standardizing gives \[\beta=\Phi(3(c-1)).\] Using \(c=1.645/3\approx 0.5483\), \[3(c-1)\approx -1.355,\] so \[\beta\approx \Phi(-1.355)\approx 0.0877.\] Thus the probability of missing a present aircraft is about \(8.77\%\).

NoteExample

Example 10 (Evidence check at level \(0.01\)). Suppose the observed signal is \(X=0.6\). Determine whether there is sufficient evidence to reject \(H_0\) at significance level \(\alpha=0.01\).

TipSolution

For a right-sided level-\(0.01\) test, \[\mathbb{P}_{H_0}(X>c)=0.01.\] Thus \[3c=z_{0.99}\approx 2.326, \qquad c\approx \frac{2.326}{3}=0.7753.\] Since the observed value is \[0.6<0.7753,\] we do not reject \(H_0\) at the \(0.01\) level.

NoteExample

Example 11 (Power constraint). Find a critical value \(c\) so that the probability of missing a present aircraft is less than \(5\%\). Then compute the resulting significance level.

TipSolution

We want \[\beta=\mathbb{P}_{H_1}(X\le c)=0.05.\] Since under \(H_1\), \(X\sim \operatorname{Normal}(1,1/9)\), \[\mathbb{P}(3(X-1)\le 3(c-1))=0.05.\] Thus \[3(c-1)=z_{0.05}\approx -1.645,\] so \[c=1-\frac{1.645}{3}\approx 0.4517.\] The resulting significance level is \[\alpha=\mathbb{P}_{H_0}(X>c)=1-\Phi(3c).\] Since \(3c\approx 1.355\), \[\alpha\approx 1-\Phi(1.355)\approx 0.0877.\] To reduce the miss probability to \(5\%\), the false alarm probability must increase to about \(8.77\%\).

NoteExample

Example 12 (\(p\)-value for the radar test). For the observed value \(X_0=0.6\), compute the \(p\)-value for the right-sided radar test.

TipSolution

Under \(H_0\), \(X\sim \operatorname{Normal}(0,1/9)\). The standardized observed statistic is \[Z_0=\frac{0.6}{1/3}=1.8.\] For a right-sided test, \[p\text{-value}=\mathbb{P}_{H_0}(X\ge 0.6)=\mathbb{P}(Z\ge 1.8)=1-\Phi(1.8).\] Numerically, \[p\text{-value}\approx 0.0359.\] Therefore:

  • at \(\alpha=0.05\), reject \(H_0\);

  • at \(\alpha=0.01\), do not reject \(H_0\).

20 Classical Test for a Normal Mean

This section reviews how a familiar normal mean test fits into the general framework of test statistics and rejection regions.

Suppose \[X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2),\] where \(\sigma^2\) is known. We test \[H_0:\mu=\mu_0 \qquad \text{versus}\qquad H_1:\mu\ne \mu_0.\]

Under \(H_0\), \[\bar X\sim \operatorname{Normal}\left(\mu_0,\frac{\sigma^2}{n}\right).\] Therefore \[Z=\frac{\bar X-\mu_0}{\sigma/\sqrt n}\sim \operatorname{Normal}(0,1).\]

A two-sided level-\(\alpha\) test rejects \(H_0\) when \[|Z|>z_{1-\alpha/2}.\] Equivalently, \[\left|\bar X-\mu_0\right|> z_{1-\alpha/2}\frac{\sigma}{\sqrt n}.\]

When \(\sigma^2\) is unknown and the data are normal, the classical test statistic is \[t=\frac{\bar X-\mu_0}{S/\sqrt n}, \qquad S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2,\] and under \(H_0\), \[t\sim t_{n-1}.\] The two-sided level-\(\alpha\) test rejects when \[|t|>t_{n-1,1-\alpha/2}.\]

21 Methods of Finding Tests

This section introduces several general methods for constructing hypothesis tests.

The main methods discussed in this section are:

  1. likelihood ratio tests;

  2. Bayesian tests;

  3. union-intersection tests;

  4. intersection-union tests;

  5. the Neyman-Pearson lemma for simple hypotheses.

TipKey idea

Guiding principle Different testing methods correspond to different statistical philosophies. Likelihood ratio tests compare best likelihoods; Bayesian tests use posterior probabilities; union-intersection and intersection-union tests build complex tests from simpler component tests.

22 Likelihood Ratio Tests

Likelihood ratio tests compare how well the null parameter space explains the data against how well the full parameter space explains the data.

Let \(X_1,\ldots,X_n\) be a sample from a population distribution with density or mass function \(f(x\mid\theta)\). For observed data \(x=(x_1,\ldots,x_n)\), the likelihood function is \[L(\theta\mid x)=f(x_1,\ldots,x_n\mid\theta)=\prod_{i=1}^n f(x_i\mid\theta).\]

NoteDefinition

Definition 13 (Likelihood ratio statistic). For testing \[H_0:\theta\in\Theta_0 \qquad \text{versus}\qquad H_1:\theta\in\Theta_0^c,\] the likelihood ratio statistic is \[\lambda(x)=\frac{\sup_{\theta\in\Theta_0}L(\theta\mid x)}{\sup_{\theta\in\Theta}L(\theta\mid x)}.\] A likelihood ratio test rejects \(H_0\) for small values of \(\lambda(x)\): \[R=\{x:\lambda(x)\le c\}, \qquad 0\le c\le 1.\]

Since \(\Theta_0\subseteq \Theta\), we always have \[0\le \lambda(x)\le 1.\] A small value of \(\lambda(x)\) means that the null model fits much worse than the unrestricted model.

22.1 Simple versus simple likelihood ratio test

The simplest LRT compares two point hypotheses.

NoteDefinition

Definition 14 (Simple likelihood ratio). For \[H_0:\theta=\theta_0 \qquad \text{versus}\qquad H_1:\theta=\theta_1,\] define \[\lambda(x)=\frac{L(\theta_0\mid x)}{L(\theta_1\mid x)}.\] The likelihood ratio test rejects \(H_0\) for small values of \(\lambda(x)\).

23 Likelihood Ratio Test: Radar Example

This section rederives the radar test using the likelihood ratio method.

Recall that \[X=\theta+W, \qquad W\sim \operatorname{Normal}\left(0,\frac19\right),\] and we test \[H_0:\theta=0 \qquad \text{versus}\qquad H_1:\theta=1.\]

NoteExample

Example 15 (LRT for radar detection). Find the likelihood ratio statistic and show that the LRT rejects for large values of \(X\).

TipSolution

The density under \(\theta=0\) is \[L(0\mid x)=\frac{3}{\sqrt{2\pi}}\exp\left(-\frac{9x^2}{2}\right).\] The density under \(\theta=1\) is \[L(1\mid x)=\frac{3}{\sqrt{2\pi}}\exp\left(-\frac{9(x-1)^2}{2}\right).\] Thus the likelihood ratio is \[\lambda(x)=\frac{L(0\mid x)}{L(1\mid x)} =\exp\left(-\frac{9x^2}{2}+\frac{9(x-1)^2}{2}\right).\] Simplifying, \[\lambda(x)=\exp\left(\frac{9(1-2x)}{2}\right).\] Since this is a decreasing function of \(x\), rejecting for small \(\lambda(x)\) is equivalent to rejecting for large \(x\). Therefore the LRT has the form \[\text{reject }H_0 \quad \text{if}\quad x>c'.\] For a level \(0.05\) test, \[\mathbb{P}_{H_0}(X>c')=0.05,\] so \[c'=\frac{z_{0.95}}{3}\approx \frac{1.645}{3}=0.5483.\] This is the same test constructed directly from Type I error control.

24 LRT for a Normal Mean with Known Variance

This section shows that the classical two-sided \(Z\)-test is a likelihood ratio test.

Suppose \[X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2),\] where \(\sigma^2\) is known. We test \[H_0:\mu=\mu_0 \qquad \text{versus}\qquad H_1:\mu\ne\mu_0.\]

NoteExample

Example 16 (Normal mean LRT with known variance). Derive the likelihood ratio statistic.

TipSolution

The likelihood is \[L(\mu\mid x)= (2\pi\sigma^2)^{-n/2} \exp\left(-\frac{1}{2\sigma^2}\sum_{i=1}^n (x_i-\mu)^2\right).\] Under \(H_0\), the best null value is fixed at \(\mu_0\). Under the full parameter space, the MLE is \[\widehat\mu=\bar x.\] Therefore \[\lambda(x)=\frac{L(\mu_0\mid x)}{L(\bar x\mid x)}.\] Using the identity \[\sum_{i=1}^n (x_i-\mu_0)^2 = \sum_{i=1}^n (x_i-\bar x)^2+n(\bar x-\mu_0)^2,\] we obtain \[\lambda(x)=\exp\left(-\frac{n(\bar x-\mu_0)^2}{2\sigma^2}\right).\] This statistic decreases as \(|\bar x-\mu_0|\) increases. Thus rejecting for small \(\lambda(x)\) is equivalent to rejecting for large \[\left|\frac{\bar X-\mu_0}{\sigma/\sqrt n}\right|.\] Therefore the level-\(\alpha\) LRT rejects when \[\left|\frac{\bar X-\mu_0}{\sigma/\sqrt n}\right|>z_{1-\alpha/2}.\] This is the classical two-sided \(Z\)-test.

The cutoff \(c\) in the likelihood-ratio form can be written as \[c=\exp\left(-\frac{z_{1-\alpha/2}^2}{2}\right).\]

25 LRT for a Normal Mean with Unknown Variance

This section shows that the classical two-sided Student’s \(t\)-test can also be derived as a likelihood ratio test.

Suppose \[X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2),\] where both \(\mu\) and \(\sigma^2\) are unknown. We test \[H_0:\mu=\mu_0 \qquad \text{versus}\qquad H_1:\mu\ne\mu_0.\]

NoteExample

Example 17 (Normal mean LRT with unknown variance). Derive the LRT and connect it to the Student \(t\) statistic.

TipSolution

Under the full model, the MLEs are \[\widehat\mu=\bar x, \qquad \widehat\sigma^2=\frac1n\sum_{i=1}^n (x_i-\bar x)^2.\] Under \(H_0\), \(\mu=\mu_0\), and the MLE of \(\sigma^2\) is \[\widehat\sigma_0^2=\frac1n\sum_{i=1}^n (x_i-\mu_0)^2.\] After substituting these into the normal likelihood, the likelihood ratio is \[\lambda(x)= \left(\frac{\widehat\sigma_0^2}{\widehat\sigma^2}\right)^{-n/2}.\] Using \[\sum_{i=1}^n (x_i-\mu_0)^2 = \sum_{i=1}^n (x_i-\bar x)^2+n(\bar x-\mu_0)^2,\] we get \[\frac{\widehat\sigma_0^2}{\widehat\sigma^2} =1+\frac{n(\bar x-\mu_0)^2}{\sum_{i=1}^n (x_i-\bar x)^2}.\] Let \[S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2\] and \[t=\frac{\bar X-\mu_0}{S/\sqrt n}.\] Then \[\frac{\widehat\sigma_0^2}{\widehat\sigma^2}=1+\frac{t^2}{n-1}.\] Therefore \(\lambda(x)\) is small exactly when \(|t|\) is large. The LRT rejects \(H_0\) when \[|t|>t_{n-1,1-\alpha/2}.\] This is the classical two-sided Student’s \(t\)-test with \(n-1\) degrees of freedom.

ImportantRemark

Remark 18. The classical \(Z\)-, \(t\)-, chi-square, proportion, pooled two-sample \(t\)-, two-proportion, and \(F\)-tests can all be interpreted as special cases or asymptotic versions of likelihood ratio tests.

26 LRTs and Sufficient Statistics

This section explains why likelihood ratio tests can be computed using sufficient statistics without losing information.

ImportantTheorem

Theorem 19 (LRT based on a sufficient statistic). Suppose \(T(X)\) is sufficient for \(\theta\). Let \(\lambda(x)\) be the likelihood ratio statistic based on the full data \(X\), and let \(\lambda^*(T(x))\) be the likelihood ratio statistic based on the statistic \(T\). Then \[\lambda^*(T(x))=\lambda(x).\]

NoteProof

Proof. By the factorization theorem, the likelihood can be written as \[L(\theta\mid x)=g(T(x),\theta)h(x),\] where \(h(x)\) does not depend on \(\theta\). Then \[\lambda(x)=\frac{\sup_{\theta\in\Theta_0}g(T(x),\theta)h(x)}{\sup_{\theta\in\Theta}g(T(x),\theta)h(x)}.\] The factor \(h(x)\) cancels, so \[\lambda(x)=\frac{\sup_{\theta\in\Theta_0}g(T(x),\theta)}{\sup_{\theta\in\Theta}g(T(x),\theta)},\] which is exactly the likelihood ratio based on \(T(x)\). ◻

NoteExample

Example 20 (Normal mean with known variance using \(\bar X\)). Suppose \(X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2)\) with known \(\sigma^2\). Since \(\bar X\) is sufficient for \(\mu\), derive the LRT using \(T=\bar X\).

TipSolution

The statistic \[T=\bar X\] has distribution \[T\sim \operatorname{Normal}\left(\mu,\frac{\sigma^2}{n}\right).\] Thus the likelihood based on \(T=t\) is \[L(\mu\mid t)=\frac{1}{\sqrt{2\pi\sigma^2/n}} \exp\left(-\frac{n(t-\mu)^2}{2\sigma^2}\right).\] For testing \(H_0:\mu=\mu_0\) versus \(H_1:\mu\ne\mu_0\), the unrestricted MLE based on \(t\) is \(\widehat\mu=t\). Therefore \[\lambda^*(t)=\frac{L(\mu_0\mid t)}{L(t\mid t)} =\exp\left(-\frac{n(t-\mu_0)^2}{2\sigma^2}\right).\] Substituting \(t=\bar x\) gives \[\lambda^*(\bar x)=\exp\left(-\frac{n(\bar x-\mu_0)^2}{2\sigma^2}\right),\] which is the same likelihood ratio statistic obtained from the full sample.

27 Bayesian Tests

This section presents hypothesis testing from a Bayesian viewpoint, where inference is based on posterior probabilities.

In Bayesian statistics, the parameter \(\theta\) is treated as a random quantity with prior distribution \(\pi(\theta)\). Given data \(x=(x_1,\ldots,x_n)\), the posterior distribution is \[\pi(\theta\mid x)=\frac{f(x\mid\theta)\pi(\theta)}{m(x)},\] where \[m(x)=\int f(x\mid\theta)\pi(\theta)\,d\theta\] is the marginal distribution of the data.

NoteDefinition

Definition 21 (Bayesian test by posterior probabilities). For testing \[H_0:\theta\in\Theta_0 \qquad \text{versus}\qquad H_1:\theta\in\Theta_0^c,\] compute the posterior probabilities \[\mathbb{P}(\theta\in\Theta_0\mid x) \quad \text{and}\quad \mathbb{P}(\theta\in\Theta_0^c\mid x).\] A simple Bayesian rule rejects \(H_0\) if \[\mathbb{P}(\theta\in\Theta_0\mid x)<\mathbb{P}(\theta\in\Theta_0^c\mid x).\] Equivalently, reject if \[\mathbb{P}(\theta\in\Theta_0\mid x)<\frac12.\] A more conservative rule might reject only if \[\mathbb{P}(\theta\in\Theta_0\mid x)<0.05.\]

27.1 Bayesian normal mean test

We now compute a Bayesian test for a normal mean.

NoteExample

Example 22 (Bayesian test for a normal mean). Suppose \[X_1,\ldots,X_n\mid\mu \sim \operatorname{Normal}(\mu,\sigma^2),\] where \(\sigma^2\) is known. Suppose the prior is \[\mu\sim \operatorname{Normal}(\theta,\tau^2).\] Test \[H_0:\mu\le \mu_0 \qquad \text{versus}\qquad H_1:\mu>\mu_0.\] Derive the posterior and the Bayesian decision rule that rejects when the posterior probability of \(H_0\) is less than \(1/2\).

TipSolution

The normal-normal conjugate posterior is normal: \[\mu\mid x \sim \operatorname{Normal}(m_n,v_n),\] where \[m_n=\frac{\tau^2\sum_{i=1}^n x_i+\sigma^2\theta}{n\tau^2+\sigma^2}\] and \[v_n=\frac{\sigma^2\tau^2}{\sigma^2+n\tau^2}.\] We reject \(H_0\) when \[\mathbb{P}(\mu\le \mu_0\mid x)<\frac12.\] Since the posterior distribution is normal and symmetric, this condition is equivalent to \[m_n>\mu_0.\] Thus the Bayesian test rejects when \[\frac{\tau^2\sum_{i=1}^n x_i+\sigma^2\theta}{n\tau^2+\sigma^2}>\mu_0.\] Equivalently, \[\bar x>\mu_0+\frac{\sigma^2}{n\tau^2}(\mu_0-\theta).\] This shows how the prior mean \(\theta\) shifts the rejection threshold.

28 Union-Intersection Tests

This section explains how to construct a test for a complicated alternative by combining simpler component tests.

The union-intersection test is useful when the null hypothesis can be written as an intersection of simpler hypotheses.

Suppose \[H_0:\theta\in\Theta_0=\bigcap_{\gamma\in\Gamma}\Theta_\gamma.\] Then by De Morgan’s law, \[H_1:\theta\in\Theta_0^c=\bigcup_{\gamma\in\Gamma}\Theta_\gamma^c.\] For each \(\gamma\), we test \[H_{0\gamma}:\theta\in\Theta_\gamma \qquad \text{versus}\qquad H_{1\gamma}:\theta\in\Theta_\gamma^c.\] Let the rejection region for the \(\gamma\)th subtest be \[R_\gamma=\{x:T_\gamma(x)>c\}.\] The union-intersection test rejects if any component test rejects: \[R=\bigcup_{\gamma\in\Gamma}R_\gamma.\] Equivalently, \[R=\left\{x:\sup_{\gamma\in\Gamma}T_\gamma(x)>c\right\}.\] Thus the combined test statistic is \[T(x)=\sup_{\gamma\in\Gamma}T_\gamma(x).\]

NoteExample

Example 23 (UIT for a two-sided normal mean test). Suppose \(X_1,\ldots,X_n\) are normal and we want to test \[H_0:\mu=\mu_0 \qquad \text{versus}\qquad H_1:\mu\ne \mu_0.\] Explain how this can be constructed from one-sided tests.

TipSolution

The null hypothesis can be written as \[H_0:\mu=\mu_0 \quad \Longleftrightarrow \quad \{\mu\le \mu_0\}\cap \{\mu\ge \mu_0\}.\] The alternative is \[H_1:\mu\ne \mu_0 \quad \Longleftrightarrow \quad \{\mu>\mu_0\}\cup \{\mu<\mu_0\}.\] Thus we combine two one-sided tests:

  • reject \(H_{0L}:\mu\le \mu_0\) for large positive values of the standardized statistic;

  • reject \(H_{0R}:\mu\ge \mu_0\) for large negative values of the standardized statistic.

If \(\sigma^2\) is known, the standardized statistic is \[Z=\frac{\bar X-\mu_0}{\sigma/\sqrt n}.\] The UIT rejects when either tail is too extreme, which is equivalent to \[|Z|>z_{1-\alpha/2}.\] This is the classical two-sided \(Z\)-test.

If \(\sigma^2\) is unknown, replace \(\sigma\) by \(S\): \[t=\frac{\bar X-\mu_0}{S/\sqrt n},\] and reject when \[|t|>t_{n-1,1-\alpha/2}.\] This is the classical two-sided \(t\)-test.

29 Intersection-Union Tests

This section explains the complementary construction, where we reject only if all component tests reject.

The intersection-union test is useful when the alternative hypothesis requires several conditions to hold simultaneously.

Suppose \[H_0:\theta\in\Theta_0=\bigcup_{\gamma\in\Gamma}\Theta_\gamma.\] Then by De Morgan’s law, \[H_1:\theta\in\Theta_0^c=\bigcap_{\gamma\in\Gamma}\Theta_\gamma^c.\] For each component test, let \[R_\gamma=\{x:T_\gamma(x)>c\}.\] The IUT rejects only when all component tests reject: \[R=\bigcap_{\gamma\in\Gamma}R_\gamma.\] Equivalently, \[R=\left\{x:\inf_{\gamma\in\Gamma}T_\gamma(x)>c\right\}.\] Thus the combined test statistic is \[T(x)=\inf_{\gamma\in\Gamma}T_\gamma(x).\]

NoteExample

Example 24 (Acceptance sampling for upholstery fabric). A batch of upholstery fabric has two quality parameters:

  • \(\theta_1\): mean breaking strength, which should exceed \(50\) pounds;

  • \(\theta_2\): probability of passing a flammability test, which should exceed \(0.95\).

Set up an intersection-union test for determining whether a batch is acceptable.

TipSolution

The batch is acceptable only if both standards are met: \[H_1:\theta_1>50 \quad \text{and}\quad \theta_2>0.95.\] Thus the null hypothesis is that at least one standard fails: \[H_0:\theta_1\le 50 \quad \text{or}\quad \theta_2\le 0.95.\] Equivalently, \[H_0=\{\theta_1\le 50\}\cup \{\theta_2\le 0.95\}\] and \[H_1=\{\theta_1>50\}\cap \{\theta_2>0.95\}.\] We collect two types of data:

  • breaking strengths \(X_1,\ldots,X_n\), often modeled as normal with mean \(\theta_1\);

  • flammability indicators \(Y_1,\ldots,Y_m\), where \(Y_i=1\) if the item passes and \(0\) otherwise, often modeled as Bernoulli with probability \(\theta_2\).

We test the two component null hypotheses \[H_{01}:\theta_1\le 50 \qquad \text{and}\qquad H_{02}:\theta_2\le 0.95.\] The intersection-union rule rejects the overall null \(H_0\) only if both component null hypotheses are rejected. In words, we declare the batch acceptable only if the strength requirement and the flammability requirement both pass their respective tests.

30 Neyman-Pearson Lemma: Simple Hypotheses

This section records the fundamental optimality result behind likelihood ratio tests for simple hypotheses.

Although the detailed proof is usually saved for a more advanced treatment, the main message is essential: for simple null and simple alternative hypotheses, the most powerful test at a fixed significance level is a likelihood ratio test.

ImportantTheorem

Theorem 25 (Neyman-Pearson lemma). Consider testing two simple hypotheses \[H_0:\theta=\theta_0 \qquad \text{versus}\qquad H_1:\theta=\theta_1.\] Among all tests with Type I error probability at most \(\alpha\), the most powerful test rejects \(H_0\) for sufficiently small values of \[\frac{L(\theta_0\mid X)}{L(\theta_1\mid X)}.\] Equivalently, it rejects for sufficiently large values of \[\frac{L(\theta_1\mid X)}{L(\theta_0\mid X)}.\]

ImportantRemark

Remark 26. The Neyman-Pearson lemma explains why likelihood ratio tests are not only natural, but optimal, for simple-versus-simple testing problems.

31 Practice Problems

This section gives practice problems that reinforce the main ideas of hypothesis tests, likelihood ratios, \(p\)-values, and combined tests.

WarningPractice Problem

Practice Problem 27 (Coin test). A coin is tossed \(100\) times and \(62\) heads are observed. Test \[H_0:p=\frac12 \qquad \text{versus}\qquad H_1:p\ne \frac12\] using the normal approximation at level \(\alpha=0.05\).

TipSolution

Under \(H_0\), \[X\sim \operatorname{Binomial}\left(100,\frac12\right), \qquad \mathbb{E}[X]=50, \qquad \operatorname{sd}(X)=5.\] The standardized statistic is \[Z=\frac{62-50}{5}=2.4.\] For a two-sided level-\(0.05\) test, the critical value is \(1.96\). Since \[|2.4|>1.96,\] we reject \(H_0\). The approximate two-sided \(p\)-value is \[2(1-\Phi(2.4))\approx 0.0164.\]

WarningPractice Problem

Practice Problem 28 (Radar \(p\)-value). In the radar example, suppose \(X=0.4\). Compute the right-sided \(p\)-value under \(H_0\).

TipSolution

Under \(H_0\), \(X\sim \operatorname{Normal}(0,1/9)\). Standardizing gives \[Z=3X=3(0.4)=1.2.\] Thus the right-sided \(p\)-value is \[p=\mathbb{P}(Z\ge 1.2)=1-\Phi(1.2)\approx 0.1151.\] At level \(0.05\), we would not reject \(H_0\).

WarningPractice Problem

Practice Problem 29 (Normal mean, known variance). Suppose \(X_1,\ldots,X_{25}\sim \operatorname{Normal}(\mu,16)\) and \(\bar x=53\). Test \[H_0:\mu=50 \qquad \text{versus}\qquad H_1:\mu\ne 50\] at level \(0.05\).

TipSolution

Here \(\sigma=4\) and \(n=25\), so \[Z=\frac{\bar x-\mu_0}{\sigma/\sqrt n} =\frac{53-50}{4/5}=\frac{3}{0.8}=3.75.\] Since \[|3.75|>1.96,\] we reject \(H_0\) at level \(0.05\).

WarningPractice Problem

Practice Problem 30 (Normal mean, unknown variance). Suppose \(X_1,\ldots,X_{16}\) are normal, \(\bar x=10.8\), and \(s=2.4\). Test \[H_0:\mu=10 \qquad \text{versus}\qquad H_1:\mu\ne 10\] at level \(0.05\).

TipSolution

The test statistic is \[t=\frac{\bar x-\mu_0}{s/\sqrt n} =\frac{10.8-10}{2.4/4} =\frac{0.8}{0.6}=1.333.\] There are \(n-1=15\) degrees of freedom. The two-sided \(0.05\) critical value is about \[t_{15,0.975}\approx 2.131.\] Since \[|1.333|<2.131,\] we fail to reject \(H_0\).

WarningPractice Problem

Practice Problem 31 (Likelihood ratio for Bernoulli simple hypotheses). Let \(X_1,\ldots,X_n\sim \operatorname{Bernoulli}(p)\). Test \[H_0:p=p_0 \qquad \text{versus}\qquad H_1:p=p_1,\] where \(p_1>p_0\). Find the likelihood ratio and describe the rejection region.

TipSolution

Let \[S=\sum_{i=1}^n X_i.\] The likelihood is \[L(p\mid x)=p^S(1-p)^{n-S}.\] Thus \[\lambda(x)=\frac{L(p_0\mid x)}{L(p_1\mid x)} =\left(\frac{p_0}{p_1}\right)^S \left(\frac{1-p_0}{1-p_1}\right)^{n-S}.\] Taking logs, \[\log\lambda(x)=S\log\left(\frac{p_0}{p_1}\right) +(n-S)\log\left(\frac{1-p_0}{1-p_1}\right).\] Because \(p_1>p_0\), the likelihood ratio decreases as \(S\) increases. Therefore the LRT rejects \(H_0\) for large values of \(S\), i.e. \[\sum_{i=1}^n X_i \ge k\] for a threshold \(k\) chosen to control the Type I error probability.

WarningPractice Problem

Practice Problem 32 (Bayesian posterior decision). Suppose \(\mu\mid x\sim \operatorname{Normal}(4,1)\). Test \[H_0:\mu\le 3 \qquad \text{versus}\qquad H_1:\mu>3.\] Using the Bayesian rule “reject \(H_0\) if \(\mathbb{P}(H_0\mid x)<1/2\),” what is the decision?

TipSolution

Since the posterior is normal with mean \(4\) and variance \(1\), \[\mathbb{P}(H_0\mid x)=\mathbb{P}(\mu\le 3\mid x)=\Phi\left(\frac{3-4}{1}\right)=\Phi(-1)\approx 0.1587.\] Since \[0.1587<\frac12,\] we reject \(H_0\) using this Bayesian decision rule.

WarningPractice Problem

Practice Problem 33 (Intersection-union logic). A device is acceptable only if its battery life is above \(10\) hours and its failure probability is below \(0.01\). Write the null and alternative hypotheses for an IUT.

TipSolution

Let \(\theta_1\) be the mean battery life and \(\theta_2\) be the failure probability. The device is acceptable only if \[\theta_1>10 \qquad \text{and}\qquad \theta_2<0.01.\] Thus the alternative is \[H_1:\theta_1>10 \text{ and } \theta_2<0.01.\] The null is that at least one requirement fails: \[H_0:\theta_1\le 10 \text{ or } \theta_2\ge 0.01.\] An IUT rejects \(H_0\) only if both component tests reject their corresponding component null hypotheses.

32 Summary

This section introduced the foundations and construction methods for hypothesis testing.

TipKey idea

Key takeaways

  • A hypothesis test divides the sample space into an acceptance region and a rejection region.

  • Type I error means rejecting a true null; Type II error means failing to reject a false null.

  • The significance level controls Type I error probability.

  • The \(p\)-value is the smallest significance level that would reject \(H_0\).

  • Likelihood ratio tests reject when the null model fits much worse than the unrestricted model.

  • The classical \(Z\)-test and \(t\)-test can be derived from likelihood ratio tests.

  • Bayesian tests use posterior probabilities of hypotheses.

  • UITs reject when any component test rejects; IUTs reject only when all component tests reject.