16 Chapter 15: Hypothesis Tests II — Evaluating Tests

This chapter explains how to evaluate and compare hypothesis tests after constructing them. The main ideas are Type I and Type II errors, power functions, test size and level, unbiased tests, uniformly most powerful tests, $p$-values, and risk functions.

Topics

Evaluating hypothesis tests; Type I and Type II errors; power function; size and level; unbiased tests; uniformly most powerful tests; Neyman–Pearson lemma; binomial and normal UMP tests; $p$-values; loss functions; risk functions; decision-theoretic evaluation of tests.

17 Why We Need to Evaluate Tests

This section explains how to compare hypothesis tests after we have learned how to construct them.

In hypothesis testing, different methods may lead to different tests. A likelihood ratio test, a Bayesian test, a union-intersection test, and a classical test statistic can all be reasonable procedures. The key question is: how do we decide whether one test is better than another?

Key idea

Main idea Hypothesis tests are evaluated and compared through their probabilities of making mistakes. A good test should have small Type I error probability and small Type II error probability, subject to the unavoidable trade-off between the two.

The main evaluation tools in this section are:

the power function;
the size or significance level;
unbiasedness of tests;
uniformly most powerful tests;
risk functions and expected loss.

18 Two Types of Errors

This section reviews the two fundamental errors in hypothesis testing and introduces the power function as a unified way to measure them.

Consider a hypothesis test \[H_0: \theta\in \Theta_0 \qquad \text{versus} \qquad H_1: \theta\in \Theta_0^c.\] Let $R$ be the rejection region of the test. The test rejects $H_0$ when $X\in R$ and fails to reject $H_0$ when $X\in R^c$.

18.1 Type I and Type II errors

This subsection defines the two ways a hypothesis test can be wrong.

Definition

Definition 1 (Type I and Type II errors). For the test $H_0:\theta\in\Theta_0$ versus $H_1:\theta\in\Theta_0^c$:

A Type I error occurs when $\theta\in\Theta_0$ but the test rejects $H_0$.
A Type II error occurs when $\theta\in\Theta_0^c$ but the test fails to reject $H_0$.

Decision	$H_0$ true	$H_1$ true
Reject $H_0$	Type I error	Correct decision
Fail to reject $H_0$	Correct decision	Type II error

For $\theta\in\Theta_0$, the Type I error probability is \[\mathbb{P}_\theta(X\in R).\] For $\theta\in\Theta_0^c$, the Type II error probability is \[\mathbb{P}_\theta(X\in R^c)=1-\mathbb{P}_\theta(X\in R).\]

18.2 Power function

This subsection introduces the main function used to evaluate a test over all parameter values.

Definition

Definition 2 (Power function). The power function of a hypothesis test with rejection region $R$ is \[\beta(\theta):=\mathbb{P}_\theta(X\in R).\] Thus, \[\beta(\theta)= \begin{cases} \mathbb{P}_\theta(\text{Type I error}), & \theta\in\Theta_0,\\[3pt] 1-\mathbb{P}_\theta(\text{Type II error}), & \theta\in\Theta_0^c. \end{cases}\]

Key idea

Ideal power function The ideal test would have \[\beta(\theta)= \begin{cases} 0, & \theta\in\Theta_0,\\ 1, & \theta\in\Theta_0^c. \end{cases}\] This would mean no Type I errors and no Type II errors. In real problems, such a perfect test is usually impossible.

19 Power Function Examples

This section computes power functions for binomial and normal tests and shows the trade-off between Type I and Type II errors.

19.1 Binomial power functions

This subsection compares two tests for the same binomial hypothesis.

Example

Example 3 (Binomial power function: two tests). Let \[X\sim \operatorname{Binomial}(n=5,\theta),\] and consider \[H_0:\theta\le \frac12 \qquad \text{versus} \qquad H_1:\theta>\frac12.\] Compare the following two tests.

Test 1: reject $H_0$ if $X=5$.

The rejection region is $R_1=\{5\}$, so the power function is \[\beta_1(\theta)=\mathbb{P}_\theta(X=5)=\theta^5.\] For $\theta\le 1/2$, \[\beta_1(\theta)\le \left(\frac12\right)^5=\frac{1}{32}\approx 0.03125.\] Thus Test 1 has a small Type I error probability, but for values of $\theta$ only moderately larger than $1/2$, it has a large Type II error probability.

Test 2: reject $H_0$ if $X=3,4,$ or $5$.

The rejection region is $R_2=\{3,4,5\}$, so the power function is \[\beta_2(\theta)=\mathbb{P}_\theta(X=3,4,5) =\binom53\theta^3(1-\theta)^2+\binom54\theta^4(1-\theta)+\binom55\theta^5.\]

Solution

The first test is conservative: it rejects only when all five trials are successes. Its Type I error probability is at most $1/32$ under $H_0$.

The second test rejects more often, so it has larger power for $\theta>1/2$. However, it also has larger Type I error probability. At the boundary $\theta=1/2$, \[\beta_2\left(\frac12\right) =\binom53\left(\frac12\right)^5+\binom54\left(\frac12\right)^5+\binom55\left(\frac12\right)^5 =\frac{10+5+1}{32}=\frac12.\] Thus Test 2 has much larger power but also much larger size. This illustrates the Type I–Type II error trade-off.

Figure 19.1: Power functions for two binomial tests with n = 5.

19.2 Normal power function

This subsection derives the power function of a one-sided normal mean test.

Example

Example 4 (Normal power function). Let \[X_1,\ldots,X_n\sim \operatorname{Normal}(\theta,\sigma^2),\] where $\sigma^2$ is known. Test \[H_0:\theta\le \theta_0 \qquad \text{versus} \qquad H_1:\theta>\theta_0.\] Consider the test that rejects $H_0$ if \[\frac{\bar X-\theta_0}{\sigma/\sqrt n}>c,\] where $c>0$ is a constant.

Solution

For any fixed $\theta$, \[Z=\frac{\bar X-\theta}{\sigma/\sqrt n}\sim N(0,1).\] Thus the power function is \[\begin{aligned} \beta(\theta) &=\mathbb{P}_\theta\left(\frac{\bar X-\theta_0}{\sigma/\sqrt n}>c\right)\\ &=\mathbb{P}_\theta\left(\frac{\bar X-\theta}{\sigma/\sqrt n}>c+\frac{\theta_0-\theta}{\sigma/\sqrt n}\right)\\ &=1-\Phi\left(c+\frac{\theta_0-\theta}{\sigma/\sqrt n}\right). \end{aligned}\] Therefore, \[\beta(\theta_0)=1-\Phi(c).\] If $c=1.28$, then $\mathbb{P}(Z>1.28)\approx 0.10$, so the test has about a $10\%$ significance level at the boundary $\theta=\theta_0$.

The power function has the following behavior: \[\theta\to -\infty \implies \beta(\theta)\to 0, \qquad \theta\to +\infty \implies \beta(\theta)\to 1.\] For fixed sample size, decreasing Type I error usually increases Type II error, and increasing power usually increases Type I error.

19.3 Choosing sample size from power requirements

This subsection shows how power calculations determine a sample size.

Example

Example 5 (Choosing $c$ and $n$). Let \[X_1,\ldots,X_n\sim \operatorname{Normal}(\theta,\sigma^2),\] with known $\sigma^2$. Test \[H_0:\theta\le \theta_0 \qquad \text{versus} \qquad H_1:\theta>\theta_0\] using the rule \[\frac{\bar X-\theta_0}{\sigma/\sqrt n}>c.\] Choose $c$ and $n$ so that \[\mathbb{P}(\text{Type I error})\le 0.10, \qquad \mathbb{P}(\text{Type II error})\le 0.20\quad \text{if } \theta\ge \theta_0+\sigma.\]

Solution

The Type I error is largest at the boundary $\theta=\theta_0$. Hence we require \[\beta(\theta_0)=\mathbb{P}(Z>c)=0.10.\] Thus \[c=z_{0.10}\approx 1.28.\] Next, requiring Type II error at most $0.20$ when $\theta=\theta_0+\sigma$ is equivalent to requiring power at least $0.80$: \[\beta(\theta_0+\sigma)=0.80.\] But \[\beta(\theta_0+\sigma) =\mathbb{P}\left(Z>c-\sqrt n\right).\] So \[\mathbb{P}\left(Z>1.28-\sqrt n\right)=0.80.\] Equivalently, \[1.28-\sqrt n=z_{0.80}^{\text{right tail}},\] where $\mathbb{P}(Z>z_{0.80}^{\text{right tail}})=0.80$. Since $z_{0.80}^{\text{right tail}}\approx -0.84$, \[1.28-\sqrt n\approx -0.84, \qquad \sqrt n\approx 2.12, \qquad n\approx 4.49.\] Therefore, choose \[\boxed{n=5.}\]

20 Size and Level of a Test

This section distinguishes the exact size of a test from the more flexible idea of a level-$\alpha$ test.

20.1 Size and level

This subsection gives the formal definitions.

Definition

Definition 6 (Size-$\alpha$ test). For $\alpha\in[0,1]$, a test with power function $\beta(\theta)$ is called a size $\alpha$ test if \[\sup_{\theta\in\Theta_0}\mathbb{P}_\theta(\text{reject }H_0) =\sup_{\theta\in\Theta_0}\beta(\theta)=\alpha.\]

Definition

Definition 7 (Level-$\alpha$ test). A test is called a level $\alpha$ test if \[\sup_{\theta\in\Theta_0}\beta(\theta)\le \alpha.\]

Remark

Remark 8. A size-$\alpha$ test uses the full Type I error budget exactly. A level-$\alpha$ test may be conservative and use less than the allowed Type I error probability.

20.2 Size of likelihood ratio tests

This subsection explains how cutoffs are chosen for likelihood ratio tests.

For a likelihood ratio test with rejection region \[R=\{x:\lambda(x)\le c\},\] the size is \[\sup_{\theta\in\Theta_0}\mathbb{P}_\theta(\lambda(X)\le c).\] To obtain a size-$\alpha$ LRT, choose $c$ so that \[\sup_{\theta\in\Theta_0}\mathbb{P}_\theta(\lambda(X)\le c)=\alpha.\]

Example

Example 9 (Size-$\alpha$ LRT for a normal mean). Suppose $X_1, \ldots,X_n\sim N(\theta,\sigma^2)$ with known $\sigma^2$, and test \[H_0:\theta=\theta_0 \qquad \text{versus} \qquad H_1:\theta\ne\theta_0.\] The LRT rejects for large values of \[\left|\frac{\bar X-\theta_0}{\sigma/\sqrt n}\right|.\] A size-$\alpha$ test rejects $H_0$ if \[\left|\frac{\bar X-\theta_0}{\sigma/\sqrt n}\right|>z_{\alpha/2},\] where $\mathbb{P}(Z>z_{\alpha/2})=\alpha/2$ for $Z\sim N(0,1)$.

Solution

Under $H_0$, \[Z=\frac{\bar X-\theta_0}{\sigma/\sqrt n}\sim N(0,1).\] Thus \[\mathbb{P}_{\theta_0}(|Z|>z_{\alpha/2})=\alpha.\] This is the classical two-sided $Z$-test. In LRT form, the cutoff is equivalent to \[c=\exp\left(-\frac{z_{\alpha/2}^2}{2}\right).\]

Key idea

Common critical values The following notation is standard: \[\begin{aligned} &z_\alpha: &&\mathbb{P}(Z>z_\alpha)=\alpha, \quad Z\sim N(0,1),\\ &t_{n-1,\alpha}: &&\mathbb{P}(T_{n-1}>t_{n-1,\alpha})=\alpha,\\ &\chi^2_{p,1-\alpha}: &&\mathbb{P}(\chi_p^2>\chi^2_{p,1-\alpha})=\alpha. \end{aligned}\]

21 Unbiased Tests

This section adapts the idea of unbiasedness from estimation to hypothesis testing.

Definition

Definition 10 (Unbiased test). A test with power function $\beta(\theta)$ is called unbiased if \[\beta(\theta')\ge \beta(\theta'')\] for every $\theta'\in\Theta_0^c$ and every $\theta''\in\Theta_0$.

This means that the probability of rejecting $H_0$ under the alternative should be at least as large as the probability of rejecting $H_0$ under the null.

Key idea

Interpretation An unbiased test should not be more likely to reject $H_0$ when $H_0$ is true than when $H_1$ is true. It should point in the correct direction.

Example

Example 11 (One-sided normal test is unbiased). Let $X_1,\ldots,X_n\sim N(\theta,\sigma^2)$ with known $\sigma^2$. Test \[H_0:\theta\le \theta_0 \qquad \text{versus} \qquad H_1:\theta>\theta_0\] by rejecting $H_0$ when \[\frac{\bar X-\theta_0}{\sigma/\sqrt n}>c.\]

Solution

The power function is \[\beta(\theta)=1-\Phi\left(c+\frac{\theta_0-\theta}{\sigma/\sqrt n}\right).\] This function is increasing in $\theta$. Therefore, if $\theta'>\theta_0$ and $\theta''\le\theta_0$, then \[\beta(\theta')\ge \beta(\theta''),\] so the test is unbiased.

22 Uniformly Most Powerful Tests

This section introduces the strongest optimality criterion for hypothesis tests in a class of tests.

22.1 Most powerful and UMP tests

This subsection defines UMP tests.

A level-$\alpha$ test guarantees that the Type I error probability is at most $\alpha$ for all parameter values in the null. After controlling Type I error, we want to make Type II error small. Equivalently, we want the power function to be large under the alternative.

Definition

Definition 12 (Uniformly most powerful test). Let $\mathcal C$ be a class of tests for \[H_0:\theta\in\Theta_0 \qquad \text{versus} \qquad H_1:\theta\in\Theta_0^c.\] A test in $\mathcal C$ with power function $\beta(\theta)$ is called a uniformly most powerful test in $\mathcal C$ if \[\beta(\theta)\ge \beta'(\theta)\] for every $\theta\in\Theta_0^c$ and for every competing power function $\beta'(\theta)$ corresponding to another test in $\mathcal C$.

Remark

Remark 13. UMP tests do not always exist, especially for two-sided alternatives. When they do exist, they provide a very strong optimality guarantee.

22.2 Neyman–Pearson lemma

This subsection gives the fundamental result for finding most powerful tests for simple hypotheses.

Theorem

Theorem 14 (Neyman–Pearson Lemma). Consider testing two simple hypotheses \[H_0:\theta=\theta_0 \qquad \text{versus} \qquad H_1:\theta=\theta_1.\] Let $f(x\mid\theta_i)$ be the pdf or pmf of $X$ under $\theta_i$. Define a rejection region $R$ by \[x\in R \quad \text{if} \quad f(x\mid\theta_1)>k f(x\mid\theta_0),\] and \[x\in R^c \quad \text{if} \quad f(x\mid\theta_1)<k f(x\mid\theta_0),\] for some constant $k\ge0$, chosen so that \[\mathbb{P}_{\theta_0}(X\in R)=\alpha.\] Then any test with this rejection region is a most powerful level-$\alpha$ test for testing $H_0$ against $H_1$.

Proof

Proof. The Neyman–Pearson lemma says that for a fixed Type I error probability, the best way to maximize power against the simple alternative $\theta_1$ is to reject where the likelihood under $H_1$ is large relative to the likelihood under $H_0$. In other words, the test rejects when \[\frac{f(x\mid\theta_1)}{f(x\mid\theta_0)}>k.\] This is exactly the likelihood-ratio principle for simple hypotheses. The full proof compares the power of any competing level-$\alpha$ test with the power of this likelihood-ratio test and uses the sign of \[f(x\mid\theta_1)-k f(x\mid\theta_0)\] on the rejection and non-rejection regions. ◻

Key idea

Likelihood-ratio form The Neyman–Pearson most powerful test rejects $H_0$ for large values of \[\frac{f(x\mid\theta_1)}{f(x\mid\theta_0)}.\] This means that the observed data are much more likely under $H_1$ than under $H_0$.

23 UMP Examples

This section applies the Neyman–Pearson lemma to binomial and normal models.

23.1 UMP binomial test

This subsection works out a discrete example where boundary cases may require special attention.

Example

Example 15 (UMP binomial test). Let \[X\sim \operatorname{Binomial}(2,\theta).\] Test \[H_0:\theta=\theta_0=\frac12 \qquad \text{versus} \qquad H_1:\theta=\theta_1=\frac34.\] Compute the likelihood ratios for $X\in\{0,1,2\}$.

Solution

The pmf is \[f(x\mid\theta)=\binom{2}{x}\theta^x(1-\theta)^{2-x}.\] The likelihood ratio is \[\frac{f(x\mid 3/4)}{f(x\mid 1/2)}.\] For $x=0$, \[\frac{f(0\mid 3/4)}{f(0\mid 1/2)} =\frac{(1/4)^2}{(1/2)^2}=\frac14.\] For $x=1$, \[\frac{f(1\mid 3/4)}{f(1\mid 1/2)} =\frac{2(3/4)(1/4)}{2(1/2)(1/2)}=\frac34.\] For $x=2$, \[\frac{f(2\mid 3/4)}{f(2\mid 1/2)} =\frac{(3/4)^2}{(1/2)^2}=\frac94.\] Thus the likelihood ratios are ordered as \[\frac14<\frac34<\frac94.\] By the Neyman–Pearson lemma, the most powerful test rejects first at $X=2$, then at $X=1$, then at $X=0$, depending on the desired size.

For example:

If $3/4<k<9/4$, reject $H_0$ when $X=2$. The size is \[\alpha=\mathbb{P}_{1/2}(X=2)=\frac14.\]
If $1/4<k<3/4$, reject $H_0$ when $X=1$ or $X=2$. The size is \[\alpha=\mathbb{P}_{1/2}(X=1\text{ or }2)=\frac34.\]
If $k>9/4$, never reject $H_0$, giving size $0$.
If $k<1/4$, always reject $H_0$, giving size $1$.

For the rejection rule $R=\{2\}$, the power function is \[\beta(\theta)=\mathbb{P}_\theta(X=2)=\theta^2.\] For the rejection rule $R=\{1,2\}$, the power function is \[\beta(\theta)=\mathbb{P}_\theta(X\ge1)=1-(1-\theta)^2.\]

23.2 UMP normal test

This subsection derives the one-sided most powerful normal test.

Example

Example 16 (UMP normal test). Suppose \[X_1,\ldots,X_n\sim \operatorname{Normal}(\theta,\sigma^2),\] where $\sigma^2$ is known. Test \[H_0:\theta=\theta_0 \qquad \text{versus} \qquad H_1:\theta=\theta_1,\] where $\theta_1<\theta_0$.

Solution

The sample mean $\bar X$ is sufficient for $\theta$, and \[\bar X\sim \operatorname{Normal}\left(\theta,\frac{\sigma^2}{n}\right).\] The density of $\bar X=t$ is \[g(t\mid \theta) =\frac{1}{\sqrt{2\pi\sigma^2/n}} \exp\left\{-\frac{n(t-\theta)^2}{2\sigma^2}\right\}.\] The Neyman–Pearson test rejects $H_0$ when \[g(\bar X\mid\theta_1)>k g(\bar X\mid\theta_0).\] Taking logarithms, this inequality is equivalent to rejecting for small values of $\bar X$, because $\theta_1<\theta_0$. Hence the most powerful test has the form \[\text{Reject }H_0 \quad \text{if} \quad \bar X<c.\] To make this a level-$\alpha$ test, choose $c$ so that \[\alpha=\mathbb{P}_{\theta_0}(\bar X<c).\] Under $H_0$, \[\bar X\sim \operatorname{Normal}\left(\theta_0,\frac{\sigma^2}{n}\right),\] so \[\boxed{c=\theta_0-\frac{\sigma}{\sqrt n}z_\alpha,}\] where $\mathbb{P}(Z>z_\alpha)=\alpha$.

If instead $\theta_1>\theta_0$, then the UMP test rejects for large values of $\bar X$: \[\bar X>\theta_0+\frac{\sigma}{\sqrt n}z_\alpha.\]

24 $p$-Values

This section defines valid $p$-values and connects them to significance levels and rejection rules.

24.1 Definition and interpretation

This subsection explains why $p$-values summarize the strength of evidence against the null hypothesis.

A $p$-value is a test statistic that measures how surprising the observed data are if the null hypothesis is true. Small $p$-values provide evidence in favor of the alternative hypothesis.

Definition

Definition 17 (Valid $p$-value). A statistic $p(X)$ is a valid $p$-value if, for every $\theta\in\Theta_0$ and every $0\le \alpha\le 1$, \[\mathbb{P}_\theta(p(X)\le \alpha)\le \alpha.\]

Remark

Remark 18. A $p$-value is not the probability that $H_0$ is true. It is a probability computed under the assumption that $H_0$ is true.

Equivalently, the $p$-value can be understood as the smallest significance level at which the observed test statistic would lead to rejection: \[p(x)=\inf\{\alpha:\text{ reject }H_0\text{ at level }\alpha\}.\]

24.2 General construction of a $p$-value

This subsection gives a general formula for a valid $p$-value.

Theorem

Theorem 19 (General $p$-value formula). Let $W(X)$ be a test statistic such that large values of $W$ provide evidence in favor of $H_1$. Then \[p(x):=\sup_{\theta\in\Theta_0}\mathbb{P}_\theta\left(W(X)\ge W(x)\right)\] defines a valid $p$-value.

Proof

Proof. The quantity $p(x)$ is the largest null probability of observing a test statistic at least as extreme as the observed statistic. Therefore, for any $\theta\in\Theta_0$, the probability that this tail probability is at most $\alpha$ is no larger than $\alpha$. Hence \[\mathbb{P}_\theta(p(X)\le\alpha)\le\alpha,\] which is the validity condition. ◻

Key idea

Interpretation A $p$-value answers the question: \[\text{How surprising is my observed signal, if the null hypothesis were true?}\] The smaller the $p$-value, the stronger the evidence against $H_0$.

24.3 One-sided normal $p$-value

This subsection computes a standard $p$-value for a one-sided normal test.

Example

Example 20 (One-sided normal $p$-value). Let $X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2)$ with $\sigma$ known. Test \[H_0:\mu\le \mu_0 \qquad \text{versus} \qquad H_1:\mu>\mu_0.\] Use the statistic \[W(X)=Z=\frac{\bar X-\mu_0}{\sigma/\sqrt n}.\] Find the $p$-value.

Solution

Large values of $Z$ favor $H_1$. For any $\mu\le\mu_0$, \[\begin{aligned} \mathbb{P}_\mu(W\ge w) &=\mathbb{P}_\mu\left(\frac{\bar X-\mu_0}{\sigma/\sqrt n}\ge w\right)\\ &=\mathbb{P}_\mu\left(\frac{\bar X-\mu}{\sigma/\sqrt n}\ge w+\frac{\mu_0-\mu}{\sigma/\sqrt n}\right). \end{aligned}\] The right tail probability is largest when $\mu=\mu_0$. Hence the supremum over $\Theta_0=\{\mu:\mu\le\mu_0\}$ is attained at $\mu=\mu_0$.

If \[z_{\mathrm{obs}}=\frac{\bar x-\mu_0}{\sigma/\sqrt n},\] then the valid $p$-value is \[\boxed{p(x)=\mathbb{P}_{\mu_0}(Z\ge z_{\mathrm{obs}})=1-\Phi(z_{\mathrm{obs}}).}\]

Example

Example 21 (Numerical $p$-value). Suppose $z_{\mathrm{obs}}=2.1$ in the one-sided normal test above. Compute the $p$-value and state the decision at levels $\alpha=0.05$ and $\alpha=0.01$.

Solution

The one-sided $p$-value is \[p=1-\Phi(2.1)\approx 0.0179.\] At $\alpha=0.05$, since $0.0179<0.05$, reject $H_0$.

At $\alpha=0.01$, since $0.0179>0.01$, fail to reject $H_0$.

25 Loss and Risk Functions for Tests

This section treats hypothesis testing as a decision problem with possible costs for different mistakes.

25.1 Decision-theoretic formulation

This subsection introduces actions, loss functions, and risk functions for hypothesis testing.

In hypothesis testing, there are two possible actions: \[a_0=\text{fail to reject }H_0, \qquad a_1=\text{reject }H_0.\] A decision rule $\delta(x)$ selects an action based on the observed data $x$.

Definition

Definition 22 (Loss function). A loss function \[L:\Theta\times \mathcal A\to \mathbb{R}\] measures the penalty for taking action $a\in\mathcal A$ when the true parameter is $\theta$.

Example

Example 23 (0–1 loss). For testing $H_0:\theta\in\Theta_0$ versus $H_1:\theta\in\Theta_0^c$, the ordinary 0–1 loss is \[L(\theta,a)= \begin{cases} 0, & \theta\in\Theta_0\text{ and }a=a_0,\text{ or }\theta\in\Theta_0^c\text{ and }a=a_1,\\ 1, & \theta\in\Theta_0\text{ and }a=a_1,\text{ or }\theta\in\Theta_0^c\text{ and }a=a_0. \end{cases}\]

Solution

The loss is $0$ for a correct decision and $1$ for an incorrect decision. Thus both Type I and Type II errors are treated as equally costly.

25.2 Generalized 0–1 loss

This subsection allows Type I and Type II errors to have different costs.

Definition

Definition 24 (Generalized 0–1 loss). Let $c_I$ be the cost of a Type I error and $c_{II}$ be the cost of a Type II error. Define \[L(\theta,a_0)= \begin{cases} 0, & \theta\in\Theta_0,\\ c_{II}, & \theta\in\Theta_0^c, \end{cases} \qquad L(\theta,a_1)= \begin{cases} c_I, & \theta\in\Theta_0,\\ 0, & \theta\in\Theta_0^c. \end{cases}\]

If $c_I=c_{II}=1$, this reduces to the ordinary 0–1 loss. If $c_I\ne c_{II}$, the test should account for the relative seriousness of the two errors.

25.3 Risk function

This subsection defines risk as the expected loss of a test.

Definition

Definition 25 (Risk function). The risk function of a decision rule $\delta$ is \[R(\theta,\delta)=\mathbb{E}_\theta[L(\theta,\delta(X))].\]

Let the power function be \[\beta(\theta)=\mathbb{P}_\theta(\delta(X)=a_1),\] the probability of rejecting $H_0$ under parameter $\theta$. Under generalized 0–1 loss, \[R(\theta,\delta)= \begin{cases} c_I\beta(\theta), & \theta\in\Theta_0,\\[3pt] c_{II}\{1-\beta(\theta)\}, & \theta\in\Theta_0^c. \end{cases}\]

Key idea

Risk interpretation For $\theta\in\Theta_0$, the risk is the Type I error probability weighted by the cost $c_I$. For $\theta\in\Theta_0^c$, the risk is the Type II error probability weighted by the cost $c_{II}$.

26 Risk Function Example

This section computes the risk function for a one-sided normal test.

Example

Example 26 (Normal one-sided test risk). Let \[X_1,\ldots,X_n\stackrel{iid}{\sim} N(\theta,\sigma^2), \qquad \sigma=1, \qquad n=25.\] Test \[H_0:\theta\le 0 \qquad \text{versus} \qquad H_1:\theta>0.\] Use the size $\alpha=0.05$ test that rejects $H_0$ when \[\bar X>z_{1-\alpha}\frac{\sigma}{\sqrt n}.\] Take $c_I=2$ and $c_{II}=1$. Compute the power function and risk function.

Solution

Here \[z_{1-\alpha}=z_{0.95}\approx 1.645, \qquad \frac{\sigma}{\sqrt n}=\frac15.\] So the rejection rule is \[\bar X>1.645\cdot \frac15=0.329.\] For a general $\theta$, \[\bar X\sim N\left(\theta,\frac1{25}\right).\] Thus the power function is \[\begin{aligned} \beta(\theta) &=\mathbb{P}_\theta(\bar X>0.329)\\ &=\mathbb{P}\left(\frac{\bar X-\theta}{1/5}>\frac{0.329-\theta}{1/5}\right)\\ &=1-\Phi(1.645-5\theta). \end{aligned}\] Under the generalized 0–1 loss with $c_I=2$ and $c_{II}=1$, \[R(\theta)= \begin{cases} 2\beta(\theta), & \theta\le 0,\\[3pt] 1-\beta(\theta), & \theta>0. \end{cases}\]

Example

Example 27 (Numerical risk values). Using the previous example, compute the risk at $\theta=0,-0.2,0.2,0.5$.

Solution

The power function is \[\beta(\theta)=1-\Phi(1.645-5\theta).\] At the boundary $\theta=0$, \[\beta(0)=0.05, \qquad R(0)=2(0.05)=0.10.\] At $\theta=-0.2$, \[\beta(-0.2)=1-\Phi(1.645+1)=1-\Phi(2.645)\approx 0.0041,\] so \[R(-0.2)=2(0.0041)=0.0082.\] At $\theta=0.2$, \[\beta(0.2)=1-\Phi(1.645-1)=1-\Phi(0.645)\approx 0.260.\] Since $\theta>0$, \[R(0.2)=1-\beta(0.2)\approx 0.740.\] At $\theta=0.5$, \[\beta(0.5)=1-\Phi(1.645-2.5)=1-\Phi(-0.855)=\Phi(0.855)\approx 0.803.\] Thus \[R(0.5)=1-0.803=0.197.\]

Important lesson

Important lesson The risk can be small under parts of the null and large near the alternative boundary. The shape of the risk function depends on both the power function and the relative costs $c_I$ and $c_{II}$.

27 Practice Problems

This section gives practice problems that reinforce the main definitions and calculations from the section.

Practice Problem

Practice Problem 28 (Power function for a binomial test). Let $X\sim\operatorname{Binomial}(6,\theta)$ and test \[H_0:\theta\le \frac12 \qquad \text{versus} \qquad H_1:\theta>\frac12.\] Consider the test that rejects $H_0$ if $X\ge5$.

Find the power function.
Find the size of the test.
Find the Type II error probability when $\theta=0.8$.

Solution

(a) The rejection region is $R=\{5,6\}$, so \[\beta(\theta)=\mathbb{P}_\theta(X\ge5) =\binom65\theta^5(1-\theta)+\binom66\theta^6 =6\theta^5(1-\theta)+\theta^6.\] (b) Since $\beta(\theta)$ is increasing in $\theta$, the size is attained at $\theta=1/2$: \[\alpha=\beta(1/2)=6\left(\frac12\right)^5\left(\frac12\right)+\left(\frac12\right)^6 =\frac{6+1}{64}=\frac7{64}.\] (c) At $\theta=0.8$, \[\beta(0.8)=6(0.8)^5(0.2)+(0.8)^6.\] The Type II error probability is \[1-\beta(0.8)=1-\{6(0.8)^5(0.2)+(0.8)^6\}.\] Numerically, this is approximately $0.3446$.

Practice Problem

Practice Problem 29 (Normal sample size). Let $X_1,\ldots,X_n\sim N(\theta,\sigma^2)$ with known $\sigma^2$. Test \[H_0:\theta\le\theta_0 \qquad \text{versus} \qquad H_1:\theta>\theta_0\] using a level $0.05$ test. Find a formula for the smallest $n$ such that the power is at least $0.90$ when $\theta=\theta_0+\Delta$, where $\Delta>0$.

Solution

The level $0.05$ test rejects when \[\frac{\bar X-\theta_0}{\sigma/\sqrt n}>z_{0.05}.\] At $\theta=\theta_0+\Delta$, the power is \[\beta(\theta_0+\Delta) =1-\Phi\left(z_{0.05}-\frac{\Delta\sqrt n}{\sigma}\right).\] We require \[1-\Phi\left(z_{0.05}-\frac{\Delta\sqrt n}{\sigma}\right)\ge 0.90.\] Equivalently, \[\Phi\left(z_{0.05}-\frac{\Delta\sqrt n}{\sigma}\right)\le0.10.\] Thus \[z_{0.05}-\frac{\Delta\sqrt n}{\sigma}\le z_{0.90}^{\text{right tail}},\] where $\mathbb{P}(Z>z_{0.90}^{\text{right tail}})=0.90$, so $z_{0.90}^{\text{right tail}}\approx -1.2816$. Hence \[\sqrt n\ge \frac{\sigma}{\Delta}(z_{0.05}+1.2816).\] Therefore \[\boxed{n\ge \left(\frac{\sigma}{\Delta}(z_{0.05}+1.2816)\right)^2.}\] The smallest integer $n$ is the ceiling of the right-hand side.

Practice Problem

Practice Problem 30 (One-sided normal p-value). Let $X_1,\ldots,X_{25}\sim N(\mu,4)$ and test \[H_0:\mu\le 10 \qquad \text{versus} \qquad H_1:\mu>10.\] Suppose $\bar x=10.9$. Compute the $p$-value and state the decision at $\alpha=0.05$.

Solution

Here $\sigma=2$ and $n=25$, so $\sigma/\sqrt n=2/5=0.4$. The observed test statistic is \[z_{\mathrm{obs}}=\frac{10.9-10}{0.4}=2.25.\] The one-sided $p$-value is \[p=1-\Phi(2.25)\approx 0.0122.\] Since $0.0122<0.05$, reject $H_0$ at the $5\%$ level.

Practice Problem

Practice Problem 31 (Risk function). For the one-sided normal test in the previous problem, use generalized 0–1 loss with $c_I=3$ and $c_{II}=1$. Write the risk function in terms of the power function.

Solution

The decision is to reject or fail to reject $H_0:\mu\le10$. The risk function is \[R(\mu,\delta)= \begin{cases} 3\beta(\mu), & \mu\le 10,\\[3pt] 1-\beta(\mu), & \mu>10. \end{cases}\] where \[\beta(\mu)=\mathbb{P}_\mu(\text{reject }H_0).\] For a level-$0.05$ test, \[\beta(\mu)=1-\Phi\left(z_{0.05}+\frac{10-\mu}{2/5}\right).\]

28 Summary

This section summarizes the main concepts used to evaluate hypothesis tests.

Concept	Definition	Meaning
Power function	$\beta(\theta)=\mathbb{P}_\theta(X\in R)$	Probability of rejecting $H_0$ at parameter value $\theta$
Type I error	$\mathbb{P}_\theta(X\in R)$ for $\theta\in\Theta_0$	Rejecting $H_0$ when $H_0$ is true
Type II error	$1-\beta(\theta)$ for $\theta\in\Theta_0^c$	Failing to reject $H_0$ when $H_1$ is true
Size	$\sup_{\theta\in\Theta_0}\beta(\theta)$	Worst-case Type I error probability
Level $\alpha$	$\sup_{\theta\in\Theta_0}\beta(\theta)\le\alpha$	Type I error controlled by $\alpha$
Unbiased test	$\beta(\theta')\ge\beta(\theta'')$ for $\theta'\in\Theta_0^c$, $\theta''\in\Theta_0$	Rejects more often under the alternative than under the null
UMP test	Maximizes power for all $\theta\in\Theta_0^c$ within a class	Strong optimality criterion
$p$-value	Smallest level at which $H_0$ is rejected	Measures evidence against $H_0$
Risk function	$R(\theta,\delta)=\mathbb{E}_\theta[L(\theta,\delta(X))]$	Expected loss of a test

Key idea

Key message A hypothesis test is not evaluated by whether it is correct on one dataset. It is evaluated by its long-run error probabilities, its power function, and, when costs matter, its risk function.

--- title: "Chapter 15: Hypothesis Tests II — Evaluating Tests" format: html: toc: true toc-depth: 3 number-sections: true pdf: toc: true number-sections: true execute: warning: false message: false --- This chapter explains how to evaluate and compare hypothesis tests after constructing them. The main ideas are Type I and Type II errors, power functions, test size and level, unbiased tests, uniformly most powerful tests, $p$-values, and risk functions. ::: {.callout-note title="Topics"} Evaluating hypothesis tests; Type I and Type II errors; power function; size and level; unbiased tests; uniformly most powerful tests; Neyman--Pearson lemma; binomial and normal UMP tests; $p$-values; loss functions; risk functions; decision-theoretic evaluation of tests. ::: # Why We Need to Evaluate Tests This section explains how to compare hypothesis tests after we have learned how to construct them. In hypothesis testing, different methods may lead to different tests. A likelihood ratio test, a Bayesian test, a union-intersection test, and a classical test statistic can all be reasonable procedures. The key question is: *how do we decide whether one test is better than another?* ::: {.callout-tip title="Key idea"} Main idea Hypothesis tests are evaluated and compared through their probabilities of making mistakes. A good test should have small Type I error probability and small Type II error probability, subject to the unavoidable trade-off between the two. ::: The main evaluation tools in this section are: - the **power function**; - the **size** or **significance level**; - **unbiasedness** of tests; - **uniformly most powerful** tests; - **risk functions** and expected loss. # Two Types of Errors This section reviews the two fundamental errors in hypothesis testing and introduces the power function as a unified way to measure them. Consider a hypothesis test $$H_0: \theta\in \Theta_0 \qquad \text{versus} \qquad H_1: \theta\in \Theta_0^c.$$ Let $R$ be the rejection region of the test. The test rejects $H_0$ when $X\in R$ and fails to reject $H_0$ when $X\in R^c$. ## Type I and Type II errors This subsection defines the two ways a hypothesis test can be wrong. ::: {.callout-note title="Definition"} **Definition 1** (Type I and Type II errors). For the test $H_0:\theta\in\Theta_0$ versus $H_1:\theta\in\Theta_0^c$: - A **Type I error** occurs when $\theta\in\Theta_0$ but the test rejects $H_0$. - A **Type II error** occurs when $\theta\in\Theta_0^c$ but the test fails to reject $H_0$. ::: | Decision | $H_0$ true | $H_1$ true | |---|---|---| | Reject $H_0$ | Type I error | Correct decision | | Fail to reject $H_0$ | Correct decision | Type II error | For $\theta\in\Theta_0$, the Type I error probability is $$\mathbb{P}_\theta(X\in R).$$ For $\theta\in\Theta_0^c$, the Type II error probability is $$\mathbb{P}_\theta(X\in R^c)=1-\mathbb{P}_\theta(X\in R).$$ ## Power function This subsection introduces the main function used to evaluate a test over all parameter values. ::: {.callout-note title="Definition"} **Definition 2** (Power function). The **power function** of a hypothesis test with rejection region $R$ is $$\beta(\theta):=\mathbb{P}_\theta(X\in R).$$ Thus, $$\beta(\theta)= \begin{cases} \mathbb{P}_\theta(\text{Type I error}), & \theta\in\Theta_0,\\[3pt] 1-\mathbb{P}_\theta(\text{Type II error}), & \theta\in\Theta_0^c. \end{cases}$$ ::: ::: {.callout-tip title="Key idea"} Ideal power function The ideal test would have $$\beta(\theta)= \begin{cases} 0, & \theta\in\Theta_0,\\ 1, & \theta\in\Theta_0^c. \end{cases}$$ This would mean no Type I errors and no Type II errors. In real problems, such a perfect test is usually impossible. ::: # Power Function Examples This section computes power functions for binomial and normal tests and shows the trade-off between Type I and Type II errors. ## Binomial power functions This subsection compares two tests for the same binomial hypothesis. ::: {.callout-tip title="Example"} **Example 3** (Binomial power function: two tests). Let $$X\sim \operatorname{Binomial}(n=5,\theta),$$ and consider $$H_0:\theta\le \frac12 \qquad \text{versus} \qquad H_1:\theta>\frac12.$$ Compare the following two tests. **Test 1:** reject $H_0$ if $X=5$. The rejection region is $R_1=\{5\}$, so the power function is $$\beta_1(\theta)=\mathbb{P}_\theta(X=5)=\theta^5.$$ For $\theta\le 1/2$, $$\beta_1(\theta)\le \left(\frac12\right)^5=\frac{1}{32}\approx 0.03125.$$ Thus Test 1 has a small Type I error probability, but for values of $\theta$ only moderately larger than $1/2$, it has a large Type II error probability. **Test 2:** reject $H_0$ if $X=3,4,$ or $5$. The rejection region is $R_2=\{3,4,5\}$, so the power function is $$\beta_2(\theta)=\mathbb{P}_\theta(X=3,4,5) =\binom53\theta^3(1-\theta)^2+\binom54\theta^4(1-\theta)+\binom55\theta^5.$$ ::: ::: {.callout-note title="Solution"} The first test is conservative: it rejects only when all five trials are successes. Its Type I error probability is at most $1/32$ under $H_0$. The second test rejects more often, so it has larger power for $\theta>1/2$. However, it also has larger Type I error probability. At the boundary $\theta=1/2$, $$\beta_2\left(\frac12\right) =\binom53\left(\frac12\right)^5+\binom54\left(\frac12\right)^5+\binom55\left(\frac12\right)^5 =\frac{10+5+1}{32}=\frac12.$$ Thus Test 2 has much larger power but also much larger size. This illustrates the Type I--Type II error trade-off. ::: ```{python} #| label: fig-binomial-power-comparison #| fig-cap: "Power functions for two binomial tests with n = 5." #| echo: false import numpy as np import matplotlib.pyplot as plt theta = np.linspace(0, 1, 400) beta1 = theta**5 beta2 = 10*theta**3*(1-theta)**2 + 5*theta**4*(1-theta) + theta**5 plt.figure(figsize=(7, 4.2)) plt.plot(theta, beta1, label=r"$\beta_1(\theta)=\theta^5$") plt.plot(theta, beta2, linestyle="--", label=r"$\beta_2(\theta)=P(X\geq 3)$") plt.axvline(0.5, linestyle=":") plt.xlabel(r"$\theta$") plt.ylabel(r"$\beta(\theta)$") plt.ylim(0, 1.05) plt.grid(True, alpha=0.3) plt.legend() plt.show() ``` ## Normal power function This subsection derives the power function of a one-sided normal mean test. ::: {.callout-tip title="Example"} **Example 4** (Normal power function). Let $$X_1,\ldots,X_n\sim \operatorname{Normal}(\theta,\sigma^2),$$ where $\sigma^2$ is known. Test $$H_0:\theta\le \theta_0 \qquad \text{versus} \qquad H_1:\theta>\theta_0.$$ Consider the test that rejects $H_0$ if $$\frac{\bar X-\theta_0}{\sigma/\sqrt n}>c,$$ where $c>0$ is a constant. ::: ::: {.callout-note title="Solution"} For any fixed $\theta$, $$Z=\frac{\bar X-\theta}{\sigma/\sqrt n}\sim N(0,1).$$ Thus the power function is $$\begin{aligned} \beta(\theta) &=\mathbb{P}_\theta\left(\frac{\bar X-\theta_0}{\sigma/\sqrt n}>c\right)\\ &=\mathbb{P}_\theta\left(\frac{\bar X-\theta}{\sigma/\sqrt n}>c+\frac{\theta_0-\theta}{\sigma/\sqrt n}\right)\\ &=1-\Phi\left(c+\frac{\theta_0-\theta}{\sigma/\sqrt n}\right). \end{aligned}$$ Therefore, $$\beta(\theta_0)=1-\Phi(c).$$ If $c=1.28$, then $\mathbb{P}(Z>1.28)\approx 0.10$, so the test has about a $10\%$ significance level at the boundary $\theta=\theta_0$. ::: The power function has the following behavior: $$\theta\to -\infty \implies \beta(\theta)\to 0, \qquad \theta\to +\infty \implies \beta(\theta)\to 1.$$ For fixed sample size, decreasing Type I error usually increases Type II error, and increasing power usually increases Type I error. ## Choosing sample size from power requirements This subsection shows how power calculations determine a sample size. ::: {.callout-tip title="Example"} **Example 5** (Choosing $c$ and $n$). Let $$X_1,\ldots,X_n\sim \operatorname{Normal}(\theta,\sigma^2),$$ with known $\sigma^2$. Test $$H_0:\theta\le \theta_0 \qquad \text{versus} \qquad H_1:\theta>\theta_0$$ using the rule $$\frac{\bar X-\theta_0}{\sigma/\sqrt n}>c.$$ Choose $c$ and $n$ so that $$\mathbb{P}(\text{Type I error})\le 0.10, \qquad \mathbb{P}(\text{Type II error})\le 0.20\quad \text{if } \theta\ge \theta_0+\sigma.$$ ::: ::: {.callout-note title="Solution"} The Type I error is largest at the boundary $\theta=\theta_0$. Hence we require $$\beta(\theta_0)=\mathbb{P}(Z>c)=0.10.$$ Thus $$c=z_{0.10}\approx 1.28.$$ Next, requiring Type II error at most $0.20$ when $\theta=\theta_0+\sigma$ is equivalent to requiring power at least $0.80$: $$\beta(\theta_0+\sigma)=0.80.$$ But $$\beta(\theta_0+\sigma) =\mathbb{P}\left(Z>c-\sqrt n\right).$$ So $$\mathbb{P}\left(Z>1.28-\sqrt n\right)=0.80.$$ Equivalently, $$1.28-\sqrt n=z_{0.80}^{\text{right tail}},$$ where $\mathbb{P}(Z>z_{0.80}^{\text{right tail}})=0.80$. Since $z_{0.80}^{\text{right tail}}\approx -0.84$, $$1.28-\sqrt n\approx -0.84, \qquad \sqrt n\approx 2.12, \qquad n\approx 4.49.$$ Therefore, choose $$\boxed{n=5.}$$ ::: # Size and Level of a Test This section distinguishes the exact size of a test from the more flexible idea of a level-$\alpha$ test. ## Size and level This subsection gives the formal definitions. ::: {.callout-note title="Definition"} **Definition 6** (Size-$\alpha$ test). For $\alpha\in[0,1]$, a test with power function $\beta(\theta)$ is called a **size $\alpha$ test** if $$\sup_{\theta\in\Theta_0}\mathbb{P}_\theta(\text{reject }H_0) =\sup_{\theta\in\Theta_0}\beta(\theta)=\alpha.$$ ::: ::: {.callout-note title="Definition"} **Definition 7** (Level-$\alpha$ test). A test is called a **level $\alpha$ test** if $$\sup_{\theta\in\Theta_0}\beta(\theta)\le \alpha.$$ ::: ::: {.callout-important title="Remark"} *Remark 8*. A size-$\alpha$ test uses the full Type I error budget exactly. A level-$\alpha$ test may be conservative and use less than the allowed Type I error probability. ::: ## Size of likelihood ratio tests This subsection explains how cutoffs are chosen for likelihood ratio tests. For a likelihood ratio test with rejection region $$R=\{x:\lambda(x)\le c\},$$ the size is $$\sup_{\theta\in\Theta_0}\mathbb{P}_\theta(\lambda(X)\le c).$$ To obtain a size-$\alpha$ LRT, choose $c$ so that $$\sup_{\theta\in\Theta_0}\mathbb{P}_\theta(\lambda(X)\le c)=\alpha.$$ ::: {.callout-tip title="Example"} **Example 9** (Size-$\alpha$ LRT for a normal mean). Suppose $X_1, \ldots,X_n\sim N(\theta,\sigma^2)$ with known $\sigma^2$, and test $$H_0:\theta=\theta_0 \qquad \text{versus} \qquad H_1:\theta\ne\theta_0.$$ The LRT rejects for large values of $$\left|\frac{\bar X-\theta_0}{\sigma/\sqrt n}\right|.$$ A size-$\alpha$ test rejects $H_0$ if $$\left|\frac{\bar X-\theta_0}{\sigma/\sqrt n}\right|>z_{\alpha/2},$$ where $\mathbb{P}(Z>z_{\alpha/2})=\alpha/2$ for $Z\sim N(0,1)$. ::: ::: {.callout-note title="Solution"} Under $H_0$, $$Z=\frac{\bar X-\theta_0}{\sigma/\sqrt n}\sim N(0,1).$$ Thus $$\mathbb{P}_{\theta_0}(|Z|>z_{\alpha/2})=\alpha.$$ This is the classical two-sided $Z$-test. In LRT form, the cutoff is equivalent to $$c=\exp\left(-\frac{z_{\alpha/2}^2}{2}\right).$$ ::: ::: {.callout-tip title="Key idea"} Common critical values The following notation is standard: $$\begin{aligned} &z_\alpha: &&\mathbb{P}(Z>z_\alpha)=\alpha, \quad Z\sim N(0,1),\\ &t_{n-1,\alpha}: &&\mathbb{P}(T_{n-1}>t_{n-1,\alpha})=\alpha,\\ &\chi^2_{p,1-\alpha}: &&\mathbb{P}(\chi_p^2>\chi^2_{p,1-\alpha})=\alpha. \end{aligned}$$ ::: # Unbiased Tests This section adapts the idea of unbiasedness from estimation to hypothesis testing. ::: {.callout-note title="Definition"} **Definition 10** (Unbiased test). A test with power function $\beta(\theta)$ is called **unbiased** if $$\beta(\theta')\ge \beta(\theta'')$$ for every $\theta'\in\Theta_0^c$ and every $\theta''\in\Theta_0$. ::: This means that the probability of rejecting $H_0$ under the alternative should be at least as large as the probability of rejecting $H_0$ under the null. ::: {.callout-tip title="Key idea"} Interpretation An unbiased test should not be more likely to reject $H_0$ when $H_0$ is true than when $H_1$ is true. It should point in the correct direction. ::: ::: {.callout-tip title="Example"} **Example 11** (One-sided normal test is unbiased). Let $X_1,\ldots,X_n\sim N(\theta,\sigma^2)$ with known $\sigma^2$. Test $$H_0:\theta\le \theta_0 \qquad \text{versus} \qquad H_1:\theta>\theta_0$$ by rejecting $H_0$ when $$\frac{\bar X-\theta_0}{\sigma/\sqrt n}>c.$$ ::: ::: {.callout-note title="Solution"} The power function is $$\beta(\theta)=1-\Phi\left(c+\frac{\theta_0-\theta}{\sigma/\sqrt n}\right).$$ This function is increasing in $\theta$. Therefore, if $\theta'>\theta_0$ and $\theta''\le\theta_0$, then $$\beta(\theta')\ge \beta(\theta''),$$ so the test is unbiased. ::: # Uniformly Most Powerful Tests This section introduces the strongest optimality criterion for hypothesis tests in a class of tests. ## Most powerful and UMP tests This subsection defines UMP tests. A level-$\alpha$ test guarantees that the Type I error probability is at most $\alpha$ for all parameter values in the null. After controlling Type I error, we want to make Type II error small. Equivalently, we want the power function to be large under the alternative. ::: {.callout-note title="Definition"} **Definition 12** (Uniformly most powerful test). Let $\mathcal C$ be a class of tests for $$H_0:\theta\in\Theta_0 \qquad \text{versus} \qquad H_1:\theta\in\Theta_0^c.$$ A test in $\mathcal C$ with power function $\beta(\theta)$ is called a **uniformly most powerful** test in $\mathcal C$ if $$\beta(\theta)\ge \beta'(\theta)$$ for every $\theta\in\Theta_0^c$ and for every competing power function $\beta'(\theta)$ corresponding to another test in $\mathcal C$. ::: ::: {.callout-important title="Remark"} *Remark 13*. UMP tests do not always exist, especially for two-sided alternatives. When they do exist, they provide a very strong optimality guarantee. ::: ## Neyman--Pearson lemma This subsection gives the fundamental result for finding most powerful tests for simple hypotheses. ::: {.callout-important title="Theorem"} **Theorem 14** (Neyman--Pearson Lemma). *Consider testing two simple hypotheses $$H_0:\theta=\theta_0 \qquad \text{versus} \qquad H_1:\theta=\theta_1.$$ Let $f(x\mid\theta_i)$ be the pdf or pmf of $X$ under $\theta_i$. Define a rejection region $R$ by $$x\in R \quad \text{if} \quad f(x\mid\theta_1)>k f(x\mid\theta_0),$$ and $$x\in R^c \quad \text{if} \quad f(x\mid\theta_1)<k f(x\mid\theta_0),$$ for some constant $k\ge0$, chosen so that $$\mathbb{P}_{\theta_0}(X\in R)=\alpha.$$ Then any test with this rejection region is a most powerful level-$\alpha$ test for testing $H_0$ against $H_1$.* ::: ::: {.callout-note title="Proof"} *Proof.* The Neyman--Pearson lemma says that for a fixed Type I error probability, the best way to maximize power against the simple alternative $\theta_1$ is to reject where the likelihood under $H_1$ is large relative to the likelihood under $H_0$. In other words, the test rejects when $$\frac{f(x\mid\theta_1)}{f(x\mid\theta_0)}>k.$$ This is exactly the likelihood-ratio principle for simple hypotheses. The full proof compares the power of any competing level-$\alpha$ test with the power of this likelihood-ratio test and uses the sign of $$f(x\mid\theta_1)-k f(x\mid\theta_0)$$ on the rejection and non-rejection regions. ◻ ::: ::: {.callout-tip title="Key idea"} Likelihood-ratio form The Neyman--Pearson most powerful test rejects $H_0$ for large values of $$\frac{f(x\mid\theta_1)}{f(x\mid\theta_0)}.$$ This means that the observed data are much more likely under $H_1$ than under $H_0$. ::: # UMP Examples This section applies the Neyman--Pearson lemma to binomial and normal models. ## UMP binomial test This subsection works out a discrete example where boundary cases may require special attention. ::: {.callout-tip title="Example"} **Example 15** (UMP binomial test). Let $$X\sim \operatorname{Binomial}(2,\theta).$$ Test $$H_0:\theta=\theta_0=\frac12 \qquad \text{versus} \qquad H_1:\theta=\theta_1=\frac34.$$ Compute the likelihood ratios for $X\in\{0,1,2\}$. ::: ::: {.callout-note title="Solution"} The pmf is $$f(x\mid\theta)=\binom{2}{x}\theta^x(1-\theta)^{2-x}.$$ The likelihood ratio is $$\frac{f(x\mid 3/4)}{f(x\mid 1/2)}.$$ For $x=0$, $$\frac{f(0\mid 3/4)}{f(0\mid 1/2)} =\frac{(1/4)^2}{(1/2)^2}=\frac14.$$ For $x=1$, $$\frac{f(1\mid 3/4)}{f(1\mid 1/2)} =\frac{2(3/4)(1/4)}{2(1/2)(1/2)}=\frac34.$$ For $x=2$, $$\frac{f(2\mid 3/4)}{f(2\mid 1/2)} =\frac{(3/4)^2}{(1/2)^2}=\frac94.$$ Thus the likelihood ratios are ordered as $$\frac14<\frac34<\frac94.$$ By the Neyman--Pearson lemma, the most powerful test rejects first at $X=2$, then at $X=1$, then at $X=0$, depending on the desired size. For example: - If $3/4<k<9/4$, reject $H_0$ when $X=2$. The size is $$\alpha=\mathbb{P}_{1/2}(X=2)=\frac14.$$ - If $1/4<k<3/4$, reject $H_0$ when $X=1$ or $X=2$. The size is $$\alpha=\mathbb{P}_{1/2}(X=1\text{ or }2)=\frac34.$$ - If $k>9/4$, never reject $H_0$, giving size $0$. - If $k<1/4$, always reject $H_0$, giving size $1$. ::: For the rejection rule $R=\{2\}$, the power function is $$\beta(\theta)=\mathbb{P}_\theta(X=2)=\theta^2.$$ For the rejection rule $R=\{1,2\}$, the power function is $$\beta(\theta)=\mathbb{P}_\theta(X\ge1)=1-(1-\theta)^2.$$ ## UMP normal test This subsection derives the one-sided most powerful normal test. ::: {.callout-tip title="Example"} **Example 16** (UMP normal test). Suppose $$X_1,\ldots,X_n\sim \operatorname{Normal}(\theta,\sigma^2),$$ where $\sigma^2$ is known. Test $$H_0:\theta=\theta_0 \qquad \text{versus} \qquad H_1:\theta=\theta_1,$$ where $\theta_1<\theta_0$. ::: ::: {.callout-note title="Solution"} The sample mean $\bar X$ is sufficient for $\theta$, and $$\bar X\sim \operatorname{Normal}\left(\theta,\frac{\sigma^2}{n}\right).$$ The density of $\bar X=t$ is $$g(t\mid \theta) =\frac{1}{\sqrt{2\pi\sigma^2/n}} \exp\left\{-\frac{n(t-\theta)^2}{2\sigma^2}\right\}.$$ The Neyman--Pearson test rejects $H_0$ when $$g(\bar X\mid\theta_1)>k g(\bar X\mid\theta_0).$$ Taking logarithms, this inequality is equivalent to rejecting for small values of $\bar X$, because $\theta_1<\theta_0$. Hence the most powerful test has the form $$\text{Reject }H_0 \quad \text{if} \quad \bar X<c.$$ To make this a level-$\alpha$ test, choose $c$ so that $$\alpha=\mathbb{P}_{\theta_0}(\bar X<c).$$ Under $H_0$, $$\bar X\sim \operatorname{Normal}\left(\theta_0,\frac{\sigma^2}{n}\right),$$ so $$\boxed{c=\theta_0-\frac{\sigma}{\sqrt n}z_\alpha,}$$ where $\mathbb{P}(Z>z_\alpha)=\alpha$. ::: If instead $\theta_1>\theta_0$, then the UMP test rejects for large values of $\bar X$: $$\bar X>\theta_0+\frac{\sigma}{\sqrt n}z_\alpha.$$ # $p$-Values This section defines valid $p$-values and connects them to significance levels and rejection rules. ## Definition and interpretation This subsection explains why $p$-values summarize the strength of evidence against the null hypothesis. A $p$-value is a test statistic that measures how surprising the observed data are if the null hypothesis is true. Small $p$-values provide evidence in favor of the alternative hypothesis. ::: {.callout-note title="Definition"} **Definition 17** (Valid $p$-value). A statistic $p(X)$ is a valid $p$-value if, for every $\theta\in\Theta_0$ and every $0\le \alpha\le 1$, $$\mathbb{P}_\theta(p(X)\le \alpha)\le \alpha.$$ ::: ::: {.callout-important title="Remark"} *Remark 18*. A $p$-value is not the probability that $H_0$ is true. It is a probability computed under the assumption that $H_0$ is true. ::: Equivalently, the $p$-value can be understood as the smallest significance level at which the observed test statistic would lead to rejection: $$p(x)=\inf\{\alpha:\text{ reject }H_0\text{ at level }\alpha\}.$$ ## General construction of a $p$-value This subsection gives a general formula for a valid $p$-value. ::: {.callout-important title="Theorem"} **Theorem 19** (General $p$-value formula). *Let $W(X)$ be a test statistic such that large values of $W$ provide evidence in favor of $H_1$. Then $$p(x):=\sup_{\theta\in\Theta_0}\mathbb{P}_\theta\left(W(X)\ge W(x)\right)$$ defines a valid $p$-value.* ::: ::: {.callout-note title="Proof"} *Proof.* The quantity $p(x)$ is the largest null probability of observing a test statistic at least as extreme as the observed statistic. Therefore, for any $\theta\in\Theta_0$, the probability that this tail probability is at most $\alpha$ is no larger than $\alpha$. Hence $$\mathbb{P}_\theta(p(X)\le\alpha)\le\alpha,$$ which is the validity condition. ◻ ::: ::: {.callout-tip title="Key idea"} Interpretation A $p$-value answers the question: $$\text{How surprising is my observed signal, if the null hypothesis were true?}$$ The smaller the $p$-value, the stronger the evidence against $H_0$. ::: ## One-sided normal $p$-value This subsection computes a standard $p$-value for a one-sided normal test. ::: {.callout-tip title="Example"} **Example 20** (One-sided normal $p$-value). Let $X_1,\ldots,X_n\sim \operatorname{Normal}(\mu,\sigma^2)$ with $\sigma$ known. Test $$H_0:\mu\le \mu_0 \qquad \text{versus} \qquad H_1:\mu>\mu_0.$$ Use the statistic $$W(X)=Z=\frac{\bar X-\mu_0}{\sigma/\sqrt n}.$$ Find the $p$-value. ::: ::: {.callout-note title="Solution"} Large values of $Z$ favor $H_1$. For any $\mu\le\mu_0$, $$\begin{aligned} \mathbb{P}_\mu(W\ge w) &=\mathbb{P}_\mu\left(\frac{\bar X-\mu_0}{\sigma/\sqrt n}\ge w\right)\\ &=\mathbb{P}_\mu\left(\frac{\bar X-\mu}{\sigma/\sqrt n}\ge w+\frac{\mu_0-\mu}{\sigma/\sqrt n}\right). \end{aligned}$$ The right tail probability is largest when $\mu=\mu_0$. Hence the supremum over $\Theta_0=\{\mu:\mu\le\mu_0\}$ is attained at $\mu=\mu_0$. If $$z_{\mathrm{obs}}=\frac{\bar x-\mu_0}{\sigma/\sqrt n},$$ then the valid $p$-value is $$\boxed{p(x)=\mathbb{P}_{\mu_0}(Z\ge z_{\mathrm{obs}})=1-\Phi(z_{\mathrm{obs}}).}$$ ::: ::: {.callout-tip title="Example"} **Example 21** (Numerical $p$-value). Suppose $z_{\mathrm{obs}}=2.1$ in the one-sided normal test above. Compute the $p$-value and state the decision at levels $\alpha=0.05$ and $\alpha=0.01$. ::: ::: {.callout-note title="Solution"} The one-sided $p$-value is $$p=1-\Phi(2.1)\approx 0.0179.$$ At $\alpha=0.05$, since $0.0179<0.05$, reject $H_0$. At $\alpha=0.01$, since $0.0179>0.01$, fail to reject $H_0$. ::: # Loss and Risk Functions for Tests This section treats hypothesis testing as a decision problem with possible costs for different mistakes. ## Decision-theoretic formulation This subsection introduces actions, loss functions, and risk functions for hypothesis testing. In hypothesis testing, there are two possible actions: $$a_0=\text{fail to reject }H_0, \qquad a_1=\text{reject }H_0.$$ A decision rule $\delta(x)$ selects an action based on the observed data $x$. ::: {.callout-note title="Definition"} **Definition 22** (Loss function). A loss function $$L:\Theta\times \mathcal A\to \mathbb{R}$$ measures the penalty for taking action $a\in\mathcal A$ when the true parameter is $\theta$. ::: ::: {.callout-tip title="Example"} **Example 23** (0--1 loss). For testing $H_0:\theta\in\Theta_0$ versus $H_1:\theta\in\Theta_0^c$, the ordinary 0--1 loss is $$L(\theta,a)= \begin{cases} 0, & \theta\in\Theta_0\text{ and }a=a_0,\text{ or }\theta\in\Theta_0^c\text{ and }a=a_1,\\ 1, & \theta\in\Theta_0\text{ and }a=a_1,\text{ or }\theta\in\Theta_0^c\text{ and }a=a_0. \end{cases}$$ ::: ::: {.callout-note title="Solution"} The loss is $0$ for a correct decision and $1$ for an incorrect decision. Thus both Type I and Type II errors are treated as equally costly. ::: ## Generalized 0--1 loss This subsection allows Type I and Type II errors to have different costs. ::: {.callout-note title="Definition"} **Definition 24** (Generalized 0--1 loss). Let $c_I$ be the cost of a Type I error and $c_{II}$ be the cost of a Type II error. Define $$L(\theta,a_0)= \begin{cases} 0, & \theta\in\Theta_0,\\ c_{II}, & \theta\in\Theta_0^c, \end{cases} \qquad L(\theta,a_1)= \begin{cases} c_I, & \theta\in\Theta_0,\\ 0, & \theta\in\Theta_0^c. \end{cases}$$ ::: If $c_I=c_{II}=1$, this reduces to the ordinary 0--1 loss. If $c_I\ne c_{II}$, the test should account for the relative seriousness of the two errors. ## Risk function This subsection defines risk as the expected loss of a test. ::: {.callout-note title="Definition"} **Definition 25** (Risk function). The risk function of a decision rule $\delta$ is $$R(\theta,\delta)=\mathbb{E}_\theta[L(\theta,\delta(X))].$$ ::: Let the power function be $$\beta(\theta)=\mathbb{P}_\theta(\delta(X)=a_1),$$ the probability of rejecting $H_0$ under parameter $\theta$. Under generalized 0--1 loss, $$R(\theta,\delta)= \begin{cases} c_I\beta(\theta), & \theta\in\Theta_0,\\[3pt] c_{II}\{1-\beta(\theta)\}, & \theta\in\Theta_0^c. \end{cases}$$ ::: {.callout-tip title="Key idea"} Risk interpretation For $\theta\in\Theta_0$, the risk is the Type I error probability weighted by the cost $c_I$. For $\theta\in\Theta_0^c$, the risk is the Type II error probability weighted by the cost $c_{II}$. ::: # Risk Function Example This section computes the risk function for a one-sided normal test. ::: {.callout-tip title="Example"} **Example 26** (Normal one-sided test risk). Let $$X_1,\ldots,X_n\stackrel{iid}{\sim} N(\theta,\sigma^2), \qquad \sigma=1, \qquad n=25.$$ Test $$H_0:\theta\le 0 \qquad \text{versus} \qquad H_1:\theta>0.$$ Use the size $\alpha=0.05$ test that rejects $H_0$ when $$\bar X>z_{1-\alpha}\frac{\sigma}{\sqrt n}.$$ Take $c_I=2$ and $c_{II}=1$. Compute the power function and risk function. ::: ::: {.callout-note title="Solution"} Here $$z_{1-\alpha}=z_{0.95}\approx 1.645, \qquad \frac{\sigma}{\sqrt n}=\frac15.$$ So the rejection rule is $$\bar X>1.645\cdot \frac15=0.329.$$ For a general $\theta$, $$\bar X\sim N\left(\theta,\frac1{25}\right).$$ Thus the power function is $$\begin{aligned} \beta(\theta) &=\mathbb{P}_\theta(\bar X>0.329)\\ &=\mathbb{P}\left(\frac{\bar X-\theta}{1/5}>\frac{0.329-\theta}{1/5}\right)\\ &=1-\Phi(1.645-5\theta). \end{aligned}$$ Under the generalized 0--1 loss with $c_I=2$ and $c_{II}=1$, $$R(\theta)= \begin{cases} 2\beta(\theta), & \theta\le 0,\\[3pt] 1-\beta(\theta), & \theta>0. \end{cases}$$ ::: ::: {.callout-tip title="Example"} **Example 27** (Numerical risk values). Using the previous example, compute the risk at $\theta=0,-0.2,0.2,0.5$. ::: ::: {.callout-note title="Solution"} The power function is $$\beta(\theta)=1-\Phi(1.645-5\theta).$$ At the boundary $\theta=0$, $$\beta(0)=0.05, \qquad R(0)=2(0.05)=0.10.$$ At $\theta=-0.2$, $$\beta(-0.2)=1-\Phi(1.645+1)=1-\Phi(2.645)\approx 0.0041,$$ so $$R(-0.2)=2(0.0041)=0.0082.$$ At $\theta=0.2$, $$\beta(0.2)=1-\Phi(1.645-1)=1-\Phi(0.645)\approx 0.260.$$ Since $\theta>0$, $$R(0.2)=1-\beta(0.2)\approx 0.740.$$ At $\theta=0.5$, $$\beta(0.5)=1-\Phi(1.645-2.5)=1-\Phi(-0.855)=\Phi(0.855)\approx 0.803.$$ Thus $$R(0.5)=1-0.803=0.197.$$ ::: ::: {.callout-warning title="Important lesson"} Important lesson The risk can be small under parts of the null and large near the alternative boundary. The shape of the risk function depends on both the power function and the relative costs $c_I$ and $c_{II}$. ::: # Practice Problems This section gives practice problems that reinforce the main definitions and calculations from the section. ::: {.callout-caution title="Practice Problem"} **Practice Problem 28** (Power function for a binomial test). Let $X\sim\operatorname{Binomial}(6,\theta)$ and test $$H_0:\theta\le \frac12 \qquad \text{versus} \qquad H_1:\theta>\frac12.$$ Consider the test that rejects $H_0$ if $X\ge5$. 1. Find the power function. 2. Find the size of the test. 3. Find the Type II error probability when $\theta=0.8$. ::: ::: {.callout-note title="Solution"} $a$ The rejection region is $R=\{5,6\}$, so $$\beta(\theta)=\mathbb{P}_\theta(X\ge5) =\binom65\theta^5(1-\theta)+\binom66\theta^6 =6\theta^5(1-\theta)+\theta^6.$$ (b) Since $\beta(\theta)$ is increasing in $\theta$, the size is attained at $\theta=1/2$: $$\alpha=\beta(1/2)=6\left(\frac12\right)^5\left(\frac12\right)+\left(\frac12\right)^6 =\frac{6+1}{64}=\frac7{64}.$$ (c) At $\theta=0.8$, $$\beta(0.8)=6(0.8)^5(0.2)+(0.8)^6.$$ The Type II error probability is $$1-\beta(0.8)=1-\{6(0.8)^5(0.2)+(0.8)^6\}.$$ Numerically, this is approximately $0.3446$. ::: ::: {.callout-caution title="Practice Problem"} **Practice Problem 29** (Normal sample size). Let $X_1,\ldots,X_n\sim N(\theta,\sigma^2)$ with known $\sigma^2$. Test $$H_0:\theta\le\theta_0 \qquad \text{versus} \qquad H_1:\theta>\theta_0$$ using a level $0.05$ test. Find a formula for the smallest $n$ such that the power is at least $0.90$ when $\theta=\theta_0+\Delta$, where $\Delta>0$. ::: ::: {.callout-note title="Solution"} The level $0.05$ test rejects when $$\frac{\bar X-\theta_0}{\sigma/\sqrt n}>z_{0.05}.$$ At $\theta=\theta_0+\Delta$, the power is $$\beta(\theta_0+\Delta) =1-\Phi\left(z_{0.05}-\frac{\Delta\sqrt n}{\sigma}\right).$$ We require $$1-\Phi\left(z_{0.05}-\frac{\Delta\sqrt n}{\sigma}\right)\ge 0.90.$$ Equivalently, $$\Phi\left(z_{0.05}-\frac{\Delta\sqrt n}{\sigma}\right)\le0.10.$$ Thus $$z_{0.05}-\frac{\Delta\sqrt n}{\sigma}\le z_{0.90}^{\text{right tail}},$$ where $\mathbb{P}(Z>z_{0.90}^{\text{right tail}})=0.90$, so $z_{0.90}^{\text{right tail}}\approx -1.2816$. Hence $$\sqrt n\ge \frac{\sigma}{\Delta}(z_{0.05}+1.2816).$$ Therefore $$\boxed{n\ge \left(\frac{\sigma}{\Delta}(z_{0.05}+1.2816)\right)^2.}$$ The smallest integer $n$ is the ceiling of the right-hand side. ::: ::: {.callout-caution title="Practice Problem"} **Practice Problem 30** (One-sided normal p-value). Let $X_1,\ldots,X_{25}\sim N(\mu,4)$ and test $$H_0:\mu\le 10 \qquad \text{versus} \qquad H_1:\mu>10.$$ Suppose $\bar x=10.9$. Compute the $p$-value and state the decision at $\alpha=0.05$. ::: ::: {.callout-note title="Solution"} Here $\sigma=2$ and $n=25$, so $\sigma/\sqrt n=2/5=0.4$. The observed test statistic is $$z_{\mathrm{obs}}=\frac{10.9-10}{0.4}=2.25.$$ The one-sided $p$-value is $$p=1-\Phi(2.25)\approx 0.0122.$$ Since $0.0122<0.05$, reject $H_0$ at the $5\%$ level. ::: ::: {.callout-caution title="Practice Problem"} **Practice Problem 31** (Risk function). For the one-sided normal test in the previous problem, use generalized 0--1 loss with $c_I=3$ and $c_{II}=1$. Write the risk function in terms of the power function. ::: ::: {.callout-note title="Solution"} The decision is to reject or fail to reject $H_0:\mu\le10$. The risk function is $$R(\mu,\delta)= \begin{cases} 3\beta(\mu), & \mu\le 10,\\[3pt] 1-\beta(\mu), & \mu>10. \end{cases}$$ where $$\beta(\mu)=\mathbb{P}_\mu(\text{reject }H_0).$$ For a level-$0.05$ test, $$\beta(\mu)=1-\Phi\left(z_{0.05}+\frac{10-\mu}{2/5}\right).$$ ::: # Summary This section summarizes the main concepts used to evaluate hypothesis tests. **Concept** **Definition** **Meaning** ---------------- -------------------------------------------------------------------------------------- -------------------------------------------------------------- Power function $\beta(\theta)=\mathbb{P}_\theta(X\in R)$ Probability of rejecting $H_0$ at parameter value $\theta$ Type I error $\mathbb{P}_\theta(X\in R)$ for $\theta\in\Theta_0$ Rejecting $H_0$ when $H_0$ is true Type II error $1-\beta(\theta)$ for $\theta\in\Theta_0^c$ Failing to reject $H_0$ when $H_1$ is true Size $\sup_{\theta\in\Theta_0}\beta(\theta)$ Worst-case Type I error probability Level $\alpha$ $\sup_{\theta\in\Theta_0}\beta(\theta)\le\alpha$ Type I error controlled by $\alpha$ Unbiased test $\beta(\theta')\ge\beta(\theta'')$ for $\theta'\in\Theta_0^c$, $\theta''\in\Theta_0$ Rejects more often under the alternative than under the null UMP test Maximizes power for all $\theta\in\Theta_0^c$ within a class Strong optimality criterion $p$-value Smallest level at which $H_0$ is rejected Measures evidence against $H_0$ Risk function $R(\theta,\delta)=\mathbb{E}_\theta[L(\theta,\delta(X))]$ Expected loss of a test ::: {.callout-tip title="Key idea"} Key message A hypothesis test is not evaluated by whether it is correct on one dataset. It is evaluated by its long-run error probabilities, its power function, and, when costs matter, its risk function. :::

Concept	Definition	Meaning
Power function	\(\beta(\theta)=\mathbb{P}_\theta(X\in R)\)	Probability of rejecting \(H_0\) at parameter value \(\theta\)
Type I error	\(\mathbb{P}_\theta(X\in R)\) for \(\theta\in\Theta_0\)	Rejecting \(H_0\) when \(H_0\) is true
Type II error	\(1-\beta(\theta)\) for \(\theta\in\Theta_0^c\)	Failing to reject \(H_0\) when \(H_1\) is true
Size	\(\sup_{\theta\in\Theta_0}\beta(\theta)\)	Worst-case Type I error probability
Level \(\alpha\)	\(\sup_{\theta\in\Theta_0}\beta(\theta)\le\alpha\)	Type I error controlled by \(\alpha\)
Unbiased test	\(\beta(\theta')\ge\beta(\theta'')\) for \(\theta'\in\Theta_0^c\), \(\theta''\in\Theta_0\)	Rejects more often under the alternative than under the null
UMP test	Maximizes power for all \(\theta\in\Theta_0^c\) within a class	Strong optimality criterion
\(p\)-value	Smallest level at which \(H_0\) is rejected	Measures evidence against \(H_0\)
Risk function	\(R(\theta,\delta)=\mathbb{E}_\theta[L(\theta,\delta(X))]\)	Expected loss of a test

16 Chapter 15: Hypothesis Tests II — Evaluating Tests

17 Why We Need to Evaluate Tests

18 Two Types of Errors

18.1 Type I and Type II errors

18.2 Power function

19 Power Function Examples

19.1 Binomial power functions

19.2 Normal power function

19.3 Choosing sample size from power requirements

20 Size and Level of a Test

20.1 Size and level

20.2 Size of likelihood ratio tests

21 Unbiased Tests

22 Uniformly Most Powerful Tests

22.1 Most powerful and UMP tests

22.2 Neyman–Pearson lemma

23 UMP Examples

23.1 UMP binomial test

23.2 UMP normal test

24 \(p\)-Values

24.1 Definition and interpretation

24.2 General construction of a \(p\)-value

24.3 One-sided normal \(p\)-value

25 Loss and Risk Functions for Tests

25.1 Decision-theoretic formulation

25.2 Generalized 0–1 loss

25.3 Risk function

26 Risk Function Example

27 Practice Problems

28 Summary