MATH 5010 Section 1
Foundations of Probability Theory

An interactive teaching page for the first probability section: sample spaces, events, probability axioms, event algebra, conditional probability, independence, total probability, Bayes' theorem, birthday paradox, and geometric probability.

Graduate statistics foundationInteractive slidersSelf-check questionsPython-lab ready ideas

Learning goals

By the end of this section, students should be able to translate ordinary language into events, compute probabilities using the axioms, recognize conditional probability and independence, and apply total probability and Bayes' theorem to real examples.

Concept

Events as sets

Use union, intersection, complement, and difference to describe probability statements.

Computation

Rules

Apply $P(A^c)=1-P(A)$ and $P(A\cup B)=P(A)+P(B)-P(A\cap B)$.

Inference idea

Updating

Use Bayes' theorem to update prior probability after observing evidence.

1. Sample space and events

A random experiment has a sample space $\Omega$, the set of all possible outcomes. An event is a subset of $\Omega$.

Example: two coin tosses

$$\Omega=\{HH,HT,TH,TT\}.$$

Let $A=$ “first toss is Head”, $B=$ “second toss is Head”, and $C=$ “the two tosses are the same”. Then

$$A=\{HH,HT\},\quad B=\{HH,TH\},\quad C=\{HH,TT\}.$$

Language to set notation

PhraseEvent notation
not $A$$A^c$
$A$ and $B$$A\cap B$
$A$ or $B$ or both$A\cup B$
$A$ but not $B$$A\cap B^c$
exactly one of $A,B$$(A\cap B^c)\cup(A^c\cap B)$

2. Probability axioms

Kolmogorov axioms. A probability $P$ assigns numbers to events such that $P(A)\ge 0$, $P(\Omega)=1$, and for disjoint events $A_1,A_2,\ldots$, $$P\left(\bigcup_i A_i\right)=\sum_i P(A_i).$$

Immediate consequences

  • $P(\emptyset)=0$.
  • $P(A^c)=1-P(A)$.
  • If $A\subseteq B$, then $P(A)\le P(B)$.
  • $P(A\cup B)=P(A)+P(B)-P(A\cap B)$.
Core warning: do not add P(A)+P(B) unless A and B are disjoint. The overlap A ∩ B is counted twice and must be subtracted once.

3. Interactive event algebra lab

Move the sliders. The page checks whether the chosen values are valid and computes common event probabilities.

EventProbability
$A\cup B$
exactly one of $A,B$
at most one of $A,B$
neither $A$ nor $B$
A B

4. Conditional probability

When $P(B)>0$, the conditional probability of $A$ given $B$ is $$P(A\mid B)=\frac{P(A\cap B)}{P(B)}.$$

Interpretation

Conditioning changes the sample space from $\Omega$ to $B$. Within the smaller world where $B$ has happened, we ask how much of $B$ is also in $A$.

Multiplication rule

Rearranging the definition gives

$$P(A\cap B)=P(A\mid B)P(B)=P(B\mid A)P(A).$$

5. Independence checker

Events $A$ and $B$ are independent if observing one does not change the probability of the other:

$$P(A\cap B)=P(A)P(B).$$

Two-coin example

Let $A=$ “first toss is Head” and $C=$ “the two tosses are the same”. Then $P(A)=1/2$, $P(C)=1/2$, and $P(A\cap C)=P(\{HH\})=1/4$. Since $1/4=(1/2)(1/2)$, the events are independent.

6. Law of total probability

If $B_1,\ldots,B_k$ form a partition of the sample space, then

$$P(A)=\sum_{i=1}^k P(A\mid B_i)P(B_i).$$

Medical test simulator

QuantityValue
$P(+)$
$P(D|+)$

Teaching point

A positive result can still have a modest posterior probability when the base rate is low. This is the base-rate effect.

For the default values, $P(+)=0.95(0.01)+0.10(0.99)=0.1085$.

7. Bayes' theorem

Bayes' theorem follows from applying the multiplication rule in two ways:

$$P(B_j\mid A)=\frac{P(A\mid B_j)P(B_j)}{\sum_i P(A\mid B_i)P(B_i)}.$$

Factory machine example

QuantityValue
$P(D)$
$P(I|D)$

Default calculation

With $P(I)=0.4$, $P(II)=0.6$, $P(D|I)=0.01$, and $P(D|II)=0.02$,

$$P(I|D)=\frac{0.01\cdot0.4}{0.01\cdot0.4+0.02\cdot0.6}=0.25.$$

8. Birthday paradox

Ignoring leap years, the probability that at least two people in a group of size $n$ share a birthday is

$$1-\frac{365\cdot364\cdots(365-n+1)}{365^n}.$$

ProbabilityValue
all birthdays different
at least one shared birthday

Try $n=23$ and $n=30$. The result grows faster than intuition expects.

9. Geometric probability

For a point $(X,Y)$ chosen uniformly from the unit square, probability equals area. For example,

$$P(X^2+Y\le 1)=\int_0^1(1-x^2)\,dx=\frac{2}{3}.$$

Generalize the curve

Move $a$ in $Y\le 1-X^a$. The probability is

$$\int_0^1(1-x^a)\,dx=\frac{a}{a+1}.$$

Area probability:

Summary map

Set operations

$A^c$, $A\cup B$, $A\cap B$, $A\setminus B$ translate words into math.

Conditioning

$P(A|B)=P(A\cap B)/P(B)$ means “restrict the world to $B$.”

Updating

Bayes turns $P(E|H)$ and $P(H)$ into $P(H|E)$.

FormulaWhen to use
$P(A\cup B)=P(A)+P(B)-P(A\cap B)$“A or B” with possible overlap
$P(A^c)=1-P(A)$Use complements, especially for “at least one”
$P(A\cap B)=P(A|B)P(B)$Sequential or conditional information
$P(A)=\sum_iP(A|B_i)P(B_i)$Break into cases
$P(B_j|A)=\frac{P(A|B_j)P(B_j)}{\sum_iP(A|B_i)P(B_i)}$Reverse conditional probabilities

Self-check quiz

These are designed for quick in-class checks. Students should explain why the chosen answer is correct.