MATH 5010 Section 20: Bayesian Inference

1. The Bayesian update

Prior
what we believe before data

×

Likelihood
what the data say

∝

Posterior
updated belief after data

π(θ | x) = L(θ; x)π(θ) / ∫ L(u; x)π(u)du

Classical inference often treats the parameter as fixed and the data as random. Bayesian inference treats the parameter as unknown and represents uncertainty about it by a probability distribution.

Why conjugate priors?

A conjugate prior produces a posterior in the same family as the prior. This makes the update easy to compute and excellent for teaching: prior parameters plus data summaries become posterior parameters.

Learning goals

Translate prior + likelihood into posterior.
Compute Bayes estimators under squared error loss.
Interpret credible intervals.
Compare Bayesian tests with p-value tests.
Use posterior predictive simulation.
Understand why MCMC is needed when posteriors are not closed form.

Conjugate-pair summary

Data model	Prior	Posterior	Useful statistic	Bayes estimate under squared error
Binomial(n, θ)	Beta(a,b)	Beta(a+x,b+n−x)	x successes	E[θ\|x]
Poisson(λ)	Gamma(a,b rate)	Gamma(a+Σxᵢ,b+n)	Σxᵢ	E[λ\|x]
Exponential(mean μ)	Inv-Gamma(a,b)	Inv-Gamma(a+n,b+Σxᵢ)	Σxᵢ	E[μ\|x], if a+n>1
Normal mean μ, known σ	Normal(m₀,s₀²)	Normal(mₙ,sₙ²)	x̄	posterior mean mₙ

2. Beta–Binomial update

Use this for conversion rates, coin bias, success probability, or any binary outcome.

Prior a:

Prior b:

Trials n

Successes x

Posterior–

Mean–

MAP–

Approx. 95%–

θ | x ~ Beta(a+x, b+n−x)

3. Gamma–Poisson update

This matches the Poisson Bayesian estimation and Bayesian test examples: if counts are Poisson with mean λ and λ has a Gamma prior, the posterior is Gamma.

Prior shape a

Prior rate b

Sample size n

Total count S = Σxᵢ

Test threshold λ₀

Posterior–

Posterior mean–

P(λ ≤ λ₀ | x)–

Decision–

λ | x ~ Gamma(a+S, b+n), rate parameterization

4. Inverse-Gamma prior for an Exponential mean

If Xᵢ ~ Exp(μ) with density f(x|μ)=μ⁻¹e^{-x/μ}, then an inverse-gamma prior for μ is conjugate.

Prior shape a

Prior scale b

Sample size n

Total S = Σxᵢ

Test threshold μ₀

Posterior–

Posterior mean–

P(μ ≤ μ₀ | x)–

Decision–

μ | x ~ Inv-Gamma(a+n, b+S)

5. Normal–Normal update for a mean

Known sampling standard deviation σ. The posterior mean is a precision-weighted average of the prior mean and the sample mean.

Prior mean m₀

Prior sd s₀

Known σ

Sample size n

Sample mean x̄

Posterior mean–

Posterior sd–

Data weight–

Approx. 95%–

sₙ² = 1 / (1/s₀² + n/σ²), mₙ = sₙ²(m₀/s₀² + nx̄/σ²)

6. Credible interval interpretation

A 95% Bayesian credible interval means: after observing the data and using the prior, the posterior probability that θ lies in the interval is 0.95.

Classical confidence interval: the random procedure has long-run 95% coverage.
Bayesian credible interval: the posterior distribution assigns 95% probability to this interval.

Normal posterior interval calculator

Posterior mean

Posterior sd

Credibility level

Lower–

Upper–

Width–

Meaningposterior prob.

Loss functions and Bayes estimators

Loss	Bayes estimator	Intuition
Squared error	posterior mean	balances squared distance
Absolute error	posterior median	balances posterior probability
0–1 loss	posterior mode/MAP	most likely parameter value

Teaching connection

Earlier in the course, the mean minimized expected squared error and the median minimized expected absolute error. Bayesian point estimation applies the same principle, but expectation is taken with respect to the posterior distribution.

7. Bayesian hypothesis testing

For one-sided tests such as H₀: θ ≤ θ₀ versus H₁: θ > θ₀, a direct Bayesian rule is based on posterior probability.

Reject H₀ if P(θ ≤ θ₀ | data) < α_B

Poisson mean example

Using the Gamma–Poisson calculator above with n=20, a=2, b=1, S=130, λ₀=5 gives the posterior Gamma(132,21). The decision rule compares P(λ≤5|data) to 0.05.

Exponential mean example

Using the inverse-gamma calculator above with n=15, a=3, b=10, S=160, μ₀=8 gives the posterior Inv-Gamma(18,170). The decision rule compares P(μ≤8|data) to 0.05.

8. Posterior predictive distribution

After learning about θ, Bayesian inference can predict a future observation by averaging over posterior uncertainty.

p(x_new | data) = ∫ p(x_new | θ)π(θ | data)dθ

Beta–Binomial predictive probability

For future m binary trials after posterior Beta(A,B), the predictive mean number of successes is mA/(A+B).

Future trials m

Target successes r

Predictive mean–

Approx P(X≥r)–

Posterior A–

Posterior B–

9. Why MCMC?

When the posterior cannot be written in a convenient closed form, we often approximate it by simulation. Markov chain Monte Carlo builds a dependent sequence whose long-run distribution is the posterior.

θ⁽¹⁾, θ⁽²⁾, … approximately sampled from π(θ | data)

Random-walk Metropolis toy sampler

Target density: a two-component posterior-like mixture. Move the proposal scale to see acceptance and mixing.

Proposal scale:

Iterations

Acceptance–

Sample mean–

Target ideaposterior sample

Warningcheck mixing

10. Quick self-checks

Q1

For Poisson data with Gamma(a,b) prior, what statistic is sufficient for updating λ?

Q2

Under squared error loss, the Bayes estimator is the posterior...

Q3

A 95% credible interval means...