Lab 18: Principal Component Analysis

Explore PCA as a way to rotate a data cloud into its own natural coordinate system, keep the most important directions, and compress or denoise data.

1. The PCA story

A dataset is a cloud of points. PCA asks which directions capture the most variation. The first principal component is the longest direction of the cloud. The second is the next-longest direction, perpendicular to the first.

centervariancecovarianceeigenvectorsSVDprojection

2. Rotate a data cloud

The line you choose has a projected variance. PCA chooses the angle where this variance is largest.

3. Variance as a function of direction

The peak of this curve is the first principal component direction.

4. Keep only the first component

Keeping one component means projecting each point onto the best line. This is a compressed version of the data.

Question. When noise increases, does one component still explain most of the variation?

5. Scaling changes PCA

PCA follows variance. If a feature is numerically enlarged, it may dominate the result.

6. Scree plot and explained variance

Reflection prompts

  1. Why does PCA require centering?
  2. What is the difference between a principal direction and a principal score?
  3. Why is PCA a projection method?
  4. When should you standardize features before PCA?
  5. Why can a high-variance direction fail to be a good prediction direction?