1. From world to vector
A vector is not only an arrow. In data science, a vector is often a compact description of an object. The choice of features is the first mathematical act.
Object
A student, image, house, song, patient, document, or city.
Features
Measurable quantities selected by a human, model, or sensor.
Vector
An ordered list such as [2.1, 4.7, 0, 13].
2. Build a vector from a real object
Change the features below. The object becomes a point in a small feature space. Notice that the numbers are meaningful only because we chose meanings for the coordinates.
Coffee shop representation
Teaching note
This example lets students see that a feature vector is a modeling decision. The same shop can become a 2D, 4D, or 100D vector depending on the question.
3. Similarity becomes geometry
Once objects become vectors, we can ask geometric questions: Which two objects are close? Which object is unusual? Which group forms a cluster?
Move point B
4. The danger of units and scale
A distance formula does not know whether one coordinate is dollars, minutes, pixels, or kilograms. Large-scale features can dominate the geometry.
Three apartments
| Apartment | Rent | Distance | Size |
|---|---|---|---|
| A | $2100 | 8 min | 650 sq ft |
| B | $2400 | 6 min | 720 sq ft |
| C | $2200 | 26 min | 680 sq ft |
5. Images are matrices before they become vectors
A grayscale image is a grid of numbers. Flattening the grid creates a vector. This is one reason linear algebra is everywhere in image processing and machine learning.
6. Text can also become vectors
A document can be represented by word counts. This is a simple version of the idea behind text embeddings: language is translated into coordinates.
Why cosine similarity?
Word-count vectors can be longer simply because a document has more words. Cosine similarity focuses on direction rather than length, so it often captures topical similarity better than raw distance.
7. High-dimensional geometry feels different
Modern data often has many coordinates: pixels, gene expressions, words, ratings, sensor readings. This activity samples random points and shows how distances behave as dimension grows.
8. Reflection and AI companion prompts
Reflection questions
- What information is lost when an object becomes a vector?
- When can distance be misleading?
- Why might scaling be necessary before clustering or prediction?
- What makes high-dimensional data both powerful and dangerous?
AI companion prompts
Explain feature vectors using a real example from medicine.
Give me three bad feature choices for comparing colleges.
Create a tiny dataset where scaling changes the nearest neighbor.
Explain why a 100 × 100 image is a 10,000-dimensional vector.
Instructor notes
This page can be used as an in-class demonstration, a companion to the Chapter 1 notebook, or a standalone online activity. Suggested pacing: 10 minutes for feature vectors, 10 minutes for distance and scaling, 10 minutes for image/text representations, and 10 minutes for high-dimensional intuition and discussion.
Learning outcomes
- Explain that a vector is an ordered list of meaningful features.
- Compute and interpret simple distances and dot products.
- Identify why units and scaling matter.
- Recognize images and documents as numerical objects.
- Describe one basic phenomenon of high-dimensional geometry.
Suggested extension
Ask students to modify the Jupyter notebook so that their own dataset becomes a matrix: rows are objects, columns are features. Then ask them to normalize columns and compare nearest neighbors before and after normalization.