Vectors have two properties 1. Magnitude (length), 2. Direction. From a computer science perspective, it’s just an ordered list of numbers
- Vectors can be added, multiplied (=scaled)
- A key operation on vectors is the dot product. Conceptually the dot product a.b is defined as the magnitude of vector a multiplied by the projection of vector b onto a. Projection can be thought of as the “shadow cast” by one vector onto another.
- If their dot product is zero, the two vectors are orthogonal (90 degrees) to each other. They don’t overlap with each other at all.
- Bringing it back to perceptrons: the set of weights is nothing but a vector (in as many dimensions as the number of inputs). The Perceptron Learning Algorithm finds (learns) the right set of weights. That final weight vector is always perpendicular to the hyperplane that divides the coordinate space in two. As the weights of the perceptron changes, so does the orientation of the hyperplane.
- Adding a bias term is like moving the hyperplane away from the origin – but without changing it’s orientation.
Complex Stuff
- Any function can be thought of as a multidimensional vector. The dimensionality of its vector is determined by the number of points that you chose to evaluate the function. The most general way to think about what a neural network is doing: it’s transforming one vector into another vector. Page 297-299 of Why Machines Learn explains this.
Primary Resources
- Vectors, what even are they? a simple, written intro by 3Blue1Brown. Also as a great animated video
- Chapter 2, Why Machines Learn by Anil Ananthaswamy
- HuggingFace’s video on model steering describes Vectors https://www.youtube.com/watch?v=F2jd5WuT-zg. Worth nothing that extending a multidimensional embedding doesn’t have an effect on the meaning it encodes. Only the direction matters.
- My post from Jan 2024 about vectors and embeddings (inspired by a post by Dharmesh Shah)