“Embeddings” emphasizes the notion of representing data in a meaningful and structured way, while “[[Vectors]]” refers to the numerical representation itself. ‘Vector embeddings’ is a way to represent different data types (like words, sentences, articles etc) as points in a multidimensional space.
- OpenAI’s vector embedding model is called ada-002 (read their Dec 2022 post announcing it)
- There are open source models too.
- These models take English text and map it into a space with 1,536 dimensions.
So any text (phrase, para, document) can now have coordinates for a single point in 1,536 dimensional space.
- So if you plot another piece of text and compare the two coordinates, you’d be comparing what they mean and if they are similar to each other or not.
- This implicit semantic comparison is like human understanding then… just represented mathematically.
Primary Resources
- Dharmesh’s Jan 2024 post is a very simple, accessible intro to vector embeddings concept. I wrote my own interpretation after reading it.