Embedding Models Explained: From Word2Vec to text-embedding-3
Understand embedding models from Word2Vec to OpenAI text-embedding-3. Learn how vectors power search, recommendations, and RAG with practical code examples.
What Are Embeddings?
An embedding is a numerical representation of data — text, images, audio — as a vector of floating-point numbers. Similar items have vectors that are close together in this high-dimensional space. Dissimilar items are far apart.
Neural network architecture: data flows through input, hidden, and output layers.
This simple idea powers modern search, recommendations, anomaly detection, and retrieval-augmented generation (RAG). If you are building any AI-powered feature, you need to understand embeddings.
The Evolution of Text Embeddings
2013: Word2Vec
Google's Word2Vec was the breakthrough that started it all. It learned word relationships from raw text:
king - man + woman = queen
paris - france + germany = berlin
Each word became a 300-dimensional vector. Words used in similar contexts had similar vectors.
Limitation: One vector per word. "Bank" (river) and "bank" (financial) had the same embedding.
Get more insights on AI & Machine Learning
Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.
2018: BERT Embeddings
BERT introduced contextual embeddings. The same word got different vectors depending on surrounding text:
from transformers import AutoModel, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
text = "The bank of the river was steep"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
# "bank" here gets a river-related embedding
embedding = outputs.last_hidden_state.mean(dim=1)
768 dimensions. Much better at understanding meaning, but slow and not designed for similarity search.
RAG architecture: user prompts are embedded, matched against a vector store, then fed to an LLM with retrieved context.
2022: Sentence Transformers
Models specifically trained for similarity search. The key innovation: they were trained with contrastive learning — push similar sentences together, push dissimilar ones apart.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
sentences = [
"How to deploy Docker containers",
"Docker container deployment guide",
"Best pizza recipe in New York"
]
embeddings = model.encode(sentences)
# First two will be close together
# Third will be far from both
384 dimensions. Fast, accurate, and open-source.
You might also like
2024-2025: text-embedding-3 and Beyond
OpenAI's text-embedding-3 family represents the current state of the art for API-based embeddings:
- text-embedding-3-small: 1536 dimensions, $0.02/1M tokens
- text-embedding-3-large: 3072 dimensions, $0.13/1M tokens
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
model="text-embedding-3-small",
input="Deploy a PostgreSQL database with automated backups"
)
vector = response.data[0].embedding # 1536 floats
A unique feature: you can reduce dimensions while preserving quality:
response = client.embeddings.create(
model="text-embedding-3-large",
input="Your text here",
dimensions=256 # Reduce from 3072 to 256
)
How Similarity Search Works
Two vectors are compared using cosine similarity — the cosine of the angle between them:
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Returns 0.0 to 1.0 (for normalized vectors)
# 1.0 = identical meaning
# 0.0 = completely unrelated
For efficient search over millions of vectors, use approximate nearest neighbor (ANN) algorithms implemented in vector databases:
Free Resource
Free Cloud Architecture Checklist
A 47-point checklist covering security, scalability, cost optimization, and disaster recovery for production cloud environments.
- pgvector: PostgreSQL extension (great if you already use Postgres)
- ChromaDB: Simple, developer-friendly
- Qdrant: High performance, Rust-based
- Pinecone: Managed service (cloud only)
Choosing the Right Model
| Use Case | Model | Dimensions | Speed |
|---|---|---|---|
| Quick prototyping | all-MiniLM-L6-v2 | 384 | Very fast |
| Production search | text-embedding-3-small | 1536 | Fast (API) |
| Maximum accuracy | text-embedding-3-large | 3072 | Fast (API) |
| Private/self-hosted | nomic-embed-text | 768 | Fast (local) |
| Multilingual | multilingual-e5-large | 1024 | Moderate |
For self-hosted deployments (which we recommend at TechSaaS), nomic-embed-text via Ollama gives excellent results without external API calls:
ollama pull nomic-embed-text
curl http://localhost:11434/api/embeddings \
-d '{"model": "nomic-embed-text", "prompt": "Your text here"}'
ML pipeline: from raw data collection through training, evaluation, deployment, and continuous monitoring.
Practical Tips
- Chunk your text: Embeddings work best on 256-512 token chunks, not entire documents
- Normalize vectors: Pre-normalize for faster cosine similarity (just dot product)
- Batch requests: Embed multiple texts in one API call to reduce latency
- Cache aggressively: Store embeddings in your database — never re-embed unchanged text
- Test with your data: Benchmark models on YOUR domain, not generic benchmarks
Embeddings are the foundation of modern AI applications. Whether you are building search, RAG, or recommendations, understanding how they work gives you a massive advantage.
Related Service
Cloud Solutions
Let our experts help you build the right technology strategy for your business.
Need help with ai & machine learning?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.
We Will Build You a Demo Site — For Free
Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.
No spam. No contracts. Just a free demo.