← All articlesAI & Machine Learning

Embedding Models Explained: From Word2Vec to text-embedding-3

Understand embedding models from Word2Vec to OpenAI text-embedding-3. Learn how vectors power search, recommendations, and RAG with practical code examples.

Y
Yash Pritwani
13 min read

What Are Embeddings?

An embedding is a numerical representation of data — text, images, audio — as a vector of floating-point numbers. Similar items have vectors that are close together in this high-dimensional space. Dissimilar items are far apart.

InputHiddenHiddenOutput

Neural network architecture: data flows through input, hidden, and output layers.

This simple idea powers modern search, recommendations, anomaly detection, and retrieval-augmented generation (RAG). If you are building any AI-powered feature, you need to understand embeddings.

The Evolution of Text Embeddings

2013: Word2Vec

Google's Word2Vec was the breakthrough that started it all. It learned word relationships from raw text:

king - man + woman = queen
paris - france + germany = berlin

Each word became a 300-dimensional vector. Words used in similar contexts had similar vectors.

Limitation: One vector per word. "Bank" (river) and "bank" (financial) had the same embedding.

Get more insights on AI & Machine Learning

Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.

2018: BERT Embeddings

BERT introduced contextual embeddings. The same word got different vectors depending on surrounding text:

from transformers import AutoModel, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

text = "The bank of the river was steep"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# "bank" here gets a river-related embedding
embedding = outputs.last_hidden_state.mean(dim=1)

768 dimensions. Much better at understanding meaning, but slow and not designed for similarity search.

PromptEmbed[0.2, 0.8...]VectorSearchtop-k=5LLM+ contextReplyRetrieval-Augmented Generation (RAG) Flow

RAG architecture: user prompts are embedded, matched against a vector store, then fed to an LLM with retrieved context.

2022: Sentence Transformers

Models specifically trained for similarity search. The key innovation: they were trained with contrastive learning — push similar sentences together, push dissimilar ones apart.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

sentences = [
    "How to deploy Docker containers",
    "Docker container deployment guide",
    "Best pizza recipe in New York"
]

embeddings = model.encode(sentences)

# First two will be close together
# Third will be far from both

384 dimensions. Fast, accurate, and open-source.

2024-2025: text-embedding-3 and Beyond

OpenAI's text-embedding-3 family represents the current state of the art for API-based embeddings:

  • text-embedding-3-small: 1536 dimensions, $0.02/1M tokens
  • text-embedding-3-large: 3072 dimensions, $0.13/1M tokens
from openai import OpenAI

client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Deploy a PostgreSQL database with automated backups"
)

vector = response.data[0].embedding  # 1536 floats

A unique feature: you can reduce dimensions while preserving quality:

response = client.embeddings.create(
    model="text-embedding-3-large",
    input="Your text here",
    dimensions=256  # Reduce from 3072 to 256
)

How Similarity Search Works

Two vectors are compared using cosine similarity — the cosine of the angle between them:

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Returns 0.0 to 1.0 (for normalized vectors)
# 1.0 = identical meaning
# 0.0 = completely unrelated

For efficient search over millions of vectors, use approximate nearest neighbor (ANN) algorithms implemented in vector databases:

Free Resource

Free Cloud Architecture Checklist

A 47-point checklist covering security, scalability, cost optimization, and disaster recovery for production cloud environments.

Download the Checklist
  • pgvector: PostgreSQL extension (great if you already use Postgres)
  • ChromaDB: Simple, developer-friendly
  • Qdrant: High performance, Rust-based
  • Pinecone: Managed service (cloud only)

Choosing the Right Model

Use Case Model Dimensions Speed
Quick prototyping all-MiniLM-L6-v2 384 Very fast
Production search text-embedding-3-small 1536 Fast (API)
Maximum accuracy text-embedding-3-large 3072 Fast (API)
Private/self-hosted nomic-embed-text 768 Fast (local)
Multilingual multilingual-e5-large 1024 Moderate

For self-hosted deployments (which we recommend at TechSaaS), nomic-embed-text via Ollama gives excellent results without external API calls:

ollama pull nomic-embed-text

curl http://localhost:11434/api/embeddings \
  -d '{"model": "nomic-embed-text", "prompt": "Your text here"}'
RawDataPre-processTrainModelEvaluateMetricsDeployModelMonretrain loop

ML pipeline: from raw data collection through training, evaluation, deployment, and continuous monitoring.

Practical Tips

  1. Chunk your text: Embeddings work best on 256-512 token chunks, not entire documents
  2. Normalize vectors: Pre-normalize for faster cosine similarity (just dot product)
  3. Batch requests: Embed multiple texts in one API call to reduce latency
  4. Cache aggressively: Store embeddings in your database — never re-embed unchanged text
  5. Test with your data: Benchmark models on YOUR domain, not generic benchmarks

Embeddings are the foundation of modern AI applications. Whether you are building search, RAG, or recommendations, understanding how they work gives you a massive advantage.

#embeddings#nlp#word2vec#openai#vector-search#rag

Related Service

Cloud Solutions

Let our experts help you build the right technology strategy for your business.

Need help with ai & machine learning?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.

We Will Build You a Demo Site — For Free

Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.

47+ companies trusted us
99.99% uptime
< 48hr response

No spam. No contracts. Just a free demo.