Qdrant vs Milvus vs Weaviate: Benchmarked at 50 Million Vectors
Every vector database benchmark you've read is lying to you. Not intentionally -- but they're all running at 1M vectors on a beefy machine, which tells you absolutely nothing about production behavior. At 1M vectors, everything is fast. The interesting question is what happens at
# Qdrant vs Milvus vs Weaviate: Benchmarked at 50 Million Vectors
Every vector database benchmark you've read is lying to you. Not intentionally -- but they're all running at 1M vectors on a beefy machine, which tells you absolutely nothing about production behavior. At 1M vectors, everything is fast. The interesting question is what happens at 50M vectors when your index no longer fits in RAM and your queries start hitting disk.
We ran this benchmark because a client needed to choose a vector DB for a production RAG pipeline ingesting 50M+ document embeddings. The vendor benchmarks were useless at this scale, so we ran our own. Here's what we found.
Test Setup
Identical hardware for all three databases:
All databases were configured for their recommended production settings with replication factor 1 (single shard per node for fair comparison). Each was given 48 hours to fully index before benchmarking.
Indexing Performance
This is where the first surprise hit:
|----------|-----------|------------------------|-----------------|
Qdrant was fastest to index by a significant margin. Its HNSW implementation uses a memory-mapped approach that keeps peak memory consumption lower than Milvus's bulk-loading strategy. Milvus peaked at 112GB during segment compaction -- on a 128GB machine, that's uncomfortably close to OOM territory.
Weaviate landed in the middle. Its LSM-based storage engine provides predictable memory usage during ingestion, but the tradeoff is slower overall indexing.
Query Latency (Top-10 Nearest Neighbors)
The numbers that matter. All queries use cosine similarity, top-10 results, single-threaded client:
|----------|-----|-----|-----|-----------------|-------------------|
Qdrant wins on raw latency at every percentile. The gap widens dramatically at p99 -- Qdrant's 8.2ms vs Milvus's 18.7ms. That p99 difference matters in production: when you're serving search results to users, it's the tail latency that determines perceived performance.
At 32 concurrent clients, Qdrant scales almost linearly (14.2x), while Milvus shows more contention (15.1x but from a lower base). Weaviate scales at 13.6x.
The Recall Question
Raw speed means nothing if the results are wrong. We measured recall@10 against brute-force exact search:
|----------|---------------------------|-------------------|------------------------|
All three achieve >0.99 recall when tuned, with modest latency increases. The tuning parameters:
# Qdrant - collection config
{
"hnsw_config": {
"m": 32,
"ef_construct": 256
},
"quantization_config": {
"scalar": {
"type": "int8",
"quantile": 0.99,
"always_ram": true
}
}
}# Milvus - index params
index_params:
index_type: HNSW
metric_type: COSINE
params:
M: 32
efConstruction: 256
search_params:
ef: 128# Weaviate - schema config
{
"vectorIndexConfig": {
"ef": 128,
"efConstruction": 256,
"maxConnections": 32,
"pq": {
"enabled": true,
"segments": 96
}
}
}Memory Efficiency: The Hidden Differentiator
Here's where the benchmark gets interesting for anyone running on a budget. Steady-state memory consumption at 50M vectors:
|----------|--------------------|--------------------|----------------------|
Qdrant's scalar quantization reduces memory footprint dramatically. At 50M vectors x 768 dimensions x 4 bytes = ~143GB raw, Qdrant compresses this to 38GB in RAM with int8 quantization while maintaining 0.991 recall. That's a 73% reduction.
This has real cost implications. If you're running on AWS:
r6g.4xlarge (128GB) is sufficient -- $0.80/hrr6g.8xlarge (256GB) for headroom -- $1.61/hrr6g.4xlarge works but tight -- $0.80/hrAt 50M vectors, the annual infrastructure cost difference between Qdrant and Milvus is roughly $7,100/year per node. At three replicas, that's $21K saved annually.
Operational Complexity
Numbers don't capture everything. Here's the operational reality:
Qdrant is the simplest to operate. Single binary, embedded storage, no external dependencies. Configuration is straightforward. The Raft-based clustering just works. Upgrades are non-disruptive with rolling restarts.
Milvus is the most complex. It requires etcd, MinIO (or S3), and Pulsar (or Kafka) as dependencies. A production Milvus cluster has 6-8 different component types. We spent more time debugging Milvus's dependency chain than tuning query performance.
# Milvus docker-compose dependencies (simplified)
services:
etcd:
image: quay.io/coreos/etcd:v3.5.16
minio:
image: minio/minio:RELEASE.2024-11-07
pulsar:
image: apachepulsar/pulsar:3.0.7
milvus-rootcoord:
image: milvusdb/milvus:v2.5.2
milvus-proxy:
image: milvusdb/milvus:v2.5.2
milvus-querynode:
image: milvusdb/milvus:v2.5.2
milvus-datanode:
image: milvusdb/milvus:v2.5.2
milvus-indexnode:
image: milvusdb/milvus:v2.5.2Compare that to Qdrant's single container deployment:
services:
qdrant:
image: qdrant/qdrant:v1.12.1
volumes:
- ./qdrant_data:/qdrant/storage
ports:
- "6333:6333"
- "6334:6334"Weaviate falls in between. Single binary like Qdrant, but module system adds complexity. If you need specific vectorizers or rerankers, you're managing additional containers.
Filtered Search: The Real-World Test
Pure vector search is a toy benchmark. Real applications combine vector similarity with metadata filters. We tested: "Find the 10 most similar vectors WHERE category = 'technology' AND date > '2025-01-01'" (filtering ~15% of the dataset).
|----------|-------------|-------------|----------------|
Qdrant's payload indexing is purpose-built for filtered search and it shows. Milvus's post-filtering approach means it retrieves more candidates than needed and discards non-matching ones, which wastes compute at scale.
Our Recommendation
Choose Qdrant if: You want the best performance-per-dollar, simplest operations, and your team isn't huge. It's the best choice for most production RAG pipelines.
Choose Milvus if: You're at true enterprise scale (billions of vectors), need GPU-accelerated indexing, or your organization already runs the Kafka/etcd/MinIO stack. Milvus's distributed architecture handles horizontal scaling better than the alternatives.
Choose Weaviate if: You want built-in ML model serving (vectorizers, rerankers) in the same system, or you're building a hybrid search pipeline that combines dense vectors with BM25 text search. Weaviate's module ecosystem is the richest.
For our client with 50M vectors? We deployed Qdrant. The 40% lower infrastructure cost, simpler operations, and superior filtered search performance made it the clear winner for their use case.
The Benchmark Nobody Runs
One thing every vector DB vendor avoids: what happens when you delete and re-insert 20% of your vectors (simulating a real content pipeline with updates)? We tested that too. Qdrant handled it gracefully with background compaction. Milvus required manual segment compaction to reclaim space. Weaviate's LSM compaction kicked in automatically but caused a 15% latency spike during the process.
If your data is append-only, ignore this. If your vectors represent living documents that change, it matters a lot.
---
*We help teams choose, deploy, and optimize data infrastructure for AI/ML workloads -- including vector databases, embedding pipelines, and RAG architectures. If you're evaluating vector databases for production, we can save you weeks of benchmarkingwe can save you weeks of benchmarkinghttps://techsaas.cloud/services.*
Need help with data-infrastructure?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.