← All articlesdata-infrastructure

Qdrant vs Milvus vs Weaviate: Benchmarked at 50 Million Vectors

Every vector database benchmark you've read is lying to you. Not intentionally -- but they're all running at 1M vectors on a beefy machine, which tells you absolutely nothing about production behavior. At 1M vectors, everything is fast. The interesting question is what happens at

Y
Yash Pritwani
6 min read read

# Qdrant vs Milvus vs Weaviate: Benchmarked at 50 Million Vectors

Every vector database benchmark you've read is lying to you. Not intentionally -- but they're all running at 1M vectors on a beefy machine, which tells you absolutely nothing about production behavior. At 1M vectors, everything is fast. The interesting question is what happens at 50M vectors when your index no longer fits in RAM and your queries start hitting disk.

We ran this benchmark because a client needed to choose a vector DB for a production RAG pipeline ingesting 50M+ document embeddings. The vendor benchmarks were useless at this scale, so we ran our own. Here's what we found.

Test Setup

Identical hardware for all three databases:

Machine: 3x bare-metal nodes, each with 32 vCPUs (AMD EPYC 7543), 128GB RAM, 2TB NVMe SSD
Dataset: 50M vectors, 768 dimensions (generated from BGE-base-en-v1.5 embeddings of Wikipedia articles)
Queries: 10,000 query vectors held out from the same distribution
Metrics: p50/p95/p99 latency, queries per second (QPS), recall@10, memory usage, disk usage
Versions: Qdrant 1.12.1, Milvus 2.5.2, Weaviate 1.28.0

All databases were configured for their recommended production settings with replication factor 1 (single shard per node for fair comparison). Each was given 48 hours to fully index before benchmarking.

Indexing Performance

This is where the first surprise hit:

Database
Index Time
Peak Memory During Index
Final Disk Usage

|----------|-----------|------------------------|-----------------|

Qdrant
4h 12m
89 GB
142 GB
Milvus
6h 47m
112 GB
168 GB
Weaviate
5h 31m
95 GB
155 GB

Qdrant was fastest to index by a significant margin. Its HNSW implementation uses a memory-mapped approach that keeps peak memory consumption lower than Milvus's bulk-loading strategy. Milvus peaked at 112GB during segment compaction -- on a 128GB machine, that's uncomfortably close to OOM territory.

Weaviate landed in the middle. Its LSM-based storage engine provides predictable memory usage during ingestion, but the tradeoff is slower overall indexing.

Query Latency (Top-10 Nearest Neighbors)

The numbers that matter. All queries use cosine similarity, top-10 results, single-threaded client:

Database
p50
p95
p99
QPS (1 client)
QPS (32 clients)

|----------|-----|-----|-----|-----------------|-------------------|

Qdrant
2.1ms
4.8ms
8.2ms
410
5,840
Milvus
3.4ms
9.1ms
18.7ms
260
3,920
Weaviate
2.8ms
6.3ms
12.1ms
340
4,610

Qdrant wins on raw latency at every percentile. The gap widens dramatically at p99 -- Qdrant's 8.2ms vs Milvus's 18.7ms. That p99 difference matters in production: when you're serving search results to users, it's the tail latency that determines perceived performance.

At 32 concurrent clients, Qdrant scales almost linearly (14.2x), while Milvus shows more contention (15.1x but from a lower base). Weaviate scales at 13.6x.

The Recall Question

Raw speed means nothing if the results are wrong. We measured recall@10 against brute-force exact search:

Database
Recall@10 (default config)
Recall@10 (tuned)
Latency at tuned recall

|----------|---------------------------|-------------------|------------------------|

Qdrant
0.964
0.991
3.8ms p50
Milvus
0.958
0.989
5.9ms p50
Weaviate
0.961
0.990
4.9ms p50

All three achieve >0.99 recall when tuned, with modest latency increases. The tuning parameters:

# Qdrant - collection config
{
  "hnsw_config": {
    "m": 32,
    "ef_construct": 256
  },
  "quantization_config": {
    "scalar": {
      "type": "int8",
      "quantile": 0.99,
      "always_ram": true
    }
  }
}
# Milvus - index params
index_params:
  index_type: HNSW
  metric_type: COSINE
  params:
    M: 32
    efConstruction: 256
search_params:
  ef: 128
# Weaviate - schema config
{
  "vectorIndexConfig": {
    "ef": 128,
    "efConstruction": 256,
    "maxConnections": 32,
    "pq": {
      "enabled": true,
      "segments": 96
    }
  }
}

Memory Efficiency: The Hidden Differentiator

Here's where the benchmark gets interesting for anyone running on a budget. Steady-state memory consumption at 50M vectors:

Database
RAM Usage (serving)
Can serve from disk?
Disk-only latency p50

|----------|--------------------|--------------------|----------------------|

Qdrant
38 GB
Yes (mmap)
11ms
Milvus
67 GB
Partially (DiskANN)
22ms
Weaviate
52 GB
Yes (mmap)
15ms

Qdrant's scalar quantization reduces memory footprint dramatically. At 50M vectors x 768 dimensions x 4 bytes = ~143GB raw, Qdrant compresses this to 38GB in RAM with int8 quantization while maintaining 0.991 recall. That's a 73% reduction.

This has real cost implications. If you're running on AWS:

Qdrant: r6g.4xlarge (128GB) is sufficient -- $0.80/hr
Milvus: needs r6g.8xlarge (256GB) for headroom -- $1.61/hr
Weaviate: r6g.4xlarge works but tight -- $0.80/hr

At 50M vectors, the annual infrastructure cost difference between Qdrant and Milvus is roughly $7,100/year per node. At three replicas, that's $21K saved annually.

Operational Complexity

Numbers don't capture everything. Here's the operational reality:

Qdrant is the simplest to operate. Single binary, embedded storage, no external dependencies. Configuration is straightforward. The Raft-based clustering just works. Upgrades are non-disruptive with rolling restarts.

Milvus is the most complex. It requires etcd, MinIO (or S3), and Pulsar (or Kafka) as dependencies. A production Milvus cluster has 6-8 different component types. We spent more time debugging Milvus's dependency chain than tuning query performance.

# Milvus docker-compose dependencies (simplified)
services:
  etcd:
    image: quay.io/coreos/etcd:v3.5.16
  minio:
    image: minio/minio:RELEASE.2024-11-07
  pulsar:
    image: apachepulsar/pulsar:3.0.7
  milvus-rootcoord:
    image: milvusdb/milvus:v2.5.2
  milvus-proxy:
    image: milvusdb/milvus:v2.5.2
  milvus-querynode:
    image: milvusdb/milvus:v2.5.2
  milvus-datanode:
    image: milvusdb/milvus:v2.5.2
  milvus-indexnode:
    image: milvusdb/milvus:v2.5.2

Compare that to Qdrant's single container deployment:

services:
  qdrant:
    image: qdrant/qdrant:v1.12.1
    volumes:
      - ./qdrant_data:/qdrant/storage
    ports:
      - "6333:6333"
      - "6334:6334"

Weaviate falls in between. Single binary like Qdrant, but module system adds complexity. If you need specific vectorizers or rerankers, you're managing additional containers.

Filtered Search: The Real-World Test

Pure vector search is a toy benchmark. Real applications combine vector similarity with metadata filters. We tested: "Find the 10 most similar vectors WHERE category = 'technology' AND date > '2025-01-01'" (filtering ~15% of the dataset).

Database
Filtered p50
Filtered QPS
Filter strategy

|----------|-------------|-------------|----------------|

Qdrant
3.4ms
280
Pre-filter with payload index
Milvus
8.1ms
115
Post-filter (pre-filter available but slower to index)
Weaviate
5.2ms
185
Pre-filter with inverted index

Qdrant's payload indexing is purpose-built for filtered search and it shows. Milvus's post-filtering approach means it retrieves more candidates than needed and discards non-matching ones, which wastes compute at scale.

Our Recommendation

Choose Qdrant if: You want the best performance-per-dollar, simplest operations, and your team isn't huge. It's the best choice for most production RAG pipelines.

Choose Milvus if: You're at true enterprise scale (billions of vectors), need GPU-accelerated indexing, or your organization already runs the Kafka/etcd/MinIO stack. Milvus's distributed architecture handles horizontal scaling better than the alternatives.

Choose Weaviate if: You want built-in ML model serving (vectorizers, rerankers) in the same system, or you're building a hybrid search pipeline that combines dense vectors with BM25 text search. Weaviate's module ecosystem is the richest.

For our client with 50M vectors? We deployed Qdrant. The 40% lower infrastructure cost, simpler operations, and superior filtered search performance made it the clear winner for their use case.

The Benchmark Nobody Runs

One thing every vector DB vendor avoids: what happens when you delete and re-insert 20% of your vectors (simulating a real content pipeline with updates)? We tested that too. Qdrant handled it gracefully with background compaction. Milvus required manual segment compaction to reclaim space. Weaviate's LSM compaction kicked in automatically but caused a 15% latency spike during the process.

If your data is append-only, ignore this. If your vectors represent living documents that change, it matters a lot.

---

*We help teams choose, deploy, and optimize data infrastructure for AI/ML workloads -- including vector databases, embedding pipelines, and RAG architectures. If you're evaluating vector databases for production, we can save you weeks of benchmarkingwe can save you weeks of benchmarkinghttps://techsaas.cloud/services.*

Need help with data-infrastructure?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.