← All articlesdata-infrastructure

Qdrant vs Milvus vs Weaviate: Benchmarked at 50 Million Vectors

Every vector database benchmark you've read is lying to you. Not intentionally -- but they're all running at 1M vectors on a beefy machine, which tells you absolutely nothing about production behavior. At 1M vectors, everything is fast. The interesting question is what happens at

Yash Pritwani

4 May 20266 min read read

# Qdrant vs Milvus vs Weaviate: Benchmarked at 50 Million Vectors

We ran this benchmark because a client needed to choose a vector DB for a production RAG pipeline ingesting 50M+ document embeddings. The vendor benchmarks were useless at this scale, so we ran our own. Here's what we found.

Test Setup

Identical hardware for all three databases:

•Machine: 3x bare-metal nodes, each with 32 vCPUs (AMD EPYC 7543), 128GB RAM, 2TB NVMe SSD

•Dataset: 50M vectors, 768 dimensions (generated from BGE-base-en-v1.5 embeddings of Wikipedia articles)

•Queries: 10,000 query vectors held out from the same distribution

•Metrics: p50/p95/p99 latency, queries per second (QPS), recall@10, memory usage, disk usage

•Versions: Qdrant 1.12.1, Milvus 2.5.2, Weaviate 1.28.0

All databases were configured for their recommended production settings with replication factor 1 (single shard per node for fair comparison). Each was given 48 hours to fully index before benchmarking.

Indexing Performance

This is where the first surprise hit:

Database

Index Time

Peak Memory During Index

Final Disk Usage

|----------|-----------|------------------------|-----------------|

Qdrant

4h 12m

89 GB

142 GB

Milvus

6h 47m

112 GB

168 GB

Weaviate

5h 31m

95 GB

155 GB

Qdrant was fastest to index by a significant margin. Its HNSW implementation uses a memory-mapped approach that keeps peak memory consumption lower than Milvus's bulk-loading strategy. Milvus peaked at 112GB during segment compaction -- on a 128GB machine, that's uncomfortably close to OOM territory.

Weaviate landed in the middle. Its LSM-based storage engine provides predictable memory usage during ingestion, but the tradeoff is slower overall indexing.

Query Latency (Top-10 Nearest Neighbors)

The numbers that matter. All queries use cosine similarity, top-10 results, single-threaded client:

Database

p50

p95

p99

QPS (1 client)

QPS (32 clients)

|----------|-----|-----|-----|-----------------|-------------------|

Qdrant

2.1ms

4.8ms

8.2ms

410

5,840

Milvus

3.4ms

9.1ms

18.7ms

260

3,920

Weaviate

2.8ms

6.3ms

12.1ms

340

4,610

Qdrant wins on raw latency at every percentile. The gap widens dramatically at p99 -- Qdrant's 8.2ms vs Milvus's 18.7ms. That p99 difference matters in production: when you're serving search results to users, it's the tail latency that determines perceived performance.

At 32 concurrent clients, Qdrant scales almost linearly (14.2x), while Milvus shows more contention (15.1x but from a lower base). Weaviate scales at 13.6x.

The Recall Question

Raw speed means nothing if the results are wrong. We measured recall@10 against brute-force exact search:

Database

Recall@10 (default config)

Recall@10 (tuned)

Latency at tuned recall

|----------|---------------------------|-------------------|------------------------|

Qdrant

0.964

0.991

3.8ms p50

Milvus

0.958

0.989

5.9ms p50

Weaviate

0.961

0.990

4.9ms p50

All three achieve >0.99 recall when tuned, with modest latency increases. The tuning parameters:

# Qdrant - collection config
{
  "hnsw_config": {
    "m": 32,
    "ef_construct": 256
  },
  "quantization_config": {
    "scalar": {
      "type": "int8",
      "quantile": 0.99,
      "always_ram": true
    }
  }
}

# Milvus - index params
index_params:
  index_type: HNSW
  metric_type: COSINE
  params:
    M: 32
    efConstruction: 256
search_params:
  ef: 128

# Weaviate - schema config
{
  "vectorIndexConfig": {
    "ef": 128,
    "efConstruction": 256,
    "maxConnections": 32,
    "pq": {
      "enabled": true,
      "segments": 96
    }
  }
}

Memory Efficiency: The Hidden Differentiator

Here's where the benchmark gets interesting for anyone running on a budget. Steady-state memory consumption at 50M vectors:

Database

RAM Usage (serving)

Can serve from disk?

Disk-only latency p50

|----------|--------------------|--------------------|----------------------|

Qdrant

38 GB

Yes (mmap)

11ms

Milvus

67 GB

Partially (DiskANN)

22ms

Weaviate

52 GB

Yes (mmap)

15ms

Qdrant's scalar quantization reduces memory footprint dramatically. At 50M vectors x 768 dimensions x 4 bytes = ~143GB raw, Qdrant compresses this to 38GB in RAM with int8 quantization while maintaining 0.991 recall. That's a 73% reduction.

This has real cost implications. If you're running on AWS:

•Qdrant: r6g.4xlarge (128GB) is sufficient -- $0.80/hr

•Milvus: needs r6g.8xlarge (256GB) for headroom -- $1.61/hr

•Weaviate: r6g.4xlarge works but tight -- $0.80/hr

At 50M vectors, the annual infrastructure cost difference between Qdrant and Milvus is roughly $7,100/year per node. At three replicas, that's $21K saved annually.

Operational Complexity

Numbers don't capture everything. Here's the operational reality:

Qdrant is the simplest to operate. Single binary, embedded storage, no external dependencies. Configuration is straightforward. The Raft-based clustering just works. Upgrades are non-disruptive with rolling restarts.

Milvus is the most complex. It requires etcd, MinIO (or S3), and Pulsar (or Kafka) as dependencies. A production Milvus cluster has 6-8 different component types. We spent more time debugging Milvus's dependency chain than tuning query performance.

# Milvus docker-compose dependencies (simplified)
services:
  etcd:
    image: quay.io/coreos/etcd:v3.5.16
  minio:
    image: minio/minio:RELEASE.2024-11-07
  pulsar:
    image: apachepulsar/pulsar:3.0.7
  milvus-rootcoord:
    image: milvusdb/milvus:v2.5.2
  milvus-proxy:
    image: milvusdb/milvus:v2.5.2
  milvus-querynode:
    image: milvusdb/milvus:v2.5.2
  milvus-datanode:
    image: milvusdb/milvus:v2.5.2
  milvus-indexnode:
    image: milvusdb/milvus:v2.5.2

Compare that to Qdrant's single container deployment:

services:
  qdrant:
    image: qdrant/qdrant:v1.12.1
    volumes:
      - ./qdrant_data:/qdrant/storage
    ports:
      - "6333:6333"
      - "6334:6334"

Weaviate falls in between. Single binary like Qdrant, but module system adds complexity. If you need specific vectorizers or rerankers, you're managing additional containers.

Filtered Search: The Real-World Test

Pure vector search is a toy benchmark. Real applications combine vector similarity with metadata filters. We tested: "Find the 10 most similar vectors WHERE category = 'technology' AND date > '2025-01-01'" (filtering ~15% of the dataset).

Database

Filtered p50

Filtered QPS

Filter strategy

|----------|-------------|-------------|----------------|

Qdrant

3.4ms

280

Pre-filter with payload index

Milvus

8.1ms

115

Post-filter (pre-filter available but slower to index)

Weaviate

5.2ms

185

Pre-filter with inverted index

Qdrant's payload indexing is purpose-built for filtered search and it shows. Milvus's post-filtering approach means it retrieves more candidates than needed and discards non-matching ones, which wastes compute at scale.

Our Recommendation

Choose Qdrant if: You want the best performance-per-dollar, simplest operations, and your team isn't huge. It's the best choice for most production RAG pipelines.

Choose Milvus if: You're at true enterprise scale (billions of vectors), need GPU-accelerated indexing, or your organization already runs the Kafka/etcd/MinIO stack. Milvus's distributed architecture handles horizontal scaling better than the alternatives.

Choose Weaviate if: You want built-in ML model serving (vectorizers, rerankers) in the same system, or you're building a hybrid search pipeline that combines dense vectors with BM25 text search. Weaviate's module ecosystem is the richest.

For our client with 50M vectors? We deployed Qdrant. The 40% lower infrastructure cost, simpler operations, and superior filtered search performance made it the clear winner for their use case.

The Benchmark Nobody Runs

One thing every vector DB vendor avoids: what happens when you delete and re-insert 20% of your vectors (simulating a real content pipeline with updates)? We tested that too. Qdrant handled it gracefully with background compaction. Milvus required manual segment compaction to reclaim space. Weaviate's LSM compaction kicked in automatically but caused a 15% latency spike during the process.

If your data is append-only, ignore this. If your vectors represent living documents that change, it matters a lot.

---

*We help teams choose, deploy, and optimize data infrastructure for AI/ML workloads -- including vector databases, embedding pipelines, and RAG architectures. If you're evaluating vector databases for production, we can save you weeks of benchmarkingwe can save you weeks of benchmarkinghttps://techsaas.cloud/services.*

Need help with data-infrastructure?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.

Get a Free Consultation Call +91 84569 84870