Infrastructure Monitoring: Prometheus vs InfluxDB vs VictoriaMetrics

Compare Prometheus, InfluxDB, and VictoriaMetrics for infrastructure monitoring. Storage efficiency, PromQL, cardinality handling, and self-hosted...

Y
Yash Pritwani
14 min read

One owner, one affected system, and the next buyer or recovery deadline mapped.

Monitoring Is Not Optional

If you cannot measure it, you cannot manage it. Infrastructure monitoring tells you:

Is the system healthy right now?
What is the trend over time?
When should you scale up (or down)?
What caused the outage at 3 AM?

<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 200" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="200" rx="12" fill="#1a1a2e"/><rect x="15" y="10" width="570" height="25" rx="6" fill="#6366f1" opacity="0.3"/><circle cx="30" cy="22" r="4" fill="#ef4444"/><circle cx="42" cy="22" r="4" fill="#f59e0b"/><circle cx="54" cy="22" r="4" fill="#2dd4bf"/><text x="300" y="27" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Monitoring Dashboard</text><rect x="20" y="45" width="130" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="85" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">CPU Usage</text><text x="85" y="88" text-anchor="middle" fill="#2dd4bf" font-size="18" font-family="system-ui" font-weight="bold">23%</text><rect x="160" y="45" width="130" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="225" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">Memory</text><text x="225" y="88" text-anchor="middle" fill="#f59e0b" font-size="18" font-family="system-ui" font-weight="bold">6.2 GB</text><rect x="300" y="45" width="130" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="365" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">Requests/s</text><text x="365" y="88" text-anchor="middle" fill="#6366f1" font-size="18" font-family="system-ui" font-weight="bold">1.2K</text><rect x="440" y="45" width="140" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="510" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">Uptime</text><text x="510" y="88" text-anchor="middle" fill="#2dd4bf" font-size="18" font-family="system-ui" font-weight="bold">99.9%</text><rect x="20" y="110" width="560" height="80" rx="6" fill="#6366f1" opacity="0.1"/><text x="45" y="125" fill="#94a3b8" font-size="8" font-family="system-ui">Response Time (ms)</text><polyline points="40,170 80,155 120,160 160,140 200,145 240,135 280,150 320,130 360,125 400,140 440,120 480,115 520,125 560,110" fill="none" stroke="#6366f1" stroke-width="2"/><polyline points="40,170 80,155 120,160 160,140 200,145 240,135 280,150 320,130 360,125 400,140 440,120 480,115 520,125 560,110" fill="url(#chartGrad)" stroke="none" opacity="0.3"/><defs><linearGradient id="chartGrad" x1="0" y1="0" x2="0" y2="1"><stop offset="0%" stop-color="#6366f1"/><stop offset="100%" stop-color="transparent"/></linearGradient></defs><line x1="40" y1="130" x2="560" y2="130" stroke="#e2e8f0" stroke-width="0.3" opacity="0.2"/><line x1="40" y1="150" x2="560" y2="150" stroke="#e2e8f0" stroke-width="0.3" opacity="0.2"/><line x1="40" y1="170" x2="560" y2="170" stroke="#e2e8f0" stroke-width="0.3" opacity="0.2"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Real-time monitoring dashboard showing CPU, memory, request rate, and response time trends.</p></div>

Prometheus: The Cloud-Native Standard

Prometheus is the CNCF-graduated monitoring system. It uses a pull-based model (scrapes targets) and the powerful PromQL query language.

# docker-compose.yml
services:
  prometheus:
    image: prom/prometheus:v2.54.0
    container_name: prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
      - '--web.enable-lifecycle'
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    mem_limit: 512m
# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'docker'
    static_configs:
      - targets: ['cadvisor:8080']

  - job_name: 'traefik'
    static_configs:
      - targets: ['traefik:8080']
    metrics_path: /metrics

  - job_name: 'postgres'
    static_configs:
      - targets: ['postgres-exporter:9187']

  # Service discovery for Docker containers
  - job_name: 'docker-services'
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
    relabel_configs:
      - source_labels: [__meta_docker_container_label_prometheus_scrape]
        regex: "true"
        action: keep

PromQL examples:

# CPU usage percentage per container
100 - (rate(container_cpu_usage_seconds_total[5m]) * 100)

# Memory usage percentage
container_memory_usage_bytes / container_spec_memory_limit_bytes * 100

# HTTP request rate per service
rate(traefik_service_requests_total[5m])

# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Disk usage prediction (when will disk be full?)
predict_linear(node_filesystem_avail_bytes{mountpoint="/"}[1h], 24*3600) < 0

# Alert: High error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05

Alerting rules:

# alerts.yml
groups:
  - name: infrastructure
    rules:
      - alert: HighCPU
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ .Labels.instance }}"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Disk space below 15% on {{ .Labels.instance }}"

      - alert: ContainerDown
        expr: absent(container_last_seen{name=~"traefik|postgres|redis"}) == 1
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Critical container {{ .Labels.name }} is down"

InfluxDB: The Time-Series Database

InfluxDB is a purpose-built time-series database with its own query language (Flux) and an HTTP write API (push-based).

services:
  influxdb:
    image: influxdb:2.7-alpine
    environment:
      DOCKER_INFLUXDB_INIT_MODE: setup
      DOCKER_INFLUXDB_INIT_USERNAME: admin
      DOCKER_INFLUXDB_INIT_PASSWORD: supersecret
      DOCKER_INFLUXDB_INIT_ORG: techsaas
      DOCKER_INFLUXDB_INIT_BUCKET: metrics
    volumes:
      - influxdb-data:/var/lib/influxdb2
    mem_limit: 512m

  telegraf:
    image: telegraf:1.32-alpine
    volumes:
      - ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    mem_limit: 128m
# telegraf.conf
[agent]
  interval = "10s"
  flush_interval = "10s"

[[outputs.influxdb_v2]]
  urls = ["http://influxdb:8086"]
  token = "your-token"
  organization = "techsaas"
  bucket = "metrics"

[[inputs.cpu]]
  percpu = true
  totalcpu = true

[[inputs.mem]]

[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs"]

[[inputs.docker]]
  endpoint = "unix:///var/run/docker.sock"
  container_names = []
  timeout = "5s"

[[inputs.net]]

Flux query examples:

// CPU usage over last hour
from(bucket: "metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "cpu")
  |> filter(fn: (r) => r._field == "usage_idle")
  |> filter(fn: (r) => r.cpu == "cpu-total")
  |> map(fn: (r) => ({r with _value: 100.0 - r._value}))
  |> aggregateWindow(every: 1m, fn: mean)

// Container memory usage
from(bucket: "metrics")
  |> range(start: -6h)
  |> filter(fn: (r) => r._measurement == "docker_container_mem")
  |> filter(fn: (r) => r._field == "usage_percent")
  |> group(columns: ["container_name"])

<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 200" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="200" rx="12" fill="#1a1a2e"/><path d="M100,30 L500,30 L460,65 L140,65 Z" fill="#3b82f6" opacity="0.8"/><text x="300" y="53" text-anchor="middle" fill="#ffffff" font-size="11" font-family="system-ui">Unoptimized Code — 2000ms</text><path d="M140,70 L460,70 L420,105 L180,105 Z" fill="#6366f1" opacity="0.8"/><text x="300" y="93" text-anchor="middle" fill="#ffffff" font-size="11" font-family="system-ui">+ Caching — 800ms</text><path d="M180,110 L420,110 L380,145 L220,145 Z" fill="#a855f7" opacity="0.8"/><text x="300" y="133" text-anchor="middle" fill="#ffffff" font-size="11" font-family="system-ui">+ CDN — 200ms</text><path d="M220,150 L380,150 L350,175 L250,175 Z" fill="#2dd4bf" opacity="0.9"/><text x="300" y="168" text-anchor="middle" fill="#1a1a2e" font-size="11" font-family="system-ui" font-weight="bold">Optimized — 50ms</text><text x="530" y="53" text-anchor="start" fill="#94a3b8" font-size="10" font-family="system-ui">Baseline</text><text x="445" y="93" text-anchor="start" fill="#2dd4bf" font-size="10" font-family="system-ui">-60%</text><text x="405" y="133" text-anchor="start" fill="#2dd4bf" font-size="10" font-family="system-ui">-90%</text><text x="365" y="168" text-anchor="start" fill="#2dd4bf" font-size="10" font-family="system-ui" font-weight="bold">-97.5%</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Performance optimization funnel: each layer of optimization compounds to dramatically reduce response times.</p></div>

VictoriaMetrics: The Efficient Alternative

VictoriaMetrics is a Prometheus-compatible time-series database that uses significantly less storage and memory. It accepts PromQL queries and Prometheus remote_write, making it a drop-in replacement.

services:
  victoriametrics:
    image: victoriametrics/victoria-metrics:v1.106.0
    container_name: victoriametrics
    command:
      - '--storageDataPath=/storage'
      - '--retentionPeriod=90d'
      - '--httpListenAddr=:8428'
    volumes:
      - vm-data:/storage
    mem_limit: 256m

  # Prometheus scrapes targets, remote_writes to VictoriaMetrics
  prometheus:
    image: prom/prometheus:v2.54.0
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    mem_limit: 256m
# prometheus.yml with remote_write to VictoriaMetrics
remote_write:
  - url: http://victoriametrics:8428/api/v1/write

VictoriaMetrics also provides vmagent (lightweight scraper) as a Prometheus replacement:

services:
  vmagent:
    image: victoriametrics/vmagent:v1.106.0
    command:
      - '--promscrape.config=/etc/prometheus/prometheus.yml'
      - '--remoteWrite.url=http://victoriametrics:8428/api/v1/write'
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    mem_limit: 64m

Comparison

Feature
Prometheus
InfluxDB
VictoriaMetrics

|---------|------------|----------|-----------------|

Data model
Pull (scrape)
Push (write API)
Pull + Push
Query language
PromQL
Flux
MetricsQL (PromQL superset)
Storage efficiency
Good
Good
Excellent (10x less)
RAM usage (1M series)
1-2GB
1-2GB
300-500MB
Disk usage (1M series/30d)
~10GB
~8GB
~1-2GB
High availability
Thanos / Cortex
Enterprise clustering
Built-in cluster
Long-term storage
Needs Thanos/Mimir
Built-in
Built-in
Ecosystem
Massive (exporters)
Telegraf + integrations
Prometheus-compatible
Setup complexity
Low
Low
Low
Grafana integration
Native
Native
Native (PromQL)
License
Apache 2.0
MIT (OSS) / Proprietary
Apache 2.0
Best for
Kubernetes/cloud-native
IoT, custom metrics
Resource-efficient monitoring

<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 180" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="180" rx="12" fill="#1a1a2e"/><rect x="30" y="55" width="90" height="50" rx="8" fill="#6366f1" opacity="0.9"/><text x="75" y="85" text-anchor="middle" fill="#ffffff" font-size="12" font-family="system-ui">Code</text><rect x="150" y="55" width="90" height="50" rx="8" fill="#3b82f6" opacity="0.9"/><text x="195" y="85" text-anchor="middle" fill="#ffffff" font-size="12" font-family="system-ui">Build</text><rect x="270" y="55" width="90" height="50" rx="8" fill="#a855f7" opacity="0.9"/><text x="315" y="85" text-anchor="middle" fill="#ffffff" font-size="12" font-family="system-ui">Test</text><rect x="390" y="55" width="90" height="50" rx="8" fill="#2dd4bf" opacity="0.9"/><text x="435" y="85" text-anchor="middle" fill="#1a1a2e" font-size="12" font-family="system-ui">Deploy</text><rect x="510" y="55" width="60" height="50" rx="8" fill="#f59e0b" opacity="0.9"/><text x="540" y="85" text-anchor="middle" fill="#1a1a2e" font-size="12" font-family="system-ui">Live</text><path d="M122,80 L148,80" stroke="#e2e8f0" stroke-width="2" marker-end="url(#arrow1)"/><path d="M242,80 L268,80" stroke="#e2e8f0" stroke-width="2" marker-end="url(#arrow1)"/><path d="M362,80 L388,80" stroke="#e2e8f0" stroke-width="2" marker-end="url(#arrow1)"/><path d="M482,80 L508,80" stroke="#e2e8f0" stroke-width="2" marker-end="url(#arrow1)"/><defs><marker id="arrow1" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6" fill="#e2e8f0"/></marker></defs><text x="300" y="145" text-anchor="middle" fill="#94a3b8" font-size="11" font-family="system-ui">Continuous Integration / Continuous Deployment Pipeline</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">A typical CI/CD pipeline: code flows through build, test, and deploy stages automatically.</p></div>

Our Monitoring Stack Recommendation

For small-to-medium self-hosted infrastructure:

Metrics: VictoriaMetrics (or Prometheus for ecosystem)
Logs: Grafana Loki + Promtail
Uptime: Uptime Kuma
Dashboards: Grafana
Alerting: Grafana Alerting → Ntfy/Slack

Total RAM: ~1GB for full observability of 50+ services.

At TechSaaS, we use Grafana + Loki + Promtail for log monitoring and Uptime Kuma for availability checking. For clients who need metrics monitoring, we deploy VictoriaMetrics with vmagent — it handles the same workload as Prometheus with one-fifth the memory and one-tenth the disk space. The PromQL compatibility means all existing Grafana dashboards work without changes.

#monitoring#prometheus#influxdb#victoriametrics#metrics#observability

Need the next owner and evidence step mapped?

Send the current system and deadline. Yash replies with the service path, first proof artifact, and handoff owner.