Infrastructure Monitoring: Prometheus vs InfluxDB vs VictoriaMetrics
Compare Prometheus, InfluxDB, and VictoriaMetrics for infrastructure monitoring. Storage efficiency, PromQL, cardinality handling, and self-hosted...
Monitoring Is Not Optional
If you cannot measure it, you cannot manage it. Infrastructure monitoring tells you:
<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 200" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="200" rx="12" fill="#1a1a2e"/><rect x="15" y="10" width="570" height="25" rx="6" fill="#6366f1" opacity="0.3"/><circle cx="30" cy="22" r="4" fill="#ef4444"/><circle cx="42" cy="22" r="4" fill="#f59e0b"/><circle cx="54" cy="22" r="4" fill="#2dd4bf"/><text x="300" y="27" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Monitoring Dashboard</text><rect x="20" y="45" width="130" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="85" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">CPU Usage</text><text x="85" y="88" text-anchor="middle" fill="#2dd4bf" font-size="18" font-family="system-ui" font-weight="bold">23%</text><rect x="160" y="45" width="130" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="225" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">Memory</text><text x="225" y="88" text-anchor="middle" fill="#f59e0b" font-size="18" font-family="system-ui" font-weight="bold">6.2 GB</text><rect x="300" y="45" width="130" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="365" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">Requests/s</text><text x="365" y="88" text-anchor="middle" fill="#6366f1" font-size="18" font-family="system-ui" font-weight="bold">1.2K</text><rect x="440" y="45" width="140" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="510" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">Uptime</text><text x="510" y="88" text-anchor="middle" fill="#2dd4bf" font-size="18" font-family="system-ui" font-weight="bold">99.9%</text><rect x="20" y="110" width="560" height="80" rx="6" fill="#6366f1" opacity="0.1"/><text x="45" y="125" fill="#94a3b8" font-size="8" font-family="system-ui">Response Time (ms)</text><polyline points="40,170 80,155 120,160 160,140 200,145 240,135 280,150 320,130 360,125 400,140 440,120 480,115 520,125 560,110" fill="none" stroke="#6366f1" stroke-width="2"/><polyline points="40,170 80,155 120,160 160,140 200,145 240,135 280,150 320,130 360,125 400,140 440,120 480,115 520,125 560,110" fill="url(#chartGrad)" stroke="none" opacity="0.3"/><defs><linearGradient id="chartGrad" x1="0" y1="0" x2="0" y2="1"><stop offset="0%" stop-color="#6366f1"/><stop offset="100%" stop-color="transparent"/></linearGradient></defs><line x1="40" y1="130" x2="560" y2="130" stroke="#e2e8f0" stroke-width="0.3" opacity="0.2"/><line x1="40" y1="150" x2="560" y2="150" stroke="#e2e8f0" stroke-width="0.3" opacity="0.2"/><line x1="40" y1="170" x2="560" y2="170" stroke="#e2e8f0" stroke-width="0.3" opacity="0.2"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Real-time monitoring dashboard showing CPU, memory, request rate, and response time trends.</p></div>
Prometheus: The Cloud-Native Standard
Prometheus is the CNCF-graduated monitoring system. It uses a pull-based model (scrapes targets) and the powerful PromQL query language.
# docker-compose.yml
services:
prometheus:
image: prom/prometheus:v2.54.0
container_name: prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
- '--web.enable-lifecycle'
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
mem_limit: 512m# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'docker'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'traefik'
static_configs:
- targets: ['traefik:8080']
metrics_path: /metrics
- job_name: 'postgres'
static_configs:
- targets: ['postgres-exporter:9187']
# Service discovery for Docker containers
- job_name: 'docker-services'
docker_sd_configs:
- host: unix:///var/run/docker.sock
relabel_configs:
- source_labels: [__meta_docker_container_label_prometheus_scrape]
regex: "true"
action: keepPromQL examples:
# CPU usage percentage per container
100 - (rate(container_cpu_usage_seconds_total[5m]) * 100)
# Memory usage percentage
container_memory_usage_bytes / container_spec_memory_limit_bytes * 100
# HTTP request rate per service
rate(traefik_service_requests_total[5m])
# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Disk usage prediction (when will disk be full?)
predict_linear(node_filesystem_avail_bytes{mountpoint="/"}[1h], 24*3600) < 0
# Alert: High error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05Alerting rules:
# alerts.yml
groups:
- name: infrastructure
rules:
- alert: HighCPU
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ .Labels.instance }}"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
for: 10m
labels:
severity: critical
annotations:
summary: "Disk space below 15% on {{ .Labels.instance }}"
- alert: ContainerDown
expr: absent(container_last_seen{name=~"traefik|postgres|redis"}) == 1
for: 1m
labels:
severity: critical
annotations:
summary: "Critical container {{ .Labels.name }} is down"InfluxDB: The Time-Series Database
InfluxDB is a purpose-built time-series database with its own query language (Flux) and an HTTP write API (push-based).
services:
influxdb:
image: influxdb:2.7-alpine
environment:
DOCKER_INFLUXDB_INIT_MODE: setup
DOCKER_INFLUXDB_INIT_USERNAME: admin
DOCKER_INFLUXDB_INIT_PASSWORD: supersecret
DOCKER_INFLUXDB_INIT_ORG: techsaas
DOCKER_INFLUXDB_INIT_BUCKET: metrics
volumes:
- influxdb-data:/var/lib/influxdb2
mem_limit: 512m
telegraf:
image: telegraf:1.32-alpine
volumes:
- ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
mem_limit: 128m# telegraf.conf
[agent]
interval = "10s"
flush_interval = "10s"
[[outputs.influxdb_v2]]
urls = ["http://influxdb:8086"]
token = "your-token"
organization = "techsaas"
bucket = "metrics"
[[inputs.cpu]]
percpu = true
totalcpu = true
[[inputs.mem]]
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs"]
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
container_names = []
timeout = "5s"
[[inputs.net]]Flux query examples:
// CPU usage over last hour
from(bucket: "metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "cpu")
|> filter(fn: (r) => r._field == "usage_idle")
|> filter(fn: (r) => r.cpu == "cpu-total")
|> map(fn: (r) => ({r with _value: 100.0 - r._value}))
|> aggregateWindow(every: 1m, fn: mean)
// Container memory usage
from(bucket: "metrics")
|> range(start: -6h)
|> filter(fn: (r) => r._measurement == "docker_container_mem")
|> filter(fn: (r) => r._field == "usage_percent")
|> group(columns: ["container_name"])<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 200" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="200" rx="12" fill="#1a1a2e"/><path d="M100,30 L500,30 L460,65 L140,65 Z" fill="#3b82f6" opacity="0.8"/><text x="300" y="53" text-anchor="middle" fill="#ffffff" font-size="11" font-family="system-ui">Unoptimized Code — 2000ms</text><path d="M140,70 L460,70 L420,105 L180,105 Z" fill="#6366f1" opacity="0.8"/><text x="300" y="93" text-anchor="middle" fill="#ffffff" font-size="11" font-family="system-ui">+ Caching — 800ms</text><path d="M180,110 L420,110 L380,145 L220,145 Z" fill="#a855f7" opacity="0.8"/><text x="300" y="133" text-anchor="middle" fill="#ffffff" font-size="11" font-family="system-ui">+ CDN — 200ms</text><path d="M220,150 L380,150 L350,175 L250,175 Z" fill="#2dd4bf" opacity="0.9"/><text x="300" y="168" text-anchor="middle" fill="#1a1a2e" font-size="11" font-family="system-ui" font-weight="bold">Optimized — 50ms</text><text x="530" y="53" text-anchor="start" fill="#94a3b8" font-size="10" font-family="system-ui">Baseline</text><text x="445" y="93" text-anchor="start" fill="#2dd4bf" font-size="10" font-family="system-ui">-60%</text><text x="405" y="133" text-anchor="start" fill="#2dd4bf" font-size="10" font-family="system-ui">-90%</text><text x="365" y="168" text-anchor="start" fill="#2dd4bf" font-size="10" font-family="system-ui" font-weight="bold">-97.5%</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Performance optimization funnel: each layer of optimization compounds to dramatically reduce response times.</p></div>
VictoriaMetrics: The Efficient Alternative
VictoriaMetrics is a Prometheus-compatible time-series database that uses significantly less storage and memory. It accepts PromQL queries and Prometheus remote_write, making it a drop-in replacement.
services:
victoriametrics:
image: victoriametrics/victoria-metrics:v1.106.0
container_name: victoriametrics
command:
- '--storageDataPath=/storage'
- '--retentionPeriod=90d'
- '--httpListenAddr=:8428'
volumes:
- vm-data:/storage
mem_limit: 256m
# Prometheus scrapes targets, remote_writes to VictoriaMetrics
prometheus:
image: prom/prometheus:v2.54.0
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
mem_limit: 256m# prometheus.yml with remote_write to VictoriaMetrics
remote_write:
- url: http://victoriametrics:8428/api/v1/writeVictoriaMetrics also provides vmagent (lightweight scraper) as a Prometheus replacement:
services:
vmagent:
image: victoriametrics/vmagent:v1.106.0
command:
- '--promscrape.config=/etc/prometheus/prometheus.yml'
- '--remoteWrite.url=http://victoriametrics:8428/api/v1/write'
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
mem_limit: 64mComparison
|---------|------------|----------|-----------------|
<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 180" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="180" rx="12" fill="#1a1a2e"/><rect x="30" y="55" width="90" height="50" rx="8" fill="#6366f1" opacity="0.9"/><text x="75" y="85" text-anchor="middle" fill="#ffffff" font-size="12" font-family="system-ui">Code</text><rect x="150" y="55" width="90" height="50" rx="8" fill="#3b82f6" opacity="0.9"/><text x="195" y="85" text-anchor="middle" fill="#ffffff" font-size="12" font-family="system-ui">Build</text><rect x="270" y="55" width="90" height="50" rx="8" fill="#a855f7" opacity="0.9"/><text x="315" y="85" text-anchor="middle" fill="#ffffff" font-size="12" font-family="system-ui">Test</text><rect x="390" y="55" width="90" height="50" rx="8" fill="#2dd4bf" opacity="0.9"/><text x="435" y="85" text-anchor="middle" fill="#1a1a2e" font-size="12" font-family="system-ui">Deploy</text><rect x="510" y="55" width="60" height="50" rx="8" fill="#f59e0b" opacity="0.9"/><text x="540" y="85" text-anchor="middle" fill="#1a1a2e" font-size="12" font-family="system-ui">Live</text><path d="M122,80 L148,80" stroke="#e2e8f0" stroke-width="2" marker-end="url(#arrow1)"/><path d="M242,80 L268,80" stroke="#e2e8f0" stroke-width="2" marker-end="url(#arrow1)"/><path d="M362,80 L388,80" stroke="#e2e8f0" stroke-width="2" marker-end="url(#arrow1)"/><path d="M482,80 L508,80" stroke="#e2e8f0" stroke-width="2" marker-end="url(#arrow1)"/><defs><marker id="arrow1" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6" fill="#e2e8f0"/></marker></defs><text x="300" y="145" text-anchor="middle" fill="#94a3b8" font-size="11" font-family="system-ui">Continuous Integration / Continuous Deployment Pipeline</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">A typical CI/CD pipeline: code flows through build, test, and deploy stages automatically.</p></div>
Our Monitoring Stack Recommendation
For small-to-medium self-hosted infrastructure:
Metrics: VictoriaMetrics (or Prometheus for ecosystem)
Logs: Grafana Loki + Promtail
Uptime: Uptime Kuma
Dashboards: Grafana
Alerting: Grafana Alerting → Ntfy/SlackTotal RAM: ~1GB for full observability of 50+ services.
At TechSaaS, we use Grafana + Loki + Promtail for log monitoring and Uptime Kuma for availability checking. For clients who need metrics monitoring, we deploy VictoriaMetrics with vmagent — it handles the same workload as Prometheus with one-fifth the memory and one-tenth the disk space. The PromQL compatibility means all existing Grafana dashboards work without changes.
Need help with devops?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.