Log Management: ELK vs Loki vs Datadog — Cost, Scale, and Simplicity
Compare ELK Stack, Grafana Loki, and Datadog for log management. Storage costs, query performance, self-hosted vs SaaS, and when each makes sense.
One owner, one affected system, and the next buyer or recovery deadline mapped.
The Log Management Problem
Modern applications generate enormous volumes of logs. A single server running 50 Docker containers can produce gigabytes of logs per day. You need a system that:
<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 200" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="200" rx="12" fill="#1a1a2e"/><rect x="15" y="10" width="570" height="25" rx="6" fill="#6366f1" opacity="0.3"/><circle cx="30" cy="22" r="4" fill="#ef4444"/><circle cx="42" cy="22" r="4" fill="#f59e0b"/><circle cx="54" cy="22" r="4" fill="#2dd4bf"/><text x="300" y="27" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Monitoring Dashboard</text><rect x="20" y="45" width="130" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="85" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">CPU Usage</text><text x="85" y="88" text-anchor="middle" fill="#2dd4bf" font-size="18" font-family="system-ui" font-weight="bold">23%</text><rect x="160" y="45" width="130" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="225" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">Memory</text><text x="225" y="88" text-anchor="middle" fill="#f59e0b" font-size="18" font-family="system-ui" font-weight="bold">6.2 GB</text><rect x="300" y="45" width="130" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="365" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">Requests/s</text><text x="365" y="88" text-anchor="middle" fill="#6366f1" font-size="18" font-family="system-ui" font-weight="bold">1.2K</text><rect x="440" y="45" width="140" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="510" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">Uptime</text><text x="510" y="88" text-anchor="middle" fill="#2dd4bf" font-size="18" font-family="system-ui" font-weight="bold">99.9%</text><rect x="20" y="110" width="560" height="80" rx="6" fill="#6366f1" opacity="0.1"/><text x="45" y="125" fill="#94a3b8" font-size="8" font-family="system-ui">Response Time (ms)</text><polyline points="40,170 80,155 120,160 160,140 200,145 240,135 280,150 320,130 360,125 400,140 440,120 480,115 520,125 560,110" fill="none" stroke="#6366f1" stroke-width="2"/><polyline points="40,170 80,155 120,160 160,140 200,145 240,135 280,150 320,130 360,125 400,140 440,120 480,115 520,125 560,110" fill="url(#chartGrad)" stroke="none" opacity="0.3"/><defs><linearGradient id="chartGrad" x1="0" y1="0" x2="0" y2="1"><stop offset="0%" stop-color="#6366f1"/><stop offset="100%" stop-color="transparent"/></linearGradient></defs><line x1="40" y1="130" x2="560" y2="130" stroke="#e2e8f0" stroke-width="0.3" opacity="0.2"/><line x1="40" y1="150" x2="560" y2="150" stroke="#e2e8f0" stroke-width="0.3" opacity="0.2"/><line x1="40" y1="170" x2="560" y2="170" stroke="#e2e8f0" stroke-width="0.3" opacity="0.2"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Real-time monitoring dashboard showing CPU, memory, request rate, and response time trends.</p></div>
ELK Stack: The Established Giant
ELK (Elasticsearch, Logstash, Kibana) has been the standard for log management since 2012. It is incredibly powerful but resource-hungry.
Architecture:
Applications → Filebeat → Logstash → Elasticsearch → Kibana
(or Filebeat → Elasticsearch directly)# docker-compose.yml for ELK
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.15.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
volumes:
- es-data:/usr/share/elasticsearch/data
mem_limit: 2g
kibana:
image: docker.elastic.co/kibana/kibana:8.15.0
environment:
ELASTICSEARCH_HOSTS: http://elasticsearch:9200
ports:
- "5601:5601"
mem_limit: 512m
filebeat:
image: docker.elastic.co/beats/filebeat:8.15.0
volumes:
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
mem_limit: 256mElasticsearch query example:
{
"query": {
"bool": {
"must": [
{ "match": { "container.name": "api-server" } },
{ "range": { "@timestamp": { "gte": "now-1h" } } }
],
"filter": [
{ "term": { "level": "error" } }
]
}
},
"sort": [{ "@timestamp": { "order": "desc" } }],
"size": 100
}Grafana Loki: The Lightweight Alternative
Loki is designed by Grafana Labs as a "Prometheus for logs." Unlike Elasticsearch, Loki does not index log content — it only indexes labels (metadata). This makes it dramatically cheaper to run.
Architecture:
Applications → Promtail → Loki → Grafana
(or Alloy, Vector, Fluentd)# docker-compose.yml for Loki stack
services:
loki:
image: grafana/loki:3.3.0
command: -config.file=/etc/loki/config.yaml
volumes:
- ./loki/config.yaml:/etc/loki/config.yaml
- loki-data:/loki
mem_limit: 256m
promtail:
image: grafana/promtail:3.3.0
command: -config.file=/etc/promtail/config.yaml
volumes:
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- ./promtail/config.yaml:/etc/promtail/config.yaml
mem_limit: 128m
grafana:
image: grafana/grafana:11.4.0
environment:
GF_AUTH_ANONYMOUS_ENABLED: "true"
ports:
- "3000:3000"
mem_limit: 256mLoki configuration:
# loki/config.yaml
auth_enabled: false
server:
http_listen_port: 3100
common:
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
replication_factor: 1
path_prefix: /loki
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
filesystem:
directory: /loki/chunks
limits_config:
retention_period: 30d
max_query_length: 30d
compactor:
working_directory: /loki/compactor
retention_enabled: trueLogQL query examples:
# All errors from api container
{container="api-server"} |= "error"
# Parse JSON logs and filter
{job="docker"} | json | level="error" | status >= 500
# Count errors per minute
count_over_time({container="api-server"} |= "error" [1m])
# Top 10 error messages
topk(10, sum by (message) (count_over_time({container="api-server"} | json | level="error" [1h])))<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 190" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="190" rx="12" fill="#0d1117"/><rect x="0" y="0" width="600" height="28" rx="12" fill="#1c2333"/><rect x="0" y="12" width="600" height="16" fill="#1c2333"/><circle cx="18" cy="14" r="5" fill="#ef4444"/><circle cx="34" cy="14" r="5" fill="#f59e0b"/><circle cx="50" cy="14" r="5" fill="#2dd4bf"/><text x="300" y="18" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="monospace">Terminal</text><text x="20" y="50" fill="#2dd4bf" font-size="11" font-family="monospace">$</text><text x="35" y="50" fill="#e2e8f0" font-size="11" font-family="monospace">docker compose up -d</text><text x="20" y="70" fill="#94a3b8" font-size="11" font-family="monospace">[+] Running 5/5</text><text x="20" y="88" fill="#2dd4bf" font-size="10" font-family="monospace"> ✓</text><text x="38" y="88" fill="#94a3b8" font-size="10" font-family="monospace">Network app_default Created</text><text x="20" y="106" fill="#2dd4bf" font-size="10" font-family="monospace"> ✓</text><text x="38" y="106" fill="#94a3b8" font-size="10" font-family="monospace">Container web Started</text><text x="20" y="124" fill="#2dd4bf" font-size="10" font-family="monospace"> ✓</text><text x="38" y="124" fill="#94a3b8" font-size="10" font-family="monospace">Container api Started</text><text x="20" y="142" fill="#2dd4bf" font-size="10" font-family="monospace"> ✓</text><text x="38" y="142" fill="#94a3b8" font-size="10" font-family="monospace">Container db Started</text><text x="20" y="165" fill="#2dd4bf" font-size="11" font-family="monospace">$</text><rect x="35" y="155" width="8" height="14" fill="#e2e8f0" opacity="0.7"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Docker Compose brings up your entire stack with a single command.</p></div>
Datadog: The SaaS Powerhouse
Datadog is a cloud-hosted observability platform that combines logs, metrics, traces, and more in a single pane. No infrastructure to manage.
# Docker agent for Datadog
services:
datadog-agent:
image: gcr.io/datadoghq/agent:7
environment:
DD_API_KEY: your-api-key
DD_LOGS_ENABLED: "true"
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL: "true"
DD_CONTAINER_EXCLUDE: "name:datadog-agent"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /proc/:/host/proc/:ro
- /sys/fs/cgroup/:/host/sys/fs/cgroup:roComparison
|---------|-----------|--------------|---------|
Resource Usage: Real Numbers
Running the same workload (50 containers, ~5GB logs/day):
|-----------|-----------|------------|
Loki uses 5x less RAM and 5x less disk for the same log volume. The tradeoff: no full-text search. You grep through logs instead of searching an inverted index.
<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 170" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="170" rx="12" fill="#1a1a2e"/><path d="M80,90 Q80,50 120,50 Q130,30 160,35 Q190,25 200,50 Q230,45 230,70 Q240,90 210,95 L100,95 Q70,95 80,90 Z" fill="none" stroke="#3b82f6" stroke-width="1.5"/><text x="155" y="75" text-anchor="middle" fill="#3b82f6" font-size="11" font-family="system-ui">Cloud</text><text x="155" y="120" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">$5,000/mo</text><defs><marker id="arrow9" markerWidth="10" markerHeight="7" refX="10" refY="3.5" orient="auto"><path d="M0,0 L10,3.5 L0,7" fill="#2dd4bf"/></marker></defs><line x1="245" y1="70" x2="340" y2="70" stroke="#2dd4bf" stroke-width="2.5" marker-end="url(#arrow9)"/><text x="293" y="60" text-anchor="middle" fill="#2dd4bf" font-size="10" font-family="system-ui" font-weight="bold">Migrate</text><rect x="355" y="35" width="180" height="70" rx="8" fill="none" stroke="#6366f1" stroke-width="2"/><rect x="365" y="45" width="160" height="15" rx="3" fill="#6366f1" opacity="0.7"/><rect x="365" y="65" width="160" height="15" rx="3" fill="#a855f7" opacity="0.7"/><rect x="365" y="85" width="100" height="10" rx="2" fill="#2dd4bf" opacity="0.5"/><text x="445" y="57" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Bare Metal</text><text x="445" y="77" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Docker + LXC</text><text x="445" y="120" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">$200/mo</text><text x="300" y="150" text-anchor="middle" fill="#2dd4bf" font-size="11" font-family="system-ui" font-weight="bold">96% cost reduction</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Cloud to self-hosted migration can dramatically reduce infrastructure costs while maintaining full control.</p></div>
When to Choose Each
Choose ELK when: You need full-text search across log content, you have a dedicated ops team, you process 100GB+ logs/day, or you need complex log analytics.
Choose Loki when: You want minimal resource usage, you already use Grafana, label-based filtering is sufficient, or you are cost-conscious about storage.
Choose Datadog when: You do not want to manage infrastructure, you need integrated logs+metrics+traces, your team is small, or your budget allows SaaS pricing.
At TechSaaS, we run Loki + Promtail + Grafana for our entire log stack. It uses about 480MB total RAM for 50+ containers, and the integration with Grafana gives us dashboards, alerts, and log exploration in one place. The total footprint of our observability stack (Loki + Promtail + Grafana) is 127MB of Docker images. For most self-hosted infrastructure, Loki is the clear winner.
Need the next owner and evidence step mapped?
Send the current system and deadline. Yash replies with the service path, first proof artifact, and handoff owner.