Log Management: ELK vs Loki vs Datadog — Cost, Scale, and Simplicity
Compare ELK Stack, Grafana Loki, and Datadog for log management. Storage costs, query performance, self-hosted vs SaaS, and when each makes sense.
The Log Management Problem
Modern applications generate enormous volumes of logs. A single server running 50 Docker containers can produce gigabytes of logs per day. You need a system that:
<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 200" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="200" rx="12" fill="#1a1a2e"/><rect x="15" y="10" width="570" height="25" rx="6" fill="#6366f1" opacity="0.3"/><circle cx="30" cy="22" r="4" fill="#ef4444"/><circle cx="42" cy="22" r="4" fill="#f59e0b"/><circle cx="54" cy="22" r="4" fill="#2dd4bf"/><text x="300" y="27" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Monitoring Dashboard</text><rect x="20" y="45" width="130" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="85" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">CPU Usage</text><text x="85" y="88" text-anchor="middle" fill="#2dd4bf" font-size="18" font-family="system-ui" font-weight="bold">23%</text><rect x="160" y="45" width="130" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="225" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">Memory</text><text x="225" y="88" text-anchor="middle" fill="#f59e0b" font-size="18" font-family="system-ui" font-weight="bold">6.2 GB</text><rect x="300" y="45" width="130" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="365" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">Requests/s</text><text x="365" y="88" text-anchor="middle" fill="#6366f1" font-size="18" font-family="system-ui" font-weight="bold">1.2K</text><rect x="440" y="45" width="140" height="55" rx="6" fill="#6366f1" opacity="0.2"/><text x="510" y="65" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">Uptime</text><text x="510" y="88" text-anchor="middle" fill="#2dd4bf" font-size="18" font-family="system-ui" font-weight="bold">99.9%</text><rect x="20" y="110" width="560" height="80" rx="6" fill="#6366f1" opacity="0.1"/><text x="45" y="125" fill="#94a3b8" font-size="8" font-family="system-ui">Response Time (ms)</text><polyline points="40,170 80,155 120,160 160,140 200,145 240,135 280,150 320,130 360,125 400,140 440,120 480,115 520,125 560,110" fill="none" stroke="#6366f1" stroke-width="2"/><polyline points="40,170 80,155 120,160 160,140 200,145 240,135 280,150 320,130 360,125 400,140 440,120 480,115 520,125 560,110" fill="url(#chartGrad)" stroke="none" opacity="0.3"/><defs><linearGradient id="chartGrad" x1="0" y1="0" x2="0" y2="1"><stop offset="0%" stop-color="#6366f1"/><stop offset="100%" stop-color="transparent"/></linearGradient></defs><line x1="40" y1="130" x2="560" y2="130" stroke="#e2e8f0" stroke-width="0.3" opacity="0.2"/><line x1="40" y1="150" x2="560" y2="150" stroke="#e2e8f0" stroke-width="0.3" opacity="0.2"/><line x1="40" y1="170" x2="560" y2="170" stroke="#e2e8f0" stroke-width="0.3" opacity="0.2"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Real-time monitoring dashboard showing CPU, memory, request rate, and response time trends.</p></div>
ELK Stack: The Established Giant
ELK (Elasticsearch, Logstash, Kibana) has been the standard for log management since 2012. It is incredibly powerful but resource-hungry.
Architecture:
Applications → Filebeat → Logstash → Elasticsearch → Kibana
(or Filebeat → Elasticsearch directly)# docker-compose.yml for ELK
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.15.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
volumes:
- es-data:/usr/share/elasticsearch/data
mem_limit: 2g
kibana:
image: docker.elastic.co/kibana/kibana:8.15.0
environment:
ELASTICSEARCH_HOSTS: http://elasticsearch:9200
ports:
- "5601:5601"
mem_limit: 512m
filebeat:
image: docker.elastic.co/beats/filebeat:8.15.0
volumes:
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
mem_limit: 256mElasticsearch query example:
{
"query": {
"bool": {
"must": [
{ "match": { "container.name": "api-server" } },
{ "range": { "@timestamp": { "gte": "now-1h" } } }
],
"filter": [
{ "term": { "level": "error" } }
]
}
},
"sort": [{ "@timestamp": { "order": "desc" } }],
"size": 100
}Grafana Loki: The Lightweight Alternative
Loki is designed by Grafana Labs as a "Prometheus for logs." Unlike Elasticsearch, Loki does not index log content — it only indexes labels (metadata). This makes it dramatically cheaper to run.
Architecture:
Applications → Promtail → Loki → Grafana
(or Alloy, Vector, Fluentd)# docker-compose.yml for Loki stack
services:
loki:
image: grafana/loki:3.3.0
command: -config.file=/etc/loki/config.yaml
volumes:
- ./loki/config.yaml:/etc/loki/config.yaml
- loki-data:/loki
mem_limit: 256m
promtail:
image: grafana/promtail:3.3.0
command: -config.file=/etc/promtail/config.yaml
volumes:
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- ./promtail/config.yaml:/etc/promtail/config.yaml
mem_limit: 128m
grafana:
image: grafana/grafana:11.4.0
environment:
GF_AUTH_ANONYMOUS_ENABLED: "true"
ports:
- "3000:3000"
mem_limit: 256mLoki configuration:
# loki/config.yaml
auth_enabled: false
server:
http_listen_port: 3100
common:
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
replication_factor: 1
path_prefix: /loki
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
filesystem:
directory: /loki/chunks
limits_config:
retention_period: 30d
max_query_length: 30d
compactor:
working_directory: /loki/compactor
retention_enabled: trueLogQL query examples:
# All errors from api container
{container="api-server"} |= "error"
# Parse JSON logs and filter
{job="docker"} | json | level="error" | status >= 500
# Count errors per minute
count_over_time({container="api-server"} |= "error" [1m])
# Top 10 error messages
topk(10, sum by (message) (count_over_time({container="api-server"} | json | level="error" [1h])))<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 190" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="190" rx="12" fill="#0d1117"/><rect x="0" y="0" width="600" height="28" rx="12" fill="#1c2333"/><rect x="0" y="12" width="600" height="16" fill="#1c2333"/><circle cx="18" cy="14" r="5" fill="#ef4444"/><circle cx="34" cy="14" r="5" fill="#f59e0b"/><circle cx="50" cy="14" r="5" fill="#2dd4bf"/><text x="300" y="18" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="monospace">Terminal</text><text x="20" y="50" fill="#2dd4bf" font-size="11" font-family="monospace">$</text><text x="35" y="50" fill="#e2e8f0" font-size="11" font-family="monospace">docker compose up -d</text><text x="20" y="70" fill="#94a3b8" font-size="11" font-family="monospace">[+] Running 5/5</text><text x="20" y="88" fill="#2dd4bf" font-size="10" font-family="monospace"> ✓</text><text x="38" y="88" fill="#94a3b8" font-size="10" font-family="monospace">Network app_default Created</text><text x="20" y="106" fill="#2dd4bf" font-size="10" font-family="monospace"> ✓</text><text x="38" y="106" fill="#94a3b8" font-size="10" font-family="monospace">Container web Started</text><text x="20" y="124" fill="#2dd4bf" font-size="10" font-family="monospace"> ✓</text><text x="38" y="124" fill="#94a3b8" font-size="10" font-family="monospace">Container api Started</text><text x="20" y="142" fill="#2dd4bf" font-size="10" font-family="monospace"> ✓</text><text x="38" y="142" fill="#94a3b8" font-size="10" font-family="monospace">Container db Started</text><text x="20" y="165" fill="#2dd4bf" font-size="11" font-family="monospace">$</text><rect x="35" y="155" width="8" height="14" fill="#e2e8f0" opacity="0.7"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Docker Compose brings up your entire stack with a single command.</p></div>
Datadog: The SaaS Powerhouse
Datadog is a cloud-hosted observability platform that combines logs, metrics, traces, and more in a single pane. No infrastructure to manage.
# Docker agent for Datadog
services:
datadog-agent:
image: gcr.io/datadoghq/agent:7
environment:
DD_API_KEY: your-api-key
DD_LOGS_ENABLED: "true"
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL: "true"
DD_CONTAINER_EXCLUDE: "name:datadog-agent"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /proc/:/host/proc/:ro
- /sys/fs/cgroup/:/host/sys/fs/cgroup:roComparison
|---------|-----------|--------------|---------|
Resource Usage: Real Numbers
Running the same workload (50 containers, ~5GB logs/day):
|-----------|-----------|------------|
Loki uses 5x less RAM and 5x less disk for the same log volume. The tradeoff: no full-text search. You grep through logs instead of searching an inverted index.
<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 170" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="170" rx="12" fill="#1a1a2e"/><path d="M80,90 Q80,50 120,50 Q130,30 160,35 Q190,25 200,50 Q230,45 230,70 Q240,90 210,95 L100,95 Q70,95 80,90 Z" fill="none" stroke="#3b82f6" stroke-width="1.5"/><text x="155" y="75" text-anchor="middle" fill="#3b82f6" font-size="11" font-family="system-ui">Cloud</text><text x="155" y="120" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">$5,000/mo</text><defs><marker id="arrow9" markerWidth="10" markerHeight="7" refX="10" refY="3.5" orient="auto"><path d="M0,0 L10,3.5 L0,7" fill="#2dd4bf"/></marker></defs><line x1="245" y1="70" x2="340" y2="70" stroke="#2dd4bf" stroke-width="2.5" marker-end="url(#arrow9)"/><text x="293" y="60" text-anchor="middle" fill="#2dd4bf" font-size="10" font-family="system-ui" font-weight="bold">Migrate</text><rect x="355" y="35" width="180" height="70" rx="8" fill="none" stroke="#6366f1" stroke-width="2"/><rect x="365" y="45" width="160" height="15" rx="3" fill="#6366f1" opacity="0.7"/><rect x="365" y="65" width="160" height="15" rx="3" fill="#a855f7" opacity="0.7"/><rect x="365" y="85" width="100" height="10" rx="2" fill="#2dd4bf" opacity="0.5"/><text x="445" y="57" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Bare Metal</text><text x="445" y="77" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Docker + LXC</text><text x="445" y="120" text-anchor="middle" fill="#94a3b8" font-size="9" font-family="system-ui">$200/mo</text><text x="300" y="150" text-anchor="middle" fill="#2dd4bf" font-size="11" font-family="system-ui" font-weight="bold">96% cost reduction</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Cloud to self-hosted migration can dramatically reduce infrastructure costs while maintaining full control.</p></div>
When to Choose Each
Choose ELK when: You need full-text search across log content, you have a dedicated ops team, you process 100GB+ logs/day, or you need complex log analytics.
Choose Loki when: You want minimal resource usage, you already use Grafana, label-based filtering is sufficient, or you are cost-conscious about storage.
Choose Datadog when: You do not want to manage infrastructure, you need integrated logs+metrics+traces, your team is small, or your budget allows SaaS pricing.
At TechSaaS, we run Loki + Promtail + Grafana for our entire log stack. It uses about 480MB total RAM for 50+ containers, and the integration with Grafana gives us dashboards, alerts, and log exploration in one place. The total footprint of our observability stack (Loki + Promtail + Grafana) is 127MB of Docker images. For most self-hosted infrastructure, Loki is the clear winner.
Need help with devops?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.