DORA Metrics for Platform Engineering: What Your Dashboard Should Actually Measure
Every platform engineering team has a DORA metrics dashboard. Most of them are lying.
# DORA Metrics for Platform Engineering: What Your Dashboard Should Actually Measure
Every platform engineering team has a DORA metrics dashboard. Most of them are lying.
Deployment frequency of 47/day looks great until you realize 40 of those are config changes to a feature flag service. Lead time of 2 hours looks fast until you realize it's measuring time from merge to deploy, not time from first commit to production.
Here's how to build a DORA dashboard that actually tells you something useful.
The Four Metrics (And What They Actually Mean)
1. Deployment Frequency
What people measure: COUNT(deployments) / time What you should measure: COUNT(meaningful_deployments) / time
A meaningful deployment changes user-facing behavior. Config changes, dependency bumps, and CI fixes don't count.
# Bad: counts everything
sum(increase(deployments_total[24h]))
# Better: filter by deployment type
sum(increase(deployments_total{type="feature"}[24h]))
+ sum(increase(deployments_total{type="bugfix"}[24h]))2. Lead Time for Changes
What people measure: Merge to deploy What you should measure: First commit to production traffic
The time from a developer's first commit to when real users hit the new code. This captures code review wait time, CI queue time, staging validation, and rollout duration — all the friction your platform creates.
# Capture the full pipeline
histogram_quantile(0.50,
sum(rate(lead_time_seconds_bucket{
stage="first_commit_to_production"
}[7d])) by (le)
)3. Change Failure Rate
What people measure: failed_deploys / total_deploys What you should measure: deploys_causing_degradation / total_deploys
A deployment that fails CI and never reaches production isn't a change failure — it's CI working correctly. A deployment that passes everything but causes a 10% error rate spike IS a change failure.
4. Mean Time to Recovery (MTTR)
What people measure: Time from alert to resolution What you should measure: Time from user impact to user recovery
If your alerting has 15 minutes of lag, your MTTR looks 15 minutes better than reality. Measure from the moment error rates spike, not from when PagerDuty fires.
The Dashboard That Works
Panel 1: Weekly Deployment Velocity
Panel 2: Lead Time Distribution
Panel 3: Change Failure Rate Trend
Panel 4: MTTR by Severity
Panel 5: Platform Health Score
Composite metric combining all four DORA metrics into a single score:
score = (
normalize(deployment_freq, target=daily) * 0.25 +
normalize(1/lead_time_hours, target=24h) * 0.25 +
normalize(1-change_failure_rate, target=0.85) * 0.25 +
normalize(1/mttr_hours, target=1h) * 0.25
)Common Anti-Patterns
1. Gaming the Metrics
Teams split PRs into tiny changes to inflate deployment frequency. Fix: measure feature completion rate alongside deployment frequency.
2. Measuring Teams Against Each Other
DORA metrics are for teams to improve themselves, not for management to rank teams. Different services have legitimately different deployment profiles.
3. Ignoring Context
A team with 0 deployments during a security incident investigation isn't underperforming — they're doing the right thing. Always annotate metric dashboards with context.
4. Snapshot Obsession
Looking at this week's numbers in isolation tells you nothing. The trend over 3-6 months is what matters.
Implementation: Data Sources for Real DORA
The metrics above are only as good as the data feeding them. Here's where to get each metric:
Deployment Frequency:
feature, bugfix, config, dependency, infraLead Time:
production_deploy_time - first_commit_time for each PR/branchChange Failure Rate:
MTTR:
SPACE Framework: Beyond DORA
DORA measures delivery performance. SPACE (from Microsoft Research) adds developer experience:
The combination of DORA (system performance) + SPACE (human experience) gives you the full picture. A team with elite DORA metrics but 30% satisfaction is one resignation away from collapse.
Our Recommendation
Start with just two metrics: deployment frequency (filtered by type) and change failure rate. These are the easiest to instrument and the most actionable. Add lead time once you have the data pipeline working. Add MTTR when you have incident tracking mature enough to correlate with deploys.
The dashboard is not the goal. The goal is a team that ships faster with fewer failures. The dashboard just makes the trend visible so you can have evidence-based conversations about where to invest in your platform.
---
Want help building a DORA metrics dashboard that actually drives improvement? Book a free platform engineering consultationBook a free platform engineering consultationhttps://techsaas.cloud/contact or explore our DevOps servicesDevOps serviceshttps://techsaas.cloud/services.
Need help with general?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.