← All articlesinfrastructure

Kong vs Envoy vs Traefik: Choosing Your API Gateway Without Regret

Most teams pick their API gateway based on a blog post they read three years ago. Then they spend the next two years fighting it. We've deployed all three — Kong, Envoy, and Traefik — across different production environments, and the right choice depends on exactly three things:

Y
Yash Pritwani
7 min read read

# Kong vs Envoy vs Traefik: Choosing Your API Gateway Without Regret

Most teams pick their API gateway based on a blog post they read three years ago. Then they spend the next two years fighting it. We've deployed all three — Kong, Envoy, and Traefik — across different production environments, and the right choice depends on exactly three things: your team's operational capacity, your traffic patterns, and whether you need a gateway or a service mesh. Everything else is noise.

Here's the decision framework we actually use, backed by real benchmark data and battle scars.

The 30-Second Decision Matrix

Factor
Kong
Envoy
Traefik

|--------|------|-------|---------|

Best for
API-first orgs, plugin ecosystem
Service mesh, high-perf proxying
Docker/K8s native routing
Learning curve
Medium (Lua plugins)
Steep (xDS, WASM filters)
Low (YAML/TOML labels)
Config model
Admin API + DB or DB-less
xDS control plane or static YAML
File/Docker labels/K8s CRDs
Latency (p99)
2-4ms added
0.5-1.2ms added
1-2.5ms added
Throughput
~28K RPS (single node)
~45K RPS (single node)
~35K RPS (single node)
Team size needed
2-3 DevOps
3-5 platform eng
1-2 DevOps
Plugin ecosystem
100+ official/community
WASM + Lua + ext_proc
Middleware + plugins
License
Apache 2.0 / Enterprise
Apache 2.0
MIT

*Benchmarks: 4 vCPU, 8GB RAM, 1KB payload, wrk2 with 100 connections, 10 threads, measured at steady state.*

Kong: The API Management Platform

Kong shines when your gateway isn't just routing traffic — it's your API management layer. Authentication, rate limiting, request transformation, analytics — Kong has a plugin for it, and most of them work well out of the box.

When Kong Wins

You expose APIs to external consumers (partners, mobile apps, third parties)
You need API key management, OAuth2, or JWT validation at the edge
Your team wants a UI (Kong Manager) for non-engineers to manage routes
You need per-consumer rate limiting and analytics

Production Configuration

# kong.yml — DB-less declarative config
_format_version: "3.0"

services:
  - name: user-service
    url: http://user-svc.internal:8080
    connect_timeout: 5000
    write_timeout: 10000
    read_timeout: 15000
    retries: 3
    routes:
      - name: user-api
        paths:
          - /api/v1/users
        strip_path: false
        protocols:
          - https
    plugins:
      - name: rate-limiting
        config:
          minute: 60
          policy: redis
          redis_host: redis.internal
          redis_port: 6379
      - name: jwt
        config:
          claims_to_verify:
            - exp
      - name: correlation-id
        config:
          header_name: X-Request-ID
          generator: uuid

  - name: order-service
    url: http://order-svc.internal:8080
    routes:
      - name: order-api
        paths:
          - /api/v1/orders
    plugins:
      - name: rate-limiting
        config:
          minute: 30
          policy: redis
          redis_host: redis.internal
      - name: request-transformer
        config:
          add:
            headers:
              - "X-Gateway: kong"
              - "X-Forwarded-Prefix: /api/v1"

# Global plugins applied to all routes
plugins:
  - name: prometheus
    config:
      per_consumer: true
  - name: zipkin
    config:
      http_endpoint: http://jaeger.internal:9411/api/v2/spans
      sample_ratio: 0.1

Kong's Hidden Cost

The enterprise features — RBAC, developer portal, advanced analytics — require Kong Enterprise at $35K+/year. The open-source version is powerful but lacks the management plane. We've seen teams start with OSS Kong, build custom tooling around the Admin API, and end up spending more engineering time than the enterprise license would have cost.

Envoy: The Performance King

Envoy was built at Lyft to handle millions of requests per second across thousands of services. It adds less than a millisecond of latency in most configurations. But that performance comes with complexity — Envoy's configuration model assumes you have a control plane feeding it updates via the xDS protocol.

When Envoy Wins

You're building a service mesh (Envoy is the data plane for Istio, Consul Connect, etc.)
Sub-millisecond added latency matters (financial services, real-time bidding)
You need advanced traffic management (canary, shadow, fault injection)
Your platform team has 3+ engineers who can own the complexity

Production Configuration

# envoy.yaml — front proxy config
static_resources:
  listeners:
    - name: main_listener
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8443
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                codec_type: AUTO
                access_log:
                  - name: envoy.access_loggers.stdout
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                      log_format:
                        json_format:
                          timestamp: "%START_TIME%"
                          method: "%REQ(:METHOD)%"
                          path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
                          status: "%RESPONSE_CODE%"
                          duration_ms: "%DURATION%"
                          upstream: "%UPSTREAM_HOST%"
                route_config:
                  name: local_routes
                  virtual_hosts:
                    - name: api
                      domains: ["api.example.com"]
                      routes:
                        - match:
                            prefix: "/api/v1/users"
                          route:
                            cluster: user_service
                            timeout: 15s
                            retry_policy:
                              retry_on: "5xx,reset,connect-failure"
                              num_retries: 2
                              per_try_timeout: 5s
                        - match:
                            prefix: "/api/v1/orders"
                          route:
                            cluster: order_service
                            timeout: 10s
                http_filters:
                  - name: envoy.filters.http.local_ratelimit
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
                      stat_prefix: http_local_rate_limiter
                      token_bucket:
                        max_tokens: 1000
                        tokens_per_fill: 100
                        fill_interval: 1s
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
    - name: user_service
      type: STRICT_DNS
      lb_policy: LEAST_REQUEST
      load_assignment:
        cluster_name: user_service
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: user-svc.internal
                      port_value: 8080
      health_checks:
        - timeout: 2s
          interval: 10s
          unhealthy_threshold: 3
          healthy_threshold: 2
          http_health_check:
            path: /health
      circuit_breakers:
        thresholds:
          - max_connections: 512
            max_pending_requests: 128
            max_requests: 1024
            max_retries: 3

The Envoy Reality Check

That config above is 80 lines for two routes. Kong does the same in 40 lines. Traefik does it in 15. Envoy's verbosity is the tradeoff for precision — every timeout, every retry, every circuit breaker parameter is explicit. For teams that need that control, it's a feature. For teams that don't, it's a maintenance burden.

Traefik: The Docker-Native Choice

Traefik discovers services automatically via Docker labels or Kubernetes Ingress annotations. No Admin API calls, no xDS protocol, no separate config files — your service definition IS your routing configuration.

When Traefik Wins

Your stack is Docker Compose or Kubernetes-native
You want zero-config service discovery (containers appear, routes exist)
Your team is small (1-2 people managing infrastructure)
You need automatic Let's Encrypt certificates
You're running 10-50 services, not 500

Production Configuration

# docker-compose.yml — Traefik with service discovery
services:
  traefik:
    image: traefik:v3.2
    command:
      - "--api.dashboard=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.letsencrypt.acme.tlschallenge=true"
      - "[email protected]"
      - "--certificatesresolvers.letsencrypt.acme.storage=/data/acme.json"
      - "--metrics.prometheus=true"
      - "--accesslog=true"
      - "--accesslog.format=json"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - traefik-data:/data

  user-service:
    image: myapp/user-service:latest
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.users.rule=Host(`api.example.com`) && PathPrefix(`/api/v1/users`)"
      - "traefik.http.routers.users.tls.certresolver=letsencrypt"
      - "traefik.http.services.users.loadbalancer.server.port=8080"
      - "traefik.http.routers.users.middlewares=rate-limit,retry"
      - "traefik.http.middlewares.rate-limit.ratelimit.average=60"
      - "traefik.http.middlewares.rate-limit.ratelimit.burst=20"
      - "traefik.http.middlewares.retry.retry.attempts=3"
      - "traefik.http.services.users.loadbalancer.healthcheck.path=/health"
      - "traefik.http.services.users.loadbalancer.healthcheck.interval=10s"

That's the entire routing configuration. Deploy a new container with the right labels, and Traefik picks it up in seconds. No config reload, no API call.

Traefik's Ceiling

Traefik starts to strain at very high scale. Above 200 services or 40K+ RPS on a single node, you'll want to consider Envoy. Traefik also lacks Envoy's advanced traffic management — no traffic shadowing, limited canary support, and circuit breakers are basic compared to Envoy's per-host ejection.

The Decision Framework

Answer these five questions:

1. How many services are you routing to?

Under 50: Traefik
50-200: Kong or Traefik
200+: Envoy

2. Do you need API management (auth, rate limiting, analytics per consumer)?

Yes: Kong
No: Traefik or Envoy

3. What's your acceptable added latency?

<1ms: Envoy
1-3ms: Traefik
3-5ms: Kong

4. How large is your platform/DevOps team?

1-2 people: Traefik
2-4 people: Kong
4+ people: Any (but Envoy becomes viable)

5. Are you building a service mesh?

Yes: Envoy (as data plane)
No: Kong or Traefik

If you answered "Traefik" to 3+ questions, start with Traefik. Same logic for the others. When two are tied, default to the simpler option — you can always migrate up.

Performance Comparison: Real Numbers

We benchmarked all three under identical conditions: 4 vCPU, 8GB RAM, Ubuntu 24.04, proxying to a backend that returns a static 1KB JSON response.

Metric
Kong 3.8
Envoy 1.32
Traefik 3.2

|--------|----------|------------|-------------|

RPS (p50)
28,400
45,200
35,100
Latency p50
1.8ms
0.4ms
0.9ms
Latency p99
3.6ms
1.1ms
2.3ms
Memory (idle)
180MB
45MB
60MB
Memory (peak)
420MB
190MB
210MB
CPU (at 20K RPS)
65%
32%
48%
Config reload
~500ms
Hot reload
Hot reload

Envoy's performance advantage is real but matters less than you think. At 5K RPS, the difference between 0.4ms and 1.8ms added latency is invisible to users. It only becomes material above 20K RPS where cumulative resource usage diverges.

Our Recommendation

For 80% of teams reading this: start with Traefik. It has the lowest operational overhead, the fastest time-to-production, and handles more traffic than most teams will ever need. When you outgrow it — and you'll know because you'll start hitting specific limitations, not because a blog told you to — migrate to Kong (if you need API management) or Envoy (if you need raw performance).

We've helped teams at all three stages of this journey. Whether you're setting up your first gateway or migrating from one to another, our infrastructure team can audit your current setup and recommend the right path forward. [Let's talk at techsaas.cloud/services](https://techsaas.cloud/services).

Need help with infrastructure?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.