← All articlesPlatform Engineering

Rate Limiting Patterns: Token Bucket, Sliding Window, and More

Master rate limiting algorithms for APIs. Token bucket, sliding window, fixed window, and leaky bucket explained with Redis implementations and benchmarks.

Y
Yash Pritwani
12 min read

Why Rate Limiting Matters

Without rate limiting, a single user (or bot) can exhaust your API server's resources, degrade service for everyone else, and run up your infrastructure costs. Rate limiting is not optional for production APIs — it is a fundamental requirement alongside authentication and logging.

<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 220" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="220" rx="12" fill="#1a1a2e"/><rect x="230" y="15" width="140" height="35" rx="8" fill="#6366f1" opacity="0.9"/><text x="300" y="38" text-anchor="middle" fill="#ffffff" font-size="12" font-family="system-ui" font-weight="bold">API Gateway</text><rect x="30" y="80" width="100" height="50" rx="8" fill="#3b82f6" opacity="0.8"/><text x="80" y="100" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Auth</text><text x="80" y="115" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Service</text><rect x="160" y="80" width="100" height="50" rx="8" fill="#a855f7" opacity="0.8"/><text x="210" y="100" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">User</text><text x="210" y="115" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Service</text><rect x="290" y="80" width="100" height="50" rx="8" fill="#2dd4bf" opacity="0.8"/><text x="340" y="100" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Order</text><text x="340" y="115" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Service</text><rect x="420" y="80" width="100" height="50" rx="8" fill="#f59e0b" opacity="0.8"/><text x="470" y="100" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Payment</text><text x="470" y="115" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Service</text><line x1="265" y1="50" x2="80" y2="78" stroke="#e2e8f0" stroke-width="1" opacity="0.5"/><line x1="285" y1="50" x2="210" y2="78" stroke="#e2e8f0" stroke-width="1" opacity="0.5"/><line x1="315" y1="50" x2="340" y2="78" stroke="#e2e8f0" stroke-width="1" opacity="0.5"/><line x1="335" y1="50" x2="470" y2="78" stroke="#e2e8f0" stroke-width="1" opacity="0.5"/><ellipse cx="80" cy="175" rx="35" ry="12" fill="none" stroke="#3b82f6" stroke-width="1.5"/><line x1="45" y1="175" x2="45" y2="190" stroke="#3b82f6" stroke-width="1.5"/><line x1="115" y1="175" x2="115" y2="190" stroke="#3b82f6" stroke-width="1.5"/><ellipse cx="80" cy="190" rx="35" ry="12" fill="none" stroke="#3b82f6" stroke-width="1.5"/><line x1="80" y1="130" x2="80" y2="163" stroke="#94a3b8" stroke-width="1" stroke-dasharray="3,3"/><ellipse cx="340" cy="175" rx="35" ry="12" fill="none" stroke="#2dd4bf" stroke-width="1.5"/><line x1="305" y1="175" x2="305" y2="190" stroke="#2dd4bf" stroke-width="1.5"/><line x1="375" y1="175" x2="375" y2="190" stroke="#2dd4bf" stroke-width="1.5"/><ellipse cx="340" cy="190" rx="35" ry="12" fill="none" stroke="#2dd4bf" stroke-width="1.5"/><line x1="340" y1="130" x2="340" y2="163" stroke="#94a3b8" stroke-width="1" stroke-dasharray="3,3"/><rect x="155" y="160" width="150" height="30" rx="6" fill="#a855f7" opacity="0.3"/><text x="230" y="180" text-anchor="middle" fill="#a855f7" font-size="10" font-family="system-ui">Message Bus / Events</text><line x1="210" y1="130" x2="210" y2="158" stroke="#94a3b8" stroke-width="1" stroke-dasharray="3,3"/><line x1="470" y1="130" x2="470" y2="175" stroke="#94a3b8" stroke-width="1" stroke-dasharray="3,3"/><line x1="305" y1="175" x2="470" y2="175" stroke="#94a3b8" stroke-width="0.5" stroke-dasharray="3,3" opacity="0.3"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Microservices architecture: independent services communicate through an API gateway and event bus.</p></div>

The Four Major Algorithms

1. Fixed Window Counter

The simplest approach: count requests in fixed time windows (e.g., 100 requests per minute). Reset the counter at the start of each window.

import redis
import time

r = redis.Redis()

def fixed_window(user_id: str, limit: int = 100, window: int = 60) -> bool:
    """Returns True if request is allowed."""
    current_window = int(time.time() // window)
    key = f"ratelimit:fw:{user_id}:{current_window}"

    current = r.incr(key)
    if current == 1:
        r.expire(key, window)

    return current <= limit

Pros: Simple, low memory, fast.

Cons: Burst at window boundaries. If the window resets at :00 and a user sends 100 requests at :59, they can send another 100 at :00 — 200 requests in 2 seconds.

2. Sliding Window Log

Track the exact timestamp of every request. Count requests in the trailing window.

def sliding_window_log(user_id: str, limit: int = 100, window: int = 60) -> bool:
    """Precise but memory-intensive."""
    key = f"ratelimit:swl:{user_id}"
    now = time.time()

    pipe = r.pipeline()
    # Remove entries outside the window
    pipe.zremrangebyscore(key, 0, now - window)
    # Add current request
    pipe.zadd(key, {str(now): now})
    # Count entries in window
    pipe.zcard(key)
    # Set key expiry
    pipe.expire(key, window)

    results = pipe.execute()
    count = results[2]

    return count <= limit

Pros: Perfectly accurate, no boundary bursts.

Cons: High memory (stores every timestamp). For 1000 users making 100 requests/minute, that is 100K sorted set entries.

<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 180" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="180" rx="12" fill="#1a1a2e"/><rect x="20" y="20" width="70" height="35" rx="6" fill="#3b82f6" opacity="0.8"/><text x="55" y="42" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Web</text><rect x="20" y="65" width="70" height="35" rx="6" fill="#3b82f6" opacity="0.8"/><text x="55" y="87" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Mobile</text><rect x="20" y="110" width="70" height="35" rx="6" fill="#3b82f6" opacity="0.8"/><text x="55" y="132" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">IoT</text><rect x="150" y="20" width="120" height="130" rx="10" fill="#6366f1" opacity="0.9"/><text x="210" y="50" text-anchor="middle" fill="#ffffff" font-size="12" font-family="system-ui" font-weight="bold">Gateway</text><line x1="165" y1="60" x2="255" y2="60" stroke="#ffffff" stroke-width="0.5" opacity="0.3"/><text x="210" y="80" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Rate Limit</text><text x="210" y="95" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Auth</text><text x="210" y="110" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Load Balance</text><text x="210" y="125" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Transform</text><text x="210" y="140" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Cache</text><rect x="340" y="15" width="95" height="35" rx="6" fill="#a855f7" opacity="0.8"/><text x="387" y="37" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Service A</text><rect x="340" y="60" width="95" height="35" rx="6" fill="#2dd4bf" opacity="0.8"/><text x="387" y="82" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Service B</text><rect x="340" y="105" width="95" height="35" rx="6" fill="#f59e0b" opacity="0.8"/><text x="387" y="127" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Service C</text><rect x="490" y="55" width="80" height="45" rx="6" fill="none" stroke="#e2e8f0" stroke-width="1"/><text x="530" y="82" text-anchor="middle" fill="#e2e8f0" font-size="10" font-family="system-ui">DB / Cache</text><defs><marker id="arrow7" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6" fill="#e2e8f0"/></marker></defs><line x1="92" y1="37" x2="148" y2="55" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/><line x1="92" y1="82" x2="148" y2="85" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/><line x1="92" y1="127" x2="148" y2="115" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/><line x1="272" y1="55" x2="338" y2="32" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/><line x1="272" y1="85" x2="338" y2="77" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/><line x1="272" y1="115" x2="338" y2="122" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/><line x1="437" y1="77" x2="488" y2="77" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">API gateway pattern: a single entry point handles auth, rate limiting, and routing to backend services.</p></div>

3. Sliding Window Counter

A hybrid: use two fixed windows and weight them by overlap with the sliding window. Nearly as accurate as the log, nearly as cheap as fixed window.

def sliding_window_counter(user_id: str, limit: int = 100, window: int = 60) -> bool:
    """Best balance of accuracy and efficiency."""
    now = time.time()
    current_window = int(now // window)
    previous_window = current_window - 1

    # How far into the current window are we? (0.0 to 1.0)
    window_progress = (now % window) / window

    current_key = f"ratelimit:swc:{user_id}:{current_window}"
    previous_key = f"ratelimit:swc:{user_id}:{previous_window}"

    pipe = r.pipeline()
    pipe.get(current_key)
    pipe.get(previous_key)
    results = pipe.execute()

    current_count = int(results[0] or 0)
    previous_count = int(results[1] or 0)

    # Weighted estimate: full current window + proportional previous window
    estimated = current_count + previous_count * (1 - window_progress)

    if estimated >= limit:
        return False

    # Increment current window
    pipe = r.pipeline()
    pipe.incr(current_key)
    pipe.expire(current_key, window * 2)
    pipe.execute()

    return True

Pros: Low memory (two counters per user), minimal boundary burst.

Cons: Slightly approximate (within 0.003% error in practice).

4. Token Bucket

A bucket holds tokens. Each request consumes a token. Tokens are added at a fixed rate. If the bucket is empty, the request is denied. The bucket has a maximum capacity (burst limit).

def token_bucket(user_id: str, rate: float = 10, capacity: int = 20) -> bool:
    """
    rate: tokens added per second
    capacity: max tokens (burst size)
    """
    key = f"ratelimit:tb:{user_id}"
    now = time.time()

    # Lua script for atomic token bucket
    lua_script = """
    local key = KEYS[1]
    local rate = tonumber(ARGV[1])
    local capacity = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    local data = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(data[1]) or capacity
    local last_refill = tonumber(data[2]) or now

    -- Add tokens based on time elapsed
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * rate)

    if tokens >= 1 then
        tokens = tokens - 1
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('EXPIRE', key, math.ceil(capacity / rate) * 2)
        return 1
    else
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        return 0
    end
    """

    result = r.eval(lua_script, 1, key, rate, capacity, now)
    return result == 1

Pros: Allows controlled bursts, smooth rate enforcement, predictable behavior.

Cons: More complex, requires atomic operations (Lua script).

Choosing the Right Algorithm

Algorithm
Use Case
Memory
Accuracy

|-----------|----------|--------|----------|

Fixed Window
Internal APIs, simple rate limits
Very Low
Low
Sliding Window Log
Strict compliance, billing
High
Perfect
Sliding Window Counter
Most API rate limiting
Low
High
Token Bucket
APIs with burst tolerance
Low
High

Our recommendation: Start with sliding window counter. It gives the best balance for nearly all cases. Switch to token bucket if you need explicit burst control.

Production Implementation

A complete rate limiting middleware for FastAPI:

from fastapi import Request, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware

class RateLimitMiddleware(BaseHTTPMiddleware):
    TIERS = {
        "free":    {"limit": 100,  "window": 3600},   # 100/hour
        "pro":     {"limit": 1000, "window": 3600},    # 1000/hour
        "enterprise": {"limit": 10000, "window": 3600}, # 10000/hour
    }

    async def dispatch(self, request: Request, call_next):
        # Extract user and tier
        api_key = request.headers.get("X-API-Key")
        if not api_key:
            raise HTTPException(401, "API key required")

        user = await get_user_by_key(api_key)
        tier = self.TIERS.get(user.plan, self.TIERS["free"])

        # Check rate limit
        allowed = sliding_window_counter(
            user_id=user.id,
            limit=tier["limit"],
            window=tier["window"]
        )

        if not allowed:
            remaining_time = get_reset_time(user.id)
            raise HTTPException(
                429,
                detail="Rate limit exceeded",
                headers={
                    "X-RateLimit-Limit": str(tier["limit"]),
                    "X-RateLimit-Remaining": "0",
                    "X-RateLimit-Reset": str(remaining_time),
                    "Retry-After": str(remaining_time)
                }
            )

        response = await call_next(request)

        # Add rate limit headers to all responses
        remaining = get_remaining(user.id, tier["limit"], tier["window"])
        response.headers["X-RateLimit-Limit"] = str(tier["limit"])
        response.headers["X-RateLimit-Remaining"] = str(remaining)

        return response

<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 170" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="170" rx="12" fill="#1a1a2e"/><circle cx="60" cy="85" r="25" fill="#f59e0b" opacity="0.85"/><text x="60" y="82" text-anchor="middle" fill="#1a1a2e" font-size="9" font-family="system-ui" font-weight="bold">Trigger</text><text x="60" y="94" text-anchor="middle" fill="#1a1a2e" font-size="8" font-family="system-ui">webhook</text><polygon points="175,55 210,85 175,115 140,85" fill="#6366f1" opacity="0.85"/><text x="175" y="88" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">If</text><rect x="250" y="35" width="100" height="40" rx="6" fill="#2dd4bf" opacity="0.85"/><text x="300" y="55" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Send Email</text><text x="300" y="67" text-anchor="middle" fill="#1a1a2e" font-size="8" font-family="system-ui">SMTP</text><rect x="250" y="95" width="100" height="40" rx="6" fill="#a855f7" opacity="0.85"/><text x="300" y="115" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Log Event</text><text x="300" y="127" text-anchor="middle" fill="#ffffff" font-size="8" font-family="system-ui">database</text><rect x="400" y="55" width="100" height="40" rx="6" fill="#3b82f6" opacity="0.85"/><text x="450" y="75" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Update CRM</text><text x="450" y="87" text-anchor="middle" fill="#ffffff" font-size="8" font-family="system-ui">API call</text><circle cx="545" cy="75" r="18" fill="none" stroke="#2dd4bf" stroke-width="2"/><text x="545" y="79" text-anchor="middle" fill="#2dd4bf" font-size="9" font-family="system-ui">Done</text><defs><marker id="arrow10" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6" fill="#e2e8f0"/></marker></defs><line x1="87" y1="85" x2="138" y2="85" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow10)"/><line x1="210" y1="72" x2="248" y2="55" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow10)"/><line x1="210" y1="98" x2="248" y2="115" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow10)"/><line x1="352" y1="55" x2="398" y2="68" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow10)"/><line x1="352" y1="115" x2="398" y2="82" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow10)"/><line x1="502" y1="75" x2="525" y2="75" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow10)"/><text x="225" y="45" text-anchor="middle" fill="#2dd4bf" font-size="8" font-family="system-ui">true</text><text x="225" y="120" text-anchor="middle" fill="#a855f7" font-size="8" font-family="system-ui">false</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Workflow automation: triggers, conditions, and actions chain together to eliminate manual processes.</p></div>

Rate Limit Headers (Standard)

Always include these headers in API responses:

X-RateLimit-Limit: 1000        # Max requests in window
X-RateLimit-Remaining: 847     # Requests left
X-RateLimit-Reset: 1699999999  # Unix timestamp when window resets
Retry-After: 30                # Seconds to wait (only on 429)

Rate limiting is a pillar of API design. Get it right and your API is resilient and fair. Get it wrong and you are one bot away from downtime. At TechSaaS, rate limiting is part of every API platform we build.

#rate-limiting#api#redis#algorithms#platform-engineering#security

Need help with platform engineering?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.