← All articlesPlatform Engineering

Rate Limiting Patterns: Token Bucket, Sliding Window, and More

Master rate limiting algorithms for APIs. Token bucket, sliding window, fixed window, and leaky bucket explained with Redis implementations and benchmarks.

Yash Pritwani

18 November 202512 min read

Why Rate Limiting Matters

Without rate limiting, a single user (or bot) can exhaust your API server's resources, degrade service for everyone else, and run up your infrastructure costs. Rate limiting is not optional for production APIs — it is a fundamental requirement alongside authentication and logging.

Microservices architecture: independent services communicate through an API gateway and event bus.

The Four Major Algorithms

1. Fixed Window Counter

The simplest approach: count requests in fixed time windows (e.g., 100 requests per minute). Reset the counter at the start of each window.

import redis
import time

r = redis.Redis()

def fixed_window(user_id: str, limit: int = 100, window: int = 60) -> bool:
    """Returns True if request is allowed."""
    current_window = int(time.time() // window)
    key = f"ratelimit:fw:{user_id}:{current_window}"

    current = r.incr(key)
    if current == 1:
        r.expire(key, window)

    return current <= limit

Pros: Simple, low memory, fast.

Cons: Burst at window boundaries. If the window resets at :00 and a user sends 100 requests at :59, they can send another 100 at :00 — 200 requests in 2 seconds.

Get more insights on Platform Engineering

Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.

2. Sliding Window Log

Track the exact timestamp of every request. Count requests in the trailing window.

def sliding_window_log(user_id: str, limit: int = 100, window: int = 60) -> bool:
    """Precise but memory-intensive."""
    key = f"ratelimit:swl:{user_id}"
    now = time.time()

    pipe = r.pipeline()
    # Remove entries outside the window
    pipe.zremrangebyscore(key, 0, now - window)
    # Add current request
    pipe.zadd(key, {str(now): now})
    # Count entries in window
    pipe.zcard(key)
    # Set key expiry
    pipe.expire(key, window)

    results = pipe.execute()
    count = results[2]

    return count <= limit

Pros: Perfectly accurate, no boundary bursts.

Cons: High memory (stores every timestamp). For 1000 users making 100 requests/minute, that is 100K sorted set entries.

API gateway pattern: a single entry point handles auth, rate limiting, and routing to backend services.

3. Sliding Window Counter

A hybrid: use two fixed windows and weight them by overlap with the sliding window. Nearly as accurate as the log, nearly as cheap as fixed window.

→

AI Agents Are Becoming First-Class Citizens in Platform Engineering7 min read read

→

Platform Engineering in 2026: Building Internal Developer Platforms That Actually Get Used11 min read

→

Platform Engineering for Mid-Size Teams: You Don't Need 500 Engineers to Build an IDP12 min read

def sliding_window_counter(user_id: str, limit: int = 100, window: int = 60) -> bool:
    """Best balance of accuracy and efficiency."""
    now = time.time()
    current_window = int(now // window)
    previous_window = current_window - 1

    # How far into the current window are we? (0.0 to 1.0)
    window_progress = (now % window) / window

    current_key = f"ratelimit:swc:{user_id}:{current_window}"
    previous_key = f"ratelimit:swc:{user_id}:{previous_window}"

    pipe = r.pipeline()
    pipe.get(current_key)
    pipe.get(previous_key)
    results = pipe.execute()

    current_count = int(results[0] or 0)
    previous_count = int(results[1] or 0)

    # Weighted estimate: full current window + proportional previous window
    estimated = current_count + previous_count * (1 - window_progress)

    if estimated >= limit:
        return False

    # Increment current window
    pipe = r.pipeline()
    pipe.incr(current_key)
    pipe.expire(current_key, window * 2)
    pipe.execute()

    return True

Pros: Low memory (two counters per user), minimal boundary burst.

Cons: Slightly approximate (within 0.003% error in practice).

4. Token Bucket

A bucket holds tokens. Each request consumes a token. Tokens are added at a fixed rate. If the bucket is empty, the request is denied. The bucket has a maximum capacity (burst limit).

def token_bucket(user_id: str, rate: float = 10, capacity: int = 20) -> bool:
    """
    rate: tokens added per second
    capacity: max tokens (burst size)
    """
    key = f"ratelimit:tb:{user_id}"
    now = time.time()

    # Lua script for atomic token bucket
    lua_script = """
    local key = KEYS[1]
    local rate = tonumber(ARGV[1])
    local capacity = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    local data = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(data[1]) or capacity
    local last_refill = tonumber(data[2]) or now

    -- Add tokens based on time elapsed
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * rate)

    if tokens >= 1 then
        tokens = tokens - 1
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('EXPIRE', key, math.ceil(capacity / rate) * 2)
        return 1
    else
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        return 0
    end
    """

    result = r.eval(lua_script, 1, key, rate, capacity, now)
    return result == 1

Pros: Allows controlled bursts, smooth rate enforcement, predictable behavior.

Cons: More complex, requires atomic operations (Lua script).

Choosing the Right Algorithm

Free Resource

Free Cloud Architecture Checklist

A 47-point checklist covering security, scalability, cost optimization, and disaster recovery for production cloud environments.

Download the Checklist

Algorithm	Use Case	Memory	Accuracy
Fixed Window	Internal APIs, simple rate limits	Very Low	Low
Sliding Window Log	Strict compliance, billing	High	Perfect
Sliding Window Counter	Most API rate limiting	Low	High
Token Bucket	APIs with burst tolerance	Low	High

Our recommendation: Start with sliding window counter. It gives the best balance for nearly all cases. Switch to token bucket if you need explicit burst control.

Production Implementation

A complete rate limiting middleware for FastAPI:

from fastapi import Request, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware

class RateLimitMiddleware(BaseHTTPMiddleware):
    TIERS = {
        "free":    {"limit": 100,  "window": 3600},   # 100/hour
        "pro":     {"limit": 1000, "window": 3600},    # 1000/hour
        "enterprise": {"limit": 10000, "window": 3600}, # 10000/hour
    }

    async def dispatch(self, request: Request, call_next):
        # Extract user and tier
        api_key = request.headers.get("X-API-Key")
        if not api_key:
            raise HTTPException(401, "API key required")

        user = await get_user_by_key(api_key)
        tier = self.TIERS.get(user.plan, self.TIERS["free"])

        # Check rate limit
        allowed = sliding_window_counter(
            user_id=user.id,
            limit=tier["limit"],
            window=tier["window"]
        )

        if not allowed:
            remaining_time = get_reset_time(user.id)
            raise HTTPException(
                429,
                detail="Rate limit exceeded",
                headers={
                    "X-RateLimit-Limit": str(tier["limit"]),
                    "X-RateLimit-Remaining": "0",
                    "X-RateLimit-Reset": str(remaining_time),
                    "Retry-After": str(remaining_time)
                }
            )

        response = await call_next(request)

        # Add rate limit headers to all responses
        remaining = get_remaining(user.id, tier["limit"], tier["window"])
        response.headers["X-RateLimit-Limit"] = str(tier["limit"])
        response.headers["X-RateLimit-Remaining"] = str(remaining)

        return response

Workflow automation: triggers, conditions, and actions chain together to eliminate manual processes.

Rate Limit Headers (Standard)

Always include these headers in API responses:

X-RateLimit-Limit: 1000        # Max requests in window
X-RateLimit-Remaining: 847     # Requests left
X-RateLimit-Reset: 1699999999  # Unix timestamp when window resets
Retry-After: 30                # Seconds to wait (only on 429)

Rate limiting is a pillar of API design. Get it right and your API is resilient and fair. Get it wrong and you are one bot away from downtime. At TechSaaS, rate limiting is part of every API platform we build.

#rate-limiting#api#redis#algorithms#platform-engineering#security

Related Service

Cloud Solutions

Let our experts help you build the right technology strategy for your business.

Get a Consultation Chat on WhatsApp

Need help with platform engineering?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.

Get a Free Consultation WhatsApp Us

We Will Build You a Demo Site — For Free

Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.

47+ companies trusted us

99.99% uptime

< 48hr response

No spam. No contracts. Just a free demo.