← All articlesPlatform Engineering

Rate Limiting Patterns: Token Bucket, Sliding Window, and More

Master rate limiting algorithms for APIs. Token bucket, sliding window, fixed window, and leaky bucket explained with Redis implementations and benchmarks.

Y
Yash Pritwani
12 min read

Why Rate Limiting Matters

Without rate limiting, a single user (or bot) can exhaust your API server's resources, degrade service for everyone else, and run up your infrastructure costs. Rate limiting is not optional for production APIs — it is a fundamental requirement alongside authentication and logging.

API GatewayAuthServiceUserServiceOrderServicePaymentServiceMessage Bus / Events

Microservices architecture: independent services communicate through an API gateway and event bus.

The Four Major Algorithms

1. Fixed Window Counter

The simplest approach: count requests in fixed time windows (e.g., 100 requests per minute). Reset the counter at the start of each window.

import redis
import time

r = redis.Redis()

def fixed_window(user_id: str, limit: int = 100, window: int = 60) -> bool:
    """Returns True if request is allowed."""
    current_window = int(time.time() // window)
    key = f"ratelimit:fw:{user_id}:{current_window}"

    current = r.incr(key)
    if current == 1:
        r.expire(key, window)

    return current <= limit

Pros: Simple, low memory, fast.

Cons: Burst at window boundaries. If the window resets at :00 and a user sends 100 requests at :59, they can send another 100 at :00 — 200 requests in 2 seconds.

Get more insights on Platform Engineering

Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.

2. Sliding Window Log

Track the exact timestamp of every request. Count requests in the trailing window.

def sliding_window_log(user_id: str, limit: int = 100, window: int = 60) -> bool:
    """Precise but memory-intensive."""
    key = f"ratelimit:swl:{user_id}"
    now = time.time()

    pipe = r.pipeline()
    # Remove entries outside the window
    pipe.zremrangebyscore(key, 0, now - window)
    # Add current request
    pipe.zadd(key, {str(now): now})
    # Count entries in window
    pipe.zcard(key)
    # Set key expiry
    pipe.expire(key, window)

    results = pipe.execute()
    count = results[2]

    return count <= limit

Pros: Perfectly accurate, no boundary bursts.

Cons: High memory (stores every timestamp). For 1000 users making 100 requests/minute, that is 100K sorted set entries.

WebMobileIoTGatewayRate LimitAuthLoad BalanceTransformCacheService AService BService CDB / Cache

API gateway pattern: a single entry point handles auth, rate limiting, and routing to backend services.

3. Sliding Window Counter

A hybrid: use two fixed windows and weight them by overlap with the sliding window. Nearly as accurate as the log, nearly as cheap as fixed window.

def sliding_window_counter(user_id: str, limit: int = 100, window: int = 60) -> bool:
    """Best balance of accuracy and efficiency."""
    now = time.time()
    current_window = int(now // window)
    previous_window = current_window - 1

    # How far into the current window are we? (0.0 to 1.0)
    window_progress = (now % window) / window

    current_key = f"ratelimit:swc:{user_id}:{current_window}"
    previous_key = f"ratelimit:swc:{user_id}:{previous_window}"

    pipe = r.pipeline()
    pipe.get(current_key)
    pipe.get(previous_key)
    results = pipe.execute()

    current_count = int(results[0] or 0)
    previous_count = int(results[1] or 0)

    # Weighted estimate: full current window + proportional previous window
    estimated = current_count + previous_count * (1 - window_progress)

    if estimated >= limit:
        return False

    # Increment current window
    pipe = r.pipeline()
    pipe.incr(current_key)
    pipe.expire(current_key, window * 2)
    pipe.execute()

    return True

Pros: Low memory (two counters per user), minimal boundary burst.

Cons: Slightly approximate (within 0.003% error in practice).

4. Token Bucket

A bucket holds tokens. Each request consumes a token. Tokens are added at a fixed rate. If the bucket is empty, the request is denied. The bucket has a maximum capacity (burst limit).

def token_bucket(user_id: str, rate: float = 10, capacity: int = 20) -> bool:
    """
    rate: tokens added per second
    capacity: max tokens (burst size)
    """
    key = f"ratelimit:tb:{user_id}"
    now = time.time()

    # Lua script for atomic token bucket
    lua_script = """
    local key = KEYS[1]
    local rate = tonumber(ARGV[1])
    local capacity = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    local data = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(data[1]) or capacity
    local last_refill = tonumber(data[2]) or now

    -- Add tokens based on time elapsed
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * rate)

    if tokens >= 1 then
        tokens = tokens - 1
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('EXPIRE', key, math.ceil(capacity / rate) * 2)
        return 1
    else
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        return 0
    end
    """

    result = r.eval(lua_script, 1, key, rate, capacity, now)
    return result == 1

Pros: Allows controlled bursts, smooth rate enforcement, predictable behavior.

Cons: More complex, requires atomic operations (Lua script).

Choosing the Right Algorithm

Free Resource

Free Cloud Architecture Checklist

A 47-point checklist covering security, scalability, cost optimization, and disaster recovery for production cloud environments.

Download the Checklist
Algorithm Use Case Memory Accuracy
Fixed Window Internal APIs, simple rate limits Very Low Low
Sliding Window Log Strict compliance, billing High Perfect
Sliding Window Counter Most API rate limiting Low High
Token Bucket APIs with burst tolerance Low High

Our recommendation: Start with sliding window counter. It gives the best balance for nearly all cases. Switch to token bucket if you need explicit burst control.

Production Implementation

A complete rate limiting middleware for FastAPI:

from fastapi import Request, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware

class RateLimitMiddleware(BaseHTTPMiddleware):
    TIERS = {
        "free":    {"limit": 100,  "window": 3600},   # 100/hour
        "pro":     {"limit": 1000, "window": 3600},    # 1000/hour
        "enterprise": {"limit": 10000, "window": 3600}, # 10000/hour
    }

    async def dispatch(self, request: Request, call_next):
        # Extract user and tier
        api_key = request.headers.get("X-API-Key")
        if not api_key:
            raise HTTPException(401, "API key required")

        user = await get_user_by_key(api_key)
        tier = self.TIERS.get(user.plan, self.TIERS["free"])

        # Check rate limit
        allowed = sliding_window_counter(
            user_id=user.id,
            limit=tier["limit"],
            window=tier["window"]
        )

        if not allowed:
            remaining_time = get_reset_time(user.id)
            raise HTTPException(
                429,
                detail="Rate limit exceeded",
                headers={
                    "X-RateLimit-Limit": str(tier["limit"]),
                    "X-RateLimit-Remaining": "0",
                    "X-RateLimit-Reset": str(remaining_time),
                    "Retry-After": str(remaining_time)
                }
            )

        response = await call_next(request)

        # Add rate limit headers to all responses
        remaining = get_remaining(user.id, tier["limit"], tier["window"])
        response.headers["X-RateLimit-Limit"] = str(tier["limit"])
        response.headers["X-RateLimit-Remaining"] = str(remaining)

        return response
TriggerwebhookIfSend EmailSMTPLog EventdatabaseUpdate CRMAPI callDonetruefalse

Workflow automation: triggers, conditions, and actions chain together to eliminate manual processes.

Rate Limit Headers (Standard)

Always include these headers in API responses:

X-RateLimit-Limit: 1000        # Max requests in window
X-RateLimit-Remaining: 847     # Requests left
X-RateLimit-Reset: 1699999999  # Unix timestamp when window resets
Retry-After: 30                # Seconds to wait (only on 429)

Rate limiting is a pillar of API design. Get it right and your API is resilient and fair. Get it wrong and you are one bot away from downtime. At TechSaaS, rate limiting is part of every API platform we build.

#rate-limiting#api#redis#algorithms#platform-engineering#security

Related Service

Cloud Solutions

Let our experts help you build the right technology strategy for your business.

Need help with platform engineering?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.

We Will Build You a Demo Site — For Free

Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.

47+ companies trusted us
99.99% uptime
< 48hr response

No spam. No contracts. Just a free demo.