Rate Limiting Patterns: Token Bucket, Sliding Window, and More
Master rate limiting algorithms for APIs. Token bucket, sliding window, fixed window, and leaky bucket explained with Redis implementations and benchmarks.
Why Rate Limiting Matters
Without rate limiting, a single user (or bot) can exhaust your API server's resources, degrade service for everyone else, and run up your infrastructure costs. Rate limiting is not optional for production APIs — it is a fundamental requirement alongside authentication and logging.
<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 220" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="220" rx="12" fill="#1a1a2e"/><rect x="230" y="15" width="140" height="35" rx="8" fill="#6366f1" opacity="0.9"/><text x="300" y="38" text-anchor="middle" fill="#ffffff" font-size="12" font-family="system-ui" font-weight="bold">API Gateway</text><rect x="30" y="80" width="100" height="50" rx="8" fill="#3b82f6" opacity="0.8"/><text x="80" y="100" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Auth</text><text x="80" y="115" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Service</text><rect x="160" y="80" width="100" height="50" rx="8" fill="#a855f7" opacity="0.8"/><text x="210" y="100" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">User</text><text x="210" y="115" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Service</text><rect x="290" y="80" width="100" height="50" rx="8" fill="#2dd4bf" opacity="0.8"/><text x="340" y="100" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Order</text><text x="340" y="115" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Service</text><rect x="420" y="80" width="100" height="50" rx="8" fill="#f59e0b" opacity="0.8"/><text x="470" y="100" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Payment</text><text x="470" y="115" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Service</text><line x1="265" y1="50" x2="80" y2="78" stroke="#e2e8f0" stroke-width="1" opacity="0.5"/><line x1="285" y1="50" x2="210" y2="78" stroke="#e2e8f0" stroke-width="1" opacity="0.5"/><line x1="315" y1="50" x2="340" y2="78" stroke="#e2e8f0" stroke-width="1" opacity="0.5"/><line x1="335" y1="50" x2="470" y2="78" stroke="#e2e8f0" stroke-width="1" opacity="0.5"/><ellipse cx="80" cy="175" rx="35" ry="12" fill="none" stroke="#3b82f6" stroke-width="1.5"/><line x1="45" y1="175" x2="45" y2="190" stroke="#3b82f6" stroke-width="1.5"/><line x1="115" y1="175" x2="115" y2="190" stroke="#3b82f6" stroke-width="1.5"/><ellipse cx="80" cy="190" rx="35" ry="12" fill="none" stroke="#3b82f6" stroke-width="1.5"/><line x1="80" y1="130" x2="80" y2="163" stroke="#94a3b8" stroke-width="1" stroke-dasharray="3,3"/><ellipse cx="340" cy="175" rx="35" ry="12" fill="none" stroke="#2dd4bf" stroke-width="1.5"/><line x1="305" y1="175" x2="305" y2="190" stroke="#2dd4bf" stroke-width="1.5"/><line x1="375" y1="175" x2="375" y2="190" stroke="#2dd4bf" stroke-width="1.5"/><ellipse cx="340" cy="190" rx="35" ry="12" fill="none" stroke="#2dd4bf" stroke-width="1.5"/><line x1="340" y1="130" x2="340" y2="163" stroke="#94a3b8" stroke-width="1" stroke-dasharray="3,3"/><rect x="155" y="160" width="150" height="30" rx="6" fill="#a855f7" opacity="0.3"/><text x="230" y="180" text-anchor="middle" fill="#a855f7" font-size="10" font-family="system-ui">Message Bus / Events</text><line x1="210" y1="130" x2="210" y2="158" stroke="#94a3b8" stroke-width="1" stroke-dasharray="3,3"/><line x1="470" y1="130" x2="470" y2="175" stroke="#94a3b8" stroke-width="1" stroke-dasharray="3,3"/><line x1="305" y1="175" x2="470" y2="175" stroke="#94a3b8" stroke-width="0.5" stroke-dasharray="3,3" opacity="0.3"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Microservices architecture: independent services communicate through an API gateway and event bus.</p></div>
The Four Major Algorithms
1. Fixed Window Counter
The simplest approach: count requests in fixed time windows (e.g., 100 requests per minute). Reset the counter at the start of each window.
import redis
import time
r = redis.Redis()
def fixed_window(user_id: str, limit: int = 100, window: int = 60) -> bool:
"""Returns True if request is allowed."""
current_window = int(time.time() // window)
key = f"ratelimit:fw:{user_id}:{current_window}"
current = r.incr(key)
if current == 1:
r.expire(key, window)
return current <= limitPros: Simple, low memory, fast.
Cons: Burst at window boundaries. If the window resets at :00 and a user sends 100 requests at :59, they can send another 100 at :00 — 200 requests in 2 seconds.
2. Sliding Window Log
Track the exact timestamp of every request. Count requests in the trailing window.
def sliding_window_log(user_id: str, limit: int = 100, window: int = 60) -> bool:
"""Precise but memory-intensive."""
key = f"ratelimit:swl:{user_id}"
now = time.time()
pipe = r.pipeline()
# Remove entries outside the window
pipe.zremrangebyscore(key, 0, now - window)
# Add current request
pipe.zadd(key, {str(now): now})
# Count entries in window
pipe.zcard(key)
# Set key expiry
pipe.expire(key, window)
results = pipe.execute()
count = results[2]
return count <= limitPros: Perfectly accurate, no boundary bursts.
Cons: High memory (stores every timestamp). For 1000 users making 100 requests/minute, that is 100K sorted set entries.
<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 180" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="180" rx="12" fill="#1a1a2e"/><rect x="20" y="20" width="70" height="35" rx="6" fill="#3b82f6" opacity="0.8"/><text x="55" y="42" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Web</text><rect x="20" y="65" width="70" height="35" rx="6" fill="#3b82f6" opacity="0.8"/><text x="55" y="87" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Mobile</text><rect x="20" y="110" width="70" height="35" rx="6" fill="#3b82f6" opacity="0.8"/><text x="55" y="132" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">IoT</text><rect x="150" y="20" width="120" height="130" rx="10" fill="#6366f1" opacity="0.9"/><text x="210" y="50" text-anchor="middle" fill="#ffffff" font-size="12" font-family="system-ui" font-weight="bold">Gateway</text><line x1="165" y1="60" x2="255" y2="60" stroke="#ffffff" stroke-width="0.5" opacity="0.3"/><text x="210" y="80" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Rate Limit</text><text x="210" y="95" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Auth</text><text x="210" y="110" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Load Balance</text><text x="210" y="125" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Transform</text><text x="210" y="140" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">Cache</text><rect x="340" y="15" width="95" height="35" rx="6" fill="#a855f7" opacity="0.8"/><text x="387" y="37" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Service A</text><rect x="340" y="60" width="95" height="35" rx="6" fill="#2dd4bf" opacity="0.8"/><text x="387" y="82" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Service B</text><rect x="340" y="105" width="95" height="35" rx="6" fill="#f59e0b" opacity="0.8"/><text x="387" y="127" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Service C</text><rect x="490" y="55" width="80" height="45" rx="6" fill="none" stroke="#e2e8f0" stroke-width="1"/><text x="530" y="82" text-anchor="middle" fill="#e2e8f0" font-size="10" font-family="system-ui">DB / Cache</text><defs><marker id="arrow7" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6" fill="#e2e8f0"/></marker></defs><line x1="92" y1="37" x2="148" y2="55" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/><line x1="92" y1="82" x2="148" y2="85" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/><line x1="92" y1="127" x2="148" y2="115" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/><line x1="272" y1="55" x2="338" y2="32" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/><line x1="272" y1="85" x2="338" y2="77" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/><line x1="272" y1="115" x2="338" y2="122" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/><line x1="437" y1="77" x2="488" y2="77" stroke="#e2e8f0" stroke-width="1" marker-end="url(#arrow7)"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">API gateway pattern: a single entry point handles auth, rate limiting, and routing to backend services.</p></div>
3. Sliding Window Counter
A hybrid: use two fixed windows and weight them by overlap with the sliding window. Nearly as accurate as the log, nearly as cheap as fixed window.
def sliding_window_counter(user_id: str, limit: int = 100, window: int = 60) -> bool:
"""Best balance of accuracy and efficiency."""
now = time.time()
current_window = int(now // window)
previous_window = current_window - 1
# How far into the current window are we? (0.0 to 1.0)
window_progress = (now % window) / window
current_key = f"ratelimit:swc:{user_id}:{current_window}"
previous_key = f"ratelimit:swc:{user_id}:{previous_window}"
pipe = r.pipeline()
pipe.get(current_key)
pipe.get(previous_key)
results = pipe.execute()
current_count = int(results[0] or 0)
previous_count = int(results[1] or 0)
# Weighted estimate: full current window + proportional previous window
estimated = current_count + previous_count * (1 - window_progress)
if estimated >= limit:
return False
# Increment current window
pipe = r.pipeline()
pipe.incr(current_key)
pipe.expire(current_key, window * 2)
pipe.execute()
return TruePros: Low memory (two counters per user), minimal boundary burst.
Cons: Slightly approximate (within 0.003% error in practice).
4. Token Bucket
A bucket holds tokens. Each request consumes a token. Tokens are added at a fixed rate. If the bucket is empty, the request is denied. The bucket has a maximum capacity (burst limit).
def token_bucket(user_id: str, rate: float = 10, capacity: int = 20) -> bool:
"""
rate: tokens added per second
capacity: max tokens (burst size)
"""
key = f"ratelimit:tb:{user_id}"
now = time.time()
# Lua script for atomic token bucket
lua_script = """
local key = KEYS[1]
local rate = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now
-- Add tokens based on time elapsed
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * rate)
if tokens >= 1 then
tokens = tokens - 1
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / rate) * 2)
return 1
else
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
return 0
end
"""
result = r.eval(lua_script, 1, key, rate, capacity, now)
return result == 1Pros: Allows controlled bursts, smooth rate enforcement, predictable behavior.
Cons: More complex, requires atomic operations (Lua script).
Choosing the Right Algorithm
|-----------|----------|--------|----------|
Our recommendation: Start with sliding window counter. It gives the best balance for nearly all cases. Switch to token bucket if you need explicit burst control.
Production Implementation
A complete rate limiting middleware for FastAPI:
from fastapi import Request, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware
class RateLimitMiddleware(BaseHTTPMiddleware):
TIERS = {
"free": {"limit": 100, "window": 3600}, # 100/hour
"pro": {"limit": 1000, "window": 3600}, # 1000/hour
"enterprise": {"limit": 10000, "window": 3600}, # 10000/hour
}
async def dispatch(self, request: Request, call_next):
# Extract user and tier
api_key = request.headers.get("X-API-Key")
if not api_key:
raise HTTPException(401, "API key required")
user = await get_user_by_key(api_key)
tier = self.TIERS.get(user.plan, self.TIERS["free"])
# Check rate limit
allowed = sliding_window_counter(
user_id=user.id,
limit=tier["limit"],
window=tier["window"]
)
if not allowed:
remaining_time = get_reset_time(user.id)
raise HTTPException(
429,
detail="Rate limit exceeded",
headers={
"X-RateLimit-Limit": str(tier["limit"]),
"X-RateLimit-Remaining": "0",
"X-RateLimit-Reset": str(remaining_time),
"Retry-After": str(remaining_time)
}
)
response = await call_next(request)
# Add rate limit headers to all responses
remaining = get_remaining(user.id, tier["limit"], tier["window"])
response.headers["X-RateLimit-Limit"] = str(tier["limit"])
response.headers["X-RateLimit-Remaining"] = str(remaining)
return response<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 170" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="170" rx="12" fill="#1a1a2e"/><circle cx="60" cy="85" r="25" fill="#f59e0b" opacity="0.85"/><text x="60" y="82" text-anchor="middle" fill="#1a1a2e" font-size="9" font-family="system-ui" font-weight="bold">Trigger</text><text x="60" y="94" text-anchor="middle" fill="#1a1a2e" font-size="8" font-family="system-ui">webhook</text><polygon points="175,55 210,85 175,115 140,85" fill="#6366f1" opacity="0.85"/><text x="175" y="88" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui">If</text><rect x="250" y="35" width="100" height="40" rx="6" fill="#2dd4bf" opacity="0.85"/><text x="300" y="55" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Send Email</text><text x="300" y="67" text-anchor="middle" fill="#1a1a2e" font-size="8" font-family="system-ui">SMTP</text><rect x="250" y="95" width="100" height="40" rx="6" fill="#a855f7" opacity="0.85"/><text x="300" y="115" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Log Event</text><text x="300" y="127" text-anchor="middle" fill="#ffffff" font-size="8" font-family="system-ui">database</text><rect x="400" y="55" width="100" height="40" rx="6" fill="#3b82f6" opacity="0.85"/><text x="450" y="75" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Update CRM</text><text x="450" y="87" text-anchor="middle" fill="#ffffff" font-size="8" font-family="system-ui">API call</text><circle cx="545" cy="75" r="18" fill="none" stroke="#2dd4bf" stroke-width="2"/><text x="545" y="79" text-anchor="middle" fill="#2dd4bf" font-size="9" font-family="system-ui">Done</text><defs><marker id="arrow10" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6" fill="#e2e8f0"/></marker></defs><line x1="87" y1="85" x2="138" y2="85" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow10)"/><line x1="210" y1="72" x2="248" y2="55" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow10)"/><line x1="210" y1="98" x2="248" y2="115" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow10)"/><line x1="352" y1="55" x2="398" y2="68" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow10)"/><line x1="352" y1="115" x2="398" y2="82" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow10)"/><line x1="502" y1="75" x2="525" y2="75" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow10)"/><text x="225" y="45" text-anchor="middle" fill="#2dd4bf" font-size="8" font-family="system-ui">true</text><text x="225" y="120" text-anchor="middle" fill="#a855f7" font-size="8" font-family="system-ui">false</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Workflow automation: triggers, conditions, and actions chain together to eliminate manual processes.</p></div>
Rate Limit Headers (Standard)
Always include these headers in API responses:
X-RateLimit-Limit: 1000 # Max requests in window
X-RateLimit-Remaining: 847 # Requests left
X-RateLimit-Reset: 1699999999 # Unix timestamp when window resets
Retry-After: 30 # Seconds to wait (only on 429)Rate limiting is a pillar of API design. Get it right and your API is resilient and fair. Get it wrong and you are one bot away from downtime. At TechSaaS, rate limiting is part of every API platform we build.
Need help with platform engineering?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.