Rate Limiting Patterns: Token Bucket, Sliding Window, and More
Master rate limiting algorithms for APIs. Token bucket, sliding window, fixed window, and leaky bucket explained with Redis implementations and benchmarks.
Why Rate Limiting Matters
Without rate limiting, a single user (or bot) can exhaust your API server's resources, degrade service for everyone else, and run up your infrastructure costs. Rate limiting is not optional for production APIs — it is a fundamental requirement alongside authentication and logging.
Microservices architecture: independent services communicate through an API gateway and event bus.
The Four Major Algorithms
1. Fixed Window Counter
The simplest approach: count requests in fixed time windows (e.g., 100 requests per minute). Reset the counter at the start of each window.
import redis
import time
r = redis.Redis()
def fixed_window(user_id: str, limit: int = 100, window: int = 60) -> bool:
"""Returns True if request is allowed."""
current_window = int(time.time() // window)
key = f"ratelimit:fw:{user_id}:{current_window}"
current = r.incr(key)
if current == 1:
r.expire(key, window)
return current <= limit
Pros: Simple, low memory, fast.
Cons: Burst at window boundaries. If the window resets at :00 and a user sends 100 requests at :59, they can send another 100 at :00 — 200 requests in 2 seconds.
Get more insights on Platform Engineering
Join 2,000+ engineers who get our weekly deep-dives. No spam, unsubscribe anytime.
2. Sliding Window Log
Track the exact timestamp of every request. Count requests in the trailing window.
def sliding_window_log(user_id: str, limit: int = 100, window: int = 60) -> bool:
"""Precise but memory-intensive."""
key = f"ratelimit:swl:{user_id}"
now = time.time()
pipe = r.pipeline()
# Remove entries outside the window
pipe.zremrangebyscore(key, 0, now - window)
# Add current request
pipe.zadd(key, {str(now): now})
# Count entries in window
pipe.zcard(key)
# Set key expiry
pipe.expire(key, window)
results = pipe.execute()
count = results[2]
return count <= limit
Pros: Perfectly accurate, no boundary bursts.
Cons: High memory (stores every timestamp). For 1000 users making 100 requests/minute, that is 100K sorted set entries.
API gateway pattern: a single entry point handles auth, rate limiting, and routing to backend services.
3. Sliding Window Counter
A hybrid: use two fixed windows and weight them by overlap with the sliding window. Nearly as accurate as the log, nearly as cheap as fixed window.
You might also like
def sliding_window_counter(user_id: str, limit: int = 100, window: int = 60) -> bool:
"""Best balance of accuracy and efficiency."""
now = time.time()
current_window = int(now // window)
previous_window = current_window - 1
# How far into the current window are we? (0.0 to 1.0)
window_progress = (now % window) / window
current_key = f"ratelimit:swc:{user_id}:{current_window}"
previous_key = f"ratelimit:swc:{user_id}:{previous_window}"
pipe = r.pipeline()
pipe.get(current_key)
pipe.get(previous_key)
results = pipe.execute()
current_count = int(results[0] or 0)
previous_count = int(results[1] or 0)
# Weighted estimate: full current window + proportional previous window
estimated = current_count + previous_count * (1 - window_progress)
if estimated >= limit:
return False
# Increment current window
pipe = r.pipeline()
pipe.incr(current_key)
pipe.expire(current_key, window * 2)
pipe.execute()
return True
Pros: Low memory (two counters per user), minimal boundary burst.
Cons: Slightly approximate (within 0.003% error in practice).
4. Token Bucket
A bucket holds tokens. Each request consumes a token. Tokens are added at a fixed rate. If the bucket is empty, the request is denied. The bucket has a maximum capacity (burst limit).
def token_bucket(user_id: str, rate: float = 10, capacity: int = 20) -> bool:
"""
rate: tokens added per second
capacity: max tokens (burst size)
"""
key = f"ratelimit:tb:{user_id}"
now = time.time()
# Lua script for atomic token bucket
lua_script = """
local key = KEYS[1]
local rate = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now
-- Add tokens based on time elapsed
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * rate)
if tokens >= 1 then
tokens = tokens - 1
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / rate) * 2)
return 1
else
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
return 0
end
"""
result = r.eval(lua_script, 1, key, rate, capacity, now)
return result == 1
Pros: Allows controlled bursts, smooth rate enforcement, predictable behavior.
Cons: More complex, requires atomic operations (Lua script).
Choosing the Right Algorithm
Free Resource
Free Cloud Architecture Checklist
A 47-point checklist covering security, scalability, cost optimization, and disaster recovery for production cloud environments.
| Algorithm | Use Case | Memory | Accuracy |
|---|---|---|---|
| Fixed Window | Internal APIs, simple rate limits | Very Low | Low |
| Sliding Window Log | Strict compliance, billing | High | Perfect |
| Sliding Window Counter | Most API rate limiting | Low | High |
| Token Bucket | APIs with burst tolerance | Low | High |
Our recommendation: Start with sliding window counter. It gives the best balance for nearly all cases. Switch to token bucket if you need explicit burst control.
Production Implementation
A complete rate limiting middleware for FastAPI:
from fastapi import Request, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware
class RateLimitMiddleware(BaseHTTPMiddleware):
TIERS = {
"free": {"limit": 100, "window": 3600}, # 100/hour
"pro": {"limit": 1000, "window": 3600}, # 1000/hour
"enterprise": {"limit": 10000, "window": 3600}, # 10000/hour
}
async def dispatch(self, request: Request, call_next):
# Extract user and tier
api_key = request.headers.get("X-API-Key")
if not api_key:
raise HTTPException(401, "API key required")
user = await get_user_by_key(api_key)
tier = self.TIERS.get(user.plan, self.TIERS["free"])
# Check rate limit
allowed = sliding_window_counter(
user_id=user.id,
limit=tier["limit"],
window=tier["window"]
)
if not allowed:
remaining_time = get_reset_time(user.id)
raise HTTPException(
429,
detail="Rate limit exceeded",
headers={
"X-RateLimit-Limit": str(tier["limit"]),
"X-RateLimit-Remaining": "0",
"X-RateLimit-Reset": str(remaining_time),
"Retry-After": str(remaining_time)
}
)
response = await call_next(request)
# Add rate limit headers to all responses
remaining = get_remaining(user.id, tier["limit"], tier["window"])
response.headers["X-RateLimit-Limit"] = str(tier["limit"])
response.headers["X-RateLimit-Remaining"] = str(remaining)
return response
Workflow automation: triggers, conditions, and actions chain together to eliminate manual processes.
Rate Limit Headers (Standard)
Always include these headers in API responses:
X-RateLimit-Limit: 1000 # Max requests in window
X-RateLimit-Remaining: 847 # Requests left
X-RateLimit-Reset: 1699999999 # Unix timestamp when window resets
Retry-After: 30 # Seconds to wait (only on 429)
Rate limiting is a pillar of API design. Get it right and your API is resilient and fair. Get it wrong and you are one bot away from downtime. At TechSaaS, rate limiting is part of every API platform we build.
Related Service
Cloud Solutions
Let our experts help you build the right technology strategy for your business.
Need help with platform engineering?
TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.
We Will Build You a Demo Site — For Free
Like it? Pay us. Do not like it? Walk away, zero complaints. You will spend way less than hiring developers or any agency.
No spam. No contracts. Just a free demo.