Part 26: Rate Limiting - Protecting Systems from Overload

"The art of rate limiting isn't about saying 'no' - it's about saying 'not right now' while keeping the system healthy for everyone."
Welcome to Part 26 of our distributed systems course! After mastering bulkheads, we now explore rate limiting - a crucial technique for protecting services from being overwhelmed.

Why Rate Limiting?

Rate limiting serves multiple purposes:
  • Protection: Prevent resource exhaustion and maintain system stability
  • Fairness: Ensure equitable access among users
  • Cost Control: Limit expensive operations
  • Security: Mitigate denial-of-service attacks
  • Compliance: Enforce usage quotas and SLAs

Token Bucket Algorithm

The most versatile rate limiting algorithm:
go
package ratelimit import ( "context" "errors" "sync" "time" ) var ( ErrRateLimitExceeded = errors.New("rate limit exceeded") ) // TokenBucket implements the token bucket algorithm type TokenBucket struct { capacity float64 // Maximum tokens tokens float64 // Current tokens refillRate float64 // Tokens per second lastRefill time.Time mu sync.Mutex } // TokenBucketConfig configures the token bucket type TokenBucketConfig struct { Capacity float64 // Maximum burst size RefillRate float64 // Tokens per second (sustained rate) } // NewTokenBucket creates a new token bucket func NewTokenBucket(cfg TokenBucketConfig) *TokenBucket { return &TokenBucket{ capacity: cfg.Capacity, tokens: cfg.Capacity, // Start full refillRate: cfg.RefillRate, lastRefill: time.Now(), } } // Allow checks if n tokens are available and consumes them func (tb *TokenBucket) Allow(n float64) bool { tb.mu.Lock() defer tb.mu.Unlock() tb.refill() if tb.tokens >= n { tb.tokens -= n return true } return false } // AllowN checks multiple tokens with wait option func (tb *TokenBucket) AllowN(ctx context.Context, n float64) error { tb.mu.Lock() tb.refill() if tb.tokens >= n { tb.tokens -= n tb.mu.Unlock() return nil } // Calculate wait time deficit := n - tb.tokens waitTime := time.Duration(deficit/tb.refillRate*1000) * time.Millisecond tb.mu.Unlock() // Wait for tokens timer := time.NewTimer(waitTime) defer timer.Stop() select { case <-timer.C: return tb.AllowN(ctx, n) case <-ctx.Done(): return ctx.Err() } } // Reserve reserves tokens and returns a reservation type Reservation struct { ok bool tokens float64 timeToAct time.Time tb *TokenBucket } func (tb *TokenBucket) Reserve(n float64) *Reservation { tb.mu.Lock() defer tb.mu.Unlock() tb.refill() if tb.tokens >= n { tb.tokens -= n return &Reservation{ ok: true, tokens: n, timeToAct: time.Now(), tb: tb, } } // Calculate when tokens will be available deficit := n - tb.tokens waitTime := time.Duration(deficit/tb.refillRate*1000) * time.Millisecond tb.tokens = 0 // Reserve all current tokens return &Reservation{ ok: true, tokens: n, timeToAct: time.Now().Add(waitTime), tb: tb, } } func (r *Reservation) Delay() time.Duration { return time.Until(r.timeToAct) } func (r *Reservation) Cancel() { if r.ok { r.tb.mu.Lock() r.tb.tokens += r.tokens if r.tb.tokens > r.tb.capacity { r.tb.tokens = r.tb.capacity } r.tb.mu.Unlock() } } func (tb *TokenBucket) refill() { now := time.Now() elapsed := now.Sub(tb.lastRefill).Seconds() tb.tokens += elapsed * tb.refillRate if tb.tokens > tb.capacity { tb.tokens = tb.capacity } tb.lastRefill = now } // Status returns current bucket status func (tb *TokenBucket) Status() (tokens, capacity float64) { tb.mu.Lock() defer tb.mu.Unlock() tb.refill() return tb.tokens, tb.capacity }

Sliding Window Rate Limiter

More accurate than fixed windows:
go
// SlidingWindowLimiter implements sliding window rate limiting type SlidingWindowLimiter struct { limit int // Max requests per window window time.Duration // Window size timestamps []time.Time // Request timestamps mu sync.Mutex } // NewSlidingWindowLimiter creates a sliding window limiter func NewSlidingWindowLimiter(limit int, window time.Duration) *SlidingWindowLimiter { return &SlidingWindowLimiter{ limit: limit, window: window, timestamps: make([]time.Time, 0, limit), } } // Allow checks if a request is allowed func (swl *SlidingWindowLimiter) Allow() bool { swl.mu.Lock() defer swl.mu.Unlock() now := time.Now() windowStart := now.Add(-swl.window) // Remove expired timestamps valid := 0 for _, ts := range swl.timestamps { if ts.After(windowStart) { swl.timestamps[valid] = ts valid++ } } swl.timestamps = swl.timestamps[:valid] // Check limit if len(swl.timestamps) >= swl.limit { return false } // Add new timestamp swl.timestamps = append(swl.timestamps, now) return true } // Count returns current request count in window func (swl *SlidingWindowLimiter) Count() int { swl.mu.Lock() defer swl.mu.Unlock() now := time.Now() windowStart := now.Add(-swl.window) count := 0 for _, ts := range swl.timestamps { if ts.After(windowStart) { count++ } } return count } // SlidingWindowCounter is memory-efficient for high volumes type SlidingWindowCounter struct { limit int window time.Duration precision time.Duration // Sub-window precision buckets map[int64]int mu sync.Mutex } // NewSlidingWindowCounter creates a counter-based sliding window func NewSlidingWindowCounter(limit int, window, precision time.Duration) *SlidingWindowCounter { return &SlidingWindowCounter{ limit: limit, window: window, precision: precision, buckets: make(map[int64]int), } } func (swc *SlidingWindowCounter) Allow() bool { swc.mu.Lock() defer swc.mu.Unlock() now := time.Now() currentBucket := now.UnixNano() / int64(swc.precision) windowStart := now.Add(-swc.window).UnixNano() / int64(swc.precision) // Clean old buckets for bucket := range swc.buckets { if bucket < windowStart { delete(swc.buckets, bucket) } } // Count requests in window count := 0 for bucket, n := range swc.buckets { if bucket >= windowStart { count += n } } if count >= swc.limit { return false } // Increment current bucket swc.buckets[currentBucket]++ return true }

Leaky Bucket Algorithm

Smooths out bursts:
go
// LeakyBucket implements the leaky bucket algorithm type LeakyBucket struct { capacity int // Queue capacity leakRate time.Duration // Time between leaks queue chan struct{} stopCh chan struct{} mu sync.Mutex processed int64 dropped int64 } // NewLeakyBucket creates a leaky bucket func NewLeakyBucket(capacity int, leakRate time.Duration) *LeakyBucket { lb := &LeakyBucket{ capacity: capacity, leakRate: leakRate, queue: make(chan struct{}, capacity), stopCh: make(chan struct{}), } go lb.leak() return lb } // Add adds a request to the bucket func (lb *LeakyBucket) Add() bool { select { case lb.queue <- struct{}{}: return true default: lb.mu.Lock() lb.dropped++ lb.mu.Unlock() return false } } // AddWait waits for space in the bucket func (lb *LeakyBucket) AddWait(ctx context.Context) error { select { case lb.queue <- struct{}{}: return nil case <-ctx.Done(): return ctx.Err() } } func (lb *LeakyBucket) leak() { ticker := time.NewTicker(lb.leakRate) defer ticker.Stop() for { select { case <-ticker.C: select { case <-lb.queue: lb.mu.Lock() lb.processed++ lb.mu.Unlock() default: // Bucket empty } case <-lb.stopCh: return } } } // Stop stops the leaky bucket func (lb *LeakyBucket) Stop() { close(lb.stopCh) } // Stats returns bucket statistics func (lb *LeakyBucket) Stats() (processed, dropped int64, queueLen int) { lb.mu.Lock() defer lb.mu.Unlock() return lb.processed, lb.dropped, len(lb.queue) }

Distributed Rate Limiting with Redis

For multi-instance deployments:
go
import ( "context" "fmt" "time" "github.com/redis/go-redis/v9" ) // RedisRateLimiter provides distributed rate limiting type RedisRateLimiter struct { client *redis.Client prefix string } // NewRedisRateLimiter creates a Redis-backed rate limiter func NewRedisRateLimiter(client *redis.Client, prefix string) *RedisRateLimiter { return &RedisRateLimiter{ client: client, prefix: prefix, } } // SlidingWindowAllow checks rate limit using sliding window log func (rl *RedisRateLimiter) SlidingWindowAllow(ctx context.Context, key string, limit int, window time.Duration) (bool, error) { now := time.Now() windowStart := now.Add(-window) redisKey := fmt.Sprintf("%s:%s", rl.prefix, key) // Lua script for atomic operations script := redis.NewScript(` local key = KEYS[1] local now = tonumber(ARGV[1]) local window_start = tonumber(ARGV[2]) local limit = tonumber(ARGV[3]) local window_ms = tonumber(ARGV[4]) -- Remove old entries redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start) -- Count current entries local count = redis.call('ZCARD', key) if count < limit then -- Add new entry redis.call('ZADD', key, now, now .. '-' .. math.random()) redis.call('PEXPIRE', key, window_ms) return 1 end return 0 `) result, err := script.Run(ctx, rl.client, []string{redisKey}, now.UnixMilli(), windowStart.UnixMilli(), limit, window.Milliseconds(), ).Int() if err != nil { return false, err } return result == 1, nil } // TokenBucketAllow implements token bucket with Redis func (rl *RedisRateLimiter) TokenBucketAllow(ctx context.Context, key string, capacity, refillRate float64) (bool, error) { redisKey := fmt.Sprintf("%s:%s", rl.prefix, key) script := redis.NewScript(` local key = KEYS[1] local capacity = tonumber(ARGV[1]) local refill_rate = tonumber(ARGV[2]) local now = tonumber(ARGV[3]) local requested = tonumber(ARGV[4]) local bucket = redis.call('HMGET', key, 'tokens', 'last_refill') local tokens = tonumber(bucket[1]) or capacity local last_refill = tonumber(bucket[2]) or now -- Calculate tokens to add local elapsed = (now - last_refill) / 1000 tokens = math.min(capacity, tokens + elapsed * refill_rate) if tokens >= requested then tokens = tokens - requested redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now) redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 1) return 1 end redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now) redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 1) return 0 `) result, err := script.Run(ctx, rl.client, []string{redisKey}, capacity, refillRate, time.Now().UnixMilli(), 1.0, // Request 1 token ).Int() if err != nil { return false, err } return result == 1, nil } // FixedWindowAllow implements fixed window rate limiting func (rl *RedisRateLimiter) FixedWindowAllow(ctx context.Context, key string, limit int, window time.Duration) (bool, int, error) { now := time.Now() windowKey := fmt.Sprintf("%s:%s:%d", rl.prefix, key, now.Unix()/int64(window.Seconds())) pipe := rl.client.Pipeline() incr := pipe.Incr(ctx, windowKey) pipe.Expire(ctx, windowKey, window) _, err := pipe.Exec(ctx) if err != nil { return false, 0, err } count := int(incr.Val()) remaining := limit - count if remaining < 0 { remaining = 0 } return count <= limit, remaining, nil }

Multi-Tier Rate Limiting

Different limits for different dimensions:
go
// MultiTierLimiter applies multiple rate limits type MultiTierLimiter struct { limiters map[string]*TokenBucket mu sync.RWMutex } // Tier represents a rate limit tier type Tier struct { Name string Capacity float64 RefillRate float64 } // NewMultiTierLimiter creates a multi-tier limiter func NewMultiTierLimiter(tiers []Tier) *MultiTierLimiter { mtl := &MultiTierLimiter{ limiters: make(map[string]*TokenBucket), } for _, tier := range tiers { mtl.limiters[tier.Name] = NewTokenBucket(TokenBucketConfig{ Capacity: tier.Capacity, RefillRate: tier.RefillRate, }) } return mtl } // Allow checks all tiers func (mtl *MultiTierLimiter) Allow(tierValues map[string]float64) (bool, string) { mtl.mu.RLock() defer mtl.mu.RUnlock() for name, limiter := range mtl.limiters { tokens, ok := tierValues[name] if !ok { tokens = 1 } if !limiter.Allow(tokens) { return false, name } } return true, "" } // Example: API with user-level, IP-level, and global limits type APIRateLimiter struct { userLimiters sync.Map // map[userID]*TokenBucket ipLimiters sync.Map // map[IP]*TokenBucket globalLimiter *TokenBucket userConfig TokenBucketConfig ipConfig TokenBucketConfig } func NewAPIRateLimiter() *APIRateLimiter { return &APIRateLimiter{ userConfig: TokenBucketConfig{ Capacity: 100, // 100 requests burst RefillRate: 10, // 10 requests/second sustained }, ipConfig: TokenBucketConfig{ Capacity: 200, RefillRate: 20, }, globalLimiter: NewTokenBucket(TokenBucketConfig{ Capacity: 10000, RefillRate: 1000, }), } } func (arl *APIRateLimiter) Allow(userID, ip string) (bool, string) { // Check global limit first if !arl.globalLimiter.Allow(1) { return false, "global" } // Check user limit userLimiter := arl.getOrCreateUserLimiter(userID) if !userLimiter.Allow(1) { return false, "user" } // Check IP limit ipLimiter := arl.getOrCreateIPLimiter(ip) if !ipLimiter.Allow(1) { return false, "ip" } return true, "" } func (arl *APIRateLimiter) getOrCreateUserLimiter(userID string) *TokenBucket { if limiter, ok := arl.userLimiters.Load(userID); ok { return limiter.(*TokenBucket) } limiter := NewTokenBucket(arl.userConfig) actual, _ := arl.userLimiters.LoadOrStore(userID, limiter) return actual.(*TokenBucket) } func (arl *APIRateLimiter) getOrCreateIPLimiter(ip string) *TokenBucket { if limiter, ok := arl.ipLimiters.Load(ip); ok { return limiter.(*TokenBucket) } limiter := NewTokenBucket(arl.ipConfig) actual, _ := arl.ipLimiters.LoadOrStore(ip, limiter) return actual.(*TokenBucket) }

Adaptive Rate Limiting

Adjust limits based on system health:
go
// AdaptiveRateLimiter adjusts limits based on metrics type AdaptiveRateLimiter struct { baseLimiter *TokenBucket baseRate float64 // Health metrics errorRate *RollingAverage latencyP99 *RollingAverage cpuUsage func() float64 memoryUsage func() float64 // Thresholds maxErrorRate float64 maxLatencyMs float64 maxCPUUsage float64 maxMemoryUsage float64 // Adjustment minRate float64 maxRate float64 adjustFactor float64 mu sync.Mutex } // RollingAverage calculates rolling average type RollingAverage struct { values []float64 index int count int mu sync.Mutex } func NewRollingAverage(size int) *RollingAverage { return &RollingAverage{ values: make([]float64, size), } } func (ra *RollingAverage) Add(value float64) { ra.mu.Lock() defer ra.mu.Unlock() ra.values[ra.index] = value ra.index = (ra.index + 1) % len(ra.values) if ra.count < len(ra.values) { ra.count++ } } func (ra *RollingAverage) Average() float64 { ra.mu.Lock() defer ra.mu.Unlock() if ra.count == 0 { return 0 } sum := 0.0 for i := 0; i < ra.count; i++ { sum += ra.values[i] } return sum / float64(ra.count) } // AdaptiveConfig configures adaptive rate limiting type AdaptiveConfig struct { BaseRate float64 BaseCapacity float64 MinRate float64 MaxRate float64 MaxErrorRate float64 MaxLatencyMs float64 MaxCPUUsage float64 MaxMemoryUsage float64 AdjustInterval time.Duration } // NewAdaptiveRateLimiter creates an adaptive limiter func NewAdaptiveRateLimiter(cfg AdaptiveConfig) *AdaptiveRateLimiter { arl := &AdaptiveRateLimiter{ baseLimiter: NewTokenBucket(TokenBucketConfig{ Capacity: cfg.BaseCapacity, RefillRate: cfg.BaseRate, }), baseRate: cfg.BaseRate, errorRate: NewRollingAverage(100), latencyP99: NewRollingAverage(100), maxErrorRate: cfg.MaxErrorRate, maxLatencyMs: cfg.MaxLatencyMs, maxCPUUsage: cfg.MaxCPUUsage, maxMemoryUsage: cfg.MaxMemoryUsage, minRate: cfg.MinRate, maxRate: cfg.MaxRate, adjustFactor: 0.1, } go arl.adjustLoop(cfg.AdjustInterval) return arl } func (arl *AdaptiveRateLimiter) Allow() bool { return arl.baseLimiter.Allow(1) } // RecordMetrics records request metrics func (arl *AdaptiveRateLimiter) RecordMetrics(latencyMs float64, hasError bool) { arl.latencyP99.Add(latencyMs) if hasError { arl.errorRate.Add(1) } else { arl.errorRate.Add(0) } } // SetResourceMetrics sets resource metric functions func (arl *AdaptiveRateLimiter) SetResourceMetrics(cpuFn, memFn func() float64) { arl.cpuUsage = cpuFn arl.memoryUsage = memFn } func (arl *AdaptiveRateLimiter) adjustLoop(interval time.Duration) { ticker := time.NewTicker(interval) defer ticker.Stop() for range ticker.C { arl.adjust() } } func (arl *AdaptiveRateLimiter) adjust() { arl.mu.Lock() defer arl.mu.Unlock() currentRate := arl.baseLimiter.refillRate newRate := currentRate // Check error rate errRate := arl.errorRate.Average() if errRate > arl.maxErrorRate { newRate *= (1 - arl.adjustFactor) } // Check latency latency := arl.latencyP99.Average() if latency > arl.maxLatencyMs { newRate *= (1 - arl.adjustFactor) } // Check CPU if arl.cpuUsage != nil { cpu := arl.cpuUsage() if cpu > arl.maxCPUUsage { newRate *= (1 - arl.adjustFactor) } } // Check memory if arl.memoryUsage != nil { mem := arl.memoryUsage() if mem > arl.maxMemoryUsage { newRate *= (1 - arl.adjustFactor) } } // If all healthy, gradually increase if errRate < arl.maxErrorRate/2 && latency < arl.maxLatencyMs/2 && (arl.cpuUsage == nil || arl.cpuUsage() < arl.maxCPUUsage/2) && (arl.memoryUsage == nil || arl.memoryUsage() < arl.maxMemoryUsage/2) { newRate *= (1 + arl.adjustFactor/2) } // Clamp to bounds if newRate < arl.minRate { newRate = arl.minRate } if newRate > arl.maxRate { newRate = arl.maxRate } // Apply new rate arl.baseLimiter.refillRate = newRate }

HTTP Middleware

Apply rate limiting in HTTP handlers:
go
// RateLimitMiddleware creates HTTP middleware func RateLimitMiddleware(limiter *APIRateLimiter) func(http.Handler) http.Handler { return func(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { // Extract user ID and IP userID := r.Header.Get("X-User-ID") ip := getClientIP(r) allowed, limitType := limiter.Allow(userID, ip) if !allowed { w.Header().Set("X-RateLimit-Limit-Type", limitType) w.Header().Set("Retry-After", "1") http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests) return } next.ServeHTTP(w, r) }) } } func getClientIP(r *http.Request) string { // Check X-Forwarded-For header xff := r.Header.Get("X-Forwarded-For") if xff != "" { ips := strings.Split(xff, ",") return strings.TrimSpace(ips[0]) } // Check X-Real-IP header xri := r.Header.Get("X-Real-IP") if xri != "" { return xri } // Fall back to RemoteAddr ip, _, _ := net.SplitHostPort(r.RemoteAddr) return ip } // RateLimitHeaders adds rate limit headers to response type RateLimitHeaders struct { Limit int Remaining int Reset time.Time } func (rlh RateLimitHeaders) Apply(w http.ResponseWriter) { w.Header().Set("X-RateLimit-Limit", fmt.Sprintf("%d", rlh.Limit)) w.Header().Set("X-RateLimit-Remaining", fmt.Sprintf("%d", rlh.Remaining)) w.Header().Set("X-RateLimit-Reset", fmt.Sprintf("%d", rlh.Reset.Unix())) }

gRPC Rate Limiting

Apply rate limiting to gRPC services:
go
// UnaryRateLimitInterceptor creates a gRPC unary interceptor func UnaryRateLimitInterceptor(limiter *APIRateLimiter) grpc.UnaryServerInterceptor { return func( ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler, ) (interface{}, error) { // Extract identity from metadata md, ok := metadata.FromIncomingContext(ctx) if !ok { return nil, status.Error(codes.Internal, "no metadata") } userID := "" if vals := md.Get("user-id"); len(vals) > 0 { userID = vals[0] } ip := "" if vals := md.Get("x-forwarded-for"); len(vals) > 0 { ip = vals[0] } allowed, limitType := limiter.Allow(userID, ip) if !allowed { return nil, status.Errorf(codes.ResourceExhausted, "rate limit exceeded: %s", limitType) } return handler(ctx, req) } } // StreamRateLimitInterceptor creates a gRPC stream interceptor func StreamRateLimitInterceptor(limiter *APIRateLimiter) grpc.StreamServerInterceptor { return func( srv interface{}, ss grpc.ServerStream, info *grpc.StreamServerInfo, handler grpc.StreamHandler, ) error { ctx := ss.Context() md, ok := metadata.FromIncomingContext(ctx) if !ok { return status.Error(codes.Internal, "no metadata") } userID := "" if vals := md.Get("user-id"); len(vals) > 0 { userID = vals[0] } ip := "" if vals := md.Get("x-forwarded-for"); len(vals) > 0 { ip = vals[0] } allowed, limitType := limiter.Allow(userID, ip) if !allowed { return status.Errorf(codes.ResourceExhausted, "rate limit exceeded: %s", limitType) } return handler(srv, ss) } }

Cost-Based Rate Limiting

Different operations have different costs:
go
// CostBasedLimiter limits based on operation cost type CostBasedLimiter struct { limiter *TokenBucket costMap map[string]float64 defaultCost float64 } // NewCostBasedLimiter creates a cost-based limiter func NewCostBasedLimiter(capacity, refillRate float64) *CostBasedLimiter { return &CostBasedLimiter{ limiter: NewTokenBucket(TokenBucketConfig{ Capacity: capacity, RefillRate: refillRate, }), costMap: make(map[string]float64), defaultCost: 1.0, } } // SetCost sets the cost for an operation func (cbl *CostBasedLimiter) SetCost(operation string, cost float64) { cbl.costMap[operation] = cost } // Allow checks if an operation is allowed func (cbl *CostBasedLimiter) Allow(operation string) bool { cost, ok := cbl.costMap[operation] if !ok { cost = cbl.defaultCost } return cbl.limiter.Allow(cost) } // Example: API with different operation costs func setupCostBasedLimiting() *CostBasedLimiter { // 1000 units capacity, 100 units/second refill limiter := NewCostBasedLimiter(1000, 100) // Simple read operations - low cost limiter.SetCost("GET /users/:id", 1) limiter.SetCost("GET /products/:id", 1) // List operations - medium cost limiter.SetCost("GET /users", 5) limiter.SetCost("GET /products", 5) // Write operations - higher cost limiter.SetCost("POST /users", 10) limiter.SetCost("PUT /users/:id", 5) // Search operations - expensive limiter.SetCost("GET /search", 20) // Bulk operations - very expensive limiter.SetCost("POST /bulk-import", 100) limiter.SetCost("GET /export", 50) return limiter }

Complete Example

go
package main import ( "context" "encoding/json" "fmt" "log" "net/http" "time" ) func main() { // Create rate limiters apiLimiter := NewAPIRateLimiter() // Create adaptive limiter adaptiveLimiter := NewAdaptiveRateLimiter(AdaptiveConfig{ BaseRate: 100, BaseCapacity: 200, MinRate: 10, MaxRate: 500, MaxErrorRate: 0.05, MaxLatencyMs: 100, MaxCPUUsage: 0.8, MaxMemoryUsage: 0.8, AdjustInterval: 10 * time.Second, }) // Setup HTTP server mux := http.NewServeMux() // Apply rate limiting middleware handler := RateLimitMiddleware(apiLimiter)(mux) // API endpoints mux.HandleFunc("/api/data", func(w http.ResponseWriter, r *http.Request) { start := time.Now() // Check adaptive limiter if !adaptiveLimiter.Allow() { adaptiveLimiter.RecordMetrics(0, true) http.Error(w, "Service overloaded", http.StatusServiceUnavailable) return } // Process request time.Sleep(10 * time.Millisecond) // Simulate work // Record metrics latency := float64(time.Since(start).Milliseconds()) adaptiveLimiter.RecordMetrics(latency, false) json.NewEncoder(w).Encode(map[string]string{ "status": "ok", "data": "your data here", }) }) mux.HandleFunc("/api/status", func(w http.ResponseWriter, r *http.Request) { tokens, capacity := apiLimiter.globalLimiter.Status() json.NewEncoder(w).Encode(map[string]interface{}{ "global_tokens": tokens, "global_capacity": capacity, }) }) log.Println("Server starting on :8080") log.Fatal(http.ListenAndServe(":8080", handler)) }

Best Practices

  1. Choose the right algorithm
    • Token bucket for bursting with sustained limits
    • Leaky bucket for smooth output
    • Sliding window for accurate counting
  2. Set appropriate limits
    • Base on capacity testing
    • Consider downstream dependencies
    • Leave headroom for spikes
  3. Provide good feedback
    • Return retry-after headers
    • Include remaining quota in responses
    • Log rate limit events for analysis
  4. Handle gracefully
    • Use 429 Too Many Requests status
    • Implement backoff in clients
    • Consider priority queuing
  5. Monitor and adjust
    • Track rejection rates
    • Analyze limit hit patterns
    • Adjust limits based on data

What's Next?

In Part 27, we'll explore Load Balancing Strategies - distributing traffic effectively across service instances.

"Rate limiting is like traffic control - it's not about stopping traffic, it's about keeping everyone moving."
All Blogs
Tags:rate-limitingtoken-buckettraffic-controlscalability