Module 14: Load Balancing

What is Load Balancing?

Load balancing distributes incoming requests across multiple servers to ensure no single server is overwhelmed.
┌─────────────────────────────────────────────────────────────────┐ │ LOAD BALANCING OVERVIEW │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ WITHOUT Load Balancer: WITH Load Balancer: │ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ │ Clients │ │ Clients │ │ │ │ ↓ ↓ ↓ ↓ ↓ ↓ │ │ ↓ ↓ ↓ ↓ ↓ ↓ │ │ │ │ ▼ ▼ ▼ ▼ ▼ ▼ │ │ ▼ ▼ ▼ ▼ ▼ ▼ │ │ │ │ ┌───────────┐ │ │ ┌───────────────┐ │ │ │ │ │ Server │ │ │ │ Load Balancer │ │ │ │ │ │ OVERLOAD! │ │ │ └───────┬───────┘ │ │ │ │ └───────────┘ │ │ ↓ ↓ ↓ │ │ │ │ │ │ ┌──┴┐ ┌┴──┐┌──┴┐ │ │ │ └─────────────────────┘ │ │S1 │ │S2 ││S3 │ │ │ │ │ └───┘ └───┘└───┘ │ │ │ └─────────────────────┘ │ │ │ │ Benefits: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • High availability (failover) │ │ │ │ • Scalability (add more servers) │ │ │ │ • Performance (parallel processing) │ │ │ │ • Flexibility (rolling deployments) │ │ │ │ • Security (single entry point) │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Load Balancing Layers

┌─────────────────────────────────────────────────────────────────┐ │ OSI LAYER COMPARISON │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ L4 (Transport Layer): │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Works with: TCP/UDP packets │ │ │ │ Sees: IP addresses, ports │ │ │ │ Cannot see: HTTP headers, URLs, cookies │ │ │ │ │ │ │ │ Pros: │ │ │ │ • Very fast (no packet inspection) │ │ │ │ • Protocol agnostic │ │ │ │ • Lower latency │ │ │ │ │ │ │ │ Cons: │ │ │ │ • No content-based routing │ │ │ │ • No SSL termination (unless NAT) │ │ │ │ │ │ │ │ Examples: AWS NLB, HAProxy (TCP mode) │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ L7 (Application Layer): │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Works with: HTTP/HTTPS requests │ │ │ │ Sees: URLs, headers, cookies, body │ │ │ │ │ │ │ │ Pros: │ │ │ │ • Content-based routing │ │ │ │ • SSL termination │ │ │ │ • Request manipulation │ │ │ │ • WebSocket support │ │ │ │ • Caching │ │ │ │ │ │ │ │ Cons: │ │ │ │ • More resource intensive │ │ │ │ • Higher latency │ │ │ │ • HTTP/HTTPS only │ │ │ │ │ │ │ │ Examples: AWS ALB, nginx, HAProxy (HTTP mode) │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Decision Guide: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Use L4 when: │ │ │ │ • Need raw TCP/UDP (databases, gaming) │ │ │ │ • Performance is critical │ │ │ │ • Simple distribution is sufficient │ │ │ │ │ │ │ │ Use L7 when: │ │ │ │ • Need URL-based routing │ │ │ │ • Want SSL termination │ │ │ │ • Need header manipulation │ │ │ │ • A/B testing or canary deployments │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Load Balancing Algorithms

┌─────────────────────────────────────────────────────────────────┐ │ LOAD BALANCING ALGORITHMS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 1. ROUND ROBIN │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Request 1 → Server A │ │ │ │ Request 2 → Server B │ │ │ │ Request 3 → Server C │ │ │ │ Request 4 → Server A (back to start) │ │ │ │ │ │ │ │ Pros: Simple, fair distribution │ │ │ │ Cons: Ignores server capacity/load │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 2. WEIGHTED ROUND ROBIN │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Server A (weight=3): gets 3 requests │ │ │ │ Server B (weight=2): gets 2 requests │ │ │ │ Server C (weight=1): gets 1 request │ │ │ │ │ │ │ │ Use when: Servers have different capacities │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 3. LEAST CONNECTIONS │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Server A: 10 connections ← Next request goes here │ │ │ │ Server B: 25 connections │ │ │ │ Server C: 30 connections │ │ │ │ │ │ │ │ Use when: Requests have varying duration │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 4. LEAST RESPONSE TIME │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Server A: 50ms avg ← Next request goes here │ │ │ │ Server B: 100ms avg │ │ │ │ Server C: 150ms avg │ │ │ │ │ │ │ │ Use when: Response time matters most │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 5. IP HASH │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ hash(client_ip) % num_servers = server_index │ │ │ │ │ │ │ │ Same IP always → Same server │ │ │ │ │ │ │ │ Use when: Need sticky sessions without cookies │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 6. RANDOM │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Randomly select a server for each request │ │ │ │ │ │ │ │ Use when: True randomness is needed │ │ │ │ (Surprisingly effective for large scale) │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Algorithm Implementation in Go

go
package loadbalancer import ( "hash/fnv" "math/rand" "net/http" "sync" "sync/atomic" "time" ) // Server represents a backend server type Server struct { URL string Weight int Alive bool Connections int64 ResponseTime time.Duration mu sync.RWMutex } // LoadBalancer interface type LoadBalancer interface { NextServer() *Server } // RoundRobin load balancer type RoundRobin struct { servers []*Server current uint64 } func NewRoundRobin(servers []*Server) *RoundRobin { return &RoundRobin{servers: servers} } func (r *RoundRobin) NextServer() *Server { for i := 0; i < len(r.servers); i++ { idx := atomic.AddUint64(&r.current, 1) % uint64(len(r.servers)) server := r.servers[idx] if server.Alive { return server } } return nil } // WeightedRoundRobin load balancer type WeightedRoundRobin struct { servers []*Server weights []int currentWeight int maxWeight int gcd int mu sync.Mutex } func NewWeightedRoundRobin(servers []*Server) *WeightedRoundRobin { wrr := &WeightedRoundRobin{servers: servers} // Calculate GCD and max weight wrr.gcd = servers[0].Weight wrr.maxWeight = servers[0].Weight for _, s := range servers[1:] { wrr.gcd = gcd(wrr.gcd, s.Weight) if s.Weight > wrr.maxWeight { wrr.maxWeight = s.Weight } } return wrr } func gcd(a, b int) int { for b != 0 { a, b = b, a%b } return a } func (w *WeightedRoundRobin) NextServer() *Server { w.mu.Lock() defer w.mu.Unlock() for { for _, server := range w.servers { if server.Alive && server.Weight >= w.currentWeight { w.currentWeight -= w.gcd if w.currentWeight <= 0 { w.currentWeight = w.maxWeight } return server } } w.currentWeight -= w.gcd if w.currentWeight <= 0 { w.currentWeight = w.maxWeight } } } // LeastConnections load balancer type LeastConnections struct { servers []*Server mu sync.Mutex } func NewLeastConnections(servers []*Server) *LeastConnections { return &LeastConnections{servers: servers} } func (l *LeastConnections) NextServer() *Server { l.mu.Lock() defer l.mu.Unlock() var selected *Server minConns := int64(^uint64(0) >> 1) // Max int64 for _, server := range l.servers { if !server.Alive { continue } conns := atomic.LoadInt64(&server.Connections) if conns < minConns { minConns = conns selected = server } } if selected != nil { atomic.AddInt64(&selected.Connections, 1) } return selected } func (l *LeastConnections) ReleaseConnection(server *Server) { atomic.AddInt64(&server.Connections, -1) } // LeastResponseTime load balancer type LeastResponseTime struct { servers []*Server mu sync.RWMutex } func NewLeastResponseTime(servers []*Server) *LeastResponseTime { return &LeastResponseTime{servers: servers} } func (l *LeastResponseTime) NextServer() *Server { l.mu.RLock() defer l.mu.RUnlock() var selected *Server minTime := time.Duration(1<<63 - 1) // Max duration for _, server := range l.servers { if !server.Alive { continue } server.mu.RLock() rt := server.ResponseTime server.mu.RUnlock() // Also consider connection count score := rt * time.Duration(atomic.LoadInt64(&server.Connections)+1) if score < minTime || selected == nil { minTime = score selected = server } } return selected } func (l *LeastResponseTime) UpdateResponseTime(server *Server, duration time.Duration) { server.mu.Lock() defer server.mu.Unlock() // Exponential moving average server.ResponseTime = (server.ResponseTime*7 + duration*3) / 10 } // IPHash load balancer type IPHash struct { servers []*Server } func NewIPHash(servers []*Server) *IPHash { return &IPHash{servers: servers} } func (i *IPHash) NextServer() *Server { return nil // Need request context } func (i *IPHash) NextServerForIP(clientIP string) *Server { hash := fnv.New32() hash.Write([]byte(clientIP)) idx := hash.Sum32() % uint32(len(i.servers)) // Find alive server starting from hash position for j := 0; j < len(i.servers); j++ { server := i.servers[(int(idx)+j)%len(i.servers)] if server.Alive { return server } } return nil } // Random load balancer type Random struct { servers []*Server rng *rand.Rand mu sync.Mutex } func NewRandom(servers []*Server) *Random { return &Random{ servers: servers, rng: rand.New(rand.NewSource(time.Now().UnixNano())), } } func (r *Random) NextServer() *Server { r.mu.Lock() defer r.mu.Unlock() alive := make([]*Server, 0, len(r.servers)) for _, s := range r.servers { if s.Alive { alive = append(alive, s) } } if len(alive) == 0 { return nil } return alive[r.rng.Intn(len(alive))] } // Power of Two Choices (P2C) - combines random with least connections type PowerOfTwoChoices struct { servers []*Server rng *rand.Rand mu sync.Mutex } func NewP2C(servers []*Server) *PowerOfTwoChoices { return &PowerOfTwoChoices{ servers: servers, rng: rand.New(rand.NewSource(time.Now().UnixNano())), } } func (p *PowerOfTwoChoices) NextServer() *Server { p.mu.Lock() defer p.mu.Unlock() alive := make([]*Server, 0, len(p.servers)) for _, s := range p.servers { if s.Alive { alive = append(alive, s) } } if len(alive) == 0 { return nil } if len(alive) == 1 { return alive[0] } // Pick two random servers idx1 := p.rng.Intn(len(alive)) idx2 := p.rng.Intn(len(alive) - 1) if idx2 >= idx1 { idx2++ } s1 := alive[idx1] s2 := alive[idx2] // Return one with fewer connections if atomic.LoadInt64(&s1.Connections) < atomic.LoadInt64(&s2.Connections) { return s1 } return s2 }

Health Checks

┌─────────────────────────────────────────────────────────────────┐ │ HEALTH CHECKS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Types: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ 1. TCP Health Check (L4) │ │ │ │ Can we connect to port 8080? │ │ │ │ │ │ │ │ 2. HTTP Health Check (L7) │ │ │ │ GET /health returns 200 OK? │ │ │ │ │ │ │ │ 3. Deep Health Check │ │ │ │ Can service reach database and cache? │ │ │ │ │ │ │ │ 4. Startup Probe │ │ │ │ Is the service ready to receive traffic? │ │ │ │ │ │ │ │ 5. Liveness Probe │ │ │ │ Is the service still running correctly? │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Health Check Flow: │ │ │ │ LB Server │ │ │ │ │ │ │──── GET /health ────────────►│ │ │ │ │ │ │ │◄──── 200 OK ─────────────────│ → Mark as healthy │ │ │ │ │ │ │──── GET /health ────────────►│ │ │ │ │ │ │ │◄──── 503 Error ──────────────│ → Increment failure count│ │ │ │ │ │ │──── GET /health ────────────►│ │ │ │ │ │ │ │◄──── Timeout ────────────────│ → Mark as unhealthy │ │ │ │ (after threshold) │ │ │ │ Parameters: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ interval: 10s (how often to check) │ │ │ │ timeout: 5s (how long to wait) │ │ │ │ threshold: 3 (failures before unhealthy) │ │ │ │ recovery: 2 (successes before healthy) │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Health Check Implementation

go
package loadbalancer import ( "context" "fmt" "net" "net/http" "sync" "time" ) // HealthChecker manages server health type HealthChecker struct { servers []*Server config HealthConfig httpClient *http.Client failureCounts map[*Server]int mu sync.Mutex stopCh chan struct{} } type HealthConfig struct { Interval time.Duration Timeout time.Duration FailureThreshold int RecoveryThreshold int Type string // "tcp", "http", "deep" HTTPPath string } func NewHealthChecker(servers []*Server, config HealthConfig) *HealthChecker { return &HealthChecker{ servers: servers, config: config, httpClient: &http.Client{ Timeout: config.Timeout, }, failureCounts: make(map[*Server]int), stopCh: make(chan struct{}), } } func (h *HealthChecker) Start() { ticker := time.NewTicker(h.config.Interval) defer ticker.Stop() // Initial check h.checkAll() for { select { case <-ticker.C: h.checkAll() case <-h.stopCh: return } } } func (h *HealthChecker) Stop() { close(h.stopCh) } func (h *HealthChecker) checkAll() { var wg sync.WaitGroup for _, server := range h.servers { wg.Add(1) go func(s *Server) { defer wg.Done() h.checkServer(s) }(server) } wg.Wait() } func (h *HealthChecker) checkServer(server *Server) { var healthy bool switch h.config.Type { case "tcp": healthy = h.tcpCheck(server) case "http": healthy = h.httpCheck(server) case "deep": healthy = h.deepCheck(server) default: healthy = h.httpCheck(server) } h.mu.Lock() defer h.mu.Unlock() if healthy { h.failureCounts[server] = 0 if !server.Alive { // Need recovery threshold successes to mark healthy server.Alive = true fmt.Printf("Server %s is now healthy\n", server.URL) } } else { h.failureCounts[server]++ if h.failureCounts[server] >= h.config.FailureThreshold && server.Alive { server.Alive = false fmt.Printf("Server %s is now unhealthy (failures: %d)\n", server.URL, h.failureCounts[server]) } } } func (h *HealthChecker) tcpCheck(server *Server) bool { conn, err := net.DialTimeout("tcp", server.URL, h.config.Timeout) if err != nil { return false } conn.Close() return true } func (h *HealthChecker) httpCheck(server *Server) bool { url := fmt.Sprintf("http://%s%s", server.URL, h.config.HTTPPath) resp, err := h.httpClient.Get(url) if err != nil { return false } defer resp.Body.Close() return resp.StatusCode >= 200 && resp.StatusCode < 300 } func (h *HealthChecker) deepCheck(server *Server) bool { url := fmt.Sprintf("http://%s/health/deep", server.URL) resp, err := h.httpClient.Get(url) if err != nil { return false } defer resp.Body.Close() return resp.StatusCode == 200 } // Server-side health endpoint func HealthHandler(db Database, cache Cache) http.HandlerFunc { return func(w http.ResponseWriter, r *http.Request) { ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second) defer cancel() // Check database if err := db.Ping(ctx); err != nil { w.WriteHeader(http.StatusServiceUnavailable) w.Write([]byte(`{"status":"unhealthy","error":"database"}`)) return } // Check cache if err := cache.Ping(ctx); err != nil { w.WriteHeader(http.StatusServiceUnavailable) w.Write([]byte(`{"status":"unhealthy","error":"cache"}`)) return } w.WriteHeader(http.StatusOK) w.Write([]byte(`{"status":"healthy"}`)) } } // Database and Cache interfaces for health checks type Database interface { Ping(ctx context.Context) error } type Cache interface { Ping(ctx context.Context) error }

Session Persistence (Sticky Sessions)

┌─────────────────────────────────────────────────────────────────┐ │ STICKY SESSIONS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Problem: Stateful applications need same server │ │ │ │ Request 1: Client → LB → Server A (creates session) │ │ Request 2: Client → LB → Server B (session not found!) │ │ │ │ Solutions: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ 1. Cookie-based Affinity │ │ │ │ LB sets: Set-Cookie: SERVERID=server-a │ │ │ │ Client sends: Cookie: SERVERID=server-a │ │ │ │ │ │ │ │ 2. IP-based Affinity │ │ │ │ hash(client_ip) → same server │ │ │ │ Problem: NAT, proxies break this │ │ │ │ │ │ │ │ 3. URL Parameter │ │ │ │ /path?server=server-a │ │ │ │ Problem: URL manipulation │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Better Solution: External Session Store │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Client → LB → Any Server → Redis (shared sessions) │ │ │ │ │ │ │ │ No sticky sessions needed! │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘
go
package loadbalancer import ( "net/http" "net/http/httputil" "net/url" ) type StickyLoadBalancer struct { servers []*Server lb LoadBalancer cookieName string } func NewStickyLB(servers []*Server, lb LoadBalancer, cookieName string) *StickyLoadBalancer { return &StickyLoadBalancer{ servers: servers, lb: lb, cookieName: cookieName, } } func (s *StickyLoadBalancer) ServeHTTP(w http.ResponseWriter, r *http.Request) { var target *Server // Check for existing affinity cookie cookie, err := r.Cookie(s.cookieName) if err == nil { target = s.findServer(cookie.Value) } // If no cookie or server not found/healthy, use load balancer if target == nil || !target.Alive { target = s.lb.NextServer() if target == nil { http.Error(w, "No servers available", http.StatusServiceUnavailable) return } // Set affinity cookie http.SetCookie(w, &http.Cookie{ Name: s.cookieName, Value: target.URL, Path: "/", HttpOnly: true, Secure: true, SameSite: http.SameSiteLaxMode, }) } // Proxy request to target targetURL, _ := url.Parse("http://" + target.URL) proxy := httputil.NewSingleHostReverseProxy(targetURL) proxy.ServeHTTP(w, r) } func (s *StickyLoadBalancer) findServer(url string) *Server { for _, server := range s.servers { if server.URL == url { return server } } return nil }

Complete Load Balancer Example

go
package main import ( "context" "fmt" "io" "log" "net/http" "net/http/httputil" "net/url" "os" "os/signal" "sync/atomic" "syscall" "time" ) // Full-featured load balancer type HTTPLoadBalancer struct { servers []*Server algorithm LoadBalancer healthChecker *HealthChecker // Metrics totalRequests uint64 activeRequests int64 requestDuration time.Duration } func NewHTTPLoadBalancer(serverURLs []string, algorithm string) *HTTPLoadBalancer { servers := make([]*Server, len(serverURLs)) for i, url := range serverURLs { servers[i] = &Server{ URL: url, Alive: true, Weight: 1, } } var lb LoadBalancer switch algorithm { case "round-robin": lb = NewRoundRobin(servers) case "least-conn": lb = NewLeastConnections(servers) case "random": lb = NewRandom(servers) case "p2c": lb = NewP2C(servers) default: lb = NewRoundRobin(servers) } healthChecker := NewHealthChecker(servers, HealthConfig{ Interval: 10 * time.Second, Timeout: 5 * time.Second, FailureThreshold: 3, RecoveryThreshold: 2, Type: "http", HTTPPath: "/health", }) return &HTTPLoadBalancer{ servers: servers, algorithm: lb, healthChecker: healthChecker, } } func (lb *HTTPLoadBalancer) ServeHTTP(w http.ResponseWriter, r *http.Request) { start := time.Now() atomic.AddUint64(&lb.totalRequests, 1) atomic.AddInt64(&lb.activeRequests, 1) defer atomic.AddInt64(&lb.activeRequests, -1) // Select server server := lb.algorithm.NextServer() if server == nil { http.Error(w, "No healthy servers", http.StatusServiceUnavailable) return } // Proxy request targetURL, _ := url.Parse("http://" + server.URL) proxy := httputil.NewSingleHostReverseProxy(targetURL) // Custom error handler proxy.ErrorHandler = func(w http.ResponseWriter, r *http.Request, err error) { log.Printf("Proxy error for %s: %v", server.URL, err) http.Error(w, "Backend error", http.StatusBadGateway) } // Add headers r.Header.Set("X-Forwarded-For", r.RemoteAddr) r.Header.Set("X-Real-IP", r.RemoteAddr) r.Header.Set("X-Load-Balancer", "go-lb") // Track response time rw := &responseWriter{ResponseWriter: w, statusCode: http.StatusOK} proxy.ServeHTTP(rw, r) duration := time.Since(start) // Update server response time if lrt, ok := lb.algorithm.(*LeastResponseTime); ok { lrt.UpdateResponseTime(server, duration) } // Log request log.Printf("%s %s → %s [%d] %v", r.Method, r.URL.Path, server.URL, rw.statusCode, duration) } type responseWriter struct { http.ResponseWriter statusCode int } func (rw *responseWriter) WriteHeader(code int) { rw.statusCode = code rw.ResponseWriter.WriteHeader(code) } func (lb *HTTPLoadBalancer) StartHealthCheck() { go lb.healthChecker.Start() } func (lb *HTTPLoadBalancer) Stop() { lb.healthChecker.Stop() } // Metrics endpoint func (lb *HTTPLoadBalancer) MetricsHandler(w http.ResponseWriter, r *http.Request) { total := atomic.LoadUint64(&lb.totalRequests) active := atomic.LoadInt64(&lb.activeRequests) fmt.Fprintf(w, "# HELP lb_requests_total Total requests\n") fmt.Fprintf(w, "# TYPE lb_requests_total counter\n") fmt.Fprintf(w, "lb_requests_total %d\n", total) fmt.Fprintf(w, "# HELP lb_requests_active Active requests\n") fmt.Fprintf(w, "# TYPE lb_requests_active gauge\n") fmt.Fprintf(w, "lb_requests_active %d\n", active) fmt.Fprintf(w, "# HELP lb_backend_status Backend status\n") fmt.Fprintf(w, "# TYPE lb_backend_status gauge\n") for _, s := range lb.servers { status := 0 if s.Alive { status = 1 } fmt.Fprintf(w, "lb_backend_status{backend=\"%s\"} %d\n", s.URL, status) } } func main() { backends := []string{ "localhost:8081", "localhost:8082", "localhost:8083", } lb := NewHTTPLoadBalancer(backends, "p2c") lb.StartHealthCheck() mux := http.NewServeMux() mux.Handle("/metrics", http.HandlerFunc(lb.MetricsHandler)) mux.Handle("/", lb) server := &http.Server{ Addr: ":8080", Handler: mux, } // Graceful shutdown go func() { sigCh := make(chan os.Signal, 1) signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM) <-sigCh log.Println("Shutting down...") lb.Stop() ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel() server.Shutdown(ctx) }() log.Printf("Load balancer starting on :8080") log.Printf("Backends: %v", backends) if err := server.ListenAndServe(); err != http.ErrServerClosed { log.Fatal(err) } }

Global Load Balancing (GSLB)

┌─────────────────────────────────────────────────────────────────┐ │ GLOBAL SERVER LOAD BALANCING │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ DNS-based routing for global traffic distribution │ │ │ │ ┌───────────────┐ │ │ │ DNS Server │ │ │ │ (GSLB) │ │ │ └───────┬───────┘ │ │ │ │ │ ┌──────────────────┼──────────────────┐ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ US-East │ │ EU-West │ │ APAC │ │ │ │ Region │ │ Region │ │ Region │ │ │ │ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │ │ │ │ │LB │ │LB │ │ │ │LB │ │LB │ │ │ │LB │ │LB │ │ │ │ │ └───┘ └───┘ │ │ └───┘ └───┘ │ │ └───┘ └───┘ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ Routing Methods: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ 1. Geolocation: Route to nearest region │ │ │ │ US client → US datacenter │ │ │ │ │ │ │ │ 2. Latency-based: Route to lowest latency region │ │ │ │ Based on actual measurements │ │ │ │ │ │ │ │ 3. Weighted: Distribute based on capacity │ │ │ │ US: 50%, EU: 30%, APAC: 20% │ │ │ │ │ │ │ │ 4. Failover: Route to backup when primary fails │ │ │ │ Primary: US, Backup: EU │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Services: AWS Route 53, Cloudflare, Google Cloud DNS │ │ │ └─────────────────────────────────────────────────────────────────┘

Best Practices

┌─────────────────────────────────────────────────────────────────┐ │ LOAD BALANCING BEST PRACTICES │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 1. HEALTH CHECKS │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Always implement health checks │ │ │ │ • Check actual dependencies (DB, cache) │ │ │ │ • Use appropriate intervals (not too aggressive) │ │ │ │ • Have failure thresholds before marking unhealthy │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 2. CONNECTION DRAINING │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Allow in-flight requests to complete │ │ │ │ • Stop sending new requests before removal │ │ │ │ • Set timeout for drain period │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 3. AVOID STICKY SESSIONS │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Use external session stores instead │ │ │ │ • If needed, use short cookie TTL │ │ │ │ • Monitor sticky session distribution │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 4. MONITOR EVERYTHING │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Requests per second │ │ │ │ • Latency (p50, p95, p99) │ │ │ │ • Error rates │ │ │ │ • Backend health status │ │ │ │ • Connection pool utilization │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 5. HIGH AVAILABILITY │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Multiple load balancer instances │ │ │ │ • Use managed services when possible │ │ │ │ • Test failover regularly │ │ │ │ • Have runbooks for LB failures │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Interview Questions

  1. What's the difference between L4 and L7 load balancing?
    • L4: TCP/UDP level, faster, protocol agnostic
    • L7: HTTP level, content-based routing, more features
  2. Which algorithm would you use for long-running connections?
    • Least connections - accounts for varying request durations
  3. How do you handle session persistence without sticky sessions?
    • External session store (Redis, Memcached)
    • JWT tokens (stateless)
  4. How do health checks prevent cascading failures?
    • Remove unhealthy servers from rotation
    • Prevent requests to failing backends
    • Allow recovery before re-adding
  5. Design a load balancer for a global service
    • DNS-based global routing (GeoDNS)
    • Regional load balancers per region
    • Cross-region failover
    • Consider latency vs consistency trade-offs

Summary

┌─────────────────────────────────────────────────────────────────┐ │ LOAD BALANCING SUMMARY │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Layers: │ │ • L4: Fast, simple, TCP/UDP │ │ • L7: Feature-rich, HTTP, content routing │ │ │ │ Algorithms: │ │ • Round Robin: Simple, equal distribution │ │ • Least Connections: Good for varying loads │ │ • P2C: Best of random + least connections │ │ │ │ Key Components: │ │ • Health checks: Essential for reliability │ │ • Session handling: Prefer external stores │ │ • Metrics: Monitor everything │ │ │ │ Key Insight: │ │ "Load balancing is the foundation of horizontal scaling. │ │ Get it right, and everything else becomes easier." │ │ │ └─────────────────────────────────────────────────────────────────┘

All Blogs
Tags:load-balancingtraffic-managementhigh-availability