Module 14: Load Balancing

What is Load Balancing?

Load balancing distributes incoming requests across multiple servers to ensure no single server is overwhelmed.

┌─────────────────────────────────────────────────────────────────┐
│                 LOAD BALANCING OVERVIEW                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  WITHOUT Load Balancer:          WITH Load Balancer:            │
│  ┌─────────────────────┐        ┌─────────────────────┐        │
│  │      Clients        │        │      Clients        │        │
│  │    ↓ ↓ ↓ ↓ ↓ ↓      │        │    ↓ ↓ ↓ ↓ ↓ ↓      │        │
│  │    ▼ ▼ ▼ ▼ ▼ ▼      │        │    ▼ ▼ ▼ ▼ ▼ ▼      │        │
│  │   ┌───────────┐     │        │  ┌───────────────┐  │        │
│  │   │  Server   │     │        │  │ Load Balancer │  │        │
│  │   │ OVERLOAD! │     │        │  └───────┬───────┘  │        │
│  │   └───────────┘     │        │      ↓   ↓   ↓      │        │
│  │                     │        │   ┌──┴┐ ┌┴──┐┌──┴┐  │        │
│  └─────────────────────┘        │   │S1 │ │S2 ││S3 │  │        │
│                                 │   └───┘ └───┘└───┘  │        │
│                                 └─────────────────────┘        │
│                                                                 │
│  Benefits:                                                      │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ • High availability (failover)                        │     │
│  │ • Scalability (add more servers)                      │     │
│  │ • Performance (parallel processing)                   │     │
│  │ • Flexibility (rolling deployments)                   │     │
│  │ • Security (single entry point)                       │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Load Balancing Layers

┌─────────────────────────────────────────────────────────────────┐
│                    OSI LAYER COMPARISON                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  L4 (Transport Layer):                                         │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ Works with: TCP/UDP packets                           │     │
│  │ Sees: IP addresses, ports                             │     │
│  │ Cannot see: HTTP headers, URLs, cookies               │     │
│  │                                                       │     │
│  │ Pros:                                                 │     │
│  │ • Very fast (no packet inspection)                    │     │
│  │ • Protocol agnostic                                   │     │
│  │ • Lower latency                                       │     │
│  │                                                       │     │
│  │ Cons:                                                 │     │
│  │ • No content-based routing                            │     │
│  │ • No SSL termination (unless NAT)                     │     │
│  │                                                       │     │
│  │ Examples: AWS NLB, HAProxy (TCP mode)                 │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  L7 (Application Layer):                                       │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ Works with: HTTP/HTTPS requests                       │     │
│  │ Sees: URLs, headers, cookies, body                    │     │
│  │                                                       │     │
│  │ Pros:                                                 │     │
│  │ • Content-based routing                               │     │
│  │ • SSL termination                                     │     │
│  │ • Request manipulation                                │     │
│  │ • WebSocket support                                   │     │
│  │ • Caching                                             │     │
│  │                                                       │     │
│  │ Cons:                                                 │     │
│  │ • More resource intensive                             │     │
│  │ • Higher latency                                      │     │
│  │ • HTTP/HTTPS only                                     │     │
│  │                                                       │     │
│  │ Examples: AWS ALB, nginx, HAProxy (HTTP mode)         │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  Decision Guide:                                                │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ Use L4 when:                                          │     │
│  │ • Need raw TCP/UDP (databases, gaming)                │     │
│  │ • Performance is critical                             │     │
│  │ • Simple distribution is sufficient                   │     │
│  │                                                       │     │
│  │ Use L7 when:                                          │     │
│  │ • Need URL-based routing                              │     │
│  │ • Want SSL termination                                │     │
│  │ • Need header manipulation                            │     │
│  │ • A/B testing or canary deployments                   │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Load Balancing Algorithms

┌─────────────────────────────────────────────────────────────────┐
│               LOAD BALANCING ALGORITHMS                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. ROUND ROBIN                                                 │
│  ┌───────────────────────────────────────────────────────┐     │
│  │                                                       │     │
│  │  Request 1 → Server A                                │     │
│  │  Request 2 → Server B                                │     │
│  │  Request 3 → Server C                                │     │
│  │  Request 4 → Server A (back to start)                │     │
│  │                                                       │     │
│  │  Pros: Simple, fair distribution                      │     │
│  │  Cons: Ignores server capacity/load                   │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  2. WEIGHTED ROUND ROBIN                                       │
│  ┌───────────────────────────────────────────────────────┐     │
│  │                                                       │     │
│  │  Server A (weight=3): gets 3 requests                │     │
│  │  Server B (weight=2): gets 2 requests                │     │
│  │  Server C (weight=1): gets 1 request                 │     │
│  │                                                       │     │
│  │  Use when: Servers have different capacities          │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  3. LEAST CONNECTIONS                                          │
│  ┌───────────────────────────────────────────────────────┐     │
│  │                                                       │     │
│  │  Server A: 10 connections ← Next request goes here   │     │
│  │  Server B: 25 connections                             │     │
│  │  Server C: 30 connections                             │     │
│  │                                                       │     │
│  │  Use when: Requests have varying duration             │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  4. LEAST RESPONSE TIME                                        │
│  ┌───────────────────────────────────────────────────────┐     │
│  │                                                       │     │
│  │  Server A: 50ms avg  ← Next request goes here        │     │
│  │  Server B: 100ms avg                                  │     │
│  │  Server C: 150ms avg                                  │     │
│  │                                                       │     │
│  │  Use when: Response time matters most                 │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  5. IP HASH                                                    │
│  ┌───────────────────────────────────────────────────────┐     │
│  │                                                       │     │
│  │  hash(client_ip) % num_servers = server_index        │     │
│  │                                                       │     │
│  │  Same IP always → Same server                        │     │
│  │                                                       │     │
│  │  Use when: Need sticky sessions without cookies       │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  6. RANDOM                                                     │
│  ┌───────────────────────────────────────────────────────┐     │
│  │                                                       │     │
│  │  Randomly select a server for each request            │     │
│  │                                                       │     │
│  │  Use when: True randomness is needed                  │     │
│  │  (Surprisingly effective for large scale)             │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Algorithm Implementation in Go

go
package loadbalancer

import (
    "hash/fnv"
    "math/rand"
    "net/http"
    "sync"
    "sync/atomic"
    "time"
)

// Server represents a backend server
type Server struct {
    URL           string
    Weight        int
    Alive         bool
    Connections   int64
    ResponseTime  time.Duration
    mu            sync.RWMutex
}

// LoadBalancer interface
type LoadBalancer interface {
    NextServer() *Server
}

// RoundRobin load balancer
type RoundRobin struct {
    servers []*Server
    current uint64
}

func NewRoundRobin(servers []*Server) *RoundRobin {
    return &RoundRobin{servers: servers}
}

func (r *RoundRobin) NextServer() *Server {
    for i := 0; i < len(r.servers); i++ {
        idx := atomic.AddUint64(&r.current, 1) % uint64(len(r.servers))
        server := r.servers[idx]
        if server.Alive {
            return server
        }
    }
    return nil
}

// WeightedRoundRobin load balancer
type WeightedRoundRobin struct {
    servers       []*Server
    weights       []int
    currentWeight int
    maxWeight     int
    gcd           int
    mu            sync.Mutex
}

func NewWeightedRoundRobin(servers []*Server) *WeightedRoundRobin {
    wrr := &WeightedRoundRobin{servers: servers}

    // Calculate GCD and max weight
    wrr.gcd = servers[0].Weight
    wrr.maxWeight = servers[0].Weight

    for _, s := range servers[1:] {
        wrr.gcd = gcd(wrr.gcd, s.Weight)
        if s.Weight > wrr.maxWeight {
            wrr.maxWeight = s.Weight
        }
    }

    return wrr
}

func gcd(a, b int) int {
    for b != 0 {
        a, b = b, a%b
    }
    return a
}

func (w *WeightedRoundRobin) NextServer() *Server {
    w.mu.Lock()
    defer w.mu.Unlock()

    for {
        for _, server := range w.servers {
            if server.Alive && server.Weight >= w.currentWeight {
                w.currentWeight -= w.gcd
                if w.currentWeight <= 0 {
                    w.currentWeight = w.maxWeight
                }
                return server
            }
        }
        w.currentWeight -= w.gcd
        if w.currentWeight <= 0 {
            w.currentWeight = w.maxWeight
        }
    }
}

// LeastConnections load balancer
type LeastConnections struct {
    servers []*Server
    mu      sync.Mutex
}

func NewLeastConnections(servers []*Server) *LeastConnections {
    return &LeastConnections{servers: servers}
}

func (l *LeastConnections) NextServer() *Server {
    l.mu.Lock()
    defer l.mu.Unlock()

    var selected *Server
    minConns := int64(^uint64(0) >> 1) // Max int64

    for _, server := range l.servers {
        if !server.Alive {
            continue
        }
        conns := atomic.LoadInt64(&server.Connections)
        if conns < minConns {
            minConns = conns
            selected = server
        }
    }

    if selected != nil {
        atomic.AddInt64(&selected.Connections, 1)
    }
    return selected
}

func (l *LeastConnections) ReleaseConnection(server *Server) {
    atomic.AddInt64(&server.Connections, -1)
}

// LeastResponseTime load balancer
type LeastResponseTime struct {
    servers []*Server
    mu      sync.RWMutex
}

func NewLeastResponseTime(servers []*Server) *LeastResponseTime {
    return &LeastResponseTime{servers: servers}
}

func (l *LeastResponseTime) NextServer() *Server {
    l.mu.RLock()
    defer l.mu.RUnlock()

    var selected *Server
    minTime := time.Duration(1<<63 - 1) // Max duration

    for _, server := range l.servers {
        if !server.Alive {
            continue
        }
        server.mu.RLock()
        rt := server.ResponseTime
        server.mu.RUnlock()

        // Also consider connection count
        score := rt * time.Duration(atomic.LoadInt64(&server.Connections)+1)

        if score < minTime || selected == nil {
            minTime = score
            selected = server
        }
    }

    return selected
}

func (l *LeastResponseTime) UpdateResponseTime(server *Server, duration time.Duration) {
    server.mu.Lock()
    defer server.mu.Unlock()
    // Exponential moving average
    server.ResponseTime = (server.ResponseTime*7 + duration*3) / 10
}

// IPHash load balancer
type IPHash struct {
    servers []*Server
}

func NewIPHash(servers []*Server) *IPHash {
    return &IPHash{servers: servers}
}

func (i *IPHash) NextServer() *Server {
    return nil // Need request context
}

func (i *IPHash) NextServerForIP(clientIP string) *Server {
    hash := fnv.New32()
    hash.Write([]byte(clientIP))
    idx := hash.Sum32() % uint32(len(i.servers))

    // Find alive server starting from hash position
    for j := 0; j < len(i.servers); j++ {
        server := i.servers[(int(idx)+j)%len(i.servers)]
        if server.Alive {
            return server
        }
    }
    return nil
}

// Random load balancer
type Random struct {
    servers []*Server
    rng     *rand.Rand
    mu      sync.Mutex
}

func NewRandom(servers []*Server) *Random {
    return &Random{
        servers: servers,
        rng:     rand.New(rand.NewSource(time.Now().UnixNano())),
    }
}

func (r *Random) NextServer() *Server {
    r.mu.Lock()
    defer r.mu.Unlock()

    alive := make([]*Server, 0, len(r.servers))
    for _, s := range r.servers {
        if s.Alive {
            alive = append(alive, s)
        }
    }

    if len(alive) == 0 {
        return nil
    }

    return alive[r.rng.Intn(len(alive))]
}

// Power of Two Choices (P2C) - combines random with least connections
type PowerOfTwoChoices struct {
    servers []*Server
    rng     *rand.Rand
    mu      sync.Mutex
}

func NewP2C(servers []*Server) *PowerOfTwoChoices {
    return &PowerOfTwoChoices{
        servers: servers,
        rng:     rand.New(rand.NewSource(time.Now().UnixNano())),
    }
}

func (p *PowerOfTwoChoices) NextServer() *Server {
    p.mu.Lock()
    defer p.mu.Unlock()

    alive := make([]*Server, 0, len(p.servers))
    for _, s := range p.servers {
        if s.Alive {
            alive = append(alive, s)
        }
    }

    if len(alive) == 0 {
        return nil
    }
    if len(alive) == 1 {
        return alive[0]
    }

    // Pick two random servers
    idx1 := p.rng.Intn(len(alive))
    idx2 := p.rng.Intn(len(alive) - 1)
    if idx2 >= idx1 {
        idx2++
    }

    s1 := alive[idx1]
    s2 := alive[idx2]

    // Return one with fewer connections
    if atomic.LoadInt64(&s1.Connections) < atomic.LoadInt64(&s2.Connections) {
        return s1
    }
    return s2
}

Health Checks

┌─────────────────────────────────────────────────────────────────┐
│                    HEALTH CHECKS                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Types:                                                         │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ 1. TCP Health Check (L4)                              │     │
│  │    Can we connect to port 8080?                       │     │
│  │                                                       │     │
│  │ 2. HTTP Health Check (L7)                             │     │
│  │    GET /health returns 200 OK?                        │     │
│  │                                                       │     │
│  │ 3. Deep Health Check                                  │     │
│  │    Can service reach database and cache?              │     │
│  │                                                       │     │
│  │ 4. Startup Probe                                      │     │
│  │    Is the service ready to receive traffic?           │     │
│  │                                                       │     │
│  │ 5. Liveness Probe                                     │     │
│  │    Is the service still running correctly?            │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  Health Check Flow:                                             │
│                                                                 │
│    LB                           Server                          │
│     │                              │                            │
│     │──── GET /health ────────────►│                            │
│     │                              │                            │
│     │◄──── 200 OK ─────────────────│  → Mark as healthy        │
│     │                              │                            │
│     │──── GET /health ────────────►│                            │
│     │                              │                            │
│     │◄──── 503 Error ──────────────│  → Increment failure count│
│     │                              │                            │
│     │──── GET /health ────────────►│                            │
│     │                              │                            │
│     │◄──── Timeout ────────────────│  → Mark as unhealthy      │
│     │                              │     (after threshold)      │
│                                                                 │
│  Parameters:                                                    │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ interval:     10s  (how often to check)               │     │
│  │ timeout:       5s  (how long to wait)                 │     │
│  │ threshold:     3   (failures before unhealthy)        │     │
│  │ recovery:      2   (successes before healthy)         │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Health Check Implementation

go
package loadbalancer

import (
    "context"
    "fmt"
    "net"
    "net/http"
    "sync"
    "time"
)

// HealthChecker manages server health
type HealthChecker struct {
    servers      []*Server
    config       HealthConfig
    httpClient   *http.Client
    failureCounts map[*Server]int
    mu            sync.Mutex
    stopCh        chan struct{}
}

type HealthConfig struct {
    Interval        time.Duration
    Timeout         time.Duration
    FailureThreshold int
    RecoveryThreshold int
    Type            string // "tcp", "http", "deep"
    HTTPPath        string
}

func NewHealthChecker(servers []*Server, config HealthConfig) *HealthChecker {
    return &HealthChecker{
        servers: servers,
        config:  config,
        httpClient: &http.Client{
            Timeout: config.Timeout,
        },
        failureCounts: make(map[*Server]int),
        stopCh:        make(chan struct{}),
    }
}

func (h *HealthChecker) Start() {
    ticker := time.NewTicker(h.config.Interval)
    defer ticker.Stop()

    // Initial check
    h.checkAll()

    for {
        select {
        case <-ticker.C:
            h.checkAll()
        case <-h.stopCh:
            return
        }
    }
}

func (h *HealthChecker) Stop() {
    close(h.stopCh)
}

func (h *HealthChecker) checkAll() {
    var wg sync.WaitGroup
    for _, server := range h.servers {
        wg.Add(1)
        go func(s *Server) {
            defer wg.Done()
            h.checkServer(s)
        }(server)
    }
    wg.Wait()
}

func (h *HealthChecker) checkServer(server *Server) {
    var healthy bool

    switch h.config.Type {
    case "tcp":
        healthy = h.tcpCheck(server)
    case "http":
        healthy = h.httpCheck(server)
    case "deep":
        healthy = h.deepCheck(server)
    default:
        healthy = h.httpCheck(server)
    }

    h.mu.Lock()
    defer h.mu.Unlock()

    if healthy {
        h.failureCounts[server] = 0
        if !server.Alive {
            // Need recovery threshold successes to mark healthy
            server.Alive = true
            fmt.Printf("Server %s is now healthy\n", server.URL)
        }
    } else {
        h.failureCounts[server]++
        if h.failureCounts[server] >= h.config.FailureThreshold && server.Alive {
            server.Alive = false
            fmt.Printf("Server %s is now unhealthy (failures: %d)\n",
                server.URL, h.failureCounts[server])
        }
    }
}

func (h *HealthChecker) tcpCheck(server *Server) bool {
    conn, err := net.DialTimeout("tcp", server.URL, h.config.Timeout)
    if err != nil {
        return false
    }
    conn.Close()
    return true
}

func (h *HealthChecker) httpCheck(server *Server) bool {
    url := fmt.Sprintf("http://%s%s", server.URL, h.config.HTTPPath)
    resp, err := h.httpClient.Get(url)
    if err != nil {
        return false
    }
    defer resp.Body.Close()
    return resp.StatusCode >= 200 && resp.StatusCode < 300
}

func (h *HealthChecker) deepCheck(server *Server) bool {
    url := fmt.Sprintf("http://%s/health/deep", server.URL)
    resp, err := h.httpClient.Get(url)
    if err != nil {
        return false
    }
    defer resp.Body.Close()
    return resp.StatusCode == 200
}

// Server-side health endpoint
func HealthHandler(db Database, cache Cache) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
        defer cancel()

        // Check database
        if err := db.Ping(ctx); err != nil {
            w.WriteHeader(http.StatusServiceUnavailable)
            w.Write([]byte(`{"status":"unhealthy","error":"database"}`))
            return
        }

        // Check cache
        if err := cache.Ping(ctx); err != nil {
            w.WriteHeader(http.StatusServiceUnavailable)
            w.Write([]byte(`{"status":"unhealthy","error":"cache"}`))
            return
        }

        w.WriteHeader(http.StatusOK)
        w.Write([]byte(`{"status":"healthy"}`))
    }
}

// Database and Cache interfaces for health checks
type Database interface {
    Ping(ctx context.Context) error
}

type Cache interface {
    Ping(ctx context.Context) error
}

Session Persistence (Sticky Sessions)

┌─────────────────────────────────────────────────────────────────┐
│                  STICKY SESSIONS                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Problem: Stateful applications need same server               │
│                                                                 │
│  Request 1: Client → LB → Server A (creates session)           │
│  Request 2: Client → LB → Server B (session not found!)        │
│                                                                 │
│  Solutions:                                                     │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ 1. Cookie-based Affinity                              │     │
│  │    LB sets: Set-Cookie: SERVERID=server-a             │     │
│  │    Client sends: Cookie: SERVERID=server-a            │     │
│  │                                                       │     │
│  │ 2. IP-based Affinity                                  │     │
│  │    hash(client_ip) → same server                      │     │
│  │    Problem: NAT, proxies break this                   │     │
│  │                                                       │     │
│  │ 3. URL Parameter                                      │     │
│  │    /path?server=server-a                              │     │
│  │    Problem: URL manipulation                          │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  Better Solution: External Session Store                        │
│  ┌───────────────────────────────────────────────────────┐     │
│  │                                                       │     │
│  │  Client → LB → Any Server → Redis (shared sessions)  │     │
│  │                                                       │     │
│  │  No sticky sessions needed!                           │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

go
package loadbalancer

import (
    "net/http"
    "net/http/httputil"
    "net/url"
)

type StickyLoadBalancer struct {
    servers    []*Server
    lb         LoadBalancer
    cookieName string
}

func NewStickyLB(servers []*Server, lb LoadBalancer, cookieName string) *StickyLoadBalancer {
    return &StickyLoadBalancer{
        servers:    servers,
        lb:         lb,
        cookieName: cookieName,
    }
}

func (s *StickyLoadBalancer) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    var target *Server

    // Check for existing affinity cookie
    cookie, err := r.Cookie(s.cookieName)
    if err == nil {
        target = s.findServer(cookie.Value)
    }

    // If no cookie or server not found/healthy, use load balancer
    if target == nil || !target.Alive {
        target = s.lb.NextServer()
        if target == nil {
            http.Error(w, "No servers available", http.StatusServiceUnavailable)
            return
        }

        // Set affinity cookie
        http.SetCookie(w, &http.Cookie{
            Name:     s.cookieName,
            Value:    target.URL,
            Path:     "/",
            HttpOnly: true,
            Secure:   true,
            SameSite: http.SameSiteLaxMode,
        })
    }

    // Proxy request to target
    targetURL, _ := url.Parse("http://" + target.URL)
    proxy := httputil.NewSingleHostReverseProxy(targetURL)
    proxy.ServeHTTP(w, r)
}

func (s *StickyLoadBalancer) findServer(url string) *Server {
    for _, server := range s.servers {
        if server.URL == url {
            return server
        }
    }
    return nil
}

Complete Load Balancer Example

go
package main

import (
    "context"
    "fmt"
    "io"
    "log"
    "net/http"
    "net/http/httputil"
    "net/url"
    "os"
    "os/signal"
    "sync/atomic"
    "syscall"
    "time"
)

// Full-featured load balancer
type HTTPLoadBalancer struct {
    servers       []*Server
    algorithm     LoadBalancer
    healthChecker *HealthChecker

    // Metrics
    totalRequests   uint64
    activeRequests  int64
    requestDuration time.Duration
}

func NewHTTPLoadBalancer(serverURLs []string, algorithm string) *HTTPLoadBalancer {
    servers := make([]*Server, len(serverURLs))
    for i, url := range serverURLs {
        servers[i] = &Server{
            URL:   url,
            Alive: true,
            Weight: 1,
        }
    }

    var lb LoadBalancer
    switch algorithm {
    case "round-robin":
        lb = NewRoundRobin(servers)
    case "least-conn":
        lb = NewLeastConnections(servers)
    case "random":
        lb = NewRandom(servers)
    case "p2c":
        lb = NewP2C(servers)
    default:
        lb = NewRoundRobin(servers)
    }

    healthChecker := NewHealthChecker(servers, HealthConfig{
        Interval:         10 * time.Second,
        Timeout:          5 * time.Second,
        FailureThreshold: 3,
        RecoveryThreshold: 2,
        Type:             "http",
        HTTPPath:         "/health",
    })

    return &HTTPLoadBalancer{
        servers:       servers,
        algorithm:     lb,
        healthChecker: healthChecker,
    }
}

func (lb *HTTPLoadBalancer) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    start := time.Now()
    atomic.AddUint64(&lb.totalRequests, 1)
    atomic.AddInt64(&lb.activeRequests, 1)
    defer atomic.AddInt64(&lb.activeRequests, -1)

    // Select server
    server := lb.algorithm.NextServer()
    if server == nil {
        http.Error(w, "No healthy servers", http.StatusServiceUnavailable)
        return
    }

    // Proxy request
    targetURL, _ := url.Parse("http://" + server.URL)
    proxy := httputil.NewSingleHostReverseProxy(targetURL)

    // Custom error handler
    proxy.ErrorHandler = func(w http.ResponseWriter, r *http.Request, err error) {
        log.Printf("Proxy error for %s: %v", server.URL, err)
        http.Error(w, "Backend error", http.StatusBadGateway)
    }

    // Add headers
    r.Header.Set("X-Forwarded-For", r.RemoteAddr)
    r.Header.Set("X-Real-IP", r.RemoteAddr)
    r.Header.Set("X-Load-Balancer", "go-lb")

    // Track response time
    rw := &responseWriter{ResponseWriter: w, statusCode: http.StatusOK}
    proxy.ServeHTTP(rw, r)

    duration := time.Since(start)

    // Update server response time
    if lrt, ok := lb.algorithm.(*LeastResponseTime); ok {
        lrt.UpdateResponseTime(server, duration)
    }

    // Log request
    log.Printf("%s %s → %s [%d] %v",
        r.Method, r.URL.Path, server.URL, rw.statusCode, duration)
}

type responseWriter struct {
    http.ResponseWriter
    statusCode int
}

func (rw *responseWriter) WriteHeader(code int) {
    rw.statusCode = code
    rw.ResponseWriter.WriteHeader(code)
}

func (lb *HTTPLoadBalancer) StartHealthCheck() {
    go lb.healthChecker.Start()
}

func (lb *HTTPLoadBalancer) Stop() {
    lb.healthChecker.Stop()
}

// Metrics endpoint
func (lb *HTTPLoadBalancer) MetricsHandler(w http.ResponseWriter, r *http.Request) {
    total := atomic.LoadUint64(&lb.totalRequests)
    active := atomic.LoadInt64(&lb.activeRequests)

    fmt.Fprintf(w, "# HELP lb_requests_total Total requests\n")
    fmt.Fprintf(w, "# TYPE lb_requests_total counter\n")
    fmt.Fprintf(w, "lb_requests_total %d\n", total)

    fmt.Fprintf(w, "# HELP lb_requests_active Active requests\n")
    fmt.Fprintf(w, "# TYPE lb_requests_active gauge\n")
    fmt.Fprintf(w, "lb_requests_active %d\n", active)

    fmt.Fprintf(w, "# HELP lb_backend_status Backend status\n")
    fmt.Fprintf(w, "# TYPE lb_backend_status gauge\n")
    for _, s := range lb.servers {
        status := 0
        if s.Alive {
            status = 1
        }
        fmt.Fprintf(w, "lb_backend_status{backend=\"%s\"} %d\n", s.URL, status)
    }
}

func main() {
    backends := []string{
        "localhost:8081",
        "localhost:8082",
        "localhost:8083",
    }

    lb := NewHTTPLoadBalancer(backends, "p2c")
    lb.StartHealthCheck()

    mux := http.NewServeMux()
    mux.Handle("/metrics", http.HandlerFunc(lb.MetricsHandler))
    mux.Handle("/", lb)

    server := &http.Server{
        Addr:    ":8080",
        Handler: mux,
    }

    // Graceful shutdown
    go func() {
        sigCh := make(chan os.Signal, 1)
        signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
        <-sigCh

        log.Println("Shutting down...")
        lb.Stop()

        ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
        defer cancel()
        server.Shutdown(ctx)
    }()

    log.Printf("Load balancer starting on :8080")
    log.Printf("Backends: %v", backends)

    if err := server.ListenAndServe(); err != http.ErrServerClosed {
        log.Fatal(err)
    }
}

Global Load Balancing (GSLB)

┌─────────────────────────────────────────────────────────────────┐
│               GLOBAL SERVER LOAD BALANCING                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  DNS-based routing for global traffic distribution              │
│                                                                 │
│                    ┌───────────────┐                           │
│                    │  DNS Server   │                           │
│                    │  (GSLB)       │                           │
│                    └───────┬───────┘                           │
│                            │                                    │
│         ┌──────────────────┼──────────────────┐                │
│         │                  │                  │                │
│         ▼                  ▼                  ▼                │
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────┐          │
│  │   US-East   │   │   EU-West   │   │   APAC      │          │
│  │   Region    │   │   Region    │   │   Region    │          │
│  │ ┌───┐ ┌───┐ │   │ ┌───┐ ┌───┐ │   │ ┌───┐ ┌───┐ │          │
│  │ │LB │ │LB │ │   │ │LB │ │LB │ │   │ │LB │ │LB │ │          │
│  │ └───┘ └───┘ │   │ └───┘ └───┘ │   │ └───┘ └───┘ │          │
│  └─────────────┘   └─────────────┘   └─────────────┘          │
│                                                                 │
│  Routing Methods:                                               │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ 1. Geolocation: Route to nearest region               │     │
│  │    US client → US datacenter                          │     │
│  │                                                       │     │
│  │ 2. Latency-based: Route to lowest latency region     │     │
│  │    Based on actual measurements                       │     │
│  │                                                       │     │
│  │ 3. Weighted: Distribute based on capacity             │     │
│  │    US: 50%, EU: 30%, APAC: 20%                       │     │
│  │                                                       │     │
│  │ 4. Failover: Route to backup when primary fails      │     │
│  │    Primary: US, Backup: EU                            │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  Services: AWS Route 53, Cloudflare, Google Cloud DNS          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Best Practices

┌─────────────────────────────────────────────────────────────────┐
│              LOAD BALANCING BEST PRACTICES                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. HEALTH CHECKS                                              │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ • Always implement health checks                      │     │
│  │ • Check actual dependencies (DB, cache)               │     │
│  │ • Use appropriate intervals (not too aggressive)      │     │
│  │ • Have failure thresholds before marking unhealthy    │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  2. CONNECTION DRAINING                                        │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ • Allow in-flight requests to complete                │     │
│  │ • Stop sending new requests before removal            │     │
│  │ • Set timeout for drain period                        │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  3. AVOID STICKY SESSIONS                                      │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ • Use external session stores instead                 │     │
│  │ • If needed, use short cookie TTL                     │     │
│  │ • Monitor sticky session distribution                 │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  4. MONITOR EVERYTHING                                         │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ • Requests per second                                 │     │
│  │ • Latency (p50, p95, p99)                            │     │
│  │ • Error rates                                         │     │
│  │ • Backend health status                               │     │
│  │ • Connection pool utilization                         │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  5. HIGH AVAILABILITY                                          │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ • Multiple load balancer instances                    │     │
│  │ • Use managed services when possible                  │     │
│  │ • Test failover regularly                             │     │
│  │ • Have runbooks for LB failures                       │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Interview Questions

What's the difference between L4 and L7 load balancing?
- L4: TCP/UDP level, faster, protocol agnostic
- L7: HTTP level, content-based routing, more features
Which algorithm would you use for long-running connections?
- Least connections - accounts for varying request durations
How do you handle session persistence without sticky sessions?
- External session store (Redis, Memcached)
- JWT tokens (stateless)
How do health checks prevent cascading failures?
- Remove unhealthy servers from rotation
- Prevent requests to failing backends
- Allow recovery before re-adding
Design a load balancer for a global service
- DNS-based global routing (GeoDNS)
- Regional load balancers per region
- Cross-region failover
- Consider latency vs consistency trade-offs

Summary

┌─────────────────────────────────────────────────────────────────┐
│                 LOAD BALANCING SUMMARY                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Layers:                                                        │
│  • L4: Fast, simple, TCP/UDP                                   │
│  • L7: Feature-rich, HTTP, content routing                     │
│                                                                 │
│  Algorithms:                                                    │
│  • Round Robin: Simple, equal distribution                     │
│  • Least Connections: Good for varying loads                   │
│  • P2C: Best of random + least connections                     │
│                                                                 │
│  Key Components:                                                │
│  • Health checks: Essential for reliability                    │
│  • Session handling: Prefer external stores                    │
│  • Metrics: Monitor everything                                 │
│                                                                 │
│  Key Insight:                                                   │
│  "Load balancing is the foundation of horizontal scaling.       │
│   Get it right, and everything else becomes easier."           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Next Module: Module 15 - Caching Strategies

Previous Module: Module 13 - Horizontal vs Vertical Scaling