Module 13: Horizontal vs Vertical Scaling

The Scaling Dilemma

When your system can't handle the load, you have two options: scale up or scale out.

┌─────────────────────────────────────────────────────────────────┐
│                  SCALING OPTIONS                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  VERTICAL SCALING (Scale Up)       HORIZONTAL SCALING (Scale Out)
│  ┌─────────────────────────┐       ┌─────────────────────────┐ │
│  │                         │       │ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │ │
│  │  ┌───────────────────┐  │       │ │ S │ │ S │ │ S │ │ S │ │ │
│  │  │                   │  │       │ │ 1 │ │ 2 │ │ 3 │ │ 4 │ │ │
│  │  │   BIGGER SERVER   │  │       │ └───┘ └───┘ └───┘ └───┘ │ │
│  │  │                   │  │       │                         │ │
│  │  │  • More CPU       │  │       │  • Multiple small       │ │
│  │  │  • More RAM       │  │       │    servers              │ │
│  │  │  • Faster disk    │  │       │  • Load balancer        │ │
│  │  │                   │  │       │  • Distributed state    │ │
│  │  └───────────────────┘  │       │                         │ │
│  └─────────────────────────┘       └─────────────────────────┘ │
│                                                                 │
│  Simple, but limited            Complex, but unlimited         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Vertical Scaling Deep Dive

┌─────────────────────────────────────────────────────────────────┐
│                 VERTICAL SCALING                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  What you upgrade:                                              │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ CPU:     4 cores → 16 cores → 64 cores → 128 cores   │     │
│  │ RAM:     16GB → 64GB → 256GB → 1TB → 12TB            │     │
│  │ Disk:    HDD → SSD → NVMe → NVMe RAID                │     │
│  │ Network: 1Gbps → 10Gbps → 25Gbps → 100Gbps           │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  AWS EC2 Example:                                               │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ Instance      vCPU    RAM      Cost/hr   Scale       │     │
│  │ ─────────────────────────────────────────────────────│     │
│  │ t3.medium       2      4GB     $0.04      1x         │     │
│  │ m5.xlarge       4     16GB     $0.19      4x         │     │
│  │ m5.4xlarge     16     64GB     $0.77     16x         │     │
│  │ m5.16xlarge    64    256GB     $3.07     64x         │     │
│  │ x2idn.32xl    128      2TB    $13.34    512x         │     │
│  │ u-24tb1.metal 448     24TB   $218.40   6000x        │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  Pros:                           Cons:                          │
│  • Simple to implement           • Hardware limits              │
│  • No code changes              • Single point of failure      │
│  • No distributed complexity    • Diminishing returns          │
│  • Consistent latency           • Downtime during upgrade      │
│  • Lower operational overhead   • Expensive at high end        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

When to Choose Vertical Scaling

┌─────────────────────────────────────────────────────────────────┐
│              VERTICAL SCALING USE CASES                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  GOOD FIT:                                                      │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ ✓ Databases with complex queries                      │     │
│  │   (Large working sets benefit from more RAM)          │     │
│  │                                                       │     │
│  │ ✓ In-memory caches                                    │     │
│  │   (Redis with 256GB dataset)                          │     │
│  │                                                       │     │
│  │ ✓ Single-threaded applications                        │     │
│  │   (Faster CPU helps more than more CPUs)              │     │
│  │                                                       │     │
│  │ ✓ Early-stage startups                                │     │
│  │   (Simple operations, focus on product)               │     │
│  │                                                       │     │
│  │ ✓ Batch processing jobs                               │     │
│  │   (Run faster, not parallel)                          │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  WARNING SIGNS (time to scale horizontally):                    │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ ⚠ Approaching largest instance size                  │     │
│  │ ⚠ Cost increasing faster than load                   │     │
│  │ ⚠ Can't afford downtime for upgrades                 │     │
│  │ ⚠ Single point of failure is unacceptable            │     │
│  │ ⚠ Need geographic distribution                       │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Horizontal Scaling Deep Dive

┌─────────────────────────────────────────────────────────────────┐
│                HORIZONTAL SCALING                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Architecture:                                                  │
│                                                                 │
│         ┌─────────────┐                                        │
│         │   Clients   │                                        │
│         └──────┬──────┘                                        │
│                │                                                │
│         ┌──────▼──────┐                                        │
│         │    Load     │                                        │
│         │  Balancer   │                                        │
│         └──────┬──────┘                                        │
│       ┌────────┼────────┐                                      │
│       │        │        │                                      │
│  ┌────▼───┐┌───▼───┐┌───▼───┐                                 │
│  │Server 1││Server 2││Server 3│ ← Stateless, identical        │
│  └────┬───┘└───┬───┘└───┬───┘                                 │
│       │        │        │                                      │
│       └────────┼────────┘                                      │
│         ┌──────▼──────┐                                        │
│         │   Shared    │                                        │
│         │   Storage   │ ← Database, cache, queue               │
│         └─────────────┘                                        │
│                                                                 │
│  Key Requirements:                                              │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ 1. Stateless application servers                      │     │
│  │ 2. Shared data layer                                  │     │
│  │ 3. Load balancing                                     │     │
│  │ 4. Service discovery                                  │     │
│  │ 5. Health checks                                      │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Stateless Service Design

go
package main

import (
    "context"
    "encoding/json"
    "fmt"
    "net/http"
    "os"
    "time"

    "github.com/redis/go-redis/v9"
)

// StatelessService demonstrates horizontally scalable design
type StatelessService struct {
    // No local state - everything in external stores
    redis    *redis.Client
    serverID string
}

// Session stored externally, not in memory
type Session struct {
    UserID    string    `json:"user_id"`
    CreatedAt time.Time `json:"created_at"`
    Data      map[string]interface{} `json:"data"`
}

func NewStatelessService(redisAddr string) *StatelessService {
    hostname, _ := os.Hostname()
    return &StatelessService{
        redis: redis.NewClient(&redis.Options{
            Addr: redisAddr,
        }),
        serverID: hostname,
    }
}

// GetSession - any server can handle any request
func (s *StatelessService) GetSession(ctx context.Context, sessionID string) (*Session, error) {
    // Session stored in Redis, not local memory
    data, err := s.redis.Get(ctx, "session:"+sessionID).Bytes()
    if err != nil {
        return nil, fmt.Errorf("session not found: %w", err)
    }

    var session Session
    if err := json.Unmarshal(data, &session); err != nil {
        return nil, err
    }
    return &session, nil
}

// CreateSession - creates session in shared store
func (s *StatelessService) CreateSession(ctx context.Context, userID string) (string, error) {
    sessionID := generateID()
    session := Session{
        UserID:    userID,
        CreatedAt: time.Now(),
        Data:      make(map[string]interface{}),
    }

    data, _ := json.Marshal(session)
    err := s.redis.Set(ctx, "session:"+sessionID, data, 24*time.Hour).Err()
    return sessionID, err
}

// Handler - stateless request handling
func (s *StatelessService) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()

    // Add server ID to response for debugging
    w.Header().Set("X-Server-ID", s.serverID)

    sessionID := r.Header.Get("X-Session-ID")
    if sessionID == "" {
        http.Error(w, "missing session", http.StatusUnauthorized)
        return
    }

    session, err := s.GetSession(ctx, sessionID)
    if err != nil {
        http.Error(w, "invalid session", http.StatusUnauthorized)
        return
    }

    // Process request - any server can handle it
    response := map[string]interface{}{
        "user_id":   session.UserID,
        "server":    s.serverID,
        "timestamp": time.Now(),
    }

    json.NewEncoder(w).Encode(response)
}

func generateID() string {
    return fmt.Sprintf("%d", time.Now().UnixNano())
}

Scaling Patterns Comparison

┌─────────────────────────────────────────────────────────────────┐
│                 SCALING PATTERNS                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Pattern 1: SCALE VERTICALLY FIRST                             │
│  ┌───────────────────────────────────────────────────────┐     │
│  │                                                       │     │
│  │  1K users → 10K → 100K (keep upgrading server)       │     │
│  │                                                       │     │
│  │  Then add read replicas when vertical limit reached   │     │
│  │  Then add sharding when replicas not enough           │     │
│  │                                                       │     │
│  │  ✓ Simple to operate                                  │     │
│  │  ✓ Delays distributed complexity                      │     │
│  │  ✗ May require migration later                        │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  Pattern 2: HORIZONTAL FROM START                              │
│  ┌───────────────────────────────────────────────────────┐     │
│  │                                                       │     │
│  │  Design for horizontal from day 1                     │     │
│  │  - Stateless services                                 │     │
│  │  - Partitioned data                                   │     │
│  │  - Message queues                                     │     │
│  │                                                       │     │
│  │  ✓ Ready for growth                                   │     │
│  │  ✓ High availability built in                         │     │
│  │  ✗ More complex initially                             │     │
│  │  ✗ May be over-engineered for small scale            │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  Pattern 3: HYBRID (Recommended)                               │
│  ┌───────────────────────────────────────────────────────┐     │
│  │                                                       │     │
│  │  Stateless services (horizontal) +                    │     │
│  │  Vertical database (scale up as needed) +             │     │
│  │  Add read replicas/sharding when required             │     │
│  │                                                       │     │
│  │  ✓ Best of both worlds                                │     │
│  │  ✓ Scale where needed                                 │     │
│  │  ✓ Manage complexity incrementally                    │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Scaling Different Components

┌─────────────────────────────────────────────────────────────────┐
│           COMPONENT-SPECIFIC SCALING                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  WEB SERVERS:                                                   │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ → HORIZONTAL (stateless, easy to scale)               │     │
│  │ Auto-scale based on CPU/request count                 │     │
│  │ Use container orchestration (Kubernetes)              │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  APPLICATION SERVERS:                                          │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ → HORIZONTAL (if stateless)                           │     │
│  │ Store sessions in Redis/Memcached                     │     │
│  │ Use sticky sessions only if absolutely necessary      │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  DATABASES:                                                     │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ → VERTICAL first, then:                               │     │
│  │   - Read replicas for read scaling                    │     │
│  │   - Sharding for write scaling                        │     │
│  │   - Caching layer to reduce load                      │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  CACHES:                                                        │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ → HORIZONTAL with clustering                          │     │
│  │ Redis Cluster, Memcached pool                         │     │
│  │ Consistent hashing for key distribution               │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  QUEUES:                                                        │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ → HORIZONTAL (partitions)                             │     │
│  │ Kafka partitions, SQS (managed)                       │     │
│  │ Scale consumers with partitions                       │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  FILE STORAGE:                                                  │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ → Use cloud storage (S3, GCS)                         │     │
│  │ Already horizontally scaled                           │     │
│  │ CDN for distribution                                  │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Auto-Scaling

┌─────────────────────────────────────────────────────────────────┐
│                    AUTO-SCALING                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Scaling Triggers:                                              │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ METRIC-BASED                                          │     │
│  │ • CPU utilization > 70%                               │     │
│  │ • Memory utilization > 80%                            │     │
│  │ • Request latency > 200ms                             │     │
│  │ • Queue depth > 1000                                  │     │
│  │ • Custom metrics (business-specific)                  │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ SCHEDULE-BASED                                        │     │
│  │ • Scale up at 9 AM, down at 6 PM                     │     │
│  │ • More capacity on weekends                           │     │
│  │ • Prepare for known events (Black Friday)             │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ PREDICTIVE                                            │     │
│  │ • ML-based traffic prediction                         │     │
│  │ • Scale before the spike hits                         │     │
│  │ • Learn from historical patterns                      │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  Auto-scaling Timeline:                                         │
│                                                                 │
│  Load ↑                                                        │
│    │         ┌──────────┐                                      │
│    │    ╱╲   │ Scale Up │   ╱╲                                │
│    │   ╱  ╲  │ Triggered│  ╱  ╲                               │
│    │  ╱    ╲─┼──────────┼─╱    ╲                              │
│    │ ╱      ╲│  NEW     │╱      ╲                             │
│    │╱        │ INSTANCE │        ╲                             │
│    └─────────┴──────────┴──────────────► Time                  │
│               │← Warm-up→│                                     │
│                  time                                          │
│                                                                 │
│  Key Parameters:                                                │
│  • Cooldown period (prevent thrashing)                         │
│  • Min/max instances                                            │
│  • Scale-up vs scale-down thresholds                           │
│  • Warm-up time for new instances                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Auto-Scaling Implementation

go
package autoscale

import (
    "context"
    "log"
    "sync"
    "time"
)

// AutoScaler manages instance scaling
type AutoScaler struct {
    mu sync.Mutex

    config   ScaleConfig
    metrics  MetricsCollector
    scaler   InstanceScaler

    currentCount int
    lastScaleTime time.Time
}

// ScaleConfig defines scaling behavior
type ScaleConfig struct {
    MinInstances    int
    MaxInstances    int

    ScaleUpThreshold   float64 // CPU > this % triggers scale up
    ScaleDownThreshold float64 // CPU < this % triggers scale down

    CooldownPeriod time.Duration // Min time between scaling actions

    ScaleUpStep   int // How many instances to add
    ScaleDownStep int // How many instances to remove
}

// MetricsCollector provides current metrics
type MetricsCollector interface {
    GetAverageCPU(ctx context.Context) (float64, error)
    GetAverageMemory(ctx context.Context) (float64, error)
    GetRequestLatencyP99(ctx context.Context) (time.Duration, error)
}

// InstanceScaler manages actual instances
type InstanceScaler interface {
    GetCurrentCount(ctx context.Context) (int, error)
    ScaleTo(ctx context.Context, count int) error
}

func NewAutoScaler(config ScaleConfig, metrics MetricsCollector, scaler InstanceScaler) *AutoScaler {
    return &AutoScaler{
        config:   config,
        metrics:  metrics,
        scaler:   scaler,
    }
}

// Run starts the auto-scaling loop
func (a *AutoScaler) Run(ctx context.Context) {
    ticker := time.NewTicker(30 * time.Second)
    defer ticker.Stop()

    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            if err := a.evaluate(ctx); err != nil {
                log.Printf("auto-scale evaluation error: %v", err)
            }
        }
    }
}

func (a *AutoScaler) evaluate(ctx context.Context) error {
    a.mu.Lock()
    defer a.mu.Unlock()

    // Check cooldown
    if time.Since(a.lastScaleTime) < a.config.CooldownPeriod {
        return nil
    }

    // Get current state
    currentCount, err := a.scaler.GetCurrentCount(ctx)
    if err != nil {
        return err
    }
    a.currentCount = currentCount

    // Get metrics
    cpu, err := a.metrics.GetAverageCPU(ctx)
    if err != nil {
        return err
    }

    // Determine scaling action
    var targetCount int
    switch {
    case cpu > a.config.ScaleUpThreshold:
        targetCount = min(currentCount+a.config.ScaleUpStep, a.config.MaxInstances)
        if targetCount > currentCount {
            log.Printf("Scaling UP: CPU=%.1f%%, %d -> %d instances",
                cpu, currentCount, targetCount)
        }

    case cpu < a.config.ScaleDownThreshold:
        targetCount = max(currentCount-a.config.ScaleDownStep, a.config.MinInstances)
        if targetCount < currentCount {
            log.Printf("Scaling DOWN: CPU=%.1f%%, %d -> %d instances",
                cpu, currentCount, targetCount)
        }

    default:
        return nil // No action needed
    }

    // Execute scaling
    if targetCount != currentCount {
        if err := a.scaler.ScaleTo(ctx, targetCount); err != nil {
            return err
        }
        a.lastScaleTime = time.Now()
        a.currentCount = targetCount
    }

    return nil
}

func min(a, b int) int {
    if a < b {
        return a
    }
    return b
}

func max(a, b int) int {
    if a > b {
        return a
    }
    return b
}

// Advanced: Multi-metric scaling
type MultiMetricScaler struct {
    *AutoScaler
    rules []ScaleRule
}

type ScaleRule struct {
    Name      string
    Metric    string
    Threshold float64
    Direction string // "up" or "down"
    Weight    float64 // Importance of this rule
}

func (m *MultiMetricScaler) evaluateRules(ctx context.Context) (string, error) {
    var upVotes, downVotes float64

    for _, rule := range m.rules {
        value, err := m.getMetric(ctx, rule.Metric)
        if err != nil {
            continue
        }

        triggered := false
        if rule.Direction == "up" {
            triggered = value > rule.Threshold
        } else {
            triggered = value < rule.Threshold
        }

        if triggered {
            if rule.Direction == "up" {
                upVotes += rule.Weight
            } else {
                downVotes += rule.Weight
            }
        }
    }

    if upVotes > downVotes && upVotes > 0.5 {
        return "up", nil
    }
    if downVotes > upVotes && downVotes > 0.5 {
        return "down", nil
    }
    return "none", nil
}

func (m *MultiMetricScaler) getMetric(ctx context.Context, name string) (float64, error) {
    switch name {
    case "cpu":
        return m.metrics.GetAverageCPU(ctx)
    case "memory":
        return m.metrics.GetAverageMemory(ctx)
    default:
        return 0, nil
    }
}

Database Scaling Strategies

┌─────────────────────────────────────────────────────────────────┐
│               DATABASE SCALING EVOLUTION                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Stage 1: SINGLE DATABASE                                       │
│  ┌─────────────────────────────────────────┐                   │
│  │         ┌──────────────┐                │                   │
│  │         │   App Server │                │                   │
│  │         └──────┬───────┘                │                   │
│  │                │                         │                   │
│  │         ┌──────▼───────┐                │                   │
│  │         │   Database   │                │                   │
│  │         │  (Primary)   │                │                   │
│  │         └──────────────┘                │                   │
│  │                                         │                   │
│  │  Limit: ~10K queries/sec               │                   │
│  └─────────────────────────────────────────┘                   │
│                                                                 │
│  Stage 2: READ REPLICAS                                        │
│  ┌─────────────────────────────────────────┐                   │
│  │         ┌──────────────┐                │                   │
│  │         │   App Server │                │                   │
│  │         └──────┬───────┘                │                   │
│  │        Writes  │    Reads               │                   │
│  │           ┌────┴────┐                   │                   │
│  │           │         │                   │                   │
│  │    ┌──────▼───┐ ┌───▼───┐ ┌───▼───┐   │                   │
│  │    │ Primary  │ │Replica│ │Replica│   │                   │
│  │    │ (Write)  │ │ (Read)│ │ (Read)│   │                   │
│  │    └──────────┘ └───────┘ └───────┘   │                   │
│  │                                        │                   │
│  │  Limit: Read unlimited, Write ~10K/sec │                   │
│  └─────────────────────────────────────────┘                   │
│                                                                 │
│  Stage 3: SHARDING                                             │
│  ┌─────────────────────────────────────────┐                   │
│  │         ┌──────────────┐                │                   │
│  │         │ Shard Router │                │                   │
│  │         └──────┬───────┘                │                   │
│  │       ┌────────┼────────┐               │                   │
│  │       │        │        │               │                   │
│  │  ┌────▼───┐┌───▼───┐┌───▼───┐          │                   │
│  │  │Shard 1 ││Shard 2││Shard 3│          │                   │
│  │  │users   ││users  ││users  │          │                   │
│  │  │A-H     ││I-P    ││Q-Z    │          │                   │
│  │  └────────┘└───────┘└───────┘          │                   │
│  │                                         │                   │
│  │  Limit: Scales linearly with shards    │                   │
│  └─────────────────────────────────────────┘                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Read Replica Router

go
package db

import (
    "context"
    "database/sql"
    "sync/atomic"
)

// ReplicaRouter routes reads to replicas, writes to primary
type ReplicaRouter struct {
    primary  *sql.DB
    replicas []*sql.DB
    current  uint64 // for round-robin
}

func NewReplicaRouter(primaryDSN string, replicaDSNs []string) (*ReplicaRouter, error) {
    primary, err := sql.Open("postgres", primaryDSN)
    if err != nil {
        return nil, err
    }

    var replicas []*sql.DB
    for _, dsn := range replicaDSNs {
        replica, err := sql.Open("postgres", dsn)
        if err != nil {
            return nil, err
        }
        replicas = append(replicas, replica)
    }

    return &ReplicaRouter{
        primary:  primary,
        replicas: replicas,
    }, nil
}

// Read routes to a replica (round-robin)
func (r *ReplicaRouter) Read() *sql.DB {
    if len(r.replicas) == 0 {
        return r.primary
    }

    idx := atomic.AddUint64(&r.current, 1)
    return r.replicas[idx%uint64(len(r.replicas))]
}

// Write always goes to primary
func (r *ReplicaRouter) Write() *sql.DB {
    return r.primary
}

// QueryRead executes a read query on a replica
func (r *ReplicaRouter) QueryRead(ctx context.Context, query string, args ...interface{}) (*sql.Rows, error) {
    return r.Read().QueryContext(ctx, query, args...)
}

// ExecWrite executes a write on primary
func (r *ReplicaRouter) ExecWrite(ctx context.Context, query string, args ...interface{}) (sql.Result, error) {
    return r.Write().ExecContext(ctx, query, args...)
}

// ReadAfterWrite - read from primary when consistency needed
func (r *ReplicaRouter) ReadAfterWrite(ctx context.Context, query string, args ...interface{}) (*sql.Rows, error) {
    // Use primary for read-after-write consistency
    return r.Write().QueryContext(ctx, query, args...)
}

Cost Analysis

┌─────────────────────────────────────────────────────────────────┐
│                    COST COMPARISON                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Scenario: 10,000 requests/second                              │
│                                                                 │
│  VERTICAL:                                                      │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ 1x m5.16xlarge (64 vCPU, 256GB)                       │     │
│  │ Cost: ~$3.07/hr = $2,210/month                        │     │
│  │                                                       │     │
│  │ Pros: Simple, no distribution overhead                │     │
│  │ Cons: Single point of failure                         │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  HORIZONTAL:                                                    │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ 16x m5.xlarge (4 vCPU, 16GB each)                     │     │
│  │ Cost: 16 × $0.19/hr = $3.04/hr = $2,189/month         │     │
│  │                                                       │     │
│  │ + Load balancer: ~$20/month                           │     │
│  │ + Coordination overhead: ~10% more instances          │     │
│  │                                                       │     │
│  │ Total: ~$2,400/month                                  │     │
│  │                                                       │     │
│  │ Pros: Fault tolerant, can scale further               │     │
│  │ Cons: ~10% more expensive                             │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  Break-even point varies by workload:                          │
│  • CPU-bound: Vertical often cheaper                           │
│  • I/O-bound: Horizontal often better                          │
│  • Stateless: Horizontal preferred                             │
│  • Stateful: Vertical simpler                                  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Best Practices

┌─────────────────────────────────────────────────────────────────┐
│                   SCALING BEST PRACTICES                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. DESIGN FOR HORIZONTAL, DEPLOY VERTICAL FIRST               │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ • Write stateless services from day 1                 │     │
│  │ • Use external state stores (Redis, DB)               │     │
│  │ • Start with single large server                      │     │
│  │ • Scale horizontally when needed                      │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  2. SCALE THE BOTTLENECK                                       │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ • Profile before scaling                              │     │
│  │ • Identify actual bottleneck                          │     │
│  │ • Scale that component specifically                   │     │
│  │ • Don't scale everything blindly                      │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  3. AUTO-SCALE WITH SAFETY                                     │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ • Set min/max bounds                                  │     │
│  │ • Use cooldown periods                                │     │
│  │ • Alert on scaling events                             │     │
│  │ • Test scale-down behavior                            │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  4. CONSIDER THE FULL STACK                                    │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ • Web tier: Horizontal                                │     │
│  │ • App tier: Horizontal                                │     │
│  │ • Cache tier: Horizontal with clustering              │     │
│  │ • DB tier: Vertical then horizontal                   │     │
│  │ • Each tier may scale differently                     │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  5. CAPACITY PLANNING                                          │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ • Know your limits before you hit them                │     │
│  │ • Load test regularly                                 │     │
│  │ • Plan for 2-3x current peak                          │     │
│  │ • Have a scaling runbook ready                        │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Interview Questions

When would you choose vertical over horizontal scaling?
- Single-threaded workloads
- Small team/simple operations
- Database before read replicas
- When still fitting in a single machine
What makes a service horizontally scalable?
- Stateless design
- External session storage
- No local file dependencies
- Idempotent operations
How do you handle database scaling?
- Vertical first
- Read replicas for read scaling
- Caching layer
- Sharding for write scaling
What are the challenges of auto-scaling?
- Warm-up time
- Thrashing prevention
- State synchronization
- Connection pool limits
Design an auto-scaling strategy for an e-commerce site during Black Friday
- Pre-scale based on predictions
- Aggressive scale-up, slow scale-down
- Scale per tier independently
- Have capacity headroom

Summary

┌─────────────────────────────────────────────────────────────────┐
│                      SCALING SUMMARY                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  VERTICAL:                    HORIZONTAL:                       │
│  • Bigger server              • More servers                    │
│  • Simpler                    • Complex                         │
│  • Limited ceiling            • No ceiling                      │
│  • Single point of failure    • Fault tolerant                 │
│                                                                 │
│  Scaling Journey:                                               │
│  ┌───────────────────────────────────────────────────────┐     │
│  │ 1. Single server (vertical)                           │     │
│  │ 2. Add caching                                        │     │
│  │ 3. Read replicas                                      │     │
│  │ 4. Horizontal app servers                             │     │
│  │ 5. Database sharding                                  │     │
│  │ 6. Multi-region                                       │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                 │
│  Key Insight:                                                   │
│  "Scale when needed, not before. But design so you can."       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Next Module: Module 14 - Load Balancing

Previous Module: Module 12 - Conflict Resolution