Module 13: Horizontal vs Vertical Scaling

The Scaling Dilemma

When your system can't handle the load, you have two options: scale up or scale out.
┌─────────────────────────────────────────────────────────────────┐ │ SCALING OPTIONS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ VERTICAL SCALING (Scale Up) HORIZONTAL SCALING (Scale Out) │ ┌─────────────────────────┐ ┌─────────────────────────┐ │ │ │ │ │ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │ │ │ │ ┌───────────────────┐ │ │ │ S │ │ S │ │ S │ │ S │ │ │ │ │ │ │ │ │ │ 1 │ │ 2 │ │ 3 │ │ 4 │ │ │ │ │ │ BIGGER SERVER │ │ │ └───┘ └───┘ └───┘ └───┘ │ │ │ │ │ │ │ │ │ │ │ │ │ • More CPU │ │ │ • Multiple small │ │ │ │ │ • More RAM │ │ │ servers │ │ │ │ │ • Faster disk │ │ │ • Load balancer │ │ │ │ │ │ │ │ • Distributed state │ │ │ │ └───────────────────┘ │ │ │ │ │ └─────────────────────────┘ └─────────────────────────┘ │ │ │ │ Simple, but limited Complex, but unlimited │ │ │ └─────────────────────────────────────────────────────────────────┘

Vertical Scaling Deep Dive

┌─────────────────────────────────────────────────────────────────┐ │ VERTICAL SCALING │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ What you upgrade: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ CPU: 4 cores → 16 cores → 64 cores → 128 cores │ │ │ │ RAM: 16GB → 64GB → 256GB → 1TB → 12TB │ │ │ │ Disk: HDD → SSD → NVMe → NVMe RAID │ │ │ │ Network: 1Gbps → 10Gbps → 25Gbps → 100Gbps │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ AWS EC2 Example: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Instance vCPU RAM Cost/hr Scale │ │ │ │ ─────────────────────────────────────────────────────│ │ │ │ t3.medium 2 4GB $0.04 1x │ │ │ │ m5.xlarge 4 16GB $0.19 4x │ │ │ │ m5.4xlarge 16 64GB $0.77 16x │ │ │ │ m5.16xlarge 64 256GB $3.07 64x │ │ │ │ x2idn.32xl 128 2TB $13.34 512x │ │ │ │ u-24tb1.metal 448 24TB $218.40 6000x │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Pros: Cons: │ │ • Simple to implement • Hardware limits │ │ • No code changes • Single point of failure │ │ • No distributed complexity • Diminishing returns │ │ • Consistent latency • Downtime during upgrade │ │ • Lower operational overhead • Expensive at high end │ │ │ └─────────────────────────────────────────────────────────────────┘

When to Choose Vertical Scaling

┌─────────────────────────────────────────────────────────────────┐ │ VERTICAL SCALING USE CASES │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ GOOD FIT: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ ✓ Databases with complex queries │ │ │ │ (Large working sets benefit from more RAM) │ │ │ │ │ │ │ │ ✓ In-memory caches │ │ │ │ (Redis with 256GB dataset) │ │ │ │ │ │ │ │ ✓ Single-threaded applications │ │ │ │ (Faster CPU helps more than more CPUs) │ │ │ │ │ │ │ │ ✓ Early-stage startups │ │ │ │ (Simple operations, focus on product) │ │ │ │ │ │ │ │ ✓ Batch processing jobs │ │ │ │ (Run faster, not parallel) │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ WARNING SIGNS (time to scale horizontally): │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ ⚠ Approaching largest instance size │ │ │ │ ⚠ Cost increasing faster than load │ │ │ │ ⚠ Can't afford downtime for upgrades │ │ │ │ ⚠ Single point of failure is unacceptable │ │ │ │ ⚠ Need geographic distribution │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Horizontal Scaling Deep Dive

┌─────────────────────────────────────────────────────────────────┐ │ HORIZONTAL SCALING │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Architecture: │ │ │ │ ┌─────────────┐ │ │ │ Clients │ │ │ └──────┬──────┘ │ │ │ │ │ ┌──────▼──────┐ │ │ │ Load │ │ │ │ Balancer │ │ │ └──────┬──────┘ │ │ ┌────────┼────────┐ │ │ │ │ │ │ │ ┌────▼───┐┌───▼───┐┌───▼───┐ │ │ │Server 1││Server 2││Server 3│ ← Stateless, identical │ │ └────┬───┘└───┬───┘└───┬───┘ │ │ │ │ │ │ │ └────────┼────────┘ │ │ ┌──────▼──────┐ │ │ │ Shared │ │ │ │ Storage │ ← Database, cache, queue │ │ └─────────────┘ │ │ │ │ Key Requirements: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ 1. Stateless application servers │ │ │ │ 2. Shared data layer │ │ │ │ 3. Load balancing │ │ │ │ 4. Service discovery │ │ │ │ 5. Health checks │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Stateless Service Design

go
package main import ( "context" "encoding/json" "fmt" "net/http" "os" "time" "github.com/redis/go-redis/v9" ) // StatelessService demonstrates horizontally scalable design type StatelessService struct { // No local state - everything in external stores redis *redis.Client serverID string } // Session stored externally, not in memory type Session struct { UserID string `json:"user_id"` CreatedAt time.Time `json:"created_at"` Data map[string]interface{} `json:"data"` } func NewStatelessService(redisAddr string) *StatelessService { hostname, _ := os.Hostname() return &StatelessService{ redis: redis.NewClient(&redis.Options{ Addr: redisAddr, }), serverID: hostname, } } // GetSession - any server can handle any request func (s *StatelessService) GetSession(ctx context.Context, sessionID string) (*Session, error) { // Session stored in Redis, not local memory data, err := s.redis.Get(ctx, "session:"+sessionID).Bytes() if err != nil { return nil, fmt.Errorf("session not found: %w", err) } var session Session if err := json.Unmarshal(data, &session); err != nil { return nil, err } return &session, nil } // CreateSession - creates session in shared store func (s *StatelessService) CreateSession(ctx context.Context, userID string) (string, error) { sessionID := generateID() session := Session{ UserID: userID, CreatedAt: time.Now(), Data: make(map[string]interface{}), } data, _ := json.Marshal(session) err := s.redis.Set(ctx, "session:"+sessionID, data, 24*time.Hour).Err() return sessionID, err } // Handler - stateless request handling func (s *StatelessService) ServeHTTP(w http.ResponseWriter, r *http.Request) { ctx := r.Context() // Add server ID to response for debugging w.Header().Set("X-Server-ID", s.serverID) sessionID := r.Header.Get("X-Session-ID") if sessionID == "" { http.Error(w, "missing session", http.StatusUnauthorized) return } session, err := s.GetSession(ctx, sessionID) if err != nil { http.Error(w, "invalid session", http.StatusUnauthorized) return } // Process request - any server can handle it response := map[string]interface{}{ "user_id": session.UserID, "server": s.serverID, "timestamp": time.Now(), } json.NewEncoder(w).Encode(response) } func generateID() string { return fmt.Sprintf("%d", time.Now().UnixNano()) }

Scaling Patterns Comparison

┌─────────────────────────────────────────────────────────────────┐ │ SCALING PATTERNS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Pattern 1: SCALE VERTICALLY FIRST │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ 1K users → 10K → 100K (keep upgrading server) │ │ │ │ │ │ │ │ Then add read replicas when vertical limit reached │ │ │ │ Then add sharding when replicas not enough │ │ │ │ │ │ │ │ ✓ Simple to operate │ │ │ │ ✓ Delays distributed complexity │ │ │ │ ✗ May require migration later │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Pattern 2: HORIZONTAL FROM START │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Design for horizontal from day 1 │ │ │ │ - Stateless services │ │ │ │ - Partitioned data │ │ │ │ - Message queues │ │ │ │ │ │ │ │ ✓ Ready for growth │ │ │ │ ✓ High availability built in │ │ │ │ ✗ More complex initially │ │ │ │ ✗ May be over-engineered for small scale │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Pattern 3: HYBRID (Recommended) │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Stateless services (horizontal) + │ │ │ │ Vertical database (scale up as needed) + │ │ │ │ Add read replicas/sharding when required │ │ │ │ │ │ │ │ ✓ Best of both worlds │ │ │ │ ✓ Scale where needed │ │ │ │ ✓ Manage complexity incrementally │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Scaling Different Components

┌─────────────────────────────────────────────────────────────────┐ │ COMPONENT-SPECIFIC SCALING │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ WEB SERVERS: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ → HORIZONTAL (stateless, easy to scale) │ │ │ │ Auto-scale based on CPU/request count │ │ │ │ Use container orchestration (Kubernetes) │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ APPLICATION SERVERS: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ → HORIZONTAL (if stateless) │ │ │ │ Store sessions in Redis/Memcached │ │ │ │ Use sticky sessions only if absolutely necessary │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ DATABASES: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ → VERTICAL first, then: │ │ │ │ - Read replicas for read scaling │ │ │ │ - Sharding for write scaling │ │ │ │ - Caching layer to reduce load │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ CACHES: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ → HORIZONTAL with clustering │ │ │ │ Redis Cluster, Memcached pool │ │ │ │ Consistent hashing for key distribution │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ QUEUES: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ → HORIZONTAL (partitions) │ │ │ │ Kafka partitions, SQS (managed) │ │ │ │ Scale consumers with partitions │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ FILE STORAGE: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ → Use cloud storage (S3, GCS) │ │ │ │ Already horizontally scaled │ │ │ │ CDN for distribution │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Auto-Scaling

┌─────────────────────────────────────────────────────────────────┐ │ AUTO-SCALING │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Scaling Triggers: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ METRIC-BASED │ │ │ │ • CPU utilization > 70% │ │ │ │ • Memory utilization > 80% │ │ │ │ • Request latency > 200ms │ │ │ │ • Queue depth > 1000 │ │ │ │ • Custom metrics (business-specific) │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ SCHEDULE-BASED │ │ │ │ • Scale up at 9 AM, down at 6 PM │ │ │ │ • More capacity on weekends │ │ │ │ • Prepare for known events (Black Friday) │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ PREDICTIVE │ │ │ │ • ML-based traffic prediction │ │ │ │ • Scale before the spike hits │ │ │ │ • Learn from historical patterns │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Auto-scaling Timeline: │ │ │ │ Load ↑ │ │ │ ┌──────────┐ │ │ │ ╱╲ │ Scale Up │ ╱╲ │ │ │ ╱ ╲ │ Triggered│ ╱ ╲ │ │ │ ╱ ╲─┼──────────┼─╱ ╲ │ │ │ ╱ ╲│ NEW │╱ ╲ │ │ │╱ │ INSTANCE │ ╲ │ │ └─────────┴──────────┴──────────────► Time │ │ │← Warm-up→│ │ │ time │ │ │ │ Key Parameters: │ │ • Cooldown period (prevent thrashing) │ │ • Min/max instances │ │ • Scale-up vs scale-down thresholds │ │ • Warm-up time for new instances │ │ │ └─────────────────────────────────────────────────────────────────┘

Auto-Scaling Implementation

go
package autoscale import ( "context" "log" "sync" "time" ) // AutoScaler manages instance scaling type AutoScaler struct { mu sync.Mutex config ScaleConfig metrics MetricsCollector scaler InstanceScaler currentCount int lastScaleTime time.Time } // ScaleConfig defines scaling behavior type ScaleConfig struct { MinInstances int MaxInstances int ScaleUpThreshold float64 // CPU > this % triggers scale up ScaleDownThreshold float64 // CPU < this % triggers scale down CooldownPeriod time.Duration // Min time between scaling actions ScaleUpStep int // How many instances to add ScaleDownStep int // How many instances to remove } // MetricsCollector provides current metrics type MetricsCollector interface { GetAverageCPU(ctx context.Context) (float64, error) GetAverageMemory(ctx context.Context) (float64, error) GetRequestLatencyP99(ctx context.Context) (time.Duration, error) } // InstanceScaler manages actual instances type InstanceScaler interface { GetCurrentCount(ctx context.Context) (int, error) ScaleTo(ctx context.Context, count int) error } func NewAutoScaler(config ScaleConfig, metrics MetricsCollector, scaler InstanceScaler) *AutoScaler { return &AutoScaler{ config: config, metrics: metrics, scaler: scaler, } } // Run starts the auto-scaling loop func (a *AutoScaler) Run(ctx context.Context) { ticker := time.NewTicker(30 * time.Second) defer ticker.Stop() for { select { case <-ctx.Done(): return case <-ticker.C: if err := a.evaluate(ctx); err != nil { log.Printf("auto-scale evaluation error: %v", err) } } } } func (a *AutoScaler) evaluate(ctx context.Context) error { a.mu.Lock() defer a.mu.Unlock() // Check cooldown if time.Since(a.lastScaleTime) < a.config.CooldownPeriod { return nil } // Get current state currentCount, err := a.scaler.GetCurrentCount(ctx) if err != nil { return err } a.currentCount = currentCount // Get metrics cpu, err := a.metrics.GetAverageCPU(ctx) if err != nil { return err } // Determine scaling action var targetCount int switch { case cpu > a.config.ScaleUpThreshold: targetCount = min(currentCount+a.config.ScaleUpStep, a.config.MaxInstances) if targetCount > currentCount { log.Printf("Scaling UP: CPU=%.1f%%, %d -> %d instances", cpu, currentCount, targetCount) } case cpu < a.config.ScaleDownThreshold: targetCount = max(currentCount-a.config.ScaleDownStep, a.config.MinInstances) if targetCount < currentCount { log.Printf("Scaling DOWN: CPU=%.1f%%, %d -> %d instances", cpu, currentCount, targetCount) } default: return nil // No action needed } // Execute scaling if targetCount != currentCount { if err := a.scaler.ScaleTo(ctx, targetCount); err != nil { return err } a.lastScaleTime = time.Now() a.currentCount = targetCount } return nil } func min(a, b int) int { if a < b { return a } return b } func max(a, b int) int { if a > b { return a } return b } // Advanced: Multi-metric scaling type MultiMetricScaler struct { *AutoScaler rules []ScaleRule } type ScaleRule struct { Name string Metric string Threshold float64 Direction string // "up" or "down" Weight float64 // Importance of this rule } func (m *MultiMetricScaler) evaluateRules(ctx context.Context) (string, error) { var upVotes, downVotes float64 for _, rule := range m.rules { value, err := m.getMetric(ctx, rule.Metric) if err != nil { continue } triggered := false if rule.Direction == "up" { triggered = value > rule.Threshold } else { triggered = value < rule.Threshold } if triggered { if rule.Direction == "up" { upVotes += rule.Weight } else { downVotes += rule.Weight } } } if upVotes > downVotes && upVotes > 0.5 { return "up", nil } if downVotes > upVotes && downVotes > 0.5 { return "down", nil } return "none", nil } func (m *MultiMetricScaler) getMetric(ctx context.Context, name string) (float64, error) { switch name { case "cpu": return m.metrics.GetAverageCPU(ctx) case "memory": return m.metrics.GetAverageMemory(ctx) default: return 0, nil } }

Database Scaling Strategies

┌─────────────────────────────────────────────────────────────────┐ │ DATABASE SCALING EVOLUTION │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Stage 1: SINGLE DATABASE │ │ ┌─────────────────────────────────────────┐ │ │ │ ┌──────────────┐ │ │ │ │ │ App Server │ │ │ │ │ └──────┬───────┘ │ │ │ │ │ │ │ │ │ ┌──────▼───────┐ │ │ │ │ │ Database │ │ │ │ │ │ (Primary) │ │ │ │ │ └──────────────┘ │ │ │ │ │ │ │ │ Limit: ~10K queries/sec │ │ │ └─────────────────────────────────────────┘ │ │ │ │ Stage 2: READ REPLICAS │ │ ┌─────────────────────────────────────────┐ │ │ │ ┌──────────────┐ │ │ │ │ │ App Server │ │ │ │ │ └──────┬───────┘ │ │ │ │ Writes │ Reads │ │ │ │ ┌────┴────┐ │ │ │ │ │ │ │ │ │ │ ┌──────▼───┐ ┌───▼───┐ ┌───▼───┐ │ │ │ │ │ Primary │ │Replica│ │Replica│ │ │ │ │ │ (Write) │ │ (Read)│ │ (Read)│ │ │ │ │ └──────────┘ └───────┘ └───────┘ │ │ │ │ │ │ │ │ Limit: Read unlimited, Write ~10K/sec │ │ │ └─────────────────────────────────────────┘ │ │ │ │ Stage 3: SHARDING │ │ ┌─────────────────────────────────────────┐ │ │ │ ┌──────────────┐ │ │ │ │ │ Shard Router │ │ │ │ │ └──────┬───────┘ │ │ │ │ ┌────────┼────────┐ │ │ │ │ │ │ │ │ │ │ │ ┌────▼───┐┌───▼───┐┌───▼───┐ │ │ │ │ │Shard 1 ││Shard 2││Shard 3│ │ │ │ │ │users ││users ││users │ │ │ │ │ │A-H ││I-P ││Q-Z │ │ │ │ │ └────────┘└───────┘└───────┘ │ │ │ │ │ │ │ │ Limit: Scales linearly with shards │ │ │ └─────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Read Replica Router

go
package db import ( "context" "database/sql" "sync/atomic" ) // ReplicaRouter routes reads to replicas, writes to primary type ReplicaRouter struct { primary *sql.DB replicas []*sql.DB current uint64 // for round-robin } func NewReplicaRouter(primaryDSN string, replicaDSNs []string) (*ReplicaRouter, error) { primary, err := sql.Open("postgres", primaryDSN) if err != nil { return nil, err } var replicas []*sql.DB for _, dsn := range replicaDSNs { replica, err := sql.Open("postgres", dsn) if err != nil { return nil, err } replicas = append(replicas, replica) } return &ReplicaRouter{ primary: primary, replicas: replicas, }, nil } // Read routes to a replica (round-robin) func (r *ReplicaRouter) Read() *sql.DB { if len(r.replicas) == 0 { return r.primary } idx := atomic.AddUint64(&r.current, 1) return r.replicas[idx%uint64(len(r.replicas))] } // Write always goes to primary func (r *ReplicaRouter) Write() *sql.DB { return r.primary } // QueryRead executes a read query on a replica func (r *ReplicaRouter) QueryRead(ctx context.Context, query string, args ...interface{}) (*sql.Rows, error) { return r.Read().QueryContext(ctx, query, args...) } // ExecWrite executes a write on primary func (r *ReplicaRouter) ExecWrite(ctx context.Context, query string, args ...interface{}) (sql.Result, error) { return r.Write().ExecContext(ctx, query, args...) } // ReadAfterWrite - read from primary when consistency needed func (r *ReplicaRouter) ReadAfterWrite(ctx context.Context, query string, args ...interface{}) (*sql.Rows, error) { // Use primary for read-after-write consistency return r.Write().QueryContext(ctx, query, args...) }

Cost Analysis

┌─────────────────────────────────────────────────────────────────┐ │ COST COMPARISON │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Scenario: 10,000 requests/second │ │ │ │ VERTICAL: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ 1x m5.16xlarge (64 vCPU, 256GB) │ │ │ │ Cost: ~$3.07/hr = $2,210/month │ │ │ │ │ │ │ │ Pros: Simple, no distribution overhead │ │ │ │ Cons: Single point of failure │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ HORIZONTAL: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ 16x m5.xlarge (4 vCPU, 16GB each) │ │ │ │ Cost: 16 × $0.19/hr = $3.04/hr = $2,189/month │ │ │ │ │ │ │ │ + Load balancer: ~$20/month │ │ │ │ + Coordination overhead: ~10% more instances │ │ │ │ │ │ │ │ Total: ~$2,400/month │ │ │ │ │ │ │ │ Pros: Fault tolerant, can scale further │ │ │ │ Cons: ~10% more expensive │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Break-even point varies by workload: │ │ • CPU-bound: Vertical often cheaper │ │ • I/O-bound: Horizontal often better │ │ • Stateless: Horizontal preferred │ │ • Stateful: Vertical simpler │ │ │ └─────────────────────────────────────────────────────────────────┘

Best Practices

┌─────────────────────────────────────────────────────────────────┐ │ SCALING BEST PRACTICES │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 1. DESIGN FOR HORIZONTAL, DEPLOY VERTICAL FIRST │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Write stateless services from day 1 │ │ │ │ • Use external state stores (Redis, DB) │ │ │ │ • Start with single large server │ │ │ │ • Scale horizontally when needed │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 2. SCALE THE BOTTLENECK │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Profile before scaling │ │ │ │ • Identify actual bottleneck │ │ │ │ • Scale that component specifically │ │ │ │ • Don't scale everything blindly │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 3. AUTO-SCALE WITH SAFETY │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Set min/max bounds │ │ │ │ • Use cooldown periods │ │ │ │ • Alert on scaling events │ │ │ │ • Test scale-down behavior │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 4. CONSIDER THE FULL STACK │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Web tier: Horizontal │ │ │ │ • App tier: Horizontal │ │ │ │ • Cache tier: Horizontal with clustering │ │ │ │ • DB tier: Vertical then horizontal │ │ │ │ • Each tier may scale differently │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 5. CAPACITY PLANNING │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Know your limits before you hit them │ │ │ │ • Load test regularly │ │ │ │ • Plan for 2-3x current peak │ │ │ │ • Have a scaling runbook ready │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Interview Questions

  1. When would you choose vertical over horizontal scaling?
    • Single-threaded workloads
    • Small team/simple operations
    • Database before read replicas
    • When still fitting in a single machine
  2. What makes a service horizontally scalable?
    • Stateless design
    • External session storage
    • No local file dependencies
    • Idempotent operations
  3. How do you handle database scaling?
    • Vertical first
    • Read replicas for read scaling
    • Caching layer
    • Sharding for write scaling
  4. What are the challenges of auto-scaling?
    • Warm-up time
    • Thrashing prevention
    • State synchronization
    • Connection pool limits
  5. Design an auto-scaling strategy for an e-commerce site during Black Friday
    • Pre-scale based on predictions
    • Aggressive scale-up, slow scale-down
    • Scale per tier independently
    • Have capacity headroom

Summary

┌─────────────────────────────────────────────────────────────────┐ │ SCALING SUMMARY │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ VERTICAL: HORIZONTAL: │ │ • Bigger server • More servers │ │ • Simpler • Complex │ │ • Limited ceiling • No ceiling │ │ • Single point of failure • Fault tolerant │ │ │ │ Scaling Journey: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ 1. Single server (vertical) │ │ │ │ 2. Add caching │ │ │ │ 3. Read replicas │ │ │ │ 4. Horizontal app servers │ │ │ │ 5. Database sharding │ │ │ │ 6. Multi-region │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Key Insight: │ │ "Scale when needed, not before. But design so you can." │ │ │ └─────────────────────────────────────────────────────────────────┘

All Blogs
Tags:scalinghorizontal-scalingvertical-scalingarchitecture