Part 42: System Design Case Studies - Applying Principles to Real Problems

"The true test of understanding is not reciting principles but applying them. In this chapter, we walk through real system designs, showing how the concepts we've explored come together to solve actual problems at scale."

Case Study 1: URL Shortener

The URL shortener is a classic system design problem because it appears simple but involves many distributed systems concepts.

The core requirement is straightforward: given a long URL, produce a short code that redirects to the original URL. Users can create short URLs; anyone with the short code can be redirected.

For the short code, we need a compact representation that's unique across all shortened URLs. A common approach uses base62 encoding (a-z, A-Z, 0-9) of a unique identifier. A 7-character base62 code provides about 3.5 trillion unique values—plenty for most use cases.

Generating unique identifiers at scale requires thought. A single database sequence creates a bottleneck. Distributed ID generators like Snowflake or UUID provide uniqueness without coordination. Alternatively, hash the long URL and use a prefix of the hash, with collision handling.

The data model is simple: a mapping from short code to long URL, plus metadata (creation time, owner, click count). This mapping must be durable—losing it means all short links break.

The read path dominates traffic: for every URL creation, there might be thousands of redirects. Caching is essential. A Redis cache fronts the database. The working set of popular URLs fits in memory. Cache misses fall through to the database.

Geographic distribution reduces redirect latency. Users worldwide should reach nearby servers. A CDN can cache redirects at the edge. Multi-region database replicas serve reads locally.

Analytics add complexity. Tracking click counts, referrers, and geographic distribution requires writing data on every redirect—the hot path. Asynchronous logging to a stream (Kafka) decouples analytics from the redirect path. Stream processors aggregate the analytics data.

The system architecture emerges: load balancers distribute traffic to stateless application servers. These servers check Redis cache first, then database if needed. Redirects are logged asynchronously to Kafka. Analytics processors consume the stream and update aggregates.

This design handles millions of redirects per second with sub-50-millisecond latency, using familiar patterns: caching, replication, asynchronous processing, and geographic distribution.

Case Study 2: Distributed Cache

Memcached and Redis are ubiquitous, but understanding how to build a distributed cache illuminates core concepts.

A cache must be fast—that's its purpose. Memory storage is essential; disk is too slow. The data structure is typically a hash table for O(1) lookups.

Single-node caches have limited capacity. Distributed caches shard data across multiple nodes using consistent hashing. Each key maps to a node; clients route directly to the right node.

Cache eviction removes old entries when capacity is reached. LRU (least recently used) is common: when space is needed, evict entries that haven't been accessed recently. Implementing LRU efficiently requires a hash table plus a linked list to track access order.

Hot keys challenge even distributed caches. A single popular item might receive enormous traffic, overwhelming the node that holds it. Replicating hot keys across multiple nodes spreads the load. Detecting hot keys requires access counting, which itself adds overhead.

Cache invalidation is the hard problem. When the source data changes, cached copies become stale. Push-based invalidation sends updates to caches when data changes. Pull-based invalidation uses time-to-live (TTL) values that expire entries. Push is fresher but more complex; pull is simpler but allows temporary staleness.

The client library handles complexity. It hashes keys to find nodes, routes requests, handles node failures, and potentially retries. A well-designed client makes the distributed cache feel like a simple key-value store.

For high availability, cache nodes can replicate data. Primary nodes handle writes; replicas serve reads and take over if primaries fail. This mirrors database replication patterns.

Case Study 3: Message Queue

Message queues decouple producers from consumers, provide buffering, and enable asynchronous processing. Understanding their design reveals durability and ordering challenges.

The core abstraction is the queue: producers append messages; consumers remove them. For durability, messages must persist to disk. For performance, messages should also reside in memory when possible.

Append-only logs are an efficient storage model. Messages are appended sequentially—the fastest disk access pattern. Consumers track their position in the log. Unlike traditional queues that delete consumed messages, log-based queues retain messages for a retention period.

Partitioning enables parallel consumption. A topic is divided into partitions; each partition is a separate log. Different consumers can read different partitions concurrently. Within a partition, order is preserved; across partitions, no ordering guarantee exists.

Consumer groups coordinate multiple consumers reading the same topic. Each consumer in a group handles a subset of partitions. If a consumer fails, its partitions are reassigned to others. This provides both parallelism and fault tolerance.

At-least-once delivery is easier than exactly-once. The broker tracks which messages have been delivered but not acknowledged. If a consumer fails before acknowledging, the message is redelivered. Consumers must be idempotent to handle potential duplicates.

Exactly-once semantics require coordination. Kafka achieves this through idempotent producers (deduplicating based on sequence numbers) and transactional consumers (atomically reading and committing offsets).

High availability requires replication. Each partition has multiple replicas across different brokers. One replica is the leader handling all reads and writes; others are followers. If the leader fails, a follower becomes the new leader. This is Raft or Paxos applied to message log replication.

Case Study 4: Search System

Search systems like Elasticsearch power everything from website search to log analysis. Their design involves inverted indexes, distributed query processing, and relevance scoring.

The inverted index is the core data structure. For each term (word), it lists the documents containing that term. Searching for "distributed systems" finds documents containing "distributed" AND documents containing "systems," then intersects these lists.

Building indexes requires parsing documents, extracting terms (tokenization), normalizing them (stemming, lowercasing), and updating the inverted index. For large document collections, this is a significant processing task.

Sharding distributes documents across nodes. Each shard contains a subset of documents and maintains its own inverted index. Queries are broadcast to all shards; results are merged.

Query processing involves parsing the query, finding matching documents, scoring them for relevance, and returning the top results. Relevance scoring (like TF-IDF or BM25) considers term frequency, document frequency, and field weights.

Distributed query execution requires coordination. A coordinator receives the query, routes it to all shards, waits for results, merges and re-ranks them, and returns the final list. This scatter-gather pattern is common in distributed systems.

Near-real-time indexing allows recently added documents to be searchable quickly. New documents are first indexed in memory (a segment). Periodically, memory segments are flushed to disk. Segments are periodically merged for efficiency.

High availability uses replication. Each shard has replicas on different nodes. Queries can be served by any replica, spreading load. If a primary fails, a replica takes over.

Case Study 5: Payment System

Payment systems have stringent requirements: correctness (never lose money), durability (never lose transactions), and consistency (balances must be accurate). These requirements shape the architecture.

The double-entry bookkeeping principle underlies financial systems. Every transaction involves at least two entries: a debit from one account and a credit to another. The sum of all entries is always zero. This invariant catches errors.

Idempotency is essential. Network failures might cause payment requests to be retried. A request that creates two payments instead of one would be a serious bug. Every payment request has a unique ID; the system tracks which IDs have been processed and returns the previous result for duplicates.

Strong consistency is typically required. If a user's balance shows $100, a payment of $100 should succeed. If it shows $50, it should fail. This requires linearizable reads—seeing the true current balance, not a stale value.

Transactions span multiple accounts. Transferring money from Account A to Account B must debit A and credit B atomically. This is a distributed transaction across potentially different databases or services.

Saga patterns handle complex payment flows. A purchase might involve checking inventory, authorizing payment, capturing funds, and shipping. If any step fails, compensation actions (refunds, inventory release) restore consistency.

Reconciliation catches discrepancies. Periodically compare records: transactions in the database versus transactions at the payment processor versus transactions in accounting systems. Discrepancies indicate bugs or fraud.

Audit logging records everything. Who initiated the transaction? When? What approvals occurred? This supports debugging, fraud detection, and regulatory compliance.

Case Study 6: Real-Time Collaboration

Google Docs-style real-time collaboration allows multiple users to edit simultaneously, seeing each other's changes instantly. This requires solving consistency, conflict resolution, and latency challenges.

Operational Transformation (OT) or CRDTs handle concurrent edits. When two users type at the same position simultaneously, their edits must be merged sensibly. OT transforms operations to account for concurrent changes. CRDTs design data structures where concurrent operations naturally commute.

Client-side prediction provides instant feedback. When you type, the character appears immediately—without waiting for server confirmation. The client optimistically applies the edit, sends it to the server, and adjusts if the server's view differs.

Event synchronization keeps clients in sync. Each edit is an event sent to the server, which broadcasts to all clients. Events might arrive out of order; the reconciliation algorithm handles this.

Presence features show where other users are editing. Cursor positions, selection highlights, and user indicators help collaborators coordinate. These are high-frequency, low-importance updates—eventual consistency is fine.

Persistent storage saves documents. Edits are applied to an in-memory representation for fast collaboration. Periodically, the current state is persisted. The event log might also be stored for history and undo.

Bringing It Together

These case studies demonstrate patterns recurring across different problems: consistent hashing for distribution, replication for availability, caching for performance, asynchronous processing for decoupling, consensus for consistency.

The specific combinations differ based on requirements. A cache prioritizes speed over durability. A payment system prioritizes consistency over speed. A search system prioritizes query performance over write latency.

Understanding both the patterns and the tradeoffs enables you to design systems tailored to your requirements—combining the building blocks we've explored throughout this course into coherent, effective architectures.

"System design is not about memorizing solutions; it's about understanding patterns and tradeoffs well enough to compose novel solutions for novel problems. The case studies illuminate the patterns; your creativity applies them."