Module 6: CAP Theorem and Beyond

The CAP Theorem

Eric Brewer's CAP Theorem (2000) states that a distributed system can provide at most two of three guarantees simultaneously.
┌─────────────────────────────────────────────────────────────────┐ │ CAP THEOREM │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ C │ │ / \ │ │ / \ │ │ / CP \ │ │ / \ │ │ /__________\ │ │ /\ CA /\ │ │ / \ / \ │ │ / \ / \ │ │ / AP \ / \ │ │ /________\ /________\ │ │ A P │ │ │ │ C = Consistency (Linearizability) │ │ A = Availability (Every request receives a response) │ │ P = Partition Tolerance (System works despite network splits) │ │ │ │ IMPORTANT: You must always choose P in a distributed system! │ │ So the real choice is: CP or AP │ │ │ └─────────────────────────────────────────────────────────────────┘

Understanding Each Property

┌─────────────────────────────────────────────────────────────────┐ │ CAP PROPERTIES DEFINED │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ CONSISTENCY (C): │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Every read receives the most recent write │ │ │ │ (Linearizability) │ │ │ │ │ │ │ │ All nodes see the same data at the same time │ │ │ │ After write completes, all reads return that value │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ AVAILABILITY (A): │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Every request receives a non-error response │ │ │ │ (without guarantee it's the most recent write) │ │ │ │ │ │ │ │ System is operational: requests get responses │ │ │ │ No timeouts, no errors (ignoring network failures) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ PARTITION TOLERANCE (P): │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ System continues operating despite network partitions │ │ │ │ │ │ │ │ Network can drop or delay messages between nodes │ │ │ │ System must handle this (not give up entirely) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Why Partition Tolerance is Mandatory

┌─────────────────────────────────────────────────────────────────┐ │ WHY P IS NOT OPTIONAL │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ "A network partition is not a matter of IF but WHEN" │ │ │ │ Real-world partition causes: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ • Switch/router failure │ │ │ │ • Fiber cut (construction, animals, earthquakes) │ │ │ │ • Network congestion (buffer overflow) │ │ │ │ • Misconfigured firewall rules │ │ │ │ • Cloud provider network issues │ │ │ │ • GC pauses that trigger timeouts │ │ │ │ • Asymmetric network issues (A→B works, B→A fails) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ Without P, a CA system: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ During partition: System must become unavailable │ │ │ │ │ │ │ │ Because: │ │ │ │ • Can't guarantee consistency across partition │ │ │ │ • Can't serve requests (would violate consistency) │ │ │ │ • Must shut down or return errors │ │ │ │ │ │ │ │ "CA" means: "single node" or "accept downtime" │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ For any multi-node system, you MUST handle partitions. │ │ The choice is: What to sacrifice during partition? │ │ │ └─────────────────────────────────────────────────────────────────┘

CP vs AP: The Real Choice

CP Systems (Consistency over Availability)

┌─────────────────────────────────────────────────────────────────┐ │ CP SYSTEMS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ During partition: Sacrifice availability for consistency │ │ │ │ Before partition: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Node A ◄──────► Node B ◄──────► Node C │ │ │ │ (x=1) (x=1) (x=1) │ │ │ │ All nodes synchronized, all requests served │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ During partition: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Node A ◄──────► Node B ✗ Node C │ │ │ │ (x=1) (x=1) partition (x=1) │ │ │ │ │ │ │ │ Option 1 (Majority side): A,B continue serving │ │ │ │ Option 2 (Minority side): C rejects requests │ │ │ │ │ │ │ │ Write x=2 arrives at A: │ │ │ │ A,B: x=2 (majority confirms) │ │ │ │ C: Returns error (can't reach majority) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ CP system examples: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ • ZooKeeper (coordination service) │ │ │ │ • etcd (distributed key-value) │ │ │ │ • Google Spanner (distributed database) │ │ │ │ • HBase (Hadoop database) │ │ │ │ • MongoDB (with majority write concern) │ │ │ │ • PostgreSQL (single leader) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

AP Systems (Availability over Consistency)

┌─────────────────────────────────────────────────────────────────┐ │ AP SYSTEMS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ During partition: Sacrifice consistency for availability │ │ │ │ Before partition: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Node A ◄──────► Node B ◄──────► Node C │ │ │ │ (x=1) (x=1) (x=1) │ │ │ │ All nodes synchronized │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ During partition: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Node A ◄──────► Node B ✗ Node C │ │ │ │ (x=1) (x=1) partition (x=1) │ │ │ │ │ │ │ │ ALL nodes continue serving requests │ │ │ │ │ │ │ │ Write x=2 to A: A,B: x=2 │ │ │ │ Write x=3 to C: C: x=3 │ │ │ │ │ │ │ │ CONFLICT! Different values on different sides │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ After partition heals: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ A,B have x=2, C has x=3 │ │ │ │ Conflict resolution needed: │ │ │ │ • Last-write-wins (by timestamp) │ │ │ │ • Version vectors (detect conflict) │ │ │ │ • Application-level merge │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ AP system examples: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ • Cassandra (distributed database) │ │ │ │ • DynamoDB (AWS NoSQL) │ │ │ │ • CouchDB (document store) │ │ │ │ • Riak (distributed key-value) │ │ │ │ • DNS (domain name system) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

PACELC: CAP Extended

Daniel Abadi extended CAP with PACELC (2012): Even without Partitions, there's a trade-off.
┌─────────────────────────────────────────────────────────────────┐ │ PACELC THEOREM │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ If there's a Partition (P), choose Availability or Consistency │ │ Else (E), when running normally, choose Latency or Consistency │ │ │ │ PACELC = (P → A or C) + (E → L or C) │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ During Partition: During Normal Operation: │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ │ │ A C │ │ L C │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ▼ ▼ │ │ ▼ ▼ │ │ │ │ │ │ Availability │ │ Low Strong│ │ │ │ │ │ (serve requests) │ │ Latency Consistency│ │ │ │ │ │ OR │ │ OR │ │ │ │ │ │ Consistency │ │ Consistency │ │ │ │ │ │ (reject minority)│ │ (sync writes) │ │ │ │ │ └──────────────────┘ └──────────────────┘ │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ Key insight: Even without partitions, strong consistency │ │ requires waiting for replica acknowledgment (adds latency) │ │ │ └─────────────────────────────────────────────────────────────────┘

PACELC Classification of Databases

┌─────────────────────────────────────────────────────────────────┐ │ PACELC DATABASE CLASSIFICATION │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Database │ If Partition │ Else (Normal) │ Classification│ │ ────────────────┼──────────────┼───────────────┼───────────────│ │ DynamoDB │ A │ L │ PA/EL │ │ Cassandra │ A │ L │ PA/EL │ │ Riak │ A │ L │ PA/EL │ │ CouchDB │ A │ L │ PA/EL │ │ ────────────────┼──────────────┼───────────────┼───────────────│ │ MongoDB │ C │ L │ PC/EL │ │ ────────────────┼──────────────┼───────────────┼───────────────│ │ PNUTS (Yahoo!) │ C │ L │ PC/EL │ │ ────────────────┼──────────────┼───────────────┼───────────────│ │ HBase │ C │ C │ PC/EC │ │ Spanner │ C │ C │ PC/EC │ │ VoltDB │ C │ C │ PC/EC │ │ ────────────────┼──────────────┼───────────────┼───────────────│ │ Cosmos DB │ Tunable │ Tunable │ Tunable │ │ Cassandra* │ Tunable │ Tunable │ Tunable │ │ │ │ PA/EL: High availability, eventual consistency │ │ Best for: Social media, content delivery │ │ │ │ PC/EL: Consistent during partition, fast during normal │ │ Best for: Most web applications │ │ │ │ PC/EC: Always consistent (strongest) │ │ Best for: Financial systems, coordination │ │ │ └─────────────────────────────────────────────────────────────────┘

CAP in Practice

When to Choose CP

┌─────────────────────────────────────────────────────────────────┐ │ WHEN TO CHOOSE CP │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Choose CP when inconsistency causes serious problems: │ │ │ │ 1. FINANCIAL TRANSACTIONS │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Bank transfer: Must not create or destroy money │ │ │ │ Payment processing: Must not double-charge │ │ │ │ Stock trading: Must execute at correct price │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 2. COORDINATION / LOCKING │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Leader election: Only one leader at a time │ │ │ │ Distributed locks: Only one holder │ │ │ │ Resource allocation: No double-booking │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 3. UNIQUE CONSTRAINTS │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Username registration: Must be unique │ │ │ │ Inventory: Can't sell item twice │ │ │ │ Booking: One customer per time slot │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 4. SEQUENTIAL OPERATIONS │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Sequence number generation │ │ │ │ Order processing (must process in order) │ │ │ │ Version control (must serialize commits) │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Consequence of choosing CP: │ │ During partition, affected operations return errors. │ │ Users may see "Service temporarily unavailable." │ │ │ └─────────────────────────────────────────────────────────────────┘

When to Choose AP

┌─────────────────────────────────────────────────────────────────┐ │ WHEN TO CHOOSE AP │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Choose AP when availability is more important than │ │ perfect consistency: │ │ │ │ 1. CONTENT DELIVERY │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Static content: Images, videos, documents │ │ │ │ Old version is better than no version │ │ │ │ CDNs are inherently AP │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 2. SOCIAL MEDIA │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ News feed: Slightly stale is OK │ │ │ │ Like counts: Eventually consistent │ │ │ │ Comments: Can show in different order │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 3. SHOPPING CART │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Amazon's famous example │ │ │ │ "Add to cart" should always work │ │ │ │ Resolve conflicts at checkout time │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 4. SENSOR DATA / METRICS │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ IoT sensor readings │ │ │ │ Application metrics │ │ │ │ Log aggregation │ │ │ │ Better to have slightly inconsistent data than none │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 5. DNS │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Must always resolve (availability critical) │ │ │ │ Stale record is better than no record │ │ │ │ TTL-based eventual consistency │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Consequence of choosing AP: │ │ During partition, operations succeed but may conflict. │ │ Conflicts must be detected and resolved later. │ │ │ └─────────────────────────────────────────────────────────────────┘

Hybrid Approaches

Per-Operation Consistency

┌─────────────────────────────────────────────────────────────────┐ │ MIXED CONSISTENCY STRATEGIES │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Real systems often use different consistency for different │ │ operations: │ │ │ │ E-Commerce Example: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Operation │ Consistency │ Reason │ │ │ │ ────────────────────┼─────────────┼──────────────────────│ │ │ │ Browse products │ Eventual │ Stale prices OK │ │ │ │ View cart │ Session │ See own additions │ │ │ │ Check inventory │ Eventual* │ Final check at order │ │ │ │ Place order │ Strong │ Must not oversell │ │ │ │ Process payment │ Strong │ Financial accuracy │ │ │ │ Send confirmation │ Eventual │ Async is fine │ │ │ │ Update inventory │ Strong │ Must be accurate │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ Implementation: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ • Product catalog: Cassandra (AP, cached heavily) │ │ │ │ • Shopping cart: DynamoDB (AP, per-user) │ │ │ │ • Orders/Payments: PostgreSQL (CP, ACID) │ │ │ │ • Inventory: Redis + PostgreSQL (fast reads, CP writes)│ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Tunable Consistency

┌─────────────────────────────────────────────────────────────────┐ │ TUNABLE CONSISTENCY │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Many databases allow per-request consistency tuning: │ │ │ │ CASSANDRA: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ // Eventual consistency (fastest) │ │ │ │ SELECT * FROM users WHERE id = 123 │ │ │ │ USING CONSISTENCY ONE; │ │ │ │ │ │ │ │ // Strong consistency (slowest) │ │ │ │ SELECT * FROM users WHERE id = 123 │ │ │ │ USING CONSISTENCY ALL; │ │ │ │ │ │ │ │ // Balanced (quorum) │ │ │ │ SELECT * FROM users WHERE id = 123 │ │ │ │ USING CONSISTENCY QUORUM; │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ COSMOS DB: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Five consistency levels: │ │ │ │ │ │ │ │ Strong ─► Bounded Staleness ─► Session ─► Consistent │ │ │ │ Prefix ─► Eventual│ │ │ │ │ │ │ │ Each level can be chosen per-request │ │ │ │ Different cost and latency characteristics │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ DYNAMODB: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ // Eventually consistent read (default, cheaper) │ │ │ │ result = table.get_item(Key={'id': '123'}) │ │ │ │ │ │ │ │ // Strongly consistent read (2x read cost) │ │ │ │ result = table.get_item( │ │ │ │ Key={'id': '123'}, │ │ │ │ ConsistentRead=True │ │ │ │ ) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Common Misconceptions

┌─────────────────────────────────────────────────────────────────┐ │ CAP MISCONCEPTIONS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ✗ MISCONCEPTION 1: "Pick any 2 of 3" │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Reality: P is mandatory for distributed systems │ │ │ │ Network partitions WILL happen │ │ │ │ Real choice: CP or AP (during partition) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ✗ MISCONCEPTION 2: "System is always CP or always AP" │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Reality: Choice applies PER OPERATION │ │ │ │ Same system can be CP for some ops, AP for others │ │ │ │ Consistency can be tuned per-request │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ✗ MISCONCEPTION 3: "CAP applies all the time" │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Reality: Trade-off only during partitions │ │ │ │ When network is healthy, can have both C and A │ │ │ │ PACELC captures this (E = Else, normal operation) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ✗ MISCONCEPTION 4: "AP means no consistency at all" │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Reality: AP means eventual consistency │ │ │ │ Data converges after partition heals │ │ │ │ Can still provide causal consistency, session guarantees│ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ✗ MISCONCEPTION 5: "Consistency in CAP = ACID Consistency" │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Reality: Different meanings! │ │ │ │ CAP Consistency = Linearizability (single-object) │ │ │ │ ACID Consistency = Database invariants (multi-object) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Partition Detection and Handling

┌─────────────────────────────────────────────────────────────────┐ │ DETECTING AND HANDLING PARTITIONS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Detection methods: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ • Heartbeat timeouts (no response = possibly partitioned)│ │ │ │ • Consensus failures (can't reach quorum) │ │ │ │ • Increased latency (approaching timeout) │ │ │ │ • Network health monitoring │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ CP system response: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ 1. Detect which side has majority │ │ │ │ 2. Majority side continues operating │ │ │ │ 3. Minority side: │ │ │ │ - Stop accepting writes │ │ │ │ - Optionally serve stale reads (with warning) │ │ │ │ - Return errors for operations requiring consistency │ │ │ │ 4. Log everything for later reconciliation │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ AP system response: │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ 1. Continue operating on all sides │ │ │ │ 2. Track all writes with vector clocks/versions │ │ │ │ 3. When partition heals: │ │ │ │ - Detect conflicts (divergent versions) │ │ │ │ - Apply conflict resolution strategy │ │ │ │ - Merge or choose winning value │ │ │ │ 4. Notify application of conflicts if needed │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Interview Questions

Conceptual Questions

  1. Explain the CAP theorem. Why can't we have all three?
    • C (Consistency): All nodes see same data
    • A (Availability): Every request gets response
    • P (Partition tolerance): Works despite network splits
    • During partition: can't ensure consistency without sacrificing availability (or vice versa)
    • Must choose: respond with possibly stale data (AP) or reject request (CP)
  2. Why is partition tolerance not optional?
    • Networks fail in practice (switch failure, cable cuts, etc.)
    • "CA" means single machine or accepting downtime during partitions
    • Any distributed system must handle partitions
  3. What is PACELC? Why does it extend CAP?
    • PACELC: If Partition → A or C, Else → L or C
    • CAP only addresses partition scenarios
    • Even without partitions, trade-off exists between latency and consistency
    • Strong consistency requires synchronous replication (adds latency)
  4. How does a CP system behave during partition?
    • Majority side continues serving requests
    • Minority side rejects writes (or goes read-only)
    • Ensures no conflicting writes
    • Users on minority side see errors/timeouts

System Design Questions

  1. Design a payment system - CP or AP?
    • CP: Can't have conflicting payment states
    • Use consensus (Raft/Paxos) for payment processing
    • During partition: reject payments on minority side
    • Better to fail than double-charge
  2. Design a shopping cart - CP or AP?
    • AP: Availability more important
    • "Add to cart" should always work
    • Use CRDTs or merge at checkout
    • During partition: allow all adds, merge later
    • At checkout: verify inventory (CP operation)

Summary

┌─────────────────────────────────────────────────────────────────┐ │ MODULE 6 KEY TAKEAWAYS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 1. CAP Theorem: Can't have C, A, and P simultaneously │ │ • Partitions are inevitable │ │ • Real choice: CP or AP during partition │ │ │ │ 2. CP: Consistency over Availability │ │ • For: Financial, coordination, unique constraints │ │ • Minority side becomes unavailable during partition │ │ │ │ 3. AP: Availability over Consistency │ │ • For: Content delivery, social media, shopping carts │ │ • May have conflicting writes during partition │ │ │ │ 4. PACELC extends CAP │ │ • Even without partitions: Latency vs Consistency trade-off│ │ • PA/EL, PC/EL, PC/EC classifications │ │ │ │ 5. In practice: Mix strategies │ │ • Different operations may need different consistency │ │ • Use tunable consistency when available │ │ • CP for critical operations, AP for read-heavy data │ │ │ │ 6. Common misconception: "Pick 2 of 3" │ │ • P is mandatory, not optional │ │ • Trade-off only applies during partitions │ │ │ └─────────────────────────────────────────────────────────────────┘

Next Module: Replication Strategies - Leader-follower, multi-leader, and leaderless replication patterns.
All Blogs
Tags:cap-theorempacelcconsistencyavailabilitytradeoffs