Consistency Models in Distributed Systems


πŸ”₯ The Problem

Your user updates their profile name from "Bob" to "Robert." They hit save, see the success message, refresh the page and see "Bob" again. They update again, refresh, and now it says "Robert." They tell their friend to check their profile. The friend sees "Bob." The user refreshes and sees "Robert." Their friend refreshes and finally sees "Robert." Nobody trusts your application anymore.
This isn't a bug in your code. Your code correctly wrote "Robert" to the primary database. The problem is that your user's reads hit different replicas, each at different points in replication. Your friend's reads went to yet another replica. Everyone saw a technically "correct" value just not the same value, and not in an order that made sense to humans.
When we first built a multi-region product, we distributed database replicas across three continents for low latency. Writes went to the nearest region and replicated asynchronously. Users in Europe could write, then call a user in Asia, who would look at the same data and see something different. Support tickets flooded in: "Your app is broken my colleague and I are looking at the same page and seeing different numbers."
We learned that "eventually consistent" isn't just a technical term it's a user experience decision. If users expect to see their writes immediately and see the same data as people they're collaborating with, eventual consistency will frustrate them. But if you want low latency across regions and high availability during network partitions, eventual consistency might be your only option.
Consistency models define the contract between a distributed system and its users about what they can expect to see and when. Choosing the wrong model means either unusable latency or confused users. Understanding these models means you can make the right tradeoff for each use case.

πŸ’‘ Inspiration

The question of consistency has been central to distributed systems research since the 1970s. Early work focused on databases: if multiple transactions run concurrently, what guarantees do we provide about the outcome?
Leslie Lamport formalized sequential consistency in 1979, asking: can we make a distributed system behave like a single machine with operations happening in some order? This was already weaker than the "real-time" ordering we might naively expect, but even sequential consistency proved expensive to implement.
As systems scaled and went global, researchers and practitioners realized that strong consistency fundamentally conflicts with availability and low latency. Eric Brewer's CAP theorem (1999, proven in 2002) formalized this: you can't have consistency, availability, and partition tolerance simultaneously.
This led to a proliferation of weaker consistency models. Werner Vogels, CTO of Amazon, wrote his influential 2009 post "Eventually Consistent" explaining why Amazon's systems use weak consistency and how to make it work for users. The DynamoDB paper and the Cassandra architecture showed how to build systems with "tunable consistency," letting developers choose the right tradeoff per operation.
The insight is that consistency isn't binary. There's a spectrum from linearizability (strongest, most expensive) through sequential consistency, causal consistency, read-your-writes, monotonic reads, down to eventual consistency (weakest, cheapest). Different operations in the same application might need different consistency levels. Your bank balance needs strong consistency; your news feed doesn't.

πŸ› οΈ The Solution (Overview)

Here's the consistency spectrum from strongest to weakest:
Linearizability (also called "strong consistency" or "atomic consistency") is the gold standard. Every operation appears to happen instantaneously at some point between when it was invoked and when it returned. All clients see the same ordering of operations, and that ordering respects real-time: if operation A completes before operation B starts, A appears before B. This is what you'd get from a single machine, and it's incredibly expensive in a distributed system it requires coordination on every operation.
Sequential Consistency relaxes the real-time requirement. All clients still see the same total ordering of operations, and that ordering respects each client's program order (their operations appear in the order they issued them). But the global ordering doesn't have to match real-time. If client A's write completes before client B's read starts (in real time), B might still see the old value. This is cheaper than linearizability because it doesn't require real-time coordination.
Causal Consistency relaxes further by only ordering causally related operations. If operation A could have influenced operation B (A wrote data that B read, or they're from the same client), then everyone sees A before B. But concurrent operations ones with no causal relationship might appear in different orders to different clients. This matches human intuition well: replies appear after the posts they reply to, but independent posts can appear in any order.
Eventual Consistency provides the weakest guarantee: if no new writes occur, eventually all replicas will converge to the same value. During convergence, different clients might see different values, or the same client might see values "go backward" on different reads. This is cheap and highly available, but it pushes complexity to application developers who must handle inconsistent reads.
Session Guarantees sit between causal and eventual consistency, providing guarantees within a single client session: read-your-writes (you see your own updates), monotonic reads (you don't see values go backward), monotonic writes (your writes are applied in order), and writes-follow-reads (your writes are ordered after data you've read).

πŸ” Detailed Explanation

Linearizability: The Gold Standard

Linearizability, defined by Herlihy and Wing in 1990, is the consistency model that makes a distributed system behave like a single machine. Every operation appears to take effect atomically at some instant between its invocation and response called the linearization point.
Here's what this means concretely. Consider two clients, Alice and Bob:
Time ───────────────────────────────────────────────────────────────► Alice: β”œβ”€β”€β”€β”€β”€ write(x=1) ────── β”‚ β–Ό (linearization point somewhere in here) Bob: β”œβ”€β”€β”€β”€β”€ read(x) ────── β†’ must return 1
Alice writes x=1. Bob's read overlaps with or starts after Alice's write completes. In a linearizable system, Bob must see the value 1 (or a later value). He cannot see the old value of x.
More precisely, linearizability requires:
  1. Real-time ordering: If operation A completes before operation B invokes, then A appears before B in the linearization order.
  2. Legal sequential history: There exists some total ordering of all operations such that each read returns the value of the most recent write in that ordering.
Here's a non-linearizable execution:
Time ───────────────────────────────────────────────────────────────► Alice: β”œβ”€β”€β”€ write(x=1) ──── Bob: β”œβ”€β”€β”€ read(x) ──── β†’ returns 1 Carol: β”œβ”€β”€β”€ read(x) ──── β†’ returns 0 βœ—
Carol's read starts after Bob's read starts, and Bob saw 1. In a linearizable system, once any client has observed a write, all subsequent reads must see that write (or a later one). Carol reading 0 violates linearizability.
Why is linearizability expensive? To guarantee that "once a write is visible, it's visible to everyone," you need coordination. The typical approaches:
  1. Single leader: All operations go through one node. This node is the linearization point. Problem: that node becomes a bottleneck and single point of failure.
  2. Consensus protocol (Paxos, Raft): Every write requires a majority of replicas to acknowledge before returning. This guarantees that any subsequent read will hit at least one node that knows about the write (since read and write quorums overlap). Problem: cross-region consensus is slow 100-300ms per operation.
  3. Synchronized clocks (Spanner's TrueTime): With bounded clock uncertainty, you can assign timestamps that respect real-time ordering and then wait out the uncertainty before committing. Problem: requires specialized hardware (GPS, atomic clocks).
In practice, linearizability costs you 10-100x in latency compared to eventual consistency for geographically distributed systems. Use it only where correctness requires it.
When to use linearizability:
  • Distributed locks and leader election: The entire point is that only one entity holds the lock at a time. Without linearizability, two nodes might both think they're the leader.
  • Unique constraints: Checking if a username is taken and then creating it must be atomic. Without linearizability, two users might both successfully register the same username.
  • Financial transactions: Moving money between accounts cannot have intermediate states visible. Double-spending must be impossible.
  • Sequence number generation: If two requests both need the "next" order ID, they can't get the same one.
When not to use linearizability:
  • Read-heavy workloads with tolerance for staleness: User profiles, product catalogs, analytics dashboards.
  • High-availability requirements: During a network partition, a linearizable system must refuse requests to maintain consistency. An eventually consistent system can keep serving.
  • Multi-region deployments: Cross-continent consensus latency is unacceptable for user-facing operations.

Sequential Consistency

Sequential consistency, defined by Lamport in 1979, is slightly weaker than linearizability. It requires that all operations appear to execute in some sequential order that is consistent with the program order of each individual process but this order doesn't have to match real-time.
The difference is subtle but important:
Time ───────────────────────────────────────────────────────────────► Alice: β”œβ”€β”€β”€ write(x=1) ──── (Alice's write completes) Bob: β”œβ”€β”€β”€ write(x=2) ──── (Bob's write starts after Alice's completes) Carol: β”œβ”€β”€β”€ read(x) ──── β†’ returns 1
In a linearizable system, Carol must read 2 (or later), because Bob's write started after Alice's write completed.
In a sequentially consistent system, Carol reading 1 is valid. The sequential order could be: write(x=2), write(x=1), read(x)β†’1. This order respects each client's program order (Alice only did one operation, Bob only did one operation), but it doesn't respect the real-time ordering between Alice's and Bob's operations.
This seems like a minor relaxation, but it makes a huge difference in implementation cost. Sequential consistency can be achieved by having each client stick to one replica (so they see a consistent view) without requiring coordination between replicas about the exact real-time ordering of operations from different clients.
ZooKeeper provides sequential consistency for reads. Writes go through the leader and are linearizable. But reads can go to any replica, which might be behind the leader. A client will never see its own writes go backward (ZooKeeper ensures this), but two clients reading from different replicas might see different views.
Why does sequential consistency matter? It's what CPUs provide for multi-threaded programs. When you write lock-free concurrent code, you're reasoning about sequential consistency, not linearizability. Understanding that distinction helps you reason about what barriers and fences you need.

Causal Consistency

Causal consistency sits between sequential consistency and eventual consistency. It only requires ordering between causally related operations operations that could have influenced each other.
Two operations are causally related if:
  1. Same client: A client's later operations depend on their earlier operations
  2. Read-write dependency: A read that returns a value depends on the write that wrote that value
  3. Transitivity: If A causally precedes B and B causally precedes C, then A causally precedes C
Operations with no causal relationship are concurrent and may appear in different orders to different observers.
Here's an example:
Alice posts: "Anyone want coffee?" (event A) β”‚ β–Ό (Bob reads Alice's post) Bob replies: "I do!" (event B, causally follows A) Charlie posts: "Nice weather today" (event C, concurrent with A and B)
Causal consistency requires that everyone sees Alice's post before Bob's reply (since Bob read Alice's post before replying). But Charlie's post is concurrent with both some observers might see it before Alice's post, others after Bob's reply.
Valid orderings a client might see:
  • Alice, Bob, Charlie βœ“
  • Alice, Charlie, Bob βœ“
  • Charlie, Alice, Bob βœ“
  • Bob, Alice, Charlie βœ— (violates Alice β†’ Bob)
  • Charlie, Bob, Alice βœ— (violates Alice β†’ Bob)
Implementing causal consistency typically involves tracking causal dependencies. Each operation carries metadata about what operations it depends on:
  1. Vector clocks: Each operation carries a vector timestamp. A server only delivers an operation to a client when all its causal dependencies have been delivered.
  2. Explicit dependency tracking: Each operation lists the IDs of operations it depends on. The server waits for dependencies before applying.
  3. Lamport timestamps with session tracking: Simpler but only works within sessions.
MongoDB's "causal sessions" provide causal consistency within a session. You start a causal session, and all operations within that session see each other's effects in causal order. This is implemented by tracking a cluster time and ensuring reads go to replicas that have caught up to the cluster time of the last write in the session.
Causal consistency is a sweet spot for many applications. It's much cheaper than linearizability (no cross-region coordination required), but it matches human intuition about ordering (you see replies after posts, you see your own writes). It's the strongest consistency model achievable in an always-available system during network partitions.

Session Guarantees

Most applications don't need full causal consistency across all clients. They need guarantees within a single user session. The four session guarantees, defined by Terry et al. in 1994, provide this:
Read Your Writes (RYW): If a session writes a value, subsequent reads in that session will see that write (or a later one). This prevents the frustrating experience of updating your profile and then seeing the old value.
Implementation: Track the timestamp (or version) of the session's last write. Route subsequent reads to replicas that have caught up to at least that timestamp.
go
// Pseudocode for read-your-writes type Session struct { lastWriteTimestamp int64 } func (s *Session) Write(key, value string) { ts := db.Write(key, value) s.lastWriteTimestamp = ts } func (s *Session) Read(key string) string { // Find a replica caught up to our last write replica := findReplicaWithTimestamp(s.lastWriteTimestamp) return replica.Read(key) }
Monotonic Reads (MR): If a session reads a value v, subsequent reads will return v or a later value, never an earlier one. This prevents the disorienting experience of seeing a counter go from 10 to 8.
Implementation: Track the timestamp of the session's last read. Subsequent reads go to replicas at or ahead of that timestamp.
Monotonic Writes (MW): A session's writes are applied in the order they were issued, at all replicas. This ensures that if you write x=1 then x=2, no replica will see x=2 before x=1.
Implementation: Include a session identifier and sequence number with each write. Replicas apply writes in sequence-number order per session.
Writes Follow Reads (WFR): If a session reads a value and then writes, the write is ordered after the read. This ensures that if you read a post and then write a reply, everyone sees the post before the reply.
Implementation: Track the dependencies of what the session has read, and include them with subsequent writes.
Together, these four guarantees constitute "causal consistency within a session." They're cheap to implement (no cross-session coordination) and solve most of the UX problems that plague eventually consistent systems.

Eventual Consistency

Eventual consistency is the weakest useful consistency model. It guarantees only that if no new writes occur, eventually all replicas will converge to the same value. That's it. No guarantee about when convergence happens, what intermediate values you'll read, or what order operations appear in.
In practice, "eventually" is usually seconds replication lag in well-tuned systems is typically under 100ms. But during network issues, it can be minutes or longer.
What can go wrong with eventual consistency:
  1. Stale reads: You read an old value because your replica hasn't received recent updates yet.
  2. Non-monotonic reads: You read value 10 from replica A, then read value 8 from replica B (which is behind). It looks like the value went backward.
  3. Lost updates: Two clients both read x=0, both increment to 1, both write 1. The expected result was 2, but you got 1.
  4. Divergence during partitions: During a network partition, replicas accept writes independently and diverge. When the partition heals, you have conflicting values.
Conflict resolution in eventually consistent systems:
Since concurrent writes can occur, you need a strategy for resolving conflicts:
  1. Last-Write-Wins (LWW): Each write carries a timestamp. The write with the highest timestamp wins. Simple, but can lose data (the "losing" write is silently discarded). Also dangerous if clocks are skewed.
  2. Multi-Value (siblings): Keep all concurrent values and let the application resolve. Amazon's shopping cart does this if there's a conflict, merge both carts. The user might see some duplicate items, which is better than losing items.
  3. CRDTs (Conflict-free Replicated Data Types): Design data structures that can always be merged without conflicts. A grow-only set can merge by union. A counter can have each node track its own increments and merge by summing. CRDTs guarantee eventual convergence without coordination.
  4. Application-specific resolution: For documents, you might do operational transformation or three-way merge. For inventory, you might take the minimum (conservative). For social features, you might show both versions and let the user choose.
Strong Eventual Consistency (SEC) is eventual consistency with a convergence guarantee: replicas that have received the same set of updates (regardless of order) are guaranteed to be in the same state. This requires using CRDTs or similar structures that are commutative, associative, and idempotent.

Quorum Systems

Quorum systems let you tune consistency on a per-operation basis. The key parameters:
  • N: Total number of replicas
  • W: Write quorum number of replicas that must acknowledge a write before it returns success
  • R: Read quorum number of replicas that must respond to a read
The fundamental rule: W + R > N ensures strong consistency. A read quorum and a write quorum always overlap, so a read will always see the latest write.
Example with N=5:
Replicas: [1] [2] [3] [4] [5] β–² β–² β–² β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”˜ Write quorum (W=3) β–² β–² β–² β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”˜ Read quorum (R=3) Overlap: Replicas 2 and 3 At least one replica in the read quorum has the latest write.
Common configurations:
NWRTrade-off
322Balanced. Tolerates 1 failure. Standard choice.
313Fast writes, slow reads. Good for write-heavy workloads.
331Slow writes, fast reads. Good for read-heavy workloads.
533High availability. Tolerates 2 failures.
311Eventual consistency. No overlap guarantee. Fastest but weakest.
Sloppy quorums relax the requirement that the W and R replicas be from the "home" replicas for a key. During a partition, writes can go to any available nodes (hinted handoff), which are then forwarded to the home replicas when they become available. This improves availability but weakens consistency you might write to a quorum that doesn't overlap with subsequent reads.

Practical Implementation: Read-Your-Writes

Let's walk through implementing read-your-writes consistency, the most common session guarantee:
Approach 1: Read from primary after write
python
def update_profile(user_id, new_name): primary.write(f"user:{user_id}", {"name": new_name}) def get_profile(user_id, session): if session.recently_wrote_to(f"user:{user_id}"): # Read from primary to see our write return primary.read(f"user:{user_id}") else: # Can read from any replica return any_replica.read(f"user:{user_id}")
Problem: Doesn't scale reads if there are many writes.
Approach 2: Track write timestamp, read from caught-up replica
python
class Session: def __init__(self): self.last_write_ts = 0 def write(self, key, value): ts = db.write(key, value) self.last_write_ts = max(self.last_write_ts, ts) def read(self, key): # Find replica that has caught up to our last write for replica in replicas: if replica.current_position >= self.last_write_ts: return replica.read(key) # Fallback to primary if no replica is caught up return primary.read(key)
This is what MongoDB does with its "afterClusterTime" read concern.
Approach 3: Client-side caching
javascript
class SessionCache { constructor() { this.pendingWrites = new Map(); // key -> {value, timestamp} } write(key, value) { const ts = Date.now(); this.pendingWrites.set(key, { value, timestamp: ts }); // Send to server asynchronously server.write(key, value).then(() => { // Could clear cache entry after confirmation }); return value; } read(key) { const cached = this.pendingWrites.get(key); const serverValue = server.read(key); if (cached && cached.timestamp > serverValue.timestamp) { // Our write hasn't replicated yet, return cached return cached.value; } return serverValue.value; } }
This provides optimistic read-your-writes without waiting for replication.

πŸ—οΈ Architecture

Here's how different consistency models fit into a typical architecture:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ CONSISTENCY ARCHITECTURE β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ APPLICATION LAYER β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ OPERATION ROUTER β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Decides consistency level per operation: β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ CreateUser() ──────────► LINEARIZABLE (unique username) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ GetUserProfile() ──────► EVENTUAL (cache OK) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ TransferMoney() ───────► LINEARIZABLE (financial) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ GetNewsFeed() ─────────► EVENTUAL (staleness OK) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ PostComment() ─────────► CAUSAL (after post it replies to) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ GetDashboard() ────────► READ-YOUR-WRITES (see own changes) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ STRONG CONSISTENCY β”‚ β”‚ EVENTUAL CONSISTENCY β”‚ β”‚ β”‚ β”‚ PATH β”‚ β”‚ PATH β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ COORDINATION β”‚ β”‚ β”‚ β”‚ LOCAL REPLICAS β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ SERVICE β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ US-East EU-West APAC β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ etcd / ZooKeeper β”‚ β”‚ β”‚ β”‚ β”Œβ”€β” β”Œβ”€β” β”Œβ”€β” β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Raft consensus β”‚ β”‚ β”‚ β”‚ β”‚Rβ”‚ β”‚Rβ”‚ β”‚Rβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”˜ β””β”€β”˜ β””β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Latency: 10-300ms β”‚ β”‚ β”‚ β”‚ β–² β–² β–² β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Async replication β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ STRONGLY β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ CONSISTENT DB β”‚ β”‚ β”‚ β”‚ Latency: <10ms local β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Convergence: 50-500ms β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Spanner, CockroachDBβ”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ Single-region RDS β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ CACHING LAYER β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ Redis / Memcached β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ TTL-based invalidation β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Read-through / Write-behindβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Latency: <1ms β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Staleness: up to TTL β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ SESSION MANAGEMENT β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Tracks per-session: β”‚ β”‚ β”‚ β”‚ β€’ Last write timestamp (for read-your-writes) β”‚ β”‚ β”‚ β”‚ β€’ Last read timestamp (for monotonic reads) β”‚ β”‚ β”‚ β”‚ β€’ Causal dependencies (for causal consistency) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Session State: β”‚ β”‚ β”‚ β”‚ { β”‚ β”‚ β”‚ β”‚ "session_id": "abc123", β”‚ β”‚ β”‚ β”‚ "last_write_ts": 1705315800000, β”‚ β”‚ β”‚ β”‚ "last_read_ts": 1705315799000, β”‚ β”‚ β”‚ β”‚ "causal_clock": {"node_a": 5, "node_b": 3} β”‚ β”‚ β”‚ β”‚ } β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ CONFLICT RESOLUTION β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ For eventual consistency, when conflicts detected: β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Strategy 1: Last-Write-Wins β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ Compare timestamps, keep higher. Simple but loses data. β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Strategy 2: Multi-Value / Siblings β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ Keep both values. Application/user resolves. β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Shopping cart: union. Document: show diff. Counter: max. β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Strategy 3: CRDTs β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ Data structures designed for automatic merge. β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ G-Counter, PN-Counter, OR-Set, LWW-Register, etc. β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ No conflicts by construction. β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Key architectural decisions:
  1. Per-operation consistency selection: Different operations need different consistency. Don't use one global setting.
  2. Separate storage systems: Use strongly consistent storage (etcd, Spanner) for operations that need it, eventually consistent storage (Cassandra, DynamoDB) for others.
  3. Session state: Track session information to provide session guarantees without global coordination.
  4. Conflict resolution strategy: Decide upfront how you'll handle conflicts in eventually consistent data.

βš–οΈ Tradeoffs

Consistency ModelLatencyAvailability During PartitionComplexityUse Case
LinearizabilityHigh (10-300ms cross-region)Low (must reject operations)Low (simple semantics)Locks, unique constraints, financial transactions
SequentialMediumMediumLowCPU memory model, ZooKeeper reads
CausalLow-MediumHigh (available during partitions)Medium (track dependencies)Social features, collaborative apps
Session Guarantees (RYW, MR)LowHighLowUser sessions, most CRUD apps
EventualLowest (<1ms local)HighestHigh (handle conflicts)Caching, analytics, high-scale reads
Choosing the right model:
  1. Start with session guarantees (read-your-writes). This covers most use cases and is easy to implement.
  2. Use linearizability only where correctness requires it: unique constraints, distributed locks, financial operations.
  3. Consider causal consistency for social/collaborative features: it matches user expectations without the cost of linearizability.
  4. Use eventual consistency for read-heavy, staleness-tolerant workloads: news feeds, analytics, caching.
  5. Mix models in the same application: Different operations have different requirements.

πŸ“ Summary

Consistency models define the contract between a distributed system and its clients about what they'll observe. The spectrum runs from linearizability (strongest, most expensive) through sequential consistency, causal consistency, session guarantees, down to eventual consistency (weakest, cheapest).
Linearizability makes a distributed system behave like a single machine. Every operation appears to happen atomically, and the ordering respects real time. It's essential for locks, unique constraints, and financial operations, but it requires coordination that adds significant latency.
Sequential consistency relaxes the real-time requirement. Operations still appear in a total order consistent with program order, but that order doesn't have to match wall-clock time. It's what ZooKeeper provides for reads.
Causal consistency orders only causally related operations those where one could have influenced the other. Concurrent operations may appear in different orders to different observers. It's the strongest consistency achievable in an always-available system.
Session guarantees read-your-writes, monotonic reads, monotonic writes, writes-follow-reads provide causal consistency within a single session. They're cheap and solve most UX problems.
Eventual consistency guarantees only that replicas will eventually converge. During convergence, clients may see inconsistent data. It's the cheapest model but pushes complexity to applications.
Quorum systems let you tune consistency per operation. W + R > N ensures strong consistency; smaller values give eventual consistency with better latency.
The key insight is that consistency isn't one-size-fits-all. Different operations in the same application need different consistency levels. Use linearizability where correctness demands it, eventual consistency where latency matters, and session guarantees for most user-facing operations.

❓ Questions to Think About

  1. A developer argues that eventual consistency is "good enough" because replication lag is usually only 50ms. Why might this reasoning be flawed?
    This question probes understanding of worst-case vs. average-case behavior. Replication lag is usually 50ms, but during network issues, node failures, or high load, it can spike to seconds or minutes. Eventual consistency guarantees are about what can happen, not what usually happens. If your application logic assumes writes are visible within 100ms, you'll have subtle bugs that appear rarely but catastrophically.
  2. You're building a collaborative document editor. Users expect to see each other's changes in real time. Do you need linearizability?
    This explores whether linearizability is necessary for collaboration. Surprisingly, no causal consistency is sufficient and often preferred. Users expect to see updates in causal order (your edit after my edit that you responded to), but they don't need a global real-time ordering of all edits. Systems like Google Docs use operational transformation with causal ordering, not linearizability. Linearizability would add unacceptable latency for real-time collaboration.
  3. Your e-commerce site uses eventual consistency for the product catalog. A customer adds an item to their cart, but then the product page shows "out of stock." What happened and how could you prevent it?
    This tests practical understanding of eventual consistency anomalies. The cart service and product service are reading from different replicas. The inventory update hasn't reached the replica serving the product page. Solutions: (1) read-your-writes within a session, (2) check inventory at checkout time with strong consistency, (3) use the same replica for related reads, (4) accept the inconsistency and handle it at checkout (show "sorry, this item became unavailable").
  4. Why is quorum consistency (W+R>N) not the same as linearizability?
    This probes the subtlety between quorum overlap and linearizability. Quorum overlap ensures a read will see the latest completed write. But during a write that hasn't completed, different read quorums might include different subsets of the write quorum, seeing different values. Also, sloppy quorums (writing to non-home nodes during partitions) break the overlap guarantee entirely. True linearizability requires either waiting for all replicas or using consensus to establish a global order.
  5. MongoDB offers "majority" read concern. Is this linearizable?
    This tests whether "majority" equals linearizability. It doesn't. "Majority" read concern means you read data that has been acknowledged by a majority of replicas it won't be rolled back. But between when a write completes on the primary and when a majority acknowledge it, other clients reading with majority concern might not see the write yet. It's not linearizable. MongoDB does offer "linearizable" read concern separately, which adds a round-trip to ensure you're reading the latest majority-committed data.
  6. You're implementing a distributed lock service. A client acquires a lock, does some work, then releases it. Another client acquires the same lock. Do you need linearizability?
    This highlights a subtle correctness requirement. Yes, you need linearizability for the lock operations themselves. If the lock isn't linearizable, two clients might both successfully "acquire" the same lock if their requests go to different replicas. The "fencing token" pattern helps: each lock acquisition gets a monotonically increasing token, and the protected resource rejects operations with old tokens. But the token assignment itself needs to be linearizable (or at least use a consensus protocol).
  7. Causal consistency tracks "happens-before" relationships. What's the practical challenge in implementing this at scale?
    This explores implementation complexity. Tracking causal dependencies requires metadata either vector clocks (O(n) space where n is number of nodes/clients) or explicit dependency lists. As the system scales, this metadata grows. Delivery must be delayed until all causal dependencies are satisfied, which requires buffering and reordering. In practice, systems like MongoDB limit causal tracking to a single session (much smaller vector). Cross-session causal consistency at scale is expensive.
  8. Your system uses last-write-wins (LWW) for conflict resolution. Two users edit a document simultaneously. What happens to the "losing" edit?
    This highlights the data loss risk in LWW. The losing edit is silently discarded. The user who made it doesn't get an error their write "succeeded" but their data is gone. This is why LWW is dangerous for user data. Alternatives: multi-value (keep both, let user merge), operational transformation (merge automatically), CRDTs (design data structures where merge is lossless). LWW is acceptable only when losing data is acceptable (e.g., caching, metrics).
  9. A team decides to use Spanner for all their data because they want strong consistency everywhere. What problems might they encounter?
    This tests understanding of consistency costs. Spanner provides linearizability, but at a cost: higher latency (commit-wait of 7-10ms), higher operational cost (TrueTime requires GPS and atomic clocks), and reduced availability during partitions. For read-heavy workloads with staleness tolerance (user profiles, product catalogs), they're paying this cost unnecessarily. The solution is polyglot persistence: use Spanner for operations requiring strong consistency, use cheaper eventually consistent storage for others.
  10. You're debugging an issue where a user's session sometimes sees stale data after their own writes. Your system implements read-your-writes consistency. What could be wrong?
    This is a practical debugging question. Possible causes: (1) Session ID isn't being preserved across requests (load balancer, new browser tab). (2) The last-write-timestamp isn't being tracked correctly. (3) The replica selection logic has a bug it's not actually checking the replica's position. (4) The write timestamp comes from the client, but there's clock skew with the server. (5) There's a caching layer between the app and database that doesn't respect session consistency. (6) The "caught-up" check is using >= when it should use > (off-by-one). Debugging requires tracing the full request path.

Next Module: CAP Theorem and Beyond understanding the fundamental impossibility results that constrain distributed system design, and what PACELC adds to the picture.
All Blogs
Tags:consistencyeventual-consistencylinearizabilitycausal-consistencydistributed-databasesquorum