Part 34: CRDTs - Conflict-Free Replicated Data Types

"What if we could design data structures so clever that conflicts simply couldn't occur? CRDTs represent a fundamental shift in thinking about distributed data: instead of coordinating writes, we make coordination unnecessary."

The Coordination Problem

Throughout this course, we've encountered the fundamental tension in distributed systems: consistency requires coordination, but coordination is expensive. Strong consistency demands that nodes agree on the order of operations, which requires communication, which means latency, which means unavailability during partitions. We've explored various points on this spectrum—from linearizability to eventual consistency—accepting different tradeoffs.

But what if there were a way out? What if we could design data structures that achieve consistency without coordination? Data structures where every node can accept writes locally, replicate them asynchronously, and still converge to the same state—without conflicts, without coordination, without complex resolution logic?

This is the promise of Conflict-free Replicated Data Types, or CRDTs. They achieve the seemingly impossible: strong eventual consistency—guaranteed convergence to the same state—without requiring any synchronous coordination.

The Key Insight

The key insight behind CRDTs is mathematical: certain operations are commutative, meaning their order doesn't matter. If you're adding numbers, 3 + 5 + 2 gives the same result as 2 + 3 + 5 or 5 + 2 + 3. If all your operations have this property, then nodes can apply operations in any order and still arrive at the same final state.

But it's not just commutativity. CRDTs leverage a broader mathematical structure: semilattices. A semilattice has a merge operation that is commutative (order doesn't matter), associative (grouping doesn't matter), and idempotent (applying the same thing twice has no additional effect). When you merge states rather than applying operations, duplicate deliveries and out-of-order arrivals don't cause problems.

This mathematical foundation ensures that no matter how messages are delayed, duplicated, or reordered by the network, all nodes will eventually converge to the same state. The network can be unreliable, and the system still works correctly.

Two Flavors: State-Based and Operation-Based

CRDTs come in two flavors that achieve the same goal through different mechanisms.

State-based CRDTs, also called convergent replicated data types (CvRDTs), replicate by sending their entire state. When a node receives the state of a remote replica, it merges that state with its own using a join operation. The join must form a semilattice—it must be commutative, associative, and idempotent. No matter how many times you merge the same states, in any order, you get the same result.

The advantage of state-based CRDTs is simplicity in the replication layer. You just periodically send your state and merge incoming states. The disadvantage is bandwidth: sending the entire state can be expensive for large data structures.

Operation-based CRDTs, also called commutative replicated data types (CmRDTs), replicate by sending operations rather than states. When a node performs an operation locally, it broadcasts that operation to other nodes. The operations must be commutative so that applying them in any order yields the same result. Additionally, the delivery layer must guarantee that each operation is delivered exactly once to each node.

The advantage of operation-based CRDTs is efficiency: operations are typically much smaller than full states. The disadvantage is the requirement for exactly-once delivery, which shifts complexity to the messaging layer.

In practice, many systems use hybrid approaches, combining aspects of both flavors.

The G-Counter: Counting Without Coordination

The simplest CRDT to understand is the G-Counter, a grow-only counter. Imagine you want to count events across multiple servers—page views, for instance. Each server sees some of the views and needs to increment a shared counter.

The naive approach—each server holding an integer that it increments and broadcasts—doesn't work. If Server A has value 5 and Server B has value 7, what's the merged value? You might think 12, but what if some increments were double-counted due to message redelivery?

The G-Counter solves this elegantly. Instead of a single integer, each node maintains a map from node identifiers to integers. Each node only increments its own entry. To get the total count, you sum all entries. To merge two counters, you take the maximum for each node.

Suppose we have three nodes: A, B, and C. Node A has seen 5 events, Node B has seen 7, and Node C has seen 3. Each node's counter looks like:

Node A: {A: 5, B: 0, C: 0} Node B: {A: 0, B: 7, C: 0} Node C: {A: 0, B: 0, C: 3}

After replication and merging, all nodes have: {A: 5, B: 7, C: 3}

And the total count is 15. No matter what order the merges happen, no matter if some messages are delayed or duplicated, all nodes converge to the same state with the correct total.

The PN-Counter: Counting Up and Down

The G-Counter can only grow. What if you need a counter that can also decrement—for tracking inventory, for instance, where items are both added and removed?

The PN-Counter builds on the G-Counter by maintaining two G-Counters: one for increments (P for positive) and one for decrements (N for negative). The value of the counter is P - N. Incrementing adds to P; decrementing adds to N.

This might seem wasteful—why track increments and decrements separately instead of just maintaining a single value? The reason is convergence. A single value that goes up and down can't be merged correctly without knowing the sequence of operations. But the two G-Counters can always be merged by taking element-wise maximums, and the difference gives the correct count.

This pattern—decomposing a complex operation into multiple monotonically growing components—appears frequently in CRDT design. Monotonic growth makes merge operations straightforward.

LWW-Register: Last Writer Wins

What about a simple mutable value—a register that can be set to any value? The challenge is concurrent writes. If Node A sets the value to "apple" and Node B simultaneously sets it to "banana", what should the merged value be?

The Last-Writer-Wins Register (LWW-Register) resolves this by associating each value with a timestamp. When merging, the value with the higher timestamp wins. Concurrency is resolved by time: the later write is considered the "correct" one.

This is simple and effective, but it has limitations. The notion of "later" depends on clocks, which might not be perfectly synchronized across nodes. A write that logically should win might lose because of clock skew. Additionally, concurrent writes mean one value is silently discarded—there's no indication that a conflict occurred.

For some use cases, these limitations are acceptable. User preferences, session data, and caches often work well with LWW semantics. For others, more sophisticated approaches are needed.

MV-Register: Preserving Concurrent Values

The Multi-Value Register (MV-Register) takes a different approach to concurrent writes: instead of discarding all but one value, it keeps all concurrently written values. When a conflict occurs, the register returns a set of values, leaving conflict resolution to the application.

This might seem like punting on the problem, but it's often exactly what's needed. Consider a collaborative text editor where two users simultaneously change the document title. Rather than silently picking one title, the system can present both options to a user for manual resolution.

MV-Registers use vector clocks to track causality. Each value is tagged with a vector clock indicating when it was written. When merging, values are compared. If one value's vector clock dominates another (it's clearly later), the dominated value is discarded. If neither dominates (they're concurrent), both values are kept.

When a new value is written, it supersedes all current values. The new value's vector clock is set to be greater than all current values' clocks, ensuring that subsequent merges will discard the old values in favor of the new one.

Sets: Add and Remove

Sets present an interesting challenge. Consider a set that allows both adding and removing elements. If Node A adds element X, Node B removes element X (not having seen the add yet), and then the operations are replicated, what should the set contain?

Different CRDT set types give different answers, reflecting different semantic choices.

The G-Set (grow-only set) sidesteps the problem by only allowing additions. Elements can be added but never removed. Merging is simple set union. This works for use cases like tagging or membership where removal isn't needed.

The 2P-Set (two-phase set) allows removal by maintaining two G-Sets: one for added elements and one for removed elements. An element is in the set if it's in the add set but not in the remove set. Once an element is removed, it can never be re-added.

The LWW-Element-Set uses timestamps, similar to LWW-Register. Each element has an add timestamp and a remove timestamp. The element is present if the add timestamp is greater than the remove timestamp. This allows re-adding elements but depends on clock synchronization.

The OR-Set (observed-remove set) is more sophisticated. Each add operation generates a unique tag for the element. Remove operations remove specific tags. An element is present if it has any tags. This allows an element to be added, removed, and re-added without conflict, as long as the re-add wasn't concurrent with the remove.

The semantic differences matter. Choosing the right set CRDT requires understanding your use case: Can elements be removed? Can they be re-added? How should concurrent add and remove be resolved?

Maps and More Complex Structures

CRDTs can be composed to build more complex data structures. A CRDT map is essentially a mapping from keys to CRDT values. Each key can hold any CRDT—a register, a counter, a set, or another map.

Consider a user profile stored across multiple regions. The profile is a map: username might be an LWW-Register, friend IDs might be an OR-Set, and login count might be a G-Counter. Each field uses the appropriate CRDT for its semantics, and the map merges each field using that field's merge function.

Nested CRDTs allow representing complex data models. A document could be a map of paragraphs, each paragraph a sequence of characters. Each level uses CRDT semantics, enabling conflict-free replication of the entire structure.

Text Editing with CRDTs

Collaborative text editing is a flagship use case for CRDTs. Multiple users edit the same document simultaneously, each seeing their changes immediately, with all changes eventually converging to the same document.

The challenge is representing text as a CRDT. A naive approach—treating the document as a sequence of characters—doesn't work because insertion positions are ambiguous under concurrent editing. If User A inserts at position 5 and User B simultaneously inserts at position 10, the positions become invalid once both insertions are merged.

CRDT approaches to text editing assign unique, ordered identifiers to each character. Instead of positions, insertions specify "insert after character with ID X." These identifiers can be compared to determine order, and they remain stable under concurrent editing.

Algorithms like LSEQ, RGA, and YATA provide different schemes for generating and ordering these identifiers. They make various tradeoffs between identifier size, insertion performance, and worst-case behavior under adversarial editing patterns.

Modern collaborative applications like Google Docs, Figma, and various open-source tools use these algorithms to provide real-time collaboration with eventual consistency.

Practical Considerations

While CRDTs are theoretically elegant, practical deployment involves considerations beyond the mathematics.

Garbage collection is often necessary. Many CRDTs grow monotonically: the 2P-Set's remove set, the vector clocks in MV-Registers, and the tombstones tracking deleted elements. Over time, this metadata can grow large. Systems need strategies to prune old metadata, typically by coordinating (somewhat ironically) to establish points past which all replicas have seen all updates.

Metadata overhead can be significant. CRDTs often trade space for coordination freedom. A simple counter might require per-node counters; a set might require per-element unique identifiers and timestamps. For small data volumes, this overhead is negligible. At scale, it requires careful consideration.

Not everything fits the CRDT model. CRDTs work well when operations can be reordered without changing the semantic result. But some applications have inherently sequential semantics—financial ledgers where order determines legality, auctions where timing matters, or systems where exactly-once matters not just eventually-once.

Integration with existing systems requires thought. CRDTs define merge semantics, but you still need replication infrastructure. You need to decide when and how replicas exchange state or operations. You need to handle schema evolution and version compatibility.

The Philosophical Shift

CRDTs represent a philosophical shift in how we think about distributed data. Instead of enforcing a global order on operations—which requires coordination—we design data structures that are order-insensitive. Instead of preventing conflicts—which requires locking—we make conflicts impossible by construction.

This shift has implications beyond the specific data structures. It suggests a design principle: when possible, choose data models and operations that are inherently coordination-free. If you must count, use a CRDT counter. If you must collect elements, use a CRDT set. If you must track key-value pairs, use a CRDT map.

Of course, not all problems fit this model. But for those that do, CRDTs offer a powerful combination: strong eventual consistency with no coordination overhead, partition tolerance with no availability sacrifice, and local latency with global distribution.

As distributed systems become more prevalent and global distribution becomes more common, CRDTs and their underlying principles will only grow in importance. Understanding them is understanding a fundamental tool in the distributed systems designer's toolkit.

"CRDTs don't eliminate the challenges of distributed systems; they dissolve one particular challenge—write conflicts—by designing data structures where conflicts cannot occur. It's not magic; it's mathematics applied with elegance and purpose."