Redis Deep Dive: Architecture, Performance & Production Scaling

Introduction

Redis is often described as "just a cache," but that undersells what is actually one of the most elegantly engineered pieces of systems software. Understanding Redis at the architectural level transforms how you use it from a black-box cache to a predictable, tunable component in your distributed system.

This guide goes beyond the documentation. We'll examine why Redis achieves sub-millisecond latencies, how its data structures are implemented, and what actually happens during persistence and failover.

Chapter 1: What Redis Actually Is

The Core Abstraction

Redis is an in-memory data structure server. Not just a key-value store a server that exposes rich data structures (strings, hashes, lists, sets, sorted sets, streams, hyperloglogs) with atomic operations.

┌─────────────────────────────────────────────────────────────────┐
│                        REDIS SERVER                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐             │
│  │   Strings   │  │   Hashes    │  │   Lists     │             │
│  │   "user:1"  │  │  "session"  │  │  "queue"    │             │
│  └─────────────┘  └─────────────┘  └─────────────┘             │
│                                                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐             │
│  │    Sets     │  │ Sorted Sets │  │   Streams   │             │
│  │  "tags:1"   │  │ "leaderboard"│ │  "events"   │             │
│  └─────────────┘  └─────────────┘  └─────────────┘             │
│                                                                  │
│              ALL DATA LIVES IN RAM                               │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Common Production Use Cases

1. CACHING
   └── Database query results, API responses, computed values
   └── TTL-based expiration
   └── Cache-aside, write-through, write-behind patterns

2. SESSION STORAGE
   └── User sessions with automatic expiration
   └── Distributed session management across app servers

3. RATE LIMITING
   └── Token bucket, sliding window implementations
   └── Atomic INCR with EXPIRE

4. DISTRIBUTED LOCKING
   └── SETNX-based locks
   └── Redlock algorithm for distributed consensus

5. MESSAGE QUEUES
   └── LPUSH/BRPOP for simple queues
   └── Streams for persistent, consumer-group based messaging

6. REAL-TIME LEADERBOARDS
   └── Sorted sets with ZADD/ZRANGE
   └── O(log N) insertions, O(log N + M) range queries

7. PUB/SUB
   └── Real-time notifications
   └── Event broadcasting

Chapter 2: Why Redis Is Fast

Redis consistently delivers sub-millisecond latencies. Let's understand exactly why.

2.1 In-Memory Storage

The most obvious reason: RAM is fast.

Storage Latency Comparison:
─────────────────────────────────────────────────────────────────
L1 Cache:           ~1 ns
L2 Cache:           ~4 ns
L3 Cache:           ~12 ns
RAM:                ~100 ns        ← Redis operates here
NVMe SSD:           ~20,000 ns     (200x slower)
SATA SSD:           ~100,000 ns    (1000x slower)
HDD:                ~10,000,000 ns (100,000x slower)
─────────────────────────────────────────────────────────────────

Redis keeps everything in RAM. No disk seeks, no filesystem overhead, no page faults during normal operations.

2.2 Single-Threaded Event Loop

This is the counterintuitive genius of Redis: one thread handles all commands.

Traditional Multi-threaded Server:
┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│  Request 1 ──► Thread 1 ──► Lock ──► Data ──► Unlock ──► Response│
│  Request 2 ──► Thread 2 ──► Lock (wait) ──────────────────────► │
│  Request 3 ──► Thread 3 ──► Lock (wait) ──────────────────────► │
│                                                                  │
│  Problems:                                                       │
│  - Lock contention                                              │
│  - Context switching overhead                                    │
│  - Cache line invalidation                                      │
│  - Complex synchronization                                       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Redis Single-threaded Model:
┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│  Request 1 ─┐                                                    │
│  Request 2 ─┼──► Event Queue ──► Single Thread ──► Responses    │
│  Request 3 ─┘         │              │                          │
│                       │              │                          │
│               (epoll waits)    (processes one                   │
│                                 command at a time)              │
│                                                                  │
│  Benefits:                                                       │
│  - No locks needed (single writer)                              │
│  - No context switching                                         │
│  - CPU cache stays hot                                          │
│  - Predictable latency                                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Why single-threaded works:

Memory operations are so fast that one thread can handle 100K+ ops/sec
Lock overhead would cost more than the actual work
CPU cache efficiency is maximized

2.3 Event-Driven I/O (epoll/kqueue)

Redis uses non-blocking I/O multiplexing:

Event Loop Architecture:
─────────────────────────────────────────────────────────────────

                    ┌────────────────────────────┐
                    │       Event Loop           │
                    │   (ae_epoll / ae_kqueue)   │
                    └─────────────┬──────────────┘
                                  │
              ┌───────────────────┼───────────────────┐
              │                   │                   │
              ▼                   ▼                   ▼
    ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
    │ Client Socket 1 │  │ Client Socket 2 │  │ Client Socket N │
    │   (readable)    │  │   (writable)    │  │   (readable)    │
    └─────────────────┘  └─────────────────┘  └─────────────────┘

Loop iteration:
1. epoll_wait() - block until sockets ready
2. For each ready socket:
   - If readable: read command, execute, queue response
   - If writable: flush pending responses
3. Process time events (background tasks)
4. Repeat

No thread-per-connection. One thread handles thousands of connections.

2.4 Optimized Memory Allocator

Redis uses jemalloc (or can use tcmalloc):

Why jemalloc matters:
─────────────────────────────────────────────────────────────────

Standard malloc:
- Memory fragmentation over time
- Lock contention in multi-threaded (not Redis's issue)
- Unpredictable allocation latency

jemalloc optimizations:
- Thread-local caches (fast allocation path)
- Size-class based allocation (reduces fragmentation)
- Efficient memory reuse
- Predictable latency

Redis memory efficiency:
┌─────────────────────────────────────────────────────────────────┐
│  Small strings (< 44 bytes): Stored inline in object header    │
│  Integer strings: Encoded as actual integers (8 bytes)         │
│  Shared objects: Common values like small integers shared      │
│  Ziplist encoding: Compact representation for small collections│
└─────────────────────────────────────────────────────────────────┘

2.5 Complete Request Flow

Let's trace a GET user:123 command:

Client Request to Response:
─────────────────────────────────────────────────────────────────

1. CLIENT SENDS COMMAND
   ┌─────────────────────────────────────────────────────────────┐
   │  TCP Socket → Kernel buffer → epoll signals "readable"      │
   │  Wire format (RESP): *2\r\n$3\r\nGET\r\n$8\r\nuser:123\r\n │
   └─────────────────────────────────────────────────────────────┘
                                │
                                ▼
2. EVENT LOOP WAKES
   ┌─────────────────────────────────────────────────────────────┐
   │  epoll_wait() returns                                        │
   │  Redis reads from socket into client query buffer            │
   │  Parser identifies complete command                          │
   └─────────────────────────────────────────────────────────────┘
                                │
                                ▼
3. COMMAND DISPATCH
   ┌─────────────────────────────────────────────────────────────┐
   │  Lookup command in command table: "GET" → getCommand()      │
   │  Validate argument count                                     │
   │  Check ACL permissions (Redis 6+)                            │
   └─────────────────────────────────────────────────────────────┘
                                │
                                ▼
4. KEY LOOKUP
   ┌─────────────────────────────────────────────────────────────┐
   │  Hash "user:123" → dictionary bucket                        │
   │  Traverse bucket chain (usually 1-2 entries)                │
   │  Find dictEntry with matching key                           │
   │  Return pointer to value object                             │
   │                                                              │
   │  Time: O(1) average, few pointer dereferences               │
   └─────────────────────────────────────────────────────────────┘
                                │
                                ▼
5. RESPONSE GENERATION
   ┌─────────────────────────────────────────────────────────────┐
   │  Format value as RESP: $5\r\nhello\r\n                      │
   │  Append to client output buffer                             │
   │  Mark socket as "writable" for epoll                        │
   └─────────────────────────────────────────────────────────────┘
                                │
                                ▼
6. RESPONSE SENT
   ┌─────────────────────────────────────────────────────────────┐
   │  Next epoll iteration: socket writable                      │
   │  Write output buffer to socket                              │
   │  Kernel handles TCP delivery                                │
   └─────────────────────────────────────────────────────────────┘

Total time: ~100-200 microseconds (including network)

Chapter 3: Data Structure Internals

Redis's power comes from its data structures. Let's examine how they're actually implemented.

3.1 Strings

The simplest but most optimized structure:

String Encodings:
─────────────────────────────────────────────────────────────────

1. INT ENCODING (numbers that fit in long)
   Value: "12345"
   ┌───────────────────────────────────────┐
   │  redisObject                          │
   │  type: STRING                         │
   │  encoding: INT                        │
   │  ptr: 12345 (actual value, not ptr!)  │  ← Only 32 bytes total
   └───────────────────────────────────────┘

2. EMBSTR ENCODING (strings ≤ 44 bytes)
   Value: "hello"
   ┌───────────────────────────────────────┐
   │  redisObject + SDS header + data      │
   │  [type|enc|ptr][len|free][h|e|l|l|o]  │  ← Single allocation
   └───────────────────────────────────────┘

3. RAW ENCODING (strings > 44 bytes)
   Value: "very long string..."
   ┌───────────────┐     ┌────────────────────┐
   │ redisObject   │────►│  SDS (separate     │
   │ type: STRING  │     │  allocation)       │
   │ encoding: RAW │     │  [len][free][data] │
   └───────────────┘     └────────────────────┘

SDS (Simple Dynamic String):
┌────────────────────────────────────────────────────────────────┐
│  struct sdshdr {                                                │
│      int len;      // Current string length                    │
│      int free;     // Free space at end                        │
│      char buf[];   // Actual string data                       │
│  };                                                              │
│                                                                  │
│  Benefits:                                                       │
│  - O(1) length retrieval (no strlen)                           │
│  - Binary safe (can contain \0)                                │
│  - Prevents buffer overflow                                     │
│  - Reduces reallocations via preallocation                     │
└────────────────────────────────────────────────────────────────┘

3.2 Hashes

Two internal representations:

ZIPLIST (small hashes):
─────────────────────────────────────────────────────────────────
When: < 512 fields AND all values < 64 bytes (configurable)

Structure:
┌────┬────┬────────┬────────┬────────┬────────┬────┐
│zlb │zlt │entry1  │entry2  │entry3  │entry4  │zlend│
│ytes│ail │(field) │(value) │(field) │(value) │    │
└────┴────┴────────┴────────┴────────┴────────┴────┘

- Contiguous memory block
- Linear scan for lookup (but fast for small N)
- Cache-friendly
- Memory efficient (no pointers)


HASHTABLE (large hashes):
─────────────────────────────────────────────────────────────────
When: Exceeds ziplist thresholds

Structure:
┌─────────────────────────────────────────────────────────────────┐
│  dict {                                                          │
│    dictEntry **table[0]    // Main hash table                   │
│    dictEntry **table[1]    // Rehashing table (when active)     │
│    long size               // Table size (power of 2)           │
│    long used               // Number of entries                 │
│    int rehashidx           // Rehashing progress (-1 if idle)   │
│  }                                                               │
│                                                                  │
│  Hash function: SipHash (cryptographic, prevents hash attacks)  │
│  Collision resolution: Chaining                                  │
│  Load factor trigger: > 1.0 (or > 5.0 during BGSAVE)           │
└─────────────────────────────────────────────────────────────────┘

Incremental Rehashing:
- When table needs to grow/shrink
- Don't rehash all at once (would block)
- Rehash 1 bucket per operation
- Both tables active during transition

3.3 Lists

QUICKLIST (Redis 3.2+):
─────────────────────────────────────────────────────────────────

Previous implementations:
- Linked list: Fast push/pop, high memory overhead
- Ziplist: Memory efficient, slow middle insertions

Quicklist: Best of both worlds
┌───────────────────────────────────────────────────────────────┐
│                                                                │
│  quicklist:                                                    │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐               │
│  │ ziplist  │◄──►│ ziplist  │◄──►│ ziplist  │               │
│  │ (≤8KB)   │    │ (≤8KB)   │    │ (≤8KB)   │               │
│  │[a][b][c] │    │[d][e][f] │    │[g][h][i] │               │
│  └──────────┘    └──────────┘    └──────────┘               │
│                                                                │
│  Doubly-linked list of ziplists                               │
│                                                                │
│  Operations:                                                   │
│  LPUSH/RPUSH: O(1) - prepend/append to head/tail ziplist      │
│  LPOP/RPOP: O(1) - remove from head/tail                       │
│  LINDEX: O(N) - linear scan                                    │
│  LINSERT: O(N) - find position, insert in ziplist             │
│                                                                │
└───────────────────────────────────────────────────────────────┘

3.4 Sets

Two encodings:
─────────────────────────────────────────────────────────────────

INTSET (small sets of integers):
┌─────────────────────────────────────────────────────────────────┐
│  intset {                                                        │
│    uint32_t encoding;  // int16, int32, or int64               │
│    uint32_t length;                                              │
│    int8_t contents[];  // Sorted array of integers              │
│  }                                                               │
│                                                                  │
│  - Binary search for lookup: O(log N)                           │
│  - Memory efficient                                              │
│  - Auto-upgrades encoding when needed                           │
└─────────────────────────────────────────────────────────────────┘

HASHTABLE (general sets):
┌─────────────────────────────────────────────────────────────────┐
│  Same dict structure as hashes                                  │
│  Values are NULL (only keys matter)                             │
│  O(1) add/remove/contains                                       │
└─────────────────────────────────────────────────────────────────┘

3.5 Sorted Sets (Skip List + Hash Table)

The most sophisticated structure:

Sorted Set Internals:
─────────────────────────────────────────────────────────────────

Two structures maintained simultaneously:

1. SKIP LIST (for range queries):
┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│  Level 3:  HEAD ─────────────────────────────────► 90 ─► NIL   │
│  Level 2:  HEAD ─────────► 30 ─────────────────► 90 ─► NIL    │
│  Level 1:  HEAD ──► 10 ──► 30 ─────────► 70 ──► 90 ─► NIL     │
│  Level 0:  HEAD ──► 10 ──► 30 ──► 50 ──► 70 ──► 90 ─► NIL     │
│                     ▲      ▲      ▲      ▲      ▲              │
│                   score  score  score  score  score            │
│                                                                  │
│  Probabilistic balancing (no rotations like trees)              │
│  Average O(log N) for all operations                            │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

2. HASH TABLE (for O(1) score lookup by member):
┌─────────────────────────────────────────────────────────────────┐
│  member → score mapping                                          │
│  "player:alice" → 10                                             │
│  "player:bob"   → 30                                             │
│  ...                                                              │
└─────────────────────────────────────────────────────────────────┘

Why both?
- Skip list: Efficient range queries (ZRANGE, ZRANGEBYSCORE)
- Hash table: O(1) ZSCORE, membership check

Memory overhead is acceptable because:
- Skip list nodes point to same SDS strings (not copied)
- Enables both use cases efficiently

Chapter 4: Persistence in Redis

Redis offers two persistence mechanisms, each with distinct trade-offs.

4.1 RDB (Redis Database) Snapshots

RDB: Point-in-time snapshots
─────────────────────────────────────────────────────────────────

Trigger conditions:
- SAVE command (blocking)
- BGSAVE command (background fork)
- Automatic based on config:
  save 900 1    # After 900 sec if >= 1 key changed
  save 300 10   # After 300 sec if >= 10 keys changed
  save 60 10000 # After 60 sec if >= 10000 keys changed

BGSAVE Process:
┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│  1. Redis forks child process                                    │
│                                                                  │
│     Parent Process              Child Process                   │
│     (continues serving)         (writes snapshot)               │
│     ┌─────────────┐             ┌─────────────┐                │
│     │ Memory      │             │ Memory      │                │
│     │ (shared via │◄───────────►│ (copy-on-   │                │
│     │  COW)       │             │  write)     │                │
│     └─────────────┘             └──────┬──────┘                │
│           │                            │                        │
│           │                            ▼                        │
│     Handles requests            Writes dump.rdb                 │
│                                                                  │
│  2. Child iterates all keys                                      │
│  3. Serializes to RDB format                                     │
│  4. Writes to temp file                                          │
│  5. Renames temp → dump.rdb (atomic)                            │
│  6. Child exits                                                  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

RDB File Format:
┌────────────────────────────────────────────────────────────────┐
│  [REDIS][RDB-VERSION][METADATA][DB-SELECTOR][KEY-VALUE-PAIRS]  │
│  [EOF][CRC64-CHECKSUM]                                          │
└────────────────────────────────────────────────────────────────┘

Pros:
+ Compact binary format
+ Fast restarts (bulk load)
+ Good for backups

Cons:
- Data loss between snapshots
- Fork can be slow for large datasets
- Memory spike during fork (COW pages)

4.2 AOF (Append Only File)

AOF: Command logging
─────────────────────────────────────────────────────────────────

Every write command appended to log:

┌─────────────────────────────────────────────────────────────────┐
│  appendonly.aof:                                                 │
│  *3\r\n$3\r\nSET\r\n$3\r\nfoo\r\n$3\r\nbar\r\n                 │
│  *3\r\n$3\r\nSET\r\n$3\r\nbaz\r\n$3\r\nqux\r\n                 │
│  *2\r\n$3\r\nDEL\r\n$3\r\nfoo\r\n                               │
│  ...                                                             │
└─────────────────────────────────────────────────────────────────┘

fsync Policies:
┌─────────────────────────────────────────────────────────────────┐
│  appendfsync always                                              │
│  └── fsync after every write                                    │
│  └── Safest: ~1 command loss max                                │
│  └── Slowest: disk I/O on every write                          │
│                                                                  │
│  appendfsync everysec (recommended)                              │
│  └── fsync every second in background                           │
│  └── Balance: ~1 second data loss max                          │
│  └── Good performance                                            │
│                                                                  │
│  appendfsync no                                                  │
│  └── Let OS decide when to flush                                │
│  └── Fastest but unpredictable durability                       │
│  └── Could lose minutes of data                                  │
└─────────────────────────────────────────────────────────────────┘

AOF Rewrite (compaction):
┌─────────────────────────────────────────────────────────────────┐
│  Problem: AOF grows forever                                      │
│                                                                  │
│  Original AOF:              After Rewrite:                       │
│  SET x 1                    SET x 100                           │
│  SET x 2                    SET y 200                           │
│  SET x 3                    (History collapsed to                │
│  ...                         current state)                      │
│  SET x 100                                                       │
│  SET y 200                                                       │
│                                                                  │
│  Trigger: auto-aof-rewrite-percentage 100                       │
│           auto-aof-rewrite-min-size 64mb                        │
│                                                                  │
│  Process:                                                        │
│  1. Fork child (like BGSAVE)                                     │
│  2. Child writes current state as commands                       │
│  3. Parent buffers new writes                                    │
│  4. Child finishes, parent appends buffer                        │
│  5. Atomic swap of AOF files                                     │
└─────────────────────────────────────────────────────────────────┘

4.3 Crash Recovery

Recovery Scenarios:
─────────────────────────────────────────────────────────────────

SCENARIO 1: RDB Only
┌─────────────────────────────────────────────────────────────────┐
│  Last snapshot: 10:00 AM                                         │
│  Crash time: 10:15 AM                                            │
│  Data loss: 15 minutes of writes                                 │
│                                                                  │
│  Recovery:                                                        │
│  1. Load dump.rdb                                                │
│  2. State restored to 10:00 AM                                   │
└─────────────────────────────────────────────────────────────────┘

SCENARIO 2: AOF (everysec)
┌─────────────────────────────────────────────────────────────────┐
│  Last fsync: 10:14:59                                            │
│  Crash time: 10:15:00                                            │
│  Data loss: ~1 second                                            │
│                                                                  │
│  Recovery:                                                        │
│  1. Replay AOF commands from start                               │
│  2. State restored to ~10:14:59                                  │
│                                                                  │
│  Note: Replay can be slow for large AOF                          │
└─────────────────────────────────────────────────────────────────┘

SCENARIO 3: RDB + AOF (recommended)
┌─────────────────────────────────────────────────────────────────┐
│  Recovery uses AOF (more complete)                               │
│  But RDB provides faster disaster recovery backup                │
│                                                                  │
│  Best practice:                                                   │
│  - Enable both                                                    │
│  - Use AOF for durability                                        │
│  - Use RDB for backups (copy off-server)                         │
└─────────────────────────────────────────────────────────────────┘

4.4 Replication

Master-Replica Replication:
─────────────────────────────────────────────────────────────────

┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│           ┌───────────┐                                         │
│     ┌────►│  Replica  │ (read-only)                             │
│     │     └───────────┘                                         │
│     │                                                            │
│  ┌──┴────────┐                                                   │
│  │  Master   │ (all writes)                                      │
│  └──┬────────┘                                                   │
│     │                                                            │
│     │     ┌───────────┐                                         │
│     └────►│  Replica  │ (read-only)                             │
│           └───────────┘                                         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Replication Process:
1. INITIAL SYNC
   - Replica sends PSYNC to master
   - Master starts BGSAVE
   - Master sends RDB to replica
   - Replica loads RDB
   - Master sends backlog of writes during BGSAVE

2. CONTINUOUS REPLICATION
   - Master streams commands to replicas
   - Asynchronous (eventual consistency)
   - Replica acknowledges progress (replication offset)

3. PARTIAL RESYNC
   - If replica disconnects briefly
   - Master has backlog buffer
   - Resume from offset (no full sync needed)

Replication is ASYNCHRONOUS by default:
- Master doesn't wait for replica ACK
- Can configure WAIT command for synchronous semantics
- Trade-off: performance vs durability

Chapter 5: Scaling Redis

5.1 Vertical Scaling Limits

Single Redis Instance Limits:
─────────────────────────────────────────────────────────────────

Memory:
- Practical limit: ~100GB per instance
- Beyond this: fork for BGSAVE becomes problematic
- Memory fragmentation accumulates

CPU:
- Single-threaded: one core maximum
- Typical: 100K-200K ops/sec per instance
- Network can become bottleneck before CPU

Network:
- 10Gbps can be saturated with large values
- Small values: CPU-bound before network

5.2 Redis Cluster

Redis Cluster Architecture:
─────────────────────────────────────────────────────────────────

┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│  16,384 Hash Slots distributed across nodes                     │
│                                                                  │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ Slots: 0────5460          5461────10922       10923────16383│ │
│  └────────────────────────────────────────────────────────────┘ │
│           │                       │                    │        │
│           ▼                       ▼                    ▼        │
│     ┌──────────┐           ┌──────────┐          ┌──────────┐  │
│     │  Node 1  │           │  Node 2  │          │  Node 3  │  │
│     │ (Master) │           │ (Master) │          │ (Master) │  │
│     └────┬─────┘           └────┬─────┘          └────┬─────┘  │
│          │                      │                     │         │
│          ▼                      ▼                     ▼         │
│     ┌──────────┐           ┌──────────┐          ┌──────────┐  │
│     │ Replica  │           │ Replica  │          │ Replica  │  │
│     └──────────┘           └──────────┘          └──────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Slot Assignment:
key = "user:123"
slot = CRC16(key) % 16384 = 7342
node = lookup_slot_owner(7342) = Node 2

Hash Tags (force co-location):
key1 = "{user:123}:profile"
key2 = "{user:123}:sessions"
slot = CRC16("user:123") % 16384  # Same slot for both!

5.3 Cluster Operations

Adding a Node:
─────────────────────────────────────────────────────────────────
1. Start new Redis instance with cluster-enabled yes
2. redis-cli --cluster add-node new:6379 existing:6379
3. redis-cli --cluster reshard existing:6379
   - Move slots from existing nodes to new node
   - Data migrates automatically

Removing a Node:
─────────────────────────────────────────────────────────────────
1. Reshard slots away from node
2. redis-cli --cluster del-node <node-id>

Failover:
─────────────────────────────────────────────────────────────────
1. Master fails (detected via gossip protocol)
2. Replicas exchange failure votes
3. One replica promoted to master
4. Cluster updates slot mappings
5. Clients redirected via MOVED response

Failover time: ~1-2 seconds typical

5.4 Redis Sentinel (HA without Cluster)

Sentinel for High Availability:
─────────────────────────────────────────────────────────────────

┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│    ┌────────────┐  ┌────────────┐  ┌────────────┐              │
│    │ Sentinel 1 │  │ Sentinel 2 │  │ Sentinel 3 │              │
│    └──────┬─────┘  └──────┬─────┘  └──────┬─────┘              │
│           │               │               │                     │
│           └───────────────┼───────────────┘                     │
│                           │ (gossip + quorum)                   │
│                           ▼                                     │
│                    ┌─────────────┐                              │
│                    │   Master    │                              │
│                    └──────┬──────┘                              │
│                           │                                     │
│              ┌────────────┴────────────┐                       │
│              ▼                         ▼                        │
│       ┌─────────────┐          ┌─────────────┐                 │
│       │  Replica 1  │          │  Replica 2  │                 │
│       └─────────────┘          └─────────────┘                 │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Sentinel Functions:
1. MONITORING: Continuous health checks
2. NOTIFICATION: Alert when master fails
3. AUTOMATIC FAILOVER: Promote replica
4. CONFIGURATION PROVIDER: Clients query Sentinel for master address

Quorum:
- Need majority agreement for failover
- 3 Sentinels = tolerate 1 failure
- 5 Sentinels = tolerate 2 failures

Use Sentinel when:
- Single dataset fits in one node
- Need HA but not horizontal scaling
- Simpler than Cluster

Chapter 6: Production Considerations

6.1 Common Pitfalls

PITFALL 1: Keys Without TTL
─────────────────────────────────────────────────────────────────
Problem: Memory grows unbounded
Solution: Always set TTL on cache keys
         Use SCAN + TTL audit to find offenders

PITFALL 2: Blocking Commands on Production
─────────────────────────────────────────────────────────────────
Problem: KEYS *, SMEMBERS on large sets block server
Solution: Use SCAN variants (SCAN, SSCAN, HSCAN, ZSCAN)
         Never use KEYS in production

PITFALL 3: Hot Keys
─────────────────────────────────────────────────────────────────
Problem: One key gets all traffic, single node bottleneck
Solution: Client-side caching, key sharding, read replicas

PITFALL 4: Large Values
─────────────────────────────────────────────────────────────────
Problem: 10MB values block server during read/write
Solution: Keep values < 100KB
         Chunk large data
         Consider different storage for blobs

PITFALL 5: Fork Under Memory Pressure
─────────────────────────────────────────────────────────────────
Problem: BGSAVE fork fails when memory near limit
Solution: Keep used_memory < 50% of total for safe forking
         Monitor copy-on-write pages

6.2 Monitoring Essentials

Critical Metrics:
─────────────────────────────────────────────────────────────────

Memory:
- used_memory / maxmemory (keep < 80%)
- mem_fragmentation_ratio (1.0-1.5 normal)
- evicted_keys (should be 0 if possible)

Performance:
- instantaneous_ops_per_sec
- latency percentiles (redis-cli --latency)
- slowlog (commands > threshold)

Replication:
- master_link_status (must be "up")
- master_last_io_seconds_ago (should be < 10)
- repl_backlog_size

Clients:
- connected_clients
- blocked_clients (waiting on BRPOP etc)
- client_recent_max_output_buffer

Persistence:
- rdb_last_bgsave_status
- aof_last_write_status
- rdb_last_bgsave_time_sec

6.3 Configuration Recommendations

Production redis.conf:
─────────────────────────────────────────────────────────────────

# Memory
maxmemory 8gb
maxmemory-policy allkeys-lru

# Persistence (durable setup)
appendonly yes
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# RDB (for backups, not primary durability)
save 900 1
save 300 10
save 60 10000

# Network
tcp-backlog 511
timeout 0
tcp-keepalive 300

# Limits
maxclients 10000

# Slow log
slowlog-log-slower-than 10000  # 10ms
slowlog-max-len 128

# Memory management
activedefrag yes  # Redis 4.0+

Summary

Redis achieves extraordinary performance through deliberate architectural choices:

In-memory storage eliminates disk latency
Single-threaded execution eliminates lock overhead
Event-driven I/O handles thousands of connections efficiently
Optimized data structures minimize memory and CPU
Flexible persistence balances durability and performance

Understanding these internals helps you:

Right-size your deployment
Avoid common pitfalls
Troubleshoot performance issues
Make informed trade-offs

Redis is not magic it's carefully engineered. Knowing how it works makes you a better engineer.