Module 11: Apache Kafka Deep Dive

What is Kafka?

A distributed streaming platform for building real-time data pipelines and streaming applications.

┌─────────────────────────────────────────────────────────────────┐
│                    KAFKA OVERVIEW                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Traditional Message Queue:      Kafka:                         │
│  ┌─────────────────────┐        ┌─────────────────────┐        │
│  │ Producer → Queue    │        │ Producer → Topic    │        │
│  │            ↓        │        │            ↓        │        │
│  │         Consumer    │        │    ┌───────┴───────┐│        │
│  │         (message    │        │    │               ││        │
│  │          deleted)   │        │ Consumer A   Consumer B      │
│  └─────────────────────┘        │ (independent, replay) │      │
│                                 └─────────────────────┘        │
│                                                                 │
│  Key differences:                                               │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ • Messages persisted (not deleted after consumption)    │   │
│  │ • Multiple consumers can read same message              │   │
│  │ • Consumers track their own position (offset)           │   │
│  │ • Replay possible (seek to any offset)                  │   │
│  │ • Designed for high throughput                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Use cases:                                                     │
│  • Event sourcing                                              │
│  • Log aggregation                                             │
│  • Stream processing                                           │
│  • Metrics collection                                          │
│  • CDC (Change Data Capture)                                   │
│  • Message queuing (with retention)                            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Core Concepts

Topics and Partitions

┌─────────────────────────────────────────────────────────────────┐
│              TOPICS AND PARTITIONS                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  TOPIC: Named stream of messages (like a database table)        │
│                                                                 │
│  Topic: "user-events"                                           │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                          │   │
│  │  Partition 0: [0|1|2|3|4|5|6|7|8|9|...]  → appends here │   │
│  │                                                          │   │
│  │  Partition 1: [0|1|2|3|4|5|6|...]                        │   │
│  │                                                          │   │
│  │  Partition 2: [0|1|2|3|4|...]                            │   │
│  │                                                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Why partitions?                                                │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 1. PARALLELISM                                          │   │
│  │    Each partition can be consumed independently         │   │
│  │    More partitions = more parallel consumers            │   │
│  │                                                          │   │
│  │ 2. SCALABILITY                                          │   │
│  │    Partitions spread across brokers                     │   │
│  │    Topic can be larger than single broker               │   │
│  │                                                          │   │
│  │ 3. ORDERING                                              │   │
│  │    Order guaranteed WITHIN partition                    │   │
│  │    No global order across partitions                    │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Partition assignment:                                          │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ partition = hash(key) % num_partitions                  │   │
│  │                                                          │   │
│  │ Same key → Same partition → Guaranteed order            │   │
│  │ user_id=123 always goes to partition 2                  │   │
│  │                                                          │   │
│  │ No key (null) → Round-robin across partitions           │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Brokers and Replication

┌─────────────────────────────────────────────────────────────────┐
│              KAFKA CLUSTER ARCHITECTURE                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Cluster with 3 brokers, replication factor = 3                 │
│                                                                 │
│  Topic: "orders" (3 partitions)                                 │
│                                                                 │
│  ┌───────────────────────────────────────────────────────────┐ │
│  │ Broker 1            Broker 2            Broker 3          │ │
│  │ ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    │ │
│  │ │ P0 (Leader) │    │ P0 (Follower)│    │ P0 (Follower)│    │ │
│  │ │ P1 (Follower)│    │ P1 (Leader) │    │ P1 (Follower)│    │ │
│  │ │ P2 (Follower)│    │ P2 (Follower)│    │ P2 (Leader) │    │ │
│  │ └─────────────┘    └─────────────┘    └─────────────┘    │ │
│  └───────────────────────────────────────────────────────────┘ │
│                                                                 │
│  Leader: Handles all reads/writes for partition                 │
│  Follower: Replicates data from leader                         │
│                                                                 │
│  ISR (In-Sync Replicas):                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ Replicas that are "caught up" to leader                 │   │
│  │                                                          │   │
│  │ P0 ISR = [Broker1, Broker2, Broker3]                    │   │
│  │                                                          │   │
│  │ If Broker3 falls behind:                                │   │
│  │ P0 ISR = [Broker1, Broker2]                             │   │
│  │                                                          │   │
│  │ Messages only "committed" when in all ISR               │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Broker failure:                                                │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ Broker 1 dies → P0 leader election                      │   │
│  │ New leader: Broker 2 (from ISR)                         │   │
│  │ Automatic failover, no data loss (if acks=all)          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Producer Deep Dive

Producer Configuration

┌─────────────────────────────────────────────────────────────────┐
│              PRODUCER CONFIGURATION                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ACKS (Acknowledgment):                                         │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ acks=0:  Don't wait for any acknowledgment              │   │
│  │          Fastest, but may lose messages                  │   │
│  │                                                          │   │
│  │ acks=1:  Wait for leader acknowledgment only            │   │
│  │          Default. May lose if leader fails before        │   │
│  │          replication.                                    │   │
│  │                                                          │   │
│  │ acks=all: Wait for all ISR acknowledgment               │   │
│  │           Slowest, but strongest durability              │   │
│  │           Combined with min.insync.replicas              │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  acks=all flow:                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ Producer     Leader        Follower1     Follower2      │   │
│  │    │           │              │              │          │   │
│  │    │──send────►│              │              │          │   │
│  │    │           │──replicate──►│              │          │   │
│  │    │           │──replicate──────────────────►│         │   │
│  │    │           │◄─────ack─────│              │          │   │
│  │    │           │◄─────ack─────────────────────│         │   │
│  │    │◄───ack────│              │              │          │   │
│  │    │           │              │              │          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  BATCHING:                                                      │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ batch.size = 16384 (bytes)                              │   │
│  │ linger.ms = 5 (wait up to 5ms to fill batch)           │   │
│  │                                                          │   │
│  │ Trade-off: Latency vs Throughput                        │   │
│  │ Larger batch/longer linger → Higher throughput          │   │
│  │ Smaller batch/shorter linger → Lower latency           │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  COMPRESSION:                                                   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ compression.type = none | gzip | snappy | lz4 | zstd    │   │
│  │                                                          │   │
│  │ Compression happens at batch level                      │   │
│  │ Larger batches → Better compression                     │   │
│  │                                                          │   │
│  │ Typical ratios: 4-7x for JSON data                      │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Idempotent and Transactional Producers

┌─────────────────────────────────────────────────────────────────┐
│              EXACTLY-ONCE SEMANTICS                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  PROBLEM: Network failure during ack                            │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ Producer sends message                                  │   │
│  │ Broker receives and writes                              │   │
│  │ Ack lost in network                                     │   │
│  │ Producer retries → DUPLICATE!                           │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  IDEMPOTENT PRODUCER:                                           │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ enable.idempotence = true                               │   │
│  │                                                          │   │
│  │ How it works:                                           │   │
│  │ • Producer gets ProducerID (PID)                        │   │
│  │ • Each message has sequence number                      │   │
│  │ • Broker deduplicates by (PID, partition, sequence)     │   │
│  │                                                          │   │
│  │ Message: {PID: 123, seq: 5, data: "..."}               │   │
│  │ Retry:   {PID: 123, seq: 5, data: "..."}               │   │
│  │ Broker: "Already have seq 5 from PID 123, skip"        │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  TRANSACTIONAL PRODUCER:                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ Atomic writes across partitions/topics                  │   │
│  │                                                          │   │
│  │ producer.initTransactions()                             │   │
│  │ producer.beginTransaction()                             │   │
│  │ producer.send(topic1, msg1)                             │   │
│  │ producer.send(topic2, msg2)                             │   │
│  │ producer.commitTransaction()  // or abortTransaction()  │   │
│  │                                                          │   │
│  │ Use case: Exactly-once stream processing                │   │
│  │ Read from topic A, process, write to topic B           │   │
│  │ Commit consumer offset + output atomically              │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Consumer Deep Dive

Consumer Groups

┌─────────────────────────────────────────────────────────────────┐
│                   CONSUMER GROUPS                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Topic with 4 partitions, Consumer Group with 2 consumers:      │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                          │   │
│  │  Partition 0 ─────┐                                      │   │
│  │  Partition 1 ─────┼────► Consumer A                     │   │
│  │                   │                                      │   │
│  │  Partition 2 ─────┐                                      │   │
│  │  Partition 3 ─────┼────► Consumer B                     │   │
│  │                                                          │   │
│  │  Each partition consumed by exactly one consumer        │   │
│  │  Each consumer can handle multiple partitions           │   │
│  │                                                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  SCALING CONSUMERS:                                             │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 4 partitions, 2 consumers → 2 partitions each           │   │
│  │ 4 partitions, 4 consumers → 1 partition each           │   │
│  │ 4 partitions, 6 consumers → 4 active, 2 idle!          │   │
│  │                                                          │   │
│  │ Max parallelism = number of partitions                  │   │
│  │ More consumers than partitions → wasted resources       │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  MULTIPLE CONSUMER GROUPS:                                      │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                          │   │
│  │  Topic: orders                                          │   │
│  │      │                                                   │   │
│  │      ├─────► Group: "payment-service"                   │   │
│  │      │       (processes payments)                       │   │
│  │      │                                                   │   │
│  │      ├─────► Group: "analytics"                         │   │
│  │      │       (builds reports)                           │   │
│  │      │                                                   │   │
│  │      └─────► Group: "fraud-detection"                   │   │
│  │              (real-time fraud check)                    │   │
│  │                                                          │   │
│  │  Each group gets ALL messages independently              │   │
│  │  Different groups at different offsets                  │   │
│  │                                                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Offset Management

┌─────────────────────────────────────────────────────────────────┐
│                 OFFSET MANAGEMENT                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Offset: Position in partition (like a bookmark)                │
│                                                                 │
│  Partition:  [0|1|2|3|4|5|6|7|8|9|10|11|12]                    │
│                          ↑                                      │
│                   Current offset                                │
│                   (Consumer has processed up to here)           │
│                                                                 │
│  OFFSET COMMIT:                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ AUTO COMMIT:                                            │   │
│  │   enable.auto.commit = true                             │   │
│  │   auto.commit.interval.ms = 5000                        │   │
│  │                                                          │   │
│  │   Problem: Might commit before processing complete       │   │
│  │            → Message loss on crash                       │   │
│  │                                                          │   │
│  │ MANUAL COMMIT:                                          │   │
│  │   enable.auto.commit = false                            │   │
│  │   consumer.commitSync()  // after processing            │   │
│  │   consumer.commitAsync() // non-blocking                │   │
│  │                                                          │   │
│  │   Safer: Commit only after successful processing        │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  DELIVERY SEMANTICS:                                            │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ AT-MOST-ONCE:                                           │   │
│  │   Commit before processing                              │   │
│  │   Crash → message lost                                  │   │
│  │                                                          │   │
│  │ AT-LEAST-ONCE:                                          │   │
│  │   Commit after processing                               │   │
│  │   Crash → message reprocessed (duplicate)               │   │
│  │   Make processing idempotent!                           │   │
│  │                                                          │   │
│  │ EXACTLY-ONCE:                                           │   │
│  │   Transactional producer + consumer                     │   │
│  │   Atomic: process + commit offset                       │   │
│  │   isolation.level = read_committed                      │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Rebalancing

┌─────────────────────────────────────────────────────────────────┐
│                 CONSUMER REBALANCING                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  When rebalance occurs:                                         │
│  • Consumer joins/leaves group                                 │
│  • Consumer crashes (heartbeat timeout)                        │
│  • Partitions added to topic                                   │
│                                                                 │
│  Rebalance process:                                             │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 1. Consumers stop processing (revoke partitions)        │   │
│  │ 2. Group Coordinator reassigns partitions               │   │
│  │ 3. Consumers resume with new assignments                │   │
│  │                                                          │   │
│  │ Problem: ALL consumers stop during rebalance!           │   │
│  │          "Stop-the-world" pause                         │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  COOPERATIVE REBALANCING (Kafka 2.4+):                          │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ partition.assignment.strategy =                         │   │
│  │   CooperativeStickyAssignor                             │   │
│  │                                                          │   │
│  │ Only affected partitions reassigned                     │   │
│  │ Unaffected consumers continue processing                │   │
│  │ Much shorter pause times                                │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  STATIC GROUP MEMBERSHIP:                                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ group.instance.id = "consumer-1"                        │   │
│  │                                                          │   │
│  │ Consumer identified by instance ID (not member ID)      │   │
│  │ Short restart doesn't trigger rebalance                 │   │
│  │ session.timeout.ms can be longer                        │   │
│  │                                                          │   │
│  │ Good for: Kubernetes deployments                        │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Performance Tuning

┌─────────────────────────────────────────────────────────────────┐
│              KAFKA PERFORMANCE TUNING                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  PRODUCER TUNING:                                               │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ # Increase batch size for throughput                    │   │
│  │ batch.size = 65536  # 64KB                              │   │
│  │ linger.ms = 10      # Wait up to 10ms                   │   │
│  │                                                          │   │
│  │ # Compression                                            │   │
│  │ compression.type = lz4  # Fast compression              │   │
│  │                                                          │   │
│  │ # Buffer memory                                          │   │
│  │ buffer.memory = 67108864  # 64MB                        │   │
│  │                                                          │   │
│  │ # Parallel requests                                      │   │
│  │ max.in.flight.requests.per.connection = 5               │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  CONSUMER TUNING:                                               │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ # Fetch size                                             │   │
│  │ fetch.min.bytes = 1       # Return immediately         │   │
│  │ fetch.max.bytes = 52428800  # 50MB max                 │   │
│  │ max.partition.fetch.bytes = 1048576  # 1MB/partition   │   │
│  │                                                          │   │
│  │ # Batch processing                                       │   │
│  │ max.poll.records = 500    # Messages per poll()        │   │
│  │                                                          │   │
│  │ # Session timeout (failover speed vs false positives)   │   │
│  │ session.timeout.ms = 45000                              │   │
│  │ heartbeat.interval.ms = 15000                           │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  BROKER TUNING:                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ # Number of partitions (affects parallelism)            │   │
│  │ num.partitions = 12  # Per topic default               │   │
│  │                                                          │   │
│  │ # Replication                                            │   │
│  │ default.replication.factor = 3                         │   │
│  │ min.insync.replicas = 2                                │   │
│  │                                                          │   │
│  │ # Log retention                                          │   │
│  │ log.retention.hours = 168  # 7 days                    │   │
│  │ log.retention.bytes = -1   # No size limit             │   │
│  │                                                          │   │
│  │ # Network/IO threads                                     │   │
│  │ num.network.threads = 8                                │   │
│  │ num.io.threads = 16                                    │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Common Patterns

Event Sourcing with Kafka

┌─────────────────────────────────────────────────────────────────┐
│              EVENT SOURCING PATTERN                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Instead of storing current state, store all events:            │
│                                                                 │
│  Topic: "account-events"                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ {type: "AccountCreated", id: "A1", balance: 100}        │   │
│  │ {type: "Deposited", id: "A1", amount: 50}               │   │
│  │ {type: "Withdrawn", id: "A1", amount: 30}               │   │
│  │ {type: "Deposited", id: "A1", amount: 100}              │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Current state = replay all events:                             │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ balance = 100 + 50 - 30 + 100 = 220                     │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Benefits:                                                      │
│  • Complete audit trail                                        │
│  • Replay to any point in time                                 │
│  • Build new projections from events                           │
│  • Debug by replaying                                          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

CQRS with Kafka

┌─────────────────────────────────────────────────────────────────┐
│              CQRS PATTERN                                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Command Query Responsibility Segregation:                      │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                          │   │
│  │  Commands ──► Write Model ──► Kafka ──► Read Model      │   │
│  │  (writes)    (events)        (events)   (projections)   │   │
│  │                                                          │   │
│  │  Queries ◄────────────────────────────── Read Model     │   │
│  │  (reads)                                 (optimized     │   │
│  │                                           for reads)    │   │
│  │                                                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Example:                                                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ Write: POST /orders → OrderCreated event → Kafka        │   │
│  │                                                          │   │
│  │ Read Model builders (consumers):                        │   │
│  │ • orders-by-customer (for customer dashboard)           │   │
│  │ • orders-by-status (for operations)                     │   │
│  │ • orders-by-region (for analytics)                      │   │
│  │                                                          │   │
│  │ Each optimized for specific query patterns              │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Interview Questions

Conceptual Questions

How does Kafka achieve high throughput?
- Sequential I/O (append-only logs)
- Batching (producer and consumer)
- Zero-copy (sendfile syscall)
- Partitioning (parallelism)
- Page cache utilization
Explain consumer groups and rebalancing.
- Consumer group: Set of consumers sharing work
- Each partition assigned to one consumer
- Rebalance: Redistribute partitions on change
- Cooperative rebalancing reduces pauses
What's the difference between acks=1 and acks=all?
- acks=1: Leader acknowledges, may lose on leader failure
- acks=all: All ISR acknowledge, no data loss
- Trade-off: Latency vs durability
How do you achieve exactly-once semantics?
- Idempotent producer (dedup by sequence number)
- Transactional producer (atomic multi-partition writes)
- Consumer with read_committed isolation
- Process + commit offset atomically

System Design Questions

Design a log aggregation system with Kafka.
- Each service produces to topic per service
- Log shipper consumers (e.g., Elasticsearch)
- Retention for replay/debugging
- Partitioning by host for ordering
How would you handle Kafka consumer lag?
- Add more consumers (up to partition count)
- Increase partitions (requires topic recreation)
- Optimize processing (async, batching)
- Temporarily increase fetch size

Summary

┌─────────────────────────────────────────────────────────────────┐
│              MODULE 11 KEY TAKEAWAYS                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. Kafka is a distributed commit log                           │
│     • Persisted messages (not deleted on consumption)          │
│     • Multiple consumers, replay capability                    │
│                                                                 │
│  2. Topics and partitions                                       │
│     • Partition = unit of parallelism                          │
│     • Order guaranteed within partition                        │
│     • Key-based partitioning for ordering                      │
│                                                                 │
│  3. Producer guarantees                                         │
│     • acks=0/1/all for durability levels                       │
│     • Idempotent producer for deduplication                    │
│     • Transactions for atomic writes                           │
│                                                                 │
│  4. Consumer groups                                             │
│     • Partition assignment within group                        │
│     • Multiple groups for different use cases                  │
│     • Offset management (auto vs manual)                       │
│                                                                 │
│  5. Delivery semantics                                          │
│     • At-most-once, at-least-once, exactly-once               │
│     • Make consumers idempotent for at-least-once             │
│                                                                 │
│  6. Common patterns                                             │
│     • Event sourcing, CQRS                                     │
│     • Log aggregation, CDC                                     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Next Module: Caching Strategies - From local caches to distributed caching with Redis and Memcached.