Aerospike Deep Dive: High-Performance Hybrid Memory Architecture
Introduction
When Redis's memory limits become constraints, when you need predictable sub-millisecond latencies at terabyte scale, when every microsecond of ad-tech bidding matters—that's when Aerospike enters the conversation.
Aerospike is engineered for a specific problem: massive datasets with predictable low-latency access. Understanding its architecture reveals why companies like PayPal, Airtel, and major ad exchanges choose it for their most demanding workloads.
This guide examines Aerospike's hybrid memory model, its approach to persistence, and why it can outperform pure in-memory solutions for large datasets.
Chapter 1: What Aerospike Actually Is
Core Identity
Aerospike is a distributed NoSQL database with a hybrid memory architecture. Unlike Redis (pure in-memory) or traditional databases (pure disk), Aerospike keeps indexes in RAM while storing data on SSDs—getting the speed of memory-based lookups with the capacity of disk storage.
Traditional Database vs Redis vs Aerospike: ───────────────────────────────────────────────────────────────── TRADITIONAL DB (PostgreSQL, MySQL): ┌─────────────────────────────────────────────────────────────────┐ │ Index: Disk (B-tree on SSD) │ │ Data: Disk │ │ Read path: Disk seek → Read index → Disk seek → Read data │ │ Latency: 1-10ms typical │ └─────────────────────────────────────────────────────────────────┘ REDIS: ┌─────────────────────────────────────────────────────────────────┐ │ Index: RAM │ │ Data: RAM │ │ Read path: Hash lookup → Return data │ │ Latency: <1ms (typically 100-500μs) │ │ Constraint: Dataset must fit in RAM │ └─────────────────────────────────────────────────────────────────┘ AEROSPIKE: ┌─────────────────────────────────────────────────────────────────┐ │ Index: RAM (always) │ │ Data: RAM or SSD (configurable per namespace) │ │ Read path: RAM hash → Single SSD read (if on disk) │ │ Latency: <1ms (even for disk-resident data) │ │ Capacity: Terabytes per node │ └─────────────────────────────────────────────────────────────────┘
Where Aerospike Excels
Primary Use Cases: ───────────────────────────────────────────────────────────────── 1. REAL-TIME BIDDING (Ad Tech) - 100ms total request budget - Lookup user profile + segments - Billions of user records - Predictable latency critical 2. FRAUD DETECTION (Fintech) - Check transaction against patterns - Milliseconds decision window - Massive historical data - Write-heavy (continuous updates) 3. RECOMMENDATION ENGINES - User → Item affinity lookups - Product catalogs - Real-time personalization - TB+ datasets 4. SESSION STORES - Distributed sessions - High write throughput - Automatic expiration - Cross-datacenter replication 5. CACHING WITH PERSISTENCE - Can't afford cold start - Need cache + source of truth - Complex cache invalidation avoided
Chapter 2: Why Aerospike Is Fast
2.1 Hybrid Memory Architecture
The key innovation: index in RAM, data optionally on SSD.
Aerospike Memory Model: ───────────────────────────────────────────────────────────────── ┌─────────────────────────────────────────────────────────────────┐ │ RAM │ ├─────────────────────────────────────────────────────────────────┤ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ PRIMARY INDEX │ │ │ │ │ │ │ │ 64 bytes per record: │ │ │ │ ├── 20 bytes: Key digest (hash) │ │ │ │ ├── 8 bytes: Metadata (generation, TTL, etc.) │ │ │ │ ├── 28 bytes: Location pointer (device + offset) │ │ │ │ └── 8 bytes: Other flags │ │ │ │ │ │ │ │ 1 billion records = ~64GB RAM for index │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ Optional: Data in RAM (for hot datasets) │ └─────────────────────────────────────────────────────────────────┘ │ │ Location pointer ▼ ┌─────────────────────────────────────────────────────────────────┐ │ SSD │ ├─────────────────────────────────────────────────────────────────┤ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ RECORD DATA │ │ │ │ │ │ │ │ Written in large blocks (128KB default) │ │ │ │ Direct device access (bypasses filesystem) │ │ │ │ Optimized for SSD access patterns │ │ │ │ │ │ │ │ Can store terabytes per node │ │ │ └──────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ Read Path: 1. Hash key → Find index entry in RAM (nanoseconds) 2. Get device + offset from index 3. Single SSD read at exact location (microseconds) 4. Return data No index traversal on disk. No filesystem overhead.
2.2 Direct Device Access
Aerospike bypasses the filesystem entirely:
Traditional Database I/O: ───────────────────────────────────────────────────────────────── Application │ ▼ Filesystem (ext4, xfs) │ ├── Path lookup ├── inode resolution ├── Block mapping ├── Page cache check │ ▼ Block Device Driver │ ▼ SSD Aerospike Direct Device I/O: ───────────────────────────────────────────────────────────────── Application (Aerospike) │ ├── Own index (in RAM) ├── Own block management ├── O_DIRECT (bypass page cache) │ ▼ Block Device Driver (raw device: /dev/nvme0n1) │ ▼ SSD Benefits: - No filesystem overhead - No double-buffering (app cache + page cache) - Predictable I/O latency - Full control over write patterns
2.3 SSD Optimization
Aerospike is specifically engineered for SSD characteristics:
SSD Characteristics Aerospike Exploits: ───────────────────────────────────────────────────────────────── 1. RANDOM READ PERFORMANCE └── SSDs excel at random reads (unlike HDDs) └── Aerospike leverages this for single-record fetches └── ~50μs per random 4KB read on modern NVMe 2. WRITE AMPLIFICATION AWARENESS ┌─────────────────────────────────────────────────────────────┐ │ SSD Problem: Can't overwrite in place │ │ - Must erase entire block (512KB-4MB) │ │ - Write new data elsewhere │ │ - GC old blocks │ │ │ │ Aerospike Solution: │ │ - Write in large sequential blocks (128KB) │ │ - Defragmentation runs in background │ │ - Keeps SSD write patterns optimal │ └─────────────────────────────────────────────────────────────┘ 3. PARALLELISM └── Modern NVMe: 64+ parallel operations └── Aerospike: Multi-threaded I/O └── Saturates device capabilities 4. WEAR LEVELING └── Aerospike distributes writes evenly └── No hot spots that kill SSD early
2.4 Multi-Threaded Architecture
Unlike Redis, Aerospike is heavily multi-threaded:
Aerospike Thread Model: ───────────────────────────────────────────────────────────────── ┌─────────────────────────────────────────────────────────────────┐ │ │ │ SERVICE THREADS (configurable, default: CPU cores) │ │ ├── Handle incoming client requests │ │ ├── Parse requests │ │ ├── Route to transaction threads │ │ └── Parallel request handling │ │ │ │ TRANSACTION THREADS (per partition) │ │ ├── Execute read/write operations │ │ ├── Index operations │ │ └── Lock-free for most operations │ │ │ │ FABRIC THREADS (cluster communication) │ │ ├── Replication │ │ ├── Migration │ │ └── Cluster protocol │ │ │ │ STORAGE THREADS │ │ ├── SSD read queue │ │ ├── SSD write queue │ │ └── Defragmentation │ │ │ └─────────────────────────────────────────────────────────────────┘ Parallelism Benefits: - Utilizes all CPU cores - Concurrent requests processed simultaneously - No single-thread bottleneck - Better utilization of NVMe parallelism
2.5 Lock-Free Data Structures
Aerospike Lock-Free Operations: ───────────────────────────────────────────────────────────────── INDEX OPERATIONS: ┌─────────────────────────────────────────────────────────────────┐ │ Read: Lock-free (copy-on-write semantics) │ │ Write: Per-record locking (fine-grained) │ │ │ │ Index is partitioned: │ │ - 4096 partitions by default │ │ - Each partition: independent lock domain │ │ - Parallel operations across partitions │ └─────────────────────────────────────────────────────────────────┘ Why This Matters: - Read operations never block - Writes only contend on same key - Massive read parallelism - No global locks
2.6 Why Aerospike Can Be Faster Than Redis at Scale
This is nuanced. Let's be precise:
Redis vs Aerospike: Performance Comparison ───────────────────────────────────────────────────────────────── SCENARIO 1: Small Dataset (<50GB), Pure Cache ┌─────────────────────────────────────────────────────────────────┐ │ Winner: Redis │ │ │ │ Why: │ │ - Pure RAM access │ │ - No SSD latency │ │ - Simpler architecture │ │ - Optimized for small-medium datasets │ │ │ │ Redis: ~100μs p99 │ │ Aerospike (RAM namespace): ~150μs p99 │ └─────────────────────────────────────────────────────────────────┘ SCENARIO 2: Large Dataset (500GB+), Read Heavy ┌─────────────────────────────────────────────────────────────────┐ │ Winner: Aerospike │ │ │ │ Why: │ │ - Redis: Needs $$$$ RAM or cluster complexity │ │ - Aerospike: Index in 32GB RAM, data on 1TB SSD │ │ │ │ Redis (10-node cluster): ~200μs p99 + cluster overhead │ │ Aerospike (2 nodes): ~300μs p99 (single SSD read) │ │ │ │ Cost: Aerospike significantly cheaper (SSD vs RAM) │ └─────────────────────────────────────────────────────────────────┘ SCENARIO 3: High Write Throughput ┌─────────────────────────────────────────────────────────────────┐ │ Winner: Depends on durability requirements │ │ │ │ Redis (no persistence): ~500K writes/sec │ │ Redis (AOF everysec): ~200K writes/sec │ │ Redis (AOF always): ~50K writes/sec │ │ │ │ Aerospike (commit-to-device): ~200K writes/sec │ │ Aerospike (commit-to-memory): ~400K writes/sec │ │ │ │ Key difference: Aerospike writes are durable by default │ └─────────────────────────────────────────────────────────────────┘ SCENARIO 4: Predictable Latency Under Load ┌─────────────────────────────────────────────────────────────────┐ │ Winner: Aerospike │ │ │ │ Redis under memory pressure: │ │ - Evictions cause unpredictable latency │ │ - BGSAVE causes copy-on-write spikes │ │ - Single-threaded: one slow command blocks all │ │ │ │ Aerospike under load: │ │ - SSD latency consistent │ │ - No eviction pressure (more capacity) │ │ - Multi-threaded: slow operations don't block others │ │ │ │ p99.9 latency: │ │ Redis under GC pressure: 5-50ms spikes │ │ Aerospike: 1-2ms consistent │ └─────────────────────────────────────────────────────────────────┘
The nuanced truth:
- Redis is faster for small, pure-cache workloads
- Aerospike is faster (or comparable) at scale with better economics
- Aerospike has more predictable tail latencies
- Aerospike handles larger datasets without operational complexity
Chapter 3: Aerospike Data Model
3.1 Hierarchical Structure
Aerospike Data Hierarchy: ───────────────────────────────────────────────────────────────── NAMESPACE (like a database) ├── Storage configuration (RAM vs SSD) ├── Replication factor ├── TTL defaults ├── Conflict resolution policy │ ├── SET (like a table, but schema-less) │ │ │ └── RECORD (like a row) │ │ │ ├── KEY (primary identifier) │ │ └── Digested to 20-byte hash │ │ │ ├── METADATA │ │ ├── Generation (version counter) │ │ ├── TTL (time-to-live) │ │ └── Last-update-time │ │ │ └── BINS (like columns, but schema-less) │ ├── name: "John" │ ├── age: 30 │ ├── tags: ["admin", "verified"] │ └── profile: {...} Example: ───────────────────────────────────────────────────────────────── Namespace: "users" Set: "profiles" Key: "user:12345" Bins: - name: "Alice" - email: "alice@example.com" - last_login: 1704067200 - preferences: {"theme": "dark", "lang": "en"}
3.2 Key Digesting
Key → Digest Process: ───────────────────────────────────────────────────────────────── Original Key: "user:alice@example.com" │ ▼ RIPEMD-160 Hash │ ▼ Digest: 0x3a9f7b2c1e8d4f6a... (20 bytes) │ ▼ Partition: digest[0:2] % 4096 = 1847 │ ▼ Node: partition_map[1847] = Node 2 Benefits: - Fixed 20-byte digest regardless of key length - Uniform partition distribution - Key not stored (space savings) - optional to store - Fast hashing
3.3 Partitioning
Partition Architecture: ───────────────────────────────────────────────────────────────── ┌─────────────────────────────────────────────────────────────────┐ │ 4096 PARTITIONS │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ P0 P1 P2 P3 P4 ... P4095 │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ ▼ │ │ │ │ 3-Node Cluster Example (RF=2): │ │ ───────────────────────────────────────────────────────────── │ │ │ │ Node 1 (Master) Node 2 (Master) Node 3 (Master) │ │ P0, P3, P6... P1, P4, P7... P2, P5, P8... │ │ │ │ Node 1 (Replica) Node 2 (Replica) Node 3 (Replica) │ │ P1, P4, P7... P2, P5, P8... P0, P3, P6... │ │ │ │ Each partition has: │ │ - One master (handles writes) │ │ - RF-1 replicas (receive replicated writes) │ │ │ └─────────────────────────────────────────────────────────────────┘ Why 4096 Partitions? - Fine-grained distribution - Efficient rebalancing (move partitions, not records) - Good parallelism - Fixed overhead
Chapter 4: Persistence and Durability
4.1 Storage Engines
Aerospike Storage Options: ───────────────────────────────────────────────────────────────── 1. DATA IN MEMORY (RAM Namespace) ┌─────────────────────────────────────────────────────────────────┐ │ storage-engine memory │ │ │ │ - All data in RAM │ │ - Similar to Redis │ │ - Optionally persist to disk for restart │ │ - Fastest, but RAM-limited │ │ │ │ Use: Hot data, session caches, rate limiters │ └─────────────────────────────────────────────────────────────────┘ 2. DATA ON SSD (Device Namespace) ┌─────────────────────────────────────────────────────────────────┐ │ storage-engine device { │ │ device /dev/nvme0n1 │ │ device /dev/nvme1n1 │ │ write-block-size 128K │ │ } │ │ │ │ - Index in RAM, data on SSD │ │ - Terabytes of capacity │ │ - Sub-millisecond reads │ │ - Durable by default │ │ │ │ Use: Large datasets, user profiles, fraud detection │ └─────────────────────────────────────────────────────────────────┘ 3. ALL FLASH (Index on Flash) ┌─────────────────────────────────────────────────────────────────┐ │ index-type flash { │ │ mount /mnt/nvme_index │ │ } │ │ storage-engine device { ... } │ │ │ │ - Index AND data on SSD │ │ - Trillions of keys possible │ │ - Higher latency than RAM index │ │ │ │ Use: Massive key counts, archive data │ └─────────────────────────────────────────────────────────────────┘
4.2 Write Path and Durability
Aerospike Write Flow (Commit-to-Device): ───────────────────────────────────────────────────────────────── Client Write Request │ ▼ ┌───────────────────────────────────────────────────────────────┐ │ 1. Master Node Receives Request │ │ ├── Parse request │ │ ├── Validate record │ │ └── Acquire record lock │ └───────────────────┬───────────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────┐ │ 2. Write to SSD (Master) │ │ ├── Append to write buffer │ │ ├── When buffer full (128KB): write to device │ │ ├── Direct I/O (O_DIRECT) │ │ └── Wait for device acknowledgment │ └───────────────────┬───────────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────┐ │ 3. Update In-Memory Index │ │ ├── Create/update index entry │ │ └── Point to new device location │ └───────────────────┬───────────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────┐ │ 4. Replicate to Replica Nodes (in parallel) │ │ ├── Send record to RF-1 replicas │ │ ├── Each replica: write to SSD + update index │ │ └── Wait for acknowledgments (configurable) │ └───────────────────┬───────────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────┐ │ 5. Acknowledge to Client │ │ ├── All replicas confirmed (strong consistency) │ │ └── OR master + N replicas (tunable) │ └───────────────────────────────────────────────────────────────┘ Write Policies: ───────────────────────────────────────────────────────────────── COMMIT_ALL: Wait for all replicas (strongest) COMMIT_MASTER: Wait for master only (faster, less durable)
4.3 Recovery
Crash Recovery Process: ───────────────────────────────────────────────────────────────── SCENARIO: Node crashes and restarts ┌─────────────────────────────────────────────────────────────────┐ │ COLD START RECOVERY │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 1. Scan SSD for valid records │ │ ├── Read write blocks sequentially │ │ ├── Validate record checksums │ │ └── Build index in memory │ │ │ │ 2. Time: Proportional to data size │ │ └── ~1TB data = ~5-10 minutes │ │ │ │ 3. Parallel recovery │ │ └── Multiple threads scan different device regions │ │ │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ FAST RESTART (Enterprise Feature) │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ - Index persisted to shared memory │ │ - On restart: map shared memory │ │ - Recovery: seconds instead of minutes │ │ │ │ Use: Planned maintenance, quick recovery │ │ │ └─────────────────────────────────────────────────────────────────┘
4.4 Strong Consistency Mode
Aerospike Consistency Modes: ───────────────────────────────────────────────────────────────── AVAILABILITY MODE (AP - Default): ┌─────────────────────────────────────────────────────────────────┐ │ - Last-write-wins conflict resolution │ │ - Read from any replica │ │ - Higher availability during partitions │ │ - Possible stale reads after network partition │ │ │ │ Use: Caching, user profiles, analytics │ └─────────────────────────────────────────────────────────────────┘ STRONG CONSISTENCY MODE (CP): ┌─────────────────────────────────────────────────────────────────┐ │ strong-consistency true │ │ │ │ - Linearizable reads and writes │ │ - Roster-based quorum │ │ - Reads reflect most recent write │ │ - May reject operations during partition │ │ │ │ Use: Financial transactions, inventory, counters │ │ │ │ Mechanism: │ │ - Regime numbers track partition ownership │ │ - Writes require quorum acknowledgment │ │ - Reads go to partition master │ │ - No split-brain possible │ └─────────────────────────────────────────────────────────────────┘
Chapter 5: Scaling Aerospike
5.1 Automatic Data Distribution
Adding a Node to Cluster: ───────────────────────────────────────────────────────────────── BEFORE (3 Nodes): ┌─────────────────────────────────────────────────────────────────┐ │ Node 1: P0, P3, P6, P9, ... P4093 (~1365 partitions) │ │ Node 2: P1, P4, P7, P10, ... P4094 (~1365 partitions) │ │ Node 3: P2, P5, P8, P11, ... P4095 (~1365 partitions) │ └─────────────────────────────────────────────────────────────────┘ ADD NODE 4: ┌─────────────────────────────────────────────────────────────────┐ │ 1. Node 4 joins cluster (heartbeat discovery) │ │ 2. Cluster recalculates partition ownership │ │ 3. Migration begins: │ │ - Node 1 migrates ~341 partitions to Node 4 │ │ - Node 2 migrates ~341 partitions to Node 4 │ │ - Node 3 migrates ~341 partitions to Node 4 │ │ 4. Migration happens in background │ │ 5. No downtime, data remains available during migration │ └─────────────────────────────────────────────────────────────────┘ AFTER (4 Nodes): ┌─────────────────────────────────────────────────────────────────┐ │ Node 1: ~1024 partitions │ │ Node 2: ~1024 partitions │ │ Node 3: ~1024 partitions │ │ Node 4: ~1024 partitions │ └─────────────────────────────────────────────────────────────────┘ Migration Speed: - Configurable throughput limits - Background process (doesn't impact foreground latency) - Typically completes in minutes to hours depending on data size
5.2 Rack-Aware Replication
Rack Awareness: ───────────────────────────────────────────────────────────────── ┌─────────────────────────────────────────────────────────────────┐ │ DATA CENTER │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ RACK 1 RACK 2 RACK 3 │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────┐ │ │ │ Node 1 │ │ Node 3 │ │ Node 5 │ │ │ │ P0 (Master) │ │ P0 (Replica) │ │ │ │ │ │ P1 (Replica) │ │ P1 (Master) │ │ │ │ │ └─────────────────┘ └─────────────────┘ └─────────┘ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────┐ │ │ │ Node 2 │ │ Node 4 │ │ Node 6 │ │ │ │ │ │ │ │ │ │ │ └─────────────────┘ └─────────────────┘ └─────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ Configuration: rack-id 1 # On nodes in Rack 1 Benefit: - Replicas placed on different racks - Survive entire rack failure - No data loss if one rack goes down
5.3 Cross-Datacenter Replication (XDR)
XDR Architecture: ───────────────────────────────────────────────────────────────── ┌─────────────────────────────────────────────────────────────────┐ │ │ │ DATACENTER 1 (US-WEST) DATACENTER 2 (US-EAST) │ │ ┌───────────────────┐ ┌───────────────────┐ │ │ │ │ │ │ │ │ │ ┌─────┐ ┌─────┐ │ XDR │ ┌─────┐ ┌─────┐ │ │ │ │ │Node1│ │Node2│ │◄───────►│ │Node1│ │Node2│ │ │ │ │ └─────┘ └─────┘ │ │ └─────┘ └─────┘ │ │ │ │ ┌─────┐ ┌─────┐ │ │ ┌─────┐ ┌─────┐ │ │ │ │ │Node3│ │Node4│ │ │ │Node3│ │Node4│ │ │ │ │ └─────┘ └─────┘ │ │ └─────┘ └─────┘ │ │ │ │ │ │ │ │ │ └───────────────────┘ └───────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ XDR Features: - Asynchronous replication (doesn't impact write latency) - Compression over WAN - Conflict resolution (timestamp or generation) - Selective replication (specific sets or namespaces) - Active-active supported - Topology: star, ring, mesh Use Cases: - Disaster recovery - Geographic locality (serve from nearest DC) - Active-active writes in multiple regions
Chapter 6: When to Choose Aerospike Over Redis
Decision Framework
Choose REDIS When: ───────────────────────────────────────────────────────────────── ✓ Dataset fits comfortably in RAM (<100GB per node) ✓ Pure caching (can reload from source) ✓ Need rich data structures (sorted sets, streams, pub/sub) ✓ Simple operational model preferred ✓ Existing Redis expertise on team ✓ Cost of RAM is acceptable Examples: - Session cache for web app - Rate limiting - Real-time leaderboards - Pub/sub messaging - Small-medium caching layer Choose AEROSPIKE When: ───────────────────────────────────────────────────────────────── ✓ Dataset exceeds reasonable RAM budget (500GB+) ✓ Need predictable latency at scale ✓ Can't afford cold start (need persistence) ✓ Write-heavy workloads with durability ✓ Cost optimization critical (SSD vs RAM) ✓ TB+ scale with consistent performance Examples: - Ad-tech user profiles (billions of keys) - Fraud detection (massive pattern database) - Fintech session/transaction data - Real-time recommendation features - Large-scale feature stores (ML)
Real-World Scenario Comparison
SCENARIO: User Profile Store for Ad Platform ───────────────────────────────────────────────────────────────── Requirements: - 2 billion user profiles - Average record size: 1KB - Total data: ~2TB - Read latency: <5ms p99 - Read QPS: 500K - Write QPS: 50K - 99.99% availability REDIS SOLUTION: ┌─────────────────────────────────────────────────────────────────┐ │ 20-node cluster (100GB RAM each) │ │ Cost: ~$50,000/month (cloud instances) │ │ │ │ Challenges: │ │ - 20 nodes to manage │ │ - Cluster resharding complexity │ │ - Cold start takes hours (reload from DB) │ │ - Memory pressure affects latency │ │ - Failover: ~30 seconds per shard │ └─────────────────────────────────────────────────────────────────┘ AEROSPIKE SOLUTION: ┌─────────────────────────────────────────────────────────────────┐ │ 4-node cluster: │ │ - 128GB RAM (for index: 2B records × 64B = 128GB) │ │ - 2TB NVMe SSD each │ │ - RF=2 for redundancy │ │ │ │ Cost: ~$15,000/month (cloud instances) │ │ │ │ Benefits: │ │ - 4 nodes vs 20 (simpler) │ │ - Cold start: 5-10 minutes │ │ - Consistent latency under load │ │ - 70% cost reduction │ │ - Data durable on SSD │ └─────────────────────────────────────────────────────────────────┘
Chapter 7: Production Considerations
7.1 Capacity Planning
Aerospike Capacity Planning: ───────────────────────────────────────────────────────────────── RAM REQUIREMENTS: ┌─────────────────────────────────────────────────────────────────┐ │ Primary Index: │ │ └── 64 bytes × number_of_records │ │ │ │ Secondary Index (if used): │ │ └── ~50 bytes × number_of_indexed_values │ │ │ │ Example: │ │ 1 billion records = 64GB primary index │ │ + Secondary index on email = ~50GB │ │ + Overhead (~20%) = ~23GB │ │ Total RAM: ~137GB │ │ │ │ Use: namespace config memory-size 140G │ └─────────────────────────────────────────────────────────────────┘ SSD REQUIREMENTS: ┌─────────────────────────────────────────────────────────────────┐ │ Raw data size × defrag overhead (1.5x) × replication factor │ │ │ │ Example: │ │ 1 billion records × 1KB average = 1TB raw │ │ × 1.5 defrag = 1.5TB │ │ × RF 2 = 3TB total (1.5TB per node for 2-node) │ │ │ │ SSD Endurance: │ │ - Minimum 1 DWPD (Drive Writes Per Day) │ │ - Enterprise NVMe recommended │ └─────────────────────────────────────────────────────────────────┘
7.2 Monitoring Essentials
Critical Aerospike Metrics: ───────────────────────────────────────────────────────────────── MEMORY: - system_free_mem_pct (keep > 10%) - memory_used_bytes vs memory-size - index memory usage STORAGE: - device_available_pct (keep > 20%) - device_used_bytes - defrag_q (should be low) LATENCY: - read_latency (histogram) - write_latency (histogram) - proxy_latency (cross-node) THROUGHPUT: - client_read_success - client_write_success - client_read_error - client_write_error CLUSTER: - cluster_size (matches expected) - cluster_key (stable after convergence) - migrate_rx_partitions_active - migrate_tx_partitions_active REPLICATION: - xdr_ship_outstanding - xdr_ship_success - xdr_ship_bytes
7.3 Common Pitfalls
PITFALL 1: Undersized Index Memory ───────────────────────────────────────────────────────────────── Problem: More records than index memory can hold Result: Write failures, cluster instability Solution: Calculate 64 bytes × expected records + 30% buffer PITFALL 2: Non-Enterprise SSDs ───────────────────────────────────────────────────────────────── Problem: Consumer SSDs have low endurance Result: Drive failure in months Solution: Use enterprise NVMe (Intel Optane, Samsung PM1733) PITFALL 3: Ignoring Defrag ───────────────────────────────────────────────────────────────── Problem: defrag_q growing, latency increasing Result: Write performance degradation Solution: Tune defrag-sleep, ensure 30%+ free space PITFALL 4: Hot Keys ───────────────────────────────────────────────────────────────── Problem: One key receiving 50% of traffic Result: Single node bottleneck Solution: Application-side caching, key sharding PITFALL 5: Network Partition Without SC ───────────────────────────────────────────────────────────────── Problem: Split-brain in AP mode Result: Divergent data between partitions Solution: Enable strong-consistency for critical data
7.4 Configuration Recommendations
Production aerospike.conf: ───────────────────────────────────────────────────────────────── service { proto-fd-max 15000 # Max client connections transaction-queues 8 # Match CPU cores transaction-threads-per-queue 4 } network { heartbeat { mode mesh address any port 3002 interval 150 timeout 10 } } namespace mydata { replication-factor 2 memory-size 64G default-ttl 0 # No default expiration storage-engine device { device /dev/nvme0n1 device /dev/nvme1n1 write-block-size 128K defrag-lwm-pct 50 # Start defrag at 50% full defrag-sleep 1000 # Microseconds between defrag } # Optional: Strong consistency # strong-consistency true } # XDR for multi-datacenter xdr { dc DC2 { node-address-port 10.0.1.1 3000 node-address-port 10.0.1.2 3000 namespace mydata { } } }
Chapter 8: Comparison Summary
┌───────────────────┬─────────────────────┬─────────────────────┐ │ Feature │ Redis │ Aerospike │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Data Storage │ RAM only │ RAM or SSD │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Memory Model │ All data in RAM │ Index RAM, Data SSD │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Threading │ Single-threaded │ Multi-threaded │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Latency (small) │ ~100μs │ ~150μs │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Latency (large) │ Degrades with scale │ Consistent │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Max Dataset │ RAM limited │ TB+ per node │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Persistence │ RDB/AOF (optional) │ Native (always) │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Scaling │ Cluster (16K slots) │ Auto-partition │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Consistency │ Eventual │ Eventual or Strong │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Data Structures │ Rich (sets, lists) │ Basic (bins) │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Cost at Scale │ High (RAM) │ Lower (SSD) │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Operational │ Simpler │ More complex │ │ Complexity │ │ │ ├───────────────────┼─────────────────────┼─────────────────────┤ │ Best Use Case │ Cache, sessions, │ Large datasets, │ │ │ pub/sub, small data │ ad-tech, fintech │ └───────────────────┴─────────────────────┴─────────────────────┘
Conclusion
Aerospike and Redis solve different problems, though they overlap in use cases.
Redis is the right choice when:
- Simplicity and rich data structures matter
- Dataset fits in RAM
- Pure caching with reload capability
- Team has Redis expertise
Aerospike is the right choice when:
- Scale exceeds RAM economics
- Predictable latency at TB scale is critical
- Data must survive restarts (not just cache)
- Cost optimization is important
The key insight about Aerospike's performance isn't just "it uses SSDs well"—it's the architectural decision to keep indexes in RAM while leveraging SSDs for their random-read performance. This hybrid approach gives you memory-like lookup speeds with disk-like capacity.
Neither database is universally "better." Understanding their architectures helps you make the right choice for your specific requirements.
Both Redis and Aerospike continue to evolve. Redis 7+ adds multi-threading for I/O; Aerospike continues optimizing for newer storage technologies. Always validate performance claims against your actual workload.