Caching Strategies: From Local Memory to Distributed Systems
A comprehensive guide to implementing caching strategies across multiple tiers, from in-memory application caches to distributed Redis clusters and CDN edge caching. Learn when to use cache-aside vs write-through patterns, how to choose between ElastiCache and MemoryDB, and how to prevent cache stampede in production.
Caching seems straightforward until you're staring at a 15% hit rate wondering why your expensive Redis cluster isn't helping. Or worse, watching your database buckle under load when a popular cache key expires and 5,000 simultaneous requests rush in to regenerate it.
I've learned that effective caching isn't about adding Redis and calling it done. It's about understanding the complete hierarchy from in-memory application caches to distributed systems to CDN edge caching, and knowing which pattern solves which problem.
This guide covers the technical decisions I've encountered: when cache-aside makes sense vs write-through, how to choose between AWS ElastiCache and MemoryDB (hint: they're not interchangeable), implementing consistent hashing for distributed cache scaling, and preventing cache stampede before it takes down your database.
Understanding Cache Patterns
Cache patterns aren't just academic concepts. The difference between cache-aside and write-through can determine whether you get stale data complaints or slow write performance. Here's what each pattern actually does in production.
Cache-Aside (Lazy Loading)
The application manages both cache and database directly. On read, check cache first. On miss, fetch from database and populate cache. This is the most common pattern because it's simple and efficient.
When to use cache-aside:
- Read-heavy workloads where not all data is accessed frequently
- Data that can tolerate slight staleness
- You want to cache only what's actually used
Trade-offs:
- Initial request experiences cache miss latency
- Risk of cache stampede on popular expired keys (we'll fix this)
- Efficient memory usage since only accessed data is cached
Write-Through Pattern
Every write goes to both cache and database. The cache stays synchronized with the database, and readers always get fresh data from cache.
When to use write-through:
- Strong consistency requirements between cache and database
- Write operations are frequent
- Read-heavy workloads benefit from always-fresh cache
Trade-offs:
- Write latency increases (must update both cache and database)
- Caches data that may never be read
- Higher cache hit rates since cache is always populated
Write-Behind (Write-Back) Pattern
Writes go to cache immediately, then are asynchronously written to database. This provides excellent write performance but introduces complexity and potential data loss risk.
When to use write-behind:
- Write-heavy workloads (analytics, logs, metrics)
- Can tolerate potential data loss on cache failure
- Database write performance is a bottleneck
Trade-offs:
- Risk of data loss if cache fails before persistence
- More complex implementation and monitoring
- Excellent write performance through batching
Preventing Cache Stampede
Cache stampede (thundering herd) happens when a popular cache key expires and hundreds or thousands of requests simultaneously try to regenerate it. Your database connection pool gets exhausted and everything cascades.
Here's how to prevent it:
Probabilistic Early Expiration
Instead of waiting for cache to expire, refresh it probabilistically before expiration based on remaining TTL. This spreads out the refresh load.
Distributed Locking
When cache misses, use Redis to coordinate who regenerates the data. Other requests wait briefly and retry.
Request Coalescing
Deduplicate identical in-flight requests at the application level. If 100 requests come in for the same cache key, only one actually fetches data.
AWS Caching Services: When to Use What
AWS offers ElastiCache, MemoryDB, and DAX. They're not interchangeable - each serves different use cases.
ElastiCache for Redis
Best for:
- Session management across multiple application servers
- General-purpose caching layer (cache-aside pattern)
- Pub/sub messaging patterns
- Leaderboards, rate limiting, real-time analytics
Technical specs:
- Latency: Sub-millisecond
- Persistence: Optional snapshots (not real-time)
- Consistency: Eventual
- Pricing: ~150/month per node
MemoryDB for Redis
Best for:
- Primary database for microservices (not just cache)
- Real-time analytics requiring durability
- Mission-critical applications needing Redis speed + ACID guarantees
- Financial transactions, inventory management
Technical specs:
- Latency: Sub-millisecond reads, single-digit millisecond writes
- Persistence: Full durable persistence via transaction log
- Consistency: Strong (synchronous replication)
- Multi-AZ: Automatic failover with zero data loss
- Pricing: ~293/month (1.5x ElastiCache)
When to choose MemoryDB over ElastiCache:
- Need Redis as primary database (not just cache)
- Cannot tolerate any data loss
- Require strong consistency guarantees
- Want to eliminate separate database + cache architecture
DynamoDB Accelerator (DAX)
Best for:
- DynamoDB-specific acceleration only
- Read-heavy DynamoDB workloads (gaming leaderboards)
- Eventually consistent reads acceptable
- Need microsecond latency at scale
Technical specs:
- Latency: Microseconds for cached reads
- Integration: Native DynamoDB API compatibility
- Consistency: Eventually consistent reads only
- Pricing: ~$0.40/hour for dax.r4.large
Important limitations:
- Only works with DynamoDB (not general-purpose)
- Query/scan cache separate from get/batch-get cache
- No strongly consistent read support
- Cannot cache conditional updates
Decision Matrix
Consistent Hashing for Distributed Caches
When you have multiple cache nodes, how do you decide which node stores which key? Simple modulo hashing (hash(key) % N) causes massive redistribution when nodes change:
- Add server: ~50% of keys move
- Remove server: ~50% of keys move
Consistent hashing minimizes redistribution to ~1/N of keys.
Implementation
Why Virtual Nodes Matter
Without virtual nodes, simple consistent hashing can create uneven distribution. Virtual nodes (vnodes) solve this:
- Each physical node gets 100-200 virtual nodes scattered on the ring
- More uniform data distribution
- Smoother load balancing when adding/removing nodes
- Can weight servers by capacity (more vnodes = more data)
Multi-Tier Caching Architecture
Real performance comes from layering caches strategically. Here's a practical three-tier architecture:
L1: In-Process Memory Cache
- Size: 50-100 MB per instance
- TTL: 30-60 seconds
- Purpose: Ultra-fast access for hot data
- Technology: LRU cache
L2: Distributed Redis Cache
- Size: 10-100 GB cluster
- TTL: 5-60 minutes
- Purpose: Shared cache across instances
- Technology: ElastiCache Redis cluster
L3: CDN Edge Cache
- Size: Unlimited (CloudFront)
- TTL: 1 hour - 1 year
- Purpose: Global edge distribution
- Technology: CloudFront
Implementation
CloudFront Caching Strategies
CDN caching is different from application caching. You're distributing content globally with long TTLs, which means invalidation strategy matters.
Cache Behavior Configuration
Different content types need different cache policies:
Invalidation Strategy
CloudFront invalidation costs add up ($0.005 per path after first 1,000/month). Use versioned URLs instead:
Client-Side Caching with React Query
Frontend caching is often overlooked but critical for user experience. React Query (TanStack Query) provides sophisticated client-side caching with stale-while-revalidate pattern.
Prefetching for Better UX
Prefetch data before users need it for instant navigation:
Cache Monitoring and Optimization
You can't optimize what you don't measure. Here are the critical metrics:
Key Metrics
1. Hit Rate
Target: 85-95% depending on workload
- Below 80%: Investigate cache key design, TTL settings
- Formula:
(hits / (hits + misses)) * 100
2. Latency Percentiles
- P50: ~1-2ms for Redis
- P99: Should be <10ms
- P99.9: Alert if >50ms
3. Memory Utilization
- Target: 70-80% usage
- Alert: >90% (risk of evictions)
4. Eviction Rate
- High eviction = need more memory or shorter TTLs
Monitoring Implementation
Common Pitfalls and Lessons
1. Over-Caching Dynamic Data
Caching user-specific data with long TTL leads to users seeing stale data and increased support tickets.
Solution: Classify data by volatility:
2. Poor Cache Key Design
Including timestamps or random values in cache keys destroys hit rate.
3. Ignoring Cache Failures
Cache failure shouldn't take down your application. Always implement fallback:
4. CloudFront Invalidation Abuse
Frequent invalidation racks up costs. Use versioned URLs instead:
Cost Optimization
AWS Service Pricing (us-east-1)
ElastiCache Redis (cache.r6g.large: 13.07 GB):
- On-Demand: 150/month per node
- 3-node cluster: ~$450/month
MemoryDB (db.r6g.large: 13.07 GB):
- On-Demand: 293/month per node
- 3-node cluster: ~$879/month (1.5x ElastiCache)
CloudFront:
- First 10 TB/month: $0.085/GB
- HTTP/HTTPS requests: $0.0075 per 10,000
- Invalidation: First 1,000 paths free, $0.005 per path after
Right-Sizing Strategy
Key Takeaways
Working with caching across multiple projects has taught me these patterns:
1. Cache patterns matter: Cache-aside for read-heavy, write-through for consistency, write-behind for write-heavy. Choose based on your actual workload.
2. Prevent stampede early: Implement distributed locking and request coalescing before you have a problem. It's much harder to add after an incident.
3. AWS services aren't interchangeable: ElastiCache for general caching, MemoryDB when you need durability, DAX only for DynamoDB. Don't overpay for features you don't need.
4. Multi-tier caching works: L1 in-memory + L2 Redis + L3 CDN provides the best performance per cost. Each layer serves a purpose.
5. Monitor continuously: Cache hit rate, latency, memory usage, and cost per request. Right-size monthly based on actual utilization.
6. Design for failure: Cache should improve performance, not become a single point of failure. Always implement graceful degradation.
7. Version URLs, don't invalidate: CloudFront invalidation costs add up. Versioned assets are free and instant.
The difference between a 15% hit rate and 90% hit rate is often just proper cache key design and TTL management. Start with the basics, monitor everything, and optimize based on real metrics.