Skip to content
~/sph.sh

Caching Strategies: From Local Memory to Distributed Systems

A comprehensive guide to implementing caching strategies across multiple tiers, from in-memory application caches to distributed Redis clusters and CDN edge caching. Learn when to use cache-aside vs write-through patterns, how to choose between ElastiCache and MemoryDB, and how to prevent cache stampede in production.

Caching seems straightforward until you're staring at a 15% hit rate wondering why your expensive Redis cluster isn't helping. Or worse, watching your database buckle under load when a popular cache key expires and 5,000 simultaneous requests rush in to regenerate it.

I've learned that effective caching isn't about adding Redis and calling it done. It's about understanding the complete hierarchy from in-memory application caches to distributed systems to CDN edge caching, and knowing which pattern solves which problem.

This guide covers the technical decisions I've encountered: when cache-aside makes sense vs write-through, how to choose between AWS ElastiCache and MemoryDB (hint: they're not interchangeable), implementing consistent hashing for distributed cache scaling, and preventing cache stampede before it takes down your database.

Understanding Cache Patterns

Cache patterns aren't just academic concepts. The difference between cache-aside and write-through can determine whether you get stale data complaints or slow write performance. Here's what each pattern actually does in production.

Cache-Aside (Lazy Loading)

The application manages both cache and database directly. On read, check cache first. On miss, fetch from database and populate cache. This is the most common pattern because it's simple and efficient.

typescript
class UserRepository {  private redis: Redis;  private db: Database;
  async getUser(id: string): Promise<User> {    // Check cache first    const cached = await this.redis.get(`user:${id}`);    if (cached) {      return JSON.parse(cached);    }
    // Cache miss - fetch from database    const user = await this.db.users.findById(id);
    // Store in cache with TTL    await this.redis.set(      `user:${id}`,      JSON.stringify(user),      'EX',      3600 // 1 hour    );
    return user;  }}

When to use cache-aside:

  • Read-heavy workloads where not all data is accessed frequently
  • Data that can tolerate slight staleness
  • You want to cache only what's actually used

Trade-offs:

  • Initial request experiences cache miss latency
  • Risk of cache stampede on popular expired keys (we'll fix this)
  • Efficient memory usage since only accessed data is cached

Write-Through Pattern

Every write goes to both cache and database. The cache stays synchronized with the database, and readers always get fresh data from cache.

typescript
class UserRepository {  async updateUser(id: string, data: Partial<User>): Promise<User> {    // Update database first    const user = await this.db.users.update(id, data);
    // Immediately update cache    await this.redis.set(      `user:${id}`,      JSON.stringify(user),      'EX',      3600    );
    return user;  }
  async getUser(id: string): Promise<User> {    // Check cache (should always be there for recently updated users)    const cached = await this.redis.get(`user:${id}`);    if (cached) {      return JSON.parse(cached);    }
    // Fallback to cache-aside for cache miss    const user = await this.db.users.findById(id);    await this.redis.set(`user:${id}`, JSON.stringify(user), 'EX', 3600);    return user;  }}

When to use write-through:

  • Strong consistency requirements between cache and database
  • Write operations are frequent
  • Read-heavy workloads benefit from always-fresh cache

Trade-offs:

  • Write latency increases (must update both cache and database)
  • Caches data that may never be read
  • Higher cache hit rates since cache is always populated

Write-Behind (Write-Back) Pattern

Writes go to cache immediately, then are asynchronously written to database. This provides excellent write performance but introduces complexity and potential data loss risk.

typescript
class AnalyticsRepository {  async trackEvent(event: Event): Promise<void> {    // Write to cache immediately (fast response)    await this.redis.lpush(      'analytics:queue',      JSON.stringify(event)    );
    // Background worker processes queue asynchronously  }
  // Separate background worker  async processQueue(): Promise<void> {    while (true) {      // Batch process events from queue      const events = await this.redis.lrange('analytics:queue', 0, 99);
      if (events.length > 0) {        // Batch insert to database        await this.db.analytics.batchInsert(          events.map(e => JSON.parse(e))        );
        // Remove processed events        await this.redis.ltrim('analytics:queue', 100, -1);      }
      await new Promise(resolve => setTimeout(resolve, 1000));    }  }}

When to use write-behind:

  • Write-heavy workloads (analytics, logs, metrics)
  • Can tolerate potential data loss on cache failure
  • Database write performance is a bottleneck

Trade-offs:

  • Risk of data loss if cache fails before persistence
  • More complex implementation and monitoring
  • Excellent write performance through batching

Preventing Cache Stampede

Cache stampede (thundering herd) happens when a popular cache key expires and hundreds or thousands of requests simultaneously try to regenerate it. Your database connection pool gets exhausted and everything cascades.

Here's how to prevent it:

Probabilistic Early Expiration

Instead of waiting for cache to expire, refresh it probabilistically before expiration based on remaining TTL. This spreads out the refresh load.

typescript
async function getWithProbabilisticRefresh<T>(  key: string,  fetcher: () => Promise<T>,  ttl: number,  beta: number = 1.0): Promise<T> {  const result = await redis.get(key);
  if (result) {    const data = JSON.parse(result);    const now = Date.now();    const timeUntilExpiry = (data.expiresAt - now) / 1000;
    // Probabilistic early refresh    // As expiry approaches, probability of refresh increases    const shouldRefresh =      timeUntilExpiry / ttl < Math.random() * beta;
    if (shouldRefresh) {      // Refresh in background without blocking      this.backgroundRefresh(key, fetcher, ttl);    }
    return data.value;  }
  // Cache miss - use lock to prevent stampede  return this.getWithLock(key, fetcher, ttl);}

Distributed Locking

When cache misses, use Redis to coordinate who regenerates the data. Other requests wait briefly and retry.

typescript
async function getWithLock<T>(  key: string,  fetcher: () => Promise<T>,  ttl: number): Promise<T> {  const lockKey = `lock:${key}`;
  // Try to acquire lock (10 second timeout)  const lockAcquired = await redis.set(    lockKey,    '1',    'NX', // Only set if not exists    'EX',    10  );
  if (lockAcquired) {    try {      // We got the lock - fetch data      const value = await fetcher();
      const data = {        value,        expiresAt: Date.now() + ttl * 1000,      };
      await redis.set(        key,        JSON.stringify(data),        'EX',        ttl      );
      return value;    } finally {      // Always release lock      await redis.del(lockKey);    }  } else {    // Another request is fetching - wait and retry    await new Promise(resolve => setTimeout(resolve, 100));    return getWithProbabilisticRefresh(key, fetcher, ttl);  }}

Request Coalescing

Deduplicate identical in-flight requests at the application level. If 100 requests come in for the same cache key, only one actually fetches data.

typescript
class CacheManager {  private inflightRequests = new Map<string, Promise<any>>();
  async get<T>(    key: string,    fetcher: () => Promise<T>  ): Promise<T> {    // Check cache first    const cached = await redis.get(key);    if (cached) return JSON.parse(cached);
    // Check if request is already in flight    const existing = this.inflightRequests.get(key);    if (existing) {      // Piggyback on existing request      return existing;    }
    // Create new request    const promise = fetcher()      .then(async value => {        await redis.set(          key,          JSON.stringify(value),          'EX',          300        );        this.inflightRequests.delete(key);        return value;      })      .catch(error => {        this.inflightRequests.delete(key);        throw error;      });
    this.inflightRequests.set(key, promise);    return promise;  }}

AWS Caching Services: When to Use What

AWS offers ElastiCache, MemoryDB, and DAX. They're not interchangeable - each serves different use cases.

ElastiCache for Redis

Best for:

  • Session management across multiple application servers
  • General-purpose caching layer (cache-aside pattern)
  • Pub/sub messaging patterns
  • Leaderboards, rate limiting, real-time analytics

Technical specs:

  • Latency: Sub-millisecond
  • Persistence: Optional snapshots (not real-time)
  • Consistency: Eventual
  • Pricing: ~0.206/hourforcache.r6g.large(13.07GB)= 0.206/hour for cache.r6g.large (13.07 GB) = ~150/month per node
typescript
import Redis from 'ioredis';
const redis = new Redis.Cluster(  [    {      host: 'redis-cluster.xxx.cache.amazonaws.com',      port: 6379,    },  ],  {    redisOptions: {      password: process.env.REDIS_PASSWORD,      tls: {},    },    clusterRetryStrategy: times =>      Math.min(100 * times, 3000),    enableReadyCheck: true,    maxRetriesPerRequest: 3,  });

MemoryDB for Redis

Best for:

  • Primary database for microservices (not just cache)
  • Real-time analytics requiring durability
  • Mission-critical applications needing Redis speed + ACID guarantees
  • Financial transactions, inventory management

Technical specs:

  • Latency: Sub-millisecond reads, single-digit millisecond writes
  • Persistence: Full durable persistence via transaction log
  • Consistency: Strong (synchronous replication)
  • Multi-AZ: Automatic failover with zero data loss
  • Pricing: ~0.406/hourfordb.r6g.large= 0.406/hour for db.r6g.large = ~293/month (1.5x ElastiCache)

When to choose MemoryDB over ElastiCache:

  • Need Redis as primary database (not just cache)
  • Cannot tolerate any data loss
  • Require strong consistency guarantees
  • Want to eliminate separate database + cache architecture

DynamoDB Accelerator (DAX)

Best for:

  • DynamoDB-specific acceleration only
  • Read-heavy DynamoDB workloads (gaming leaderboards)
  • Eventually consistent reads acceptable
  • Need microsecond latency at scale

Technical specs:

  • Latency: Microseconds for cached reads
  • Integration: Native DynamoDB API compatibility
  • Consistency: Eventually consistent reads only
  • Pricing: ~$0.40/hour for dax.r4.large

Important limitations:

  • Only works with DynamoDB (not general-purpose)
  • Query/scan cache separate from get/batch-get cache
  • No strongly consistent read support
  • Cannot cache conditional updates

Decision Matrix

Consistent Hashing for Distributed Caches

When you have multiple cache nodes, how do you decide which node stores which key? Simple modulo hashing (hash(key) % N) causes massive redistribution when nodes change:

  • Add server: ~50% of keys move
  • Remove server: ~50% of keys move

Consistent hashing minimizes redistribution to ~1/N of keys.

Implementation

typescript
import crypto from 'crypto';
class ConsistentHash {  private ring: Map<number, string> = new Map();  private sortedKeys: number[] = [];  private virtualNodes: number = 150;
  private hash(key: string): number {    return parseInt(      crypto        .createHash('md5')        .update(key)        .digest('hex')        .substring(0, 8),      16    );  }
  addServer(server: string): void {    // Create virtual nodes for even distribution    for (let i = 0; i < this.virtualNodes; i++) {      const hash = this.hash(`${server}:vnode:${i}`);      this.ring.set(hash, server);      this.sortedKeys.push(hash);    }    this.sortedKeys.sort((a, b) => a - b);  }
  removeServer(server: string): void {    for (let i = 0; i < this.virtualNodes; i++) {      const hash = this.hash(`${server}:vnode:${i}`);      this.ring.delete(hash);      const index = this.sortedKeys.indexOf(hash);      if (index > -1) {        this.sortedKeys.splice(index, 1);      }    }  }
  getServer(key: string): string | undefined {    if (this.sortedKeys.length === 0) return undefined;
    const hash = this.hash(key);
    // Binary search for next server on ring    let idx = this.sortedKeys.findIndex(k => k >= hash);    if (idx === -1) idx = 0; // Wrap around
    const serverHash = this.sortedKeys[idx];    return this.ring.get(serverHash);  }}
// Usageconst hashRing = new ConsistentHash();hashRing.addServer('cache-node-1');hashRing.addServer('cache-node-2');hashRing.addServer('cache-node-3');
const server = hashRing.getServer('user:12345');// Returns: 'cache-node-2'

Why Virtual Nodes Matter

Without virtual nodes, simple consistent hashing can create uneven distribution. Virtual nodes (vnodes) solve this:

  • Each physical node gets 100-200 virtual nodes scattered on the ring
  • More uniform data distribution
  • Smoother load balancing when adding/removing nodes
  • Can weight servers by capacity (more vnodes = more data)
typescript
// Weight by capacityconst optimalVnodes = Math.ceil(  150 * (serverCapacity / averageCapacity));
// High-capacity server gets more datahashRing.addServer('high-capacity', 225); // 1.5xhashRing.addServer('low-capacity', 75); // 0.5x

Multi-Tier Caching Architecture

Real performance comes from layering caches strategically. Here's a practical three-tier architecture:

L1: In-Process Memory Cache

  • Size: 50-100 MB per instance
  • TTL: 30-60 seconds
  • Purpose: Ultra-fast access for hot data
  • Technology: LRU cache

L2: Distributed Redis Cache

  • Size: 10-100 GB cluster
  • TTL: 5-60 minutes
  • Purpose: Shared cache across instances
  • Technology: ElastiCache Redis cluster

L3: CDN Edge Cache

  • Size: Unlimited (CloudFront)
  • TTL: 1 hour - 1 year
  • Purpose: Global edge distribution
  • Technology: CloudFront

Implementation

typescript
import LRU from 'lru-cache';
class MultiTierCache {  private l1Cache: LRU<string, any>;  private l2Cache: Redis;
  constructor() {    this.l1Cache = new LRU({      max: 500, // Max items      maxSize: 50 * 1024 * 1024, // 50 MB      sizeCalculation: (value) => {        return JSON.stringify(value).length;      },      ttl: 1000 * 60, // 1 minute    });  }
  async get<T>(    key: string,    fetcher: () => Promise<T>  ): Promise<T> {    // L1: Check in-memory cache    if (this.l1Cache.has(key)) {      return this.l1Cache.get(key);    }
    // L2: Check Redis    const l2Result = await this.l2Cache.get(key);    if (l2Result) {      const value = JSON.parse(l2Result);      // Populate L1      this.l1Cache.set(key, value);      return value;    }
    // Cache miss - fetch from origin    const value = await fetcher();
    // Populate all cache layers    this.l1Cache.set(key, value);    await this.l2Cache.set(      key,      JSON.stringify(value),      'EX',      3600    );
    return value;  }
  async invalidate(key: string): Promise<void> {    // Invalidate all tiers    this.l1Cache.delete(key);    await this.l2Cache.del(key);  }}

CloudFront Caching Strategies

CDN caching is different from application caching. You're distributing content globally with long TTLs, which means invalidation strategy matters.

Cache Behavior Configuration

Different content types need different cache policies:

typescript
import * as cloudfront from 'aws-cdk-lib/aws-cloudfront';import * as cdk from 'aws-cdk-lib';
// Static assets (images, CSS, JS)const staticBehavior = {  pathPattern: '/static/*',  cachePolicy: new cloudfront.CachePolicy(    this,    'StaticCachePolicy',    {      minTtl: cdk.Duration.seconds(0),      defaultTtl: cdk.Duration.hours(24),      maxTtl: cdk.Duration.days(365),      enableAcceptEncodingGzip: true,      enableAcceptEncodingBrotli: true,      queryStringBehavior:        cloudfront.CacheQueryStringBehavior.none(),      headerBehavior:        cloudfront.CacheHeaderBehavior.none(),      cookieBehavior:        cloudfront.CacheCookieBehavior.none(),    }  ),};
// API responses (short-lived)const apiCacheBehavior = {  pathPattern: '/api/public/*',  cachePolicy: new cloudfront.CachePolicy(    this,    'ApiCachePolicy',    {      minTtl: cdk.Duration.seconds(0),      defaultTtl: cdk.Duration.seconds(60),      maxTtl: cdk.Duration.minutes(5),      queryStringBehavior:        cloudfront.CacheQueryStringBehavior.all(),      headerBehavior:        cloudfront.CacheHeaderBehavior.allowList(          'Authorization'        ),    }  ),};
// Dynamic content (no cache)const dynamicBehavior = {  pathPattern: '/api/user/*',  cachePolicy: cloudfront.CachePolicy.CACHING_DISABLED,};

Invalidation Strategy

CloudFront invalidation costs add up ($0.005 per path after first 1,000/month). Use versioned URLs instead:

typescript
// Bad: Requires invalidationconst assetUrl = '/static/app.js';await cloudfront.createInvalidation({  DistributionId: 'E1234567890',  InvalidationBatch: {    CallerReference: Date.now().toString(),    Paths: {      Quantity: 1,      Items: ['/static/app.js'],    },  },});
// Good: Versioned URL (no invalidation needed)const buildHash = process.env.BUILD_HASH;const assetUrl = `/static/app.${buildHash}.js`;// New version = new URL = automatic cache busting

Client-Side Caching with React Query

Frontend caching is often overlooked but critical for user experience. React Query (TanStack Query) provides sophisticated client-side caching with stale-while-revalidate pattern.

typescript
import {  useQuery,  useMutation,  useQueryClient,} from '@tanstack/react-query';
function UserProfile({ userId }: { userId: string }) {  const queryClient = useQueryClient();
  // Query with caching and stale-while-revalidate  const { data: user, isLoading } = useQuery({    queryKey: ['user', userId],    queryFn: () => fetchUser(userId),    staleTime: 5 * 60 * 1000, // Fresh for 5 minutes    gcTime: 30 * 60 * 1000, // Keep in cache for 30 minutes    refetchOnWindowFocus: true,    refetchOnReconnect: true,  });
  // Mutation with optimistic updates  const updateMutation = useMutation({    mutationFn: (data: Partial<User>) =>      updateUser(userId, data),
    onMutate: async newData => {      // Cancel outgoing refetches      await queryClient.cancelQueries({        queryKey: ['user', userId],      });
      // Snapshot previous value      const previous = queryClient.getQueryData([        'user',        userId,      ]);
      // Optimistically update cache      queryClient.setQueryData(        ['user', userId],        (old: any) => ({          ...old,          ...newData,        })      );
      return { previous };    },
    onError: (err, variables, context) => {      // Rollback on error      queryClient.setQueryData(        ['user', userId],        context?.previous      );    },
    onSettled: () => {      // Refetch after mutation      queryClient.invalidateQueries({        queryKey: ['user', userId],      });    },  });
  return (    <div>      {isLoading ? 'Loading...' : user?.name}      <button        onClick={() =>          updateMutation.mutate({ name: 'New Name' })        }      >        Update      </button>    </div>  );}

Prefetching for Better UX

Prefetch data before users need it for instant navigation:

typescript
function UserList() {  const queryClient = useQueryClient();
  const { data: users } = useQuery({    queryKey: ['users'],    queryFn: fetchUsers,  });
  // Prefetch on hover  const handleUserHover = (userId: string) => {    queryClient.prefetchQuery({      queryKey: ['user', userId],      queryFn: () => fetchUser(userId),    });  };
  return (    <ul>      {users?.map(user => (        <li          key={user.id}          onMouseEnter={() => handleUserHover(user.id)}        >          <Link to={`/user/${user.id}`}>            {user.name}          </Link>        </li>      ))}    </ul>  );}

Cache Monitoring and Optimization

You can't optimize what you don't measure. Here are the critical metrics:

Key Metrics

1. Hit Rate

typescript
class CacheMetrics {  private hits = 0;  private misses = 0;
  recordHit(): void {    this.hits++;  }
  recordMiss(): void {    this.misses++;  }
  getHitRate(): number {    const total = this.hits + this.misses;    return total === 0 ? 0 : (this.hits / total) * 100;  }}

Target: 85-95% depending on workload

  • Below 80%: Investigate cache key design, TTL settings
  • Formula: (hits / (hits + misses)) * 100

2. Latency Percentiles

  • P50: ~1-2ms for Redis
  • P99: Should be <10ms
  • P99.9: Alert if >50ms

3. Memory Utilization

  • Target: 70-80% usage
  • Alert: >90% (risk of evictions)

4. Eviction Rate

  • High eviction = need more memory or shorter TTLs

Monitoring Implementation

typescript
import { CloudWatch } from 'aws-sdk';
class CacheMonitor {  private cloudwatch: CloudWatch;
  async trackMetrics(    cacheKey: string,    hit: boolean,    latency: number  ): Promise<void> {    await this.cloudwatch      .putMetricData({        Namespace: 'CustomCache',        MetricData: [          {            MetricName: 'CacheHitRate',            Value: hit ? 1 : 0,            Unit: 'Count',            Dimensions: [              { Name: 'CacheLayer', Value: 'Redis' },            ],          },          {            MetricName: 'CacheLatency',            Value: latency,            Unit: 'Milliseconds',            Dimensions: [              { Name: 'CacheLayer', Value: 'Redis' },            ],          },        ],      })      .promise();  }
  async getCacheHitRate(    period: number = 300  ): Promise<number> {    const result = await this.cloudwatch      .getMetricStatistics({        Namespace: 'CustomCache',        MetricName: 'CacheHitRate',        StartTime: new Date(Date.now() - period * 1000),        EndTime: new Date(),        Period: period,        Statistics: ['Average'],        Dimensions: [          { Name: 'CacheLayer', Value: 'Redis' },        ],      })      .promise();
    return result.Datapoints?.[0]?.Average ?? 0;  }}

Common Pitfalls and Lessons

1. Over-Caching Dynamic Data

Caching user-specific data with long TTL leads to users seeing stale data and increased support tickets.

Solution: Classify data by volatility:

typescript
const cacheStrategies = {  static: {    ttl: 86400 * 7, // 1 week    pattern: 'static:*',  },  config: {    ttl: 3600, // 1 hour    pattern: 'config:*',  },  userProfile: {    ttl: 300, // 5 minutes    pattern: 'user:*',    invalidateOn: ['user.updated'],  },  realtime: {    ttl: 0, // Don't cache    pattern: 'inventory:*',  },};

2. Poor Cache Key Design

Including timestamps or random values in cache keys destroys hit rate.

typescript
// Bad: Unnecessary variabilityconst key = `user:${userId}:${timestamp}:${requestId}`;
// Good: Deterministic and minimalconst key = `user:${userId}`;
// Good: Include only meaningful parametersconst key = `user:${userId}:posts:${page}`;

3. Ignoring Cache Failures

Cache failure shouldn't take down your application. Always implement fallback:

typescript
class ResilientCache {  async get<T>(    key: string,    fetcher: () => Promise<T>  ): Promise<T> {    try {      const cached = await Promise.race([        redis.get(key),        this.timeout(100), // 100ms timeout      ]);
      if (cached) return JSON.parse(cached);    } catch (error) {      // Log but don't throw      logger.warn('Cache failure, using origin', {        key,        error,      });    }
    // Fetch from origin regardless    return fetcher();  }}

4. CloudFront Invalidation Abuse

Frequent invalidation racks up costs. Use versioned URLs instead:

typescript
class AssetVersioning {  private buildHash: string;
  constructor() {    this.buildHash =      process.env.BUILD_HASH || Date.now().toString();  }
  // Automatic cache busting via URL  getAssetUrl(path: string): string {    return `${path}?v=${this.buildHash}`;  }}

Cost Optimization

AWS Service Pricing (us-east-1)

ElastiCache Redis (cache.r6g.large: 13.07 GB):

  • On-Demand: 0.206/hour= 0.206/hour = ~150/month per node
  • 3-node cluster: ~$450/month

MemoryDB (db.r6g.large: 13.07 GB):

  • On-Demand: 0.406/hour= 0.406/hour = ~293/month per node
  • 3-node cluster: ~$879/month (1.5x ElastiCache)

CloudFront:

  • First 10 TB/month: $0.085/GB
  • HTTP/HTTPS requests: $0.0075 per 10,000
  • Invalidation: First 1,000 paths free, $0.005 per path after

Right-Sizing Strategy

typescript
class CacheOptimization {  async analyzeUtilization(): Promise<Report> {    const metrics = await this.getWeeklyMetrics();
    const avgMemoryUsage = metrics.memory.average;    const currentCapacity = this.getCurrentCapacity();
    const recommendations = [];
    // Consistently low usage    if (avgMemoryUsage < currentCapacity * 0.6) {      const recommendedSize =        this.calculateOptimalSize(metrics.memory.peak);      const savings = this.calculateSavings(        currentCapacity,        recommendedSize      );
      recommendations.push({        type: 'DOWNSIZE',        currentSize: currentCapacity,        recommendedSize,        monthlySavings: savings,      });    }
    // High eviction rate    if (metrics.evictions.perDay > 1000) {      recommendations.push({        type: 'UPSIZE',        reason: 'High eviction rate impacting hit rate',        impact: 'Hit rate could improve by 15-20%',      });    }
    return { metrics, recommendations };  }}

Key Takeaways

Working with caching across multiple projects has taught me these patterns:

1. Cache patterns matter: Cache-aside for read-heavy, write-through for consistency, write-behind for write-heavy. Choose based on your actual workload.

2. Prevent stampede early: Implement distributed locking and request coalescing before you have a problem. It's much harder to add after an incident.

3. AWS services aren't interchangeable: ElastiCache for general caching, MemoryDB when you need durability, DAX only for DynamoDB. Don't overpay for features you don't need.

4. Multi-tier caching works: L1 in-memory + L2 Redis + L3 CDN provides the best performance per cost. Each layer serves a purpose.

5. Monitor continuously: Cache hit rate, latency, memory usage, and cost per request. Right-size monthly based on actual utilization.

6. Design for failure: Cache should improve performance, not become a single point of failure. Always implement graceful degradation.

7. Version URLs, don't invalidate: CloudFront invalidation costs add up. Versioned assets are free and instant.

The difference between a 15% hit rate and 90% hit rate is often just proper cache key design and TTL management. Start with the basics, monitor everything, and optimize based on real metrics.

Related Posts