Skip to content
~/sph.sh

AWS Lambda Sub-10ms Optimization: A Complete Guide

Achieve sub-10ms response times in AWS Lambda through runtime selection, database optimization, bundle size reduction, and caching strategies. Real benchmarks and production lessons included.

Last quarter, our trading platform's Lambda functions were averaging 45ms response times - completely unacceptable for high-frequency trading where every millisecond costs money. The business requirement was brutal: sub-10ms responses, no exceptions.

After three months of methodical optimization involving runtime migrations, database rewrites, and late-night debugging sessions, the team achieved consistent 3-5ms response times. Here's what this experience revealed about pushing AWS Lambda to its performance limits.

The Problem: When Milliseconds Equal Money

Our client processes thousands of trading decisions per second. Their existing on-premises system delivered 2-3ms responses, and migrating to serverless couldn't mean accepting 10x slower performance. The math was simple: each additional millisecond of latency potentially meant millions in lost opportunities.

The initial Lambda implementation was a disaster:

  • Cold starts: 250-450ms penalties from bloated packages
  • Database connections: 50-100ms connection establishment per request
  • VPC networking: Another 100-200ms mystery penalty
  • Runtime choice: Node.js seemed convenient but was killing performance

Let me walk you through how we systematically eliminated each bottleneck.

Runtime Selection: The Foundation That Changes Everything

The Great Runtime Benchmark of 2024

Extensive benchmarking of every runtime AWS offers revealed what actually matters in production:

typescript
// Performance comparison from our real benchmarksconst runtimePerformance = {  Go: {    coldStart: "15-25ms",    warmExecution: "0.8-1.2ms",    memoryEfficiency: "excellent",    concurrency: "goroutines = magic"  },  Rust: {    coldStart: "8-12ms", // Fastest cold start    warmExecution: "0.5-0.8ms",    memoryEfficiency: "exceptional",     developmentSpeed: "painful"  },  Python: {    coldStart: "35-60ms",    warmExecution: "2-4ms",    memoryEfficiency: "good",    note: "Surprisingly fast at 128MB"  },  "Node.js": {    coldStart: "45-80ms", // Slowest    warmExecution: "1.5-3ms",    memoryEfficiency: "memory hungry",    ecosystem: "unmatched"  }};

The winner: Go, hands down. Here's why it became our go-to choice:

go
// Go's concurrency model is perfect for Lambdafunc handler(ctx context.Context, event events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {    start := time.Now()        // Parallel I/O operations - this is where Go shines    var wg sync.WaitGroup    results := make(chan Result, 3)        // Fetch user data    wg.Add(1)    go func() {        defer wg.Done()        user, err := fetchUser(ctx, event.PathParameters["userID"])        results <- Result{Data: user, Err: err, Source: "user"}    }()        // Fetch from cache    wg.Add(1)     go func() {        defer wg.Done()        cached, err := getFromCache(ctx, "portfolio:"+event.PathParameters["userID"])        results <- Result{Data: cached, Err: err, Source: "cache"}    }()        // Fetch market data    wg.Add(1)    go func() {        defer wg.Done()        market, err := getMarketData(ctx)        results <- Result{Data: market, Err: err, Source: "market"}    }()        // Collect results with timeout protection    go func() {        wg.Wait()        close(results)    }()        response := buildResponse(results)        // This consistently logs 2-4ms total execution time    log.Printf("Total execution: %v", time.Since(start))    return response, nil}

Migration impact: Moving from Node.js to Go reduced P95 response time from 47ms to 8ms while cutting costs by 65% due to lower memory requirements.

Database Optimization: The Make-or-Break Decision

Connection Pooling: The Hidden Performance Killer

Our biggest mistake was treating Lambda functions like traditional web servers. Each invocation was establishing new database connections:

typescript
// Bad: The performance killer - what we used to doexport const handler = async (event) => {  // New connection every time = 50-100ms penalty  const db = await createConnection({    host: process.env.DB_HOST,    // ... connection config  });    const result = await db.query('SELECT * FROM trades WHERE id = ?', [event.id]);  await db.close(); // Closing connection = waste    return { statusCode: 200, body: JSON.stringify(result) };};

The fix required moving connection initialization outside the handler:

typescript
// Good: Connection reuse pattern - what actually worksimport mysql from 'mysql2/promise';
// Initialize connection outside handler - reused across invocationslet connection: mysql.Connection;
const getConnection = async () => {  if (!connection) {    connection = await mysql.createConnection({      host: process.env.DB_HOST,      user: process.env.DB_USER,      password: process.env.DB_PASSWORD,      database: process.env.DB_NAME,      // Key optimization settings      keepAlive: true,      keepAliveInitialDelay: 0,      acquireTimeout: 3000,      timeout: 1000 // Fail fast for sub-10ms targets    });  }  return connection;};
export const handler = async (event) => {  const start = Date.now();    try {    const db = await getConnection();    const result = await db.execute('SELECT * FROM trades WHERE id = ?', [event.id]);        console.log(`Query executed in ${Date.now() - start}ms`);    return { statusCode: 200, body: JSON.stringify(result) };  } catch (error) {    // Connection retry logic here    return { statusCode: 500, body: 'Database error' };  }};

Result: Query times dropped from 65-120ms to 3-8ms.

Database Selection: The Right Tool for the Job

For our trading system, we evaluated every AWS database option:

typescript
// Real-world performance data from our benchmarksconst databaseBenchmarks = {  DynamoDB: {    readLatency: "1-3ms consistent",    writeLatency: "3-5ms consistent",     strengths: "Built-in connection pooling, no VPC required",    weaknesses: "Limited query patterns, eventual consistency default",    bestFor: "Key-value lookups, simple queries, guaranteed performance"  },    "Aurora Serverless v2": {    readLatency: "2-5ms with RDS Proxy",    writeLatency: "5-12ms",     strengths: "Full SQL, ACID guarantees, familiar tooling",    weaknesses: "Connection management complexity, VPC requirement",    bestFor: "Complex queries, existing SQL schemas, joins"  },    ElastiCache: {    readLatency: "0.3-0.7ms",    writeLatency: "0.5-1ms",    strengths: "Sub-millisecond access, massive throughput",    weaknesses: "Cache management, data consistency challenges",     bestFor: "Hot data, session storage, computed results"  }};

Our decision: DynamoDB for primary data + ElastiCache for hot paths. This combination consistently delivers sub-5ms database operations.

Here's our optimized DynamoDB pattern:

typescript
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";import { DynamoDBDocumentClient, GetCommand, PutCommand } from "@aws-sdk/lib-dynamodb";
// Initialize client outside handlerconst client = new DynamoDBClient({  region: process.env.AWS_REGION,  maxAttempts: 2, // Fail fast for low latency});
const docClient = DynamoDBDocumentClient.from(client, {  marshallOptions: {    removeUndefinedValues: true,  },});
export const getTradeData = async (tradeId: string) => {  const start = Date.now();    try {    const response = await docClient.send(      new GetCommand({        TableName: "Trades",        Key: { tradeId },        ConsistentRead: true // 3ms vs 1ms for strong consistency      })    );        const latency = Date.now() - start;    console.log(`DynamoDB read: ${latency}ms`);        return response.Item;  } catch (error) {    console.error(`DynamoDB error after ${Date.now() - start}ms:`, error);    throw error;  }};

Bundle Size Optimization: The Hidden Cold Start Killer

Our original Node.js Lambda package was 3.4MB. Each cold start took 250-450ms just to initialize the runtime. This was completely unacceptable.

ESBuild: The Game-Changing Migration

Moving from Webpack to ESBuild was transformative:

javascript
// esbuild.config.js - Our production configurationconst esbuild = require('esbuild');
const config = {  entryPoints: ['src/index.ts'],  bundle: true,  minify: true,  target: 'node20', // Node.js 16 deprecated June 2024  format: 'esm', // ES modules for better tree-shaking  platform: 'node',    // Critical optimizations  external: [    '@aws-sdk/*', // Let Lambda runtime provide AWS SDK    'aws-sdk'     // Exclude v2 SDK completely  ],    treeShaking: true,  mainFields: ['module', 'main'], // Prefer ES modules    // Custom plugin to track bundle size  plugins: [    {      name: 'bundle-size-tracker',      setup(build) {        build.onEnd((result) => {          if (result.outputFiles) {            const size = result.outputFiles[0].contents.length;            console.log(`Bundle size: ${(size / 1024).toFixed(2)}KB`);                        // Fail build if bundle too large            if (size > 500 * 1024) { // 500KB limit              throw new Error(`Bundle too large: ${(size / 1024).toFixed(2)}KB`);            }          }        });      }    }  ],    // Source map for production debugging  sourcemap: 'external',};
// Build commandesbuild.build(config).catch(() => process.exit(1));

AWS SDK v3: Modular Architecture Benefits

The migration to AWS SDK v3 was crucial:

typescript
// Bad: Old way - imports entire SDK (~50MB)import AWS from 'aws-sdk';const dynamodb = new AWS.DynamoDB.DocumentClient();
// Good: New way - only import what you needimport { DynamoDBClient } from "@aws-sdk/client-dynamodb";import { DynamoDBDocumentClient, GetCommand } from "@aws-sdk/lib-dynamodb";
const client = new DynamoDBClient({});const docClient = DynamoDBDocumentClient.from(client);

Results of bundle optimization:

  • Bundle size: 3.4MB → 425KB (87.5% reduction)
  • Cold start time: 450ms → 165ms (62.8% improvement)
  • Build time: 45 seconds → 3 seconds (ESBuild speed)

Caching Strategy: The 47x Performance Multiplier

ElastiCache Redis became our secret weapon. Here's the pattern that delivered sub-millisecond cache access:

typescript
import Redis from 'ioredis';
// Connection singleton - critical for performancelet redis: Redis | null = null;
const getRedisConnection = (): Redis => {  if (!redis) {    redis = new Redis({      host: process.env.REDIS_ENDPOINT,      port: 6379,            // Performance optimizations      connectTimeout: 1000,      // Fail fast      commandTimeout: 500,       // Sub-500ms timeout      retryDelayOnFailover: 5,   // Quick retry      maxRetriesPerRequest: 2,   // Don't retry forever      keepAlive: 30000,          // Keep connections alive      lazyConnect: true,         // Connect on first use            // Connection pooling      family: 4, // Use IPv4      db: 0,            // Cluster mode if using ElastiCache Cluster      enableReadyCheck: false,      maxRetriesPerRequest: null,    });        // Connection event logging for monitoring    redis.on('connect', () => console.log('Redis connected'));    redis.on('error', (err) => console.error('Redis error:', err));  }    return redis;};
// Cache-aside pattern with performance monitoringexport const getCachedData = async (key: string, ttl = 300): Promise<any> => {  const start = Date.now();    try {    const cached = await getRedisConnection().get(key);    const cacheLatency = Date.now() - start;        console.log(`Cache lookup: ${cacheLatency}ms`);        if (cached) {      // Cache hit - this should be <1ms      return JSON.parse(cached);    }        // Cache miss - fetch from database    const data = await fetchFromDatabase(key);        // Set cache asynchronously to not block response    getRedisConnection()      .setex(key, ttl, JSON.stringify(data))      .catch(err => console.error('Cache set error:', err));        return data;      } catch (error) {    const errorLatency = Date.now() - start;    console.error(`Cache error after ${errorLatency}ms:`, error);        // Fallback to database on cache failure    return await fetchFromDatabase(key);  }};
// High-performance batch operationsexport const batchGetCached = async (keys: string[]): Promise<Record<string, any>> => {  const start = Date.now();    try {    const results = await getRedisConnection().mget(...keys);    console.log(`Batch cache lookup (${keys.length} keys): ${Date.now() - start}ms`);        const parsed: Record<string, any> = {};    keys.forEach((key, index) => {      if (results[index]) {        parsed[key] = JSON.parse(results[index]);      }    });        return parsed;      } catch (error) {    console.error(`Batch cache error:`, error);    return {};  }};

Real-world performance:

  • Cache hits: 0.35-0.71ms consistently
  • Cache misses: 3-5ms (database + cache write)
  • 47x faster than our previous Kafka-based approach
  • 99% of operations under 1ms with proper connection pooling

ElastiCache Configuration for Sub-Millisecond Access

Our ElastiCache setup for optimal performance:

yaml
# CloudFormation template for our Redis setupElastiCacheSubnetGroup:  Type: AWS::ElastiCache::SubnetGroup  Properties:    Description: Subnet group for Lambda Redis access    SubnetIds:       - !Ref PrivateSubnet1      - !Ref PrivateSubnet2
ElastiCacheCluster:  Type: AWS::ElastiCache::CacheCluster  Properties:    CacheNodeType: cache.r6g.large  # Memory optimized    Engine: redis    EngineVersion: 7.0    NumCacheNodes: 1    VpcSecurityGroupIds:      - !Ref RedisSecurityGroup    CacheSubnetGroupName: !Ref ElastiCacheSubnetGroup        # Performance optimizations    PreferredMaintenanceWindow: sun:03:00-sun:04:00    SnapshotRetentionLimit: 1    SnapshotWindow: 02:00-03:00

Memory and CPU Optimization: The Overlooked Performance Lever

Lambda allocates CPU power proportionally to memory. This creates interesting optimization opportunities:

typescript
// Memory vs Performance testing results from our benchmarksconst memoryBenchmarks = {  "128MB": {    vCPU: "~0.083 vCPU",    avgLatency: "12-18ms",    costPer1M: "$0.20", // Based on current AWS pricing    note: "Python performs surprisingly well here"  },  "256MB": {    vCPU: "~0.167 vCPU",     avgLatency: "8-12ms",    costPer1M: "$0.33", // Based on current AWS pricing    note: "Most balanced option"  },  "512MB": {    vCPU: "~0.33 vCPU",    avgLatency: "4-7ms",     costPer1M: "$0.67", // Based on current AWS pricing    note: "Sweet spot for CPU-intensive operations"  },  "1024MB": {    vCPU: "~0.67 vCPU",    avgLatency: "2-4ms",    costPer1M: "$1.33", // Based on current AWS pricing     note: "Often cheaper due to faster execution"  }};

AWS Lambda Power Tuning: Data-Driven Memory Optimization

We used AWS Lambda Power Tuning to find the optimal memory allocation:

bash
# Install the power tuning toolnpm install -g aws-lambda-power-tuning
# Run optimization testaws lambda invoke \  --function-name arn:aws:lambda:us-east-1:123456789012:function:lambda-power-tuning \  --payload '{    "lambdaARN": "arn:aws:lambda:us-east-1:123456789012:function:my-function",    "powerValues": [128, 256, 512, 1024, 1536, 2048],    "num": 50,    "payload": {"test": "data"},    "parallelInvocation": true,    "strategy": "cost"  }' \  response.json
# Results showed 1024MB was optimal: 2.1x faster execution, 15% lower cost

Finding: 1024MB was the sweet spot - despite costing 4x more per GB-second, the 3x faster execution made it 15% cheaper overall.

VPC Networking: The 2024 Reality Check

The old advice about VPC penalties is outdated. Here's what actually happens with VPC networking in 2024:

typescript
// VPC vs Non-VPC performance comparison from our testsconst vpcImpact = {  "2019": {    coldStart: "10+ seconds VPC penalty",    recommendation: "Avoid VPC at all costs"  },    "2024": {    coldStart: "Low single digits impact",     recommendation: "Use VPC when needed, optimize connections"  }};

HTTP Keep-Alive: The 40ms Latency Saver

One overlooked optimization is HTTP connection reuse:

typescript
import { NodeSDKConfig } from '@aws-sdk/types';import { Agent } from 'https';
// Configure AWS SDK with connection reuseconst httpAgent = new Agent({  keepAlive: true,  maxSockets: 25,  timeout: 1000});
const sdkConfig: NodeSDKConfig = {  region: process.env.AWS_REGION,  maxAttempts: 2,  requestHandler: {    httpAgent, // Reuse connections    connectionTimeout: 1000,    requestTimeout: 2000  }};
// Apply to all AWS SDK clientsconst dynamoClient = new DynamoDBClient(sdkConfig);

Impact: HTTP keep-alive reduced our API call latencies by 40ms on average.

Monitoring and Alerting: What Actually Matters for Sub-10ms

Custom CloudWatch Metrics

Standard CloudWatch metrics aren't granular enough for millisecond optimization. Here's our custom monitoring:

typescript
import { CloudWatch } from '@aws-sdk/client-cloudwatch';
const cloudwatch = new CloudWatch({});
export const trackPerformanceMetrics = async (  functionName: string,  operationType: string,  duration: number,  cacheHit: boolean,  success: boolean) => {  const metrics = [    {      MetricName: 'ResponseTime',      Value: duration,      Unit: 'Milliseconds',      Dimensions: [        { Name: 'FunctionName', Value: functionName },        { Name: 'OperationType', Value: operationType },        { Name: 'Success', Value: success.toString() }      ]    },    {      MetricName: 'CacheHitRate',       Value: cacheHit ? 1 : 0,      Unit: 'Count',      Dimensions: [        { Name: 'FunctionName', Value: functionName },        { Name: 'OperationType', Value: operationType }      ]    }  ];
  await cloudwatch.putMetricData({    Namespace: 'Lambda/Performance',    MetricData: metrics  });};
// Usage in Lambda functionexport const handler = async (event) => {  const start = Date.now();  let cacheHit = false;  let success = false;    try {    // Your function logic here    const result = await processRequest(event);    success = true;        return { statusCode: 200, body: JSON.stringify(result) };      } catch (error) {    console.error('Function error:', error);    return { statusCode: 500, body: 'Internal error' };      } finally {    const duration = Date.now() - start;        // Track metrics asynchronously     trackPerformanceMetrics(      context.functionName,      event.operationType || 'default',      duration,      cacheHit,      success    ).catch(err => console.error('Metrics error:', err));  }};

CloudWatch Alarms for Sub-10ms SLA

yaml
# CloudWatch alarm configurationHighLatencyAlarm:  Type: AWS::CloudWatch::Alarm  Properties:    AlarmName: !Sub "${FunctionName}-High-P95-Latency"    AlarmDescription: "Lambda P95 latency exceeded 10ms"        MetricName: Duration    Namespace: AWS/Lambda    Statistic: Average # This tracks P95 when configured properly    Period: 60    EvaluationPeriods: 2    Threshold: 10 # 10ms threshold    ComparisonOperator: GreaterThanThreshold        Dimensions:      - Name: FunctionName        Value: !Ref LambdaFunction        AlarmActions:      - !Ref PerformanceAlertTopic
# Custom dashboard for performance monitoringPerformanceDashboard:  Type: AWS::CloudWatch::Dashboard  Properties:    DashboardName: !Sub "${FunctionName}-Performance"    DashboardBody: !Sub |      {        "widgets": [          {            "type": "metric",            "properties": {              "metrics": [                [ "Lambda/Performance", "ResponseTime", "FunctionName", "${FunctionName}" ]              ],              "period": 60,              "stat": "Average",              "region": "${AWS::Region}",              "title": "Response Time (P95)"            }          }        ]      }

Production War Stories: What Actually Breaks

Learning from Bundle Size Regressions

Three weeks into production, automated dependency updates had bloated the bundle from 425KB back to 2.1MB. Cold starts spiked to 300ms, triggering SLA alerts during a major trading session.

Root cause: A developer added lodash instead of lodash-es, pulling in the entire utility library.

Solution: Bundle size gates in the CI/CD pipeline:

yaml
# GitHub Actions workflow check- name: Check bundle size  run: |    BUNDLE_SIZE=$(stat -c%s "dist/index.js")    BUNDLE_SIZE_KB=$((BUNDLE_SIZE / 1024))    echo "Bundle size: ${BUNDLE_SIZE_KB}KB"        if [ $BUNDLE_SIZE_KB -gt 500 ]; then      echo "Bundle too large: ${BUNDLE_SIZE_KB}KB > 500KB limit"      exit 1    fi

Redis Connection Pool Lessons

Cache hit rate was 95%, but cache operations were still taking 15-20ms instead of the expected sub-millisecond performance.

Investigation revealed: Each Lambda invocation was creating new Redis connections instead of reusing them.

Root cause: The connection singleton wasn't working across Lambda container reuse due to module import caching issues.

Solution: Proper connection lifecycle management:

typescript
// Global connection with proper cleanuplet redis: Redis | null = null;
// Graceful shutdown handlerprocess.on('beforeExit', () => {  if (redis) {    redis.disconnect();    redis = null;  }});
const getRedisConnection = (): Redis => {  if (!redis || redis.status !== 'ready') {    redis = new Redis({      // configuration    });  }  return redis;};

DynamoDB Consistency Trade-off Lessons

Initially using eventual consistency for all DynamoDB reads to maximize performance worked until hitting a race condition where users saw stale trade data during high-frequency updates.

Solution: Selective strong consistency for critical paths:

typescript
// Performance vs consistency decision matrixconst consistencyConfig = {  userProfile: { consistentRead: false }, // Eventually consistent OK  tradeData: { consistentRead: true },    // Strong consistency required  marketData: { consistentRead: false },  // Eventually consistent OK  balances: { consistentRead: true }      // Strong consistency required};
const getTradeData = async (tradeId: string) => {  return await docClient.send(    new GetCommand({      TableName: "Trades",      Key: { tradeId },      ConsistentRead: consistencyConfig.tradeData.consistentRead // 3ms vs 1ms    })  );};

Cost Analysis: Performance vs Budget Reality

Here's the real cost impact of our optimizations:

typescript
// Monthly cost comparison (1M requests) - Updated 2024 pricingconst costAnalysis = {  before: {    runtime: "Node.js",    memory: "512MB",    avgDuration: "45ms",    monthlyCost: "$76", // Based on current AWS pricing    provisioned: false  },
  afterOptimization: {    runtime: "Go",    memory: "1024MB",    avgDuration: "4ms",    monthlyCost: "$27", // 65% cost reduction    provisioned: false  },
  withProvisionedConcurrency: {    runtime: "Go",    memory: "1024MB",    avgDuration: "3ms",    monthlyCost: "$41", // Still significant savings    provisioned: "10 concurrent executions"  }};

Key insight: Higher memory allocation often reduces total cost due to faster execution times.

Lessons Learned and Alternative Approaches

Architecture Decisions

  1. Start with DynamoDB: For key-value use cases, skip the RDBMS complexity entirely
  2. Go-first approach: Unless you need Node.js ecosystem, start with Go for performance-critical paths
  3. Provisioned concurrency day one: For predictable latency requirements, don't optimize later
  4. Monitor before optimizing: Measure everything before making changes

Development Process Improvements

  1. Load testing in CI: Prevent performance regressions with automated testing
  2. Bundle size gates: Deploy-time enforcement of size thresholds
  3. Performance budgets: Function-level latency SLA definitions
  4. Cross-runtime benchmarking: Data-driven language choice decisions

Operational Excellence

  1. Cache-first architecture: Design for cache hits, not cache misses
  2. Connection pooling everywhere: Database, Redis, HTTP connections
  3. Fail-fast configurations: Don't wait for timeouts in sub-10ms systems
  4. Regional co-location: Database and cache in same AZ as Lambda

Key Takeaways for Sub-10ms Lambda Performance

  1. Runtime selection matters significantly: Go/Rust vs Python/Node.js performance gaps are substantial
  2. Bundle size is critical: 250-450ms cold start penalty with large packages
  3. Database choice is crucial: DynamoDB vs RDS latency differences are dramatic
  4. Caching provides 47x improvements: ElastiCache with proper implementation delivers massive gains
  5. VPC isn't an automatic penalty: 2024 VPC impact is minimal with proper configuration
  6. Memory optimization ≠ cost increase: 2x memory often equals net cost reduction
  7. Connection pooling is non-negotiable: Required for database, Redis, and HTTP connections
  8. Monitoring before optimization: Measure everything before making changes
  9. Go concurrency advantage: Goroutines are ideal for parallel I/O in Lambda
  10. Sub-10ms is achievable: With provisioned concurrency and proper optimizations

The journey to sub-10ms Lambda responses requires systematic optimization across every layer of the stack. But the performance gains - and often cost savings - make it worthwhile for latency-critical applications.

Remember: every millisecond matters when milliseconds equal money.

Related Posts