Migrating from Serverless Framework to AWS CDK: Part 4 - Database and Environment Management

Master DynamoDB migrations, environment variable management, secrets handling, and VPC configurations when moving from Serverless Framework to AWS CDK.

Week 5 of our CDK migration. We'd successfully moved 23 Lambda functions, but the real challenge began when our Lead DevOps Engineer announced she was leaving. "Good luck with the database migration," she said, handing over a sticky note with three DynamoDB table names that powered our $2.8M ARR SaaS platform.

That sticky note represented 4 years of customer data, 180K users, and zero documented backup procedures. This is the story of migrating stateful infrastructure without losing a single record - and the painful lessons learned about data dependencies in production systems.

Series Navigation:

The $47K Data Migration Disaster (Almost)#

Before diving into the technical patterns, let me share what happened when we tried to "just import" our production tables.

The Friday Afternoon Table Import#

March 15th, 3:47 PM. I confidently ran cdk deploy to import our main user table into CDK management. The deployment succeeded. Our monitoring stayed green. Everything looked perfect.

Monday morning, 6:23 AM. Slack notifications exploded. Our user registration API was throwing 500 errors. The table was gone. Not unreachable - completely deleted.

Root cause: CDK tried to "import" the table but interpreted the existing Serverless Framework CloudFormation template as conflicting. It deleted the old resource before creating the new one. Zero-downtime became zero-data.

Impact: 4 hours of downtime, emergency point-in-time recovery, and a very uncomfortable all-hands meeting about "following the deployment checklist."

Lesson: Never trust imports in production without explicit retention policies and rehearsal in staging.

DynamoDB Migration Strategies That Actually Work#

Safe Table Import Pattern#

Here's the battle-tested approach we use now for production migrations:

YAML
# serverless.yml - Original table definition
resources:
  Resources:
    UsersTable:
      Type: AWS::DynamoDB::Table
      Properties:
        TableName: ${self:service}-${opt:stage}-users
        AttributeDefinitions:
          - AttributeName: userId
            AttributeType: S
          - AttributeName: email
            AttributeType: S
        KeySchema:
          - AttributeName: userId
            KeyType: HASH
        GlobalSecondaryIndexes:
          - IndexName: email-index
            KeySchema:
              - AttributeName: email
                KeyType: HASH
            Projection:
              ProjectionType: ALL
        BillingMode: PAY_PER_REQUEST

CDK approach for existing tables:

TypeScript
// lib/constructs/production-table-import.ts
import { Table, ITable } from 'aws-cdk-lib/aws-dynamodb';
import { Construct } from 'constructs';
import { CustomResource, Duration } from 'aws-cdk-lib';

export interface ProductionTableImportProps {
  tableName: string;
  region?: string;
  account?: string;
  // Critical: Verify table exists before importing
  requireExistingTable: boolean;
}

export class ProductionTableImport extends Construct {
  public readonly table: ITable;
  
  constructor(scope: Construct, id: string, props: ProductionTableImportProps) {
    super(scope, id);
    
    if (props.requireExistingTable) {
      // First, verify the table actually exists
      const verifyFn = new NodejsFunction(this, 'VerifyTableExists', {
        entry: 'src/migrations/verify-table.ts',
        handler: 'handler',
        timeout: Duration.seconds(30),
        environment: {
          TABLE_NAME: props.tableName,
        },
      });

      // Custom resource that fails deployment if table doesn't exist
      new CustomResource(this, 'TableVerification', {
        serviceToken: verifyFn.functionArn,
        properties: {
          TableName: props.tableName,
        },
      });
    }
    
    // Only import after verification succeeds
    this.table = Table.fromTableAttributes(this, 'ImportedTable', {
      tableName: props.tableName,
      region: props.region,
      account: props.account,
      // CRITICAL: This prevents CDK from trying to manage the table lifecycle
      tableStreamArn: undefined,
    });
  }
}

// src/migrations/verify-table.ts - Prevents accidental table deletion
import { DynamoDBClient, DescribeTableCommand } from '@aws-sdk/client-dynamodb';

const client = new DynamoDBClient({});

export const handler = async (event: any) => {
  const tableName = event.ResourceProperties.TableName;
  
  try {
    // Verify table exists and is ACTIVE
    const result = await client.send(new DescribeTableCommand({
      TableName: tableName,
    }));
    
    if (result.Table?.TableStatus !== 'ACTIVE') {
      throw new Error(`Table ${tableName} is not ACTIVE (status: ${result.Table?.TableStatus})`);
    }
    
    // Log critical table info for audit trail
    console.log('Production table verified:', {
      tableName,
      itemCount: result.Table.ItemCount || 'unknown',
      sizeBytes: result.Table.TableSizeBytes || 'unknown',
      status: result.Table.TableStatus,
    });
    
    return { PhysicalResourceId: `verified-${tableName}` };
  } catch (error) {
    console.error('Table verification failed:', error);
    throw error; // Fail the CloudFormation deployment
  }
};

// Usage with production safety checks
const usersTable = new ProductionTableImport(this, 'UsersTable', {
  tableName: `my-service-${config.stage}-users`,
  requireExistingTable: config.stage === 'prod', // Only verify in production
}).table;

// Grant permissions as normal
usersTable.grantReadWriteData(createUserFn);

The Production-Grade Table Pattern#

After managing 180K users and 12M transactions, here's our bulletproof table creation pattern:

TypeScript
// lib/constructs/production-user-table.ts
import { 
  Table, 
  AttributeType, 
  BillingMode,
  TableEncryption,
  StreamViewType,
  ProjectionType
} from 'aws-cdk-lib/aws-dynamodb';
import { RemovalPolicy, Tags } from 'aws-cdk-lib';
import { Alarm, Metric, TreatMissingData } from 'aws-cdk-lib/aws-cloudwatch';

export class ProductionUserTable extends Table {
  constructor(scope: Construct, id: string, props: {
    stage: string;
    enableStreams?: boolean;
    enableBackup?: boolean;
  }) {
    super(scope, id, {
      // Versioned table names for blue-green deployments
      tableName: `my-service-${props.stage}-users-v3`,
      partitionKey: {
        name: 'userId',
        type: AttributeType.STRING,
      },
      sortKey: {
        name: 'recordType',  // Enables single-table design patterns
        type: AttributeType.STRING,
      },
      billingMode: BillingMode.PAY_PER_REQUEST,  // No provisioning guesswork
      encryption: TableEncryption.AWS_MANAGED,
      // ALWAYS enable point-in-time recovery in production
      pointInTimeRecovery: props.stage === 'prod' ? true : false,
      // NEVER accidentally delete production data
      removalPolicy: props.stage === 'prod' ? RemovalPolicy.RETAIN : RemovalPolicy.DESTROY,
      // Streams enable real-time processing and audit trails
      stream: props.enableStreams ? StreamViewType.NEW_AND_OLD_IMAGES : undefined,
    });
    
    // GSI for email-based lookups (learned this is critical for auth)
    this.addGlobalSecondaryIndex({
      indexName: 'EmailLookupIndex',
      partitionKey: {
        name: 'email',
        type: AttributeType.STRING,
      },
      sortKey: {
        name: 'recordType',
        type: AttributeType.STRING,
      },
      projectionType: ProjectionType.KEYS_ONLY,  // Minimize costs
    });
    
    // GSI for time-based queries (user activity, reporting)
    this.addGlobalSecondaryIndex({
      indexName: 'TimeSeriesIndex',
      partitionKey: {
        name: 'entityType',
        type: AttributeType.STRING,
      },
      sortKey: {
        name: 'timestamp',
        type: AttributeType.STRING,
      },
      projectionType: ProjectionType.KEYS_ONLY,
    });
    
    // Production monitoring (learned the hard way)
    this.createProductionAlarms(props.stage);
    
    // Cost tracking tags
    Tags.of(this).add('Service', 'my-service');
    Tags.of(this).add('Stage', props.stage);
    Tags.of(this).add('CostCenter', 'platform');
    Tags.of(this).add('DataClassification', 'sensitive');
  }
  
  private createProductionAlarms(stage: string) {
    if (stage !== 'prod') return;
    
    // Throttle alarm - any throttling is bad
    new Alarm(this, 'ThrottleAlarm', {
      metric: new Metric({
        namespace: 'AWS/DynamoDB',
        metricName: 'UserErrorEvents',
        dimensionsMap: {
          TableName: this.tableName,
        },
        statistic: 'Sum',
        period: Duration.minutes(5),
      }),
      threshold: 1,
      evaluationPeriods: 1,
      treatMissingData: TreatMissingData.NOT_BREACHING,
      alarmDescription: 'DynamoDB table is experiencing throttling',
    });
    
    // Error rate alarm
    new Alarm(this, 'ErrorRateAlarm', {
      metric: new Metric({
        namespace: 'AWS/DynamoDB',
        metricName: 'SystemErrorEvents',
        dimensionsMap: {
          TableName: this.tableName,
        },
        statistic: 'Sum',
        period: Duration.minutes(5),
      }),
      threshold: 5,
      evaluationPeriods: 2,
      alarmDescription: 'DynamoDB table experiencing system errors',
    });
  }
}

The Zero-Downtime Data Migration#

Moving 180K user records and 4 years of transaction history without service interruption required a bulletproof migration strategy. Here's the pattern that worked:

TypeScript
// lib/constructs/production-table-migrator.ts
import { CustomResource, Duration } from 'aws-cdk-lib';
import { Provider } from 'aws-cdk-lib/custom-resources';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';
import { RetentionDays } from 'aws-cdk-lib/aws-logs';

export class ProductionTableMigrator extends Construct {
  constructor(scope: Construct, id: string, props: {
    sourceTable: ITable;
    targetTable: ITable;
    batchSize?: number;
    enableDualWrite?: boolean;
  }) {
    super(scope, id);
    
    // Migration function with production settings
    const migrationFn = new NodejsFunction(this, 'MigrationFunction', {
      entry: 'src/migrations/production-table-migrator.ts',
      handler: 'handler',
      timeout: Duration.minutes(15),
      memorySize: 3008,  // Max memory for fastest processing
      reservedConcurrentExecutions: 5,  // Limit impact on other functions
      logRetention: RetentionDays.ONE_MONTH,  // Keep migration logs
      environment: {
        SOURCE_TABLE: props.sourceTable.tableName,
        TARGET_TABLE: props.targetTable.tableName,
        BATCH_SIZE: String(props.batchSize || 25),  // DynamoDB batch limit
        ENABLE_DUAL_WRITE: String(props.enableDualWrite || false),
        // Migration tracking
        MIGRATION_ID: `migration-${Date.now()}`,
      },
    });
    
    // Comprehensive permissions for migration
    props.sourceTable.grantFullAccess(migrationFn);  // Need scan/read
    props.targetTable.grantFullAccess(migrationFn);  // Need write/verify
    
    // Create custom resource with proper error handling
    const provider = new Provider(this, 'Provider', {
      onEventHandler: migrationFn,
      logRetention: RetentionDays.ONE_MONTH,
    });
    
    new CustomResource(this, 'DataMigration', {
      serviceToken: provider.serviceToken,
      properties: {
        SourceTable: props.sourceTable.tableName,
        TargetTable: props.targetTable.tableName,
        MigrationId: `migration-${Date.now()}`,
        // Force update only when tables change
        TableFingerprint: this.generateTableFingerprint(props),
      },
    });
  }
  
  private generateTableFingerprint(props: {
    sourceTable: ITable;
    targetTable: ITable;
  }): string {
    // Create unique fingerprint based on table properties
    return Buffer.from(
      `${props.sourceTable.tableName}-${props.targetTable.tableName}`
    ).toString('base64');
  }
}

// src/migrations/production-table-migrator.ts
import { 
  DynamoDBClient, 
  ScanCommand, 
  BatchWriteItemCommand,
  DescribeTableCommand,
} from '@aws-sdk/client-dynamodb';
import { marshall, unmarshall } from '@aws-sdk/util-dynamodb';

const client = new DynamoDBClient({
  maxAttempts: 5,  // Retry failed requests
  requestHandler: {
    connectionTimeout: 2000,
    requestTimeout: 30000,
  },
});

export const handler = async (event: any) => {
  const { RequestType, ResourceProperties } = event;
  const { SourceTable, TargetTable, MigrationId } = ResourceProperties;
  
  console.log('Migration event:', { RequestType, SourceTable, TargetTable, MigrationId });
  
  try {
    if (RequestType === 'Create' || RequestType === 'Update') {
      await migrateTableData(SourceTable, TargetTable, MigrationId);
    }
    
    return {
      PhysicalResourceId: `migration-${SourceTable}-to-${TargetTable}`,
      Data: {
        Status: 'Success',
        MigrationId,
      },
    };
  } catch (error) {
    console.error('Migration failed:', error);
    throw error;  // Fail CloudFormation deployment
  }
};

async function migrateTableData(sourceTable: string, targetTable: string, migrationId: string) {
  console.log(`Starting migration: ${sourceTable} -> ${targetTable}`);
  
  // First, verify both tables exist and are active
  await verifyTableState(sourceTable);
  await verifyTableState(targetTable);
  
  let lastEvaluatedKey: any = undefined;
  let totalItems = 0;
  let batchCount = 0;
  const batchSize = parseInt(process.env.BATCH_SIZE || '25');
  
  do {
    // Scan source table
    const scanResult = await client.send(new ScanCommand({
      TableName: sourceTable,
      Limit: batchSize,
      ExclusiveStartKey: lastEvaluatedKey,
    }));
    
    if (scanResult.Items && scanResult.Items.length > 0) {
      // Prepare batch write to target table
      const writeRequests = scanResult.Items.map(item => ({
        PutRequest: { Item: item },
      }));
      
      // Write batch to target table
      await client.send(new BatchWriteItemCommand({
        RequestItems: {
          [targetTable]: writeRequests,
        },
      }));
      
      totalItems += scanResult.Items.length;
      batchCount++;
      
      console.log(`Migrated batch ${batchCount}: ${scanResult.Items.length} items (total: ${totalItems})`);
    }
    
    lastEvaluatedKey = scanResult.LastEvaluatedKey;
    
    // Prevent Lambda timeout - break if close to limit
    if (process.env.AWS_EXECUTION_ENV && Date.now() > parseInt(process.env.LAMBDA_START_TIME || '0') + 840000) {
      console.log('Approaching Lambda timeout, stopping migration');
      break;
    }
    
  } while (lastEvaluatedKey);
  
  console.log(`Migration completed: ${totalItems} items migrated in ${batchCount} batches`);
}

async function verifyTableState(tableName: string) {
  const result = await client.send(new DescribeTableCommand({
    TableName: tableName,
  }));
  
  if (result.Table?.TableStatus !== 'ACTIVE') {
    throw new Error(`Table ${tableName} is not ACTIVE (status: ${result.Table?.TableStatus})`);
  }
}

The Environment Variable Hell#

During Week 6, we discovered our production API was silently failing authentication. Debug logs showed random "undefined" values in JWT validation. The culprit? Our CDK migration had converted Serverless Framework's ${env:SECRET_KEY} references to literal strings.

Impact: 12 hours of intermittent authentication failures affecting 8,000+ users before we caught it in monitoring.

Root cause: Environment variable mismanagement during migration. Serverless Framework's string interpolation is deceptively different from CDK's environment handling.

Production-Grade Environment Management#

After this incident, we built a type-safe environment system that prevents silent failures:

TypeScript
// lib/config/production-environment.ts
export interface ProductionEnvironmentVariables {
  // Core application config - NEVER undefined in production
  SERVICE_NAME: string;
  STAGE: string;
  REGION: string;
  VERSION: string;
  ENVIRONMENT: 'development' | 'staging' | 'production';
  
  // Feature flags with defaults
  ENABLE_CACHE: 'true' | 'false';
  ENABLE_DEBUG_LOGGING: 'true' | 'false';
  ENABLE_METRICS: 'true' | 'false';
  
  // Performance tuning
  CACHE_TTL_SECONDS: string;
  MAX_RETRY_ATTEMPTS: string;
  REQUEST_TIMEOUT_MS: string;
  
  // Database references (table names, never ARNs in env vars)
  USERS_TABLE: string;
  ORDERS_TABLE: string;
  AUDIT_LOG_TABLE: string;
  
  // Secret ARNs (actual secrets retrieved at runtime)
  JWT_SECRET_ARN: string;
  DATABASE_CREDENTIALS_ARN: string;
  THIRD_PARTY_API_KEYS_ARN: string;
  
  // External service configuration
  STRIPE_WEBHOOK_ENDPOINT: string;
  SENDGRID_FROM_EMAIL: string;
  
  // Monitoring and observability
  SENTRY_DSN?: string;
  DATADOG_API_KEY_ARN?: string;
  LOG_LEVEL: 'debug' | 'info' | 'warn' | 'error';
  
  // Business logic configuration
  MAX_FILE_UPLOAD_SIZE_MB: string;
  SESSION_TIMEOUT_MINUTES: string;
  RATE_LIMIT_PER_MINUTE: string;
}

export class ProductionEnvironmentBuilder {
  private vars: Partial<ProductionEnvironmentVariables> = {};
  private requiredVars: Set<keyof ProductionEnvironmentVariables> = new Set();
  
  constructor(private stage: string, private region: string, private version: string) {
    // Set core variables that are always required
    this.vars.STAGE = stage;
    this.vars.REGION = region;
    this.vars.VERSION = version;
    this.vars.ENVIRONMENT = this.mapStageToEnvironment(stage);
    
    // Mark core variables as required
    this.requiredVars.add('SERVICE_NAME');
    this.requiredVars.add('STAGE');
    this.requiredVars.add('REGION');
    this.requiredVars.add('VERSION');
  }
  
  private mapStageToEnvironment(stage: string): 'development' | 'staging' | 'production' {
    switch (stage) {
      case 'prod':
      case 'production':
        return 'production';
      case 'staging':
      case 'stage':
        return 'staging';
      default:
        return 'development';
    }
  }
  
  addServiceName(serviceName: string): this {
    this.vars.SERVICE_NAME = serviceName;
    return this;
  }
  
  addTable(key: keyof ProductionEnvironmentVariables, table: ITable): this {
    this.vars[key] = table.tableName;
    this.requiredVars.add(key);
    return this;
  }
  
  addSecret(key: keyof ProductionEnvironmentVariables, secret: ISecret): this {
    this.vars[key] = secret.secretArn;
    this.requiredVars.add(key);
    return this;
  }
  
  addFeatureFlag(key: keyof ProductionEnvironmentVariables, enabled: boolean): this {
    this.vars[key] = enabled ? 'true' : 'false' as any;
    return this;
  }
  
  addConfig(config: Partial<ProductionEnvironmentVariables>): this {
    Object.assign(this.vars, config);
    return this;
  }
  
  markRequired(key: keyof ProductionEnvironmentVariables): this {
    this.requiredVars.add(key);
    return this;
  }
  
  build(): Record<string, string> {
    // Validate all required variables are present
    const missing = Array.from(this.requiredVars).filter(key => 
      this.vars[key] === undefined || this.vars[key] === ''
    );
    
    if (missing.length > 0) {
      throw new Error(`Missing required environment variables: ${missing.join(', ')}`);
    }
    
    // Set stage-specific defaults
    const defaults = this.getStageDefaults();
    const merged = { ...defaults, ...this.vars };
    
    // Convert to string record, filtering undefined values
    return Object.entries(merged)
      .filter(([_, value]) => value !== undefined && value !== '')
      .reduce((acc, [key, value]) => ({
        ...acc,
        [key]: String(value),
      }), {});
  }
  
  private getStageDefaults(): Partial<ProductionEnvironmentVariables> {
    const isProd = this.vars.ENVIRONMENT === 'production';
    
    return {
      // Conservative defaults for production, aggressive for dev
      ENABLE_CACHE: isProd ? 'true' : 'false',
      ENABLE_DEBUG_LOGGING: isProd ? 'false' : 'true',
      ENABLE_METRICS: isProd ? 'true' : 'false',
      CACHE_TTL_SECONDS: isProd ? '300' : '60',
      MAX_RETRY_ATTEMPTS: isProd ? '3' : '1',
      REQUEST_TIMEOUT_MS: isProd ? '30000' : '10000',
      LOG_LEVEL: isProd ? 'info' : 'debug',
      MAX_FILE_UPLOAD_SIZE_MB: '10',
      SESSION_TIMEOUT_MINUTES: '60',
      RATE_LIMIT_PER_MINUTE: isProd ? '100' : '1000',
    };
  }
}

Using Environment Builder#

TypeScript
// lib/stacks/api-stack.ts
const envBuilder = new EnvironmentBuilder(config.stage, config.region)
  .addTable('USERS_TABLE', usersTable)
  .addTable('ORDERS_TABLE', ordersTable)
  .addConfig({
    SERVICE_NAME: 'my-service',
    ENABLE_CACHE: config.stage === 'prod' ? 'true' : 'false',
    CACHE_TTL: '300',
  });

// Add to Lambda function
const createUserFn = new ServerlessFunction(this, 'CreateUserFunction', {
  entry: 'src/handlers/users.ts',
  handler: 'create',
  config,
  environment: envBuilder.build(),
});

The $23K Stripe API Key Incident#

Week 8. Our staging environment was accidentally pointed to production Stripe keys through a misconfigured environment variable. We processed 47 test transactions totaling $23,247 before catching the error.

Impact: Manual refund process, awkward customer communications, and a CFO meeting about "better controls around financial integrations."

Root cause: Secrets stored as plaintext environment variables with no environment-specific validation.

Bulletproof Secrets Management#

After paying a $23K lesson in secrets management, here's our production-grade approach:

TypeScript
// lib/constructs/secure-function.ts
import { Secret, ISecret } from 'aws-cdk-lib/aws-secretsmanager';
import { PolicyStatement } from 'aws-cdk-lib/aws-iam';

export interface SecureFunctionProps extends ServerlessFunctionProps {
  secrets?: Record<string, ISecret>;
}

export class SecureFunction extends ServerlessFunction {
  constructor(scope: Construct, id: string, props: SecureFunctionProps) {
    const { secrets = {}, ...functionProps } = props;
    
    // Pass secret ARNs as environment variables
    const secretEnvVars = Object.entries(secrets).reduce(
      (acc, [key, secret]) => ({
        ...acc,
        [`${key}_SECRET_ARN`]: secret.secretArn,
      }),
      {}
    );
    
    super(scope, id, {
      ...functionProps,
      environment: {
        ...functionProps.environment,
        ...secretEnvVars,
      },
    });
    
    // Grant read permissions for all secrets
    Object.values(secrets).forEach(secret => {
      secret.grantRead(this);
    });
  }
}

// Usage
const apiKeySecret = new Secret(this, 'ApiKeySecret', {
  secretName: `/${config.stage}/my-service/api-keys`,
  generateSecretString: {
    secretStringTemplate: JSON.stringify({}),
    generateStringKey: 'sendgrid',
    excludeCharacters: ' %+~`#$&*()|[]{}:;<>?!\'/@"\\',
  },
});

const emailFunction = new SecureFunction(this, 'EmailFunction', {
  entry: 'src/handlers/email.ts',
  handler: 'send',
  config,
  secrets: {
    API_KEYS: apiKeySecret,
  },
});

Runtime Secret Access#

TypeScript
// src/libs/secrets.ts
import { 
  SecretsManagerClient, 
  GetSecretValueCommand 
} from '@aws-sdk/client-secrets-manager';

const client = new SecretsManagerClient({});
const cache = new Map<string, any>();

export async function getSecret<T = any>(
  secretArn: string,
  jsonKey?: string
): Promise<T> {
  const cacheKey = `${secretArn}:${jsonKey || 'full'}`;
  
  if (cache.has(cacheKey)) {
    return cache.get(cacheKey);
  }
  
  try {
    const response = await client.send(
      new GetSecretValueCommand({ SecretId: secretArn })
    );
    
    const secret = JSON.parse(response.SecretString || '{}');
    const value = jsonKey ? secret[jsonKey] : secret;
    
    cache.set(cacheKey, value);
    return value;
  } catch (error) {
    console.error('Failed to retrieve secret:', error);
    throw new Error('Secret retrieval failed');
  }
}

// Usage in handler
export const handler = async (event: APIGatewayProxyEventV2) => {
  const secretArn = process.env.API_KEYS_SECRET_ARN;
  const sendgridKey = await getSecret<string>(secretArn!, 'sendgrid');
  
  // Use the secret
  await sendEmail(sendgridKey, event.body);
};

Parameter Store Integration#

For non-sensitive configuration:

TypeScript
// lib/constructs/parameter-store.ts
import { StringParameter, IParameter } from 'aws-cdk-lib/aws-ssm';

export class ServiceParameters extends Construct {
  public readonly configs: Map<string, IParameter> = new Map();
  
  constructor(scope: Construct, id: string, props: {
    service: string;
    stage: string;
    parameters: Record<string, string>;
  }) {
    super(scope, id);
    
    // Create parameters
    Object.entries(props.parameters).forEach(([key, value]) => {
      const param = new StringParameter(this, key, {
        parameterName: `/${props.service}/${props.stage}/${key}`,
        stringValue: value,
        description: `${key} for ${props.service} ${props.stage}`,
      });
      
      this.configs.set(key, param);
    });
  }
  
  grantRead(grantable: IGrantable) {
    this.configs.forEach(param => {
      param.grantRead(grantable);
    });
  }
  
  toEnvironment(): Record<string, string> {
    const env: Record<string, string> = {};
    this.configs.forEach((param, key) => {
      env[`${key}_PARAM`] = param.parameterName;
    });
    return env;
  }
}

VPC Configuration for RDS/ElastiCache#

Creating VPC-Enabled Lambda Functions#

TypeScript
// lib/constructs/vpc-config.ts
import { Vpc, SubnetType, SecurityGroup, Port } from 'aws-cdk-lib/aws-ec2';
import { DatabaseInstance, DatabaseInstanceEngine } from 'aws-cdk-lib/aws-rds';

export class VpcResources extends Construct {
  public readonly vpc: Vpc;
  public readonly lambdaSecurityGroup: SecurityGroup;
  public readonly databaseSecurityGroup: SecurityGroup;
  public readonly database?: DatabaseInstance;
  
  constructor(scope: Construct, id: string, props: {
    stage: string;
    enableDatabase?: boolean;
  }) {
    super(scope, id);
    
    // Create VPC
    this.vpc = new Vpc(this, 'Vpc', {
      vpcName: `my-service-${props.stage}`,
      maxAzs: 2,
      natGateways: props.stage === 'prod' ? 2 : 1,
      subnetConfiguration: [
        {
          name: 'Public',
          subnetType: SubnetType.PUBLIC,
          cidrMask: 24,
        },
        {
          name: 'Private',
          subnetType: SubnetType.PRIVATE_WITH_EGRESS,
          cidrMask: 24,
        },
        {
          name: 'Isolated',
          subnetType: SubnetType.PRIVATE_ISOLATED,
          cidrMask: 24,
        },
      ],
    });
    
    // Security groups
    this.lambdaSecurityGroup = new SecurityGroup(this, 'LambdaSG', {
      vpc: this.vpc,
      description: 'Security group for Lambda functions',
      allowAllOutbound: true,
    });
    
    this.databaseSecurityGroup = new SecurityGroup(this, 'DatabaseSG', {
      vpc: this.vpc,
      description: 'Security group for RDS database',
      allowAllOutbound: false,
    });
    
    // Allow Lambda to connect to database
    this.databaseSecurityGroup.addIngressRule(
      this.lambdaSecurityGroup,
      Port.tcp(5432),
      'Allow Lambda functions'
    );
    
    if (props.enableDatabase) {
      this.createDatabase(props.stage);
    }
  }
  
  private createDatabase(stage: string) {
    this.database = new DatabaseInstance(this, 'Database', {
      databaseName: 'myservice',
      engine: DatabaseInstanceEngine.postgres({
        version: PostgresEngineVersion.VER_15_2,
      }),
      vpc: this.vpc,
      vpcSubnets: {
        subnetType: SubnetType.PRIVATE_ISOLATED,
      },
      securityGroups: [this.databaseSecurityGroup],
      allocatedStorage: stage === 'prod' ? 100 : 20,
      instanceType: InstanceType.of(
        InstanceClass.T3,
        stage === 'prod' ? InstanceSize.MEDIUM : InstanceSize.MICRO
      ),
      multiAz: stage === 'prod',
      deletionProtection: stage === 'prod',
      backupRetention: Duration.days(stage === 'prod' ? 30 : 7),
    });
  }
}

VPC-Enabled Lambda Function#

TypeScript
// lib/constructs/vpc-lambda.ts
export class VpcLambdaFunction extends ServerlessFunction {
  constructor(scope: Construct, id: string, props: ServerlessFunctionProps & {
    vpcResources: VpcResources;
    databaseSecret?: ISecret;
  }) {
    const { vpcResources, databaseSecret, ...functionProps } = props;
    
    super(scope, id, {
      ...functionProps,
      vpc: vpcResources.vpc,
      vpcSubnets: {
        subnetType: SubnetType.PRIVATE_WITH_EGRESS,
      },
      securityGroups: [vpcResources.lambdaSecurityGroup],
      environment: {
        ...functionProps.environment,
        ...(databaseSecret && {
          DB_SECRET_ARN: databaseSecret.secretArn,
        }),
      },
    });
    
    // Grant database access
    if (databaseSecret) {
      databaseSecret.grantRead(this);
    }
  }
}

Database Connection Management#

TypeScript
// src/libs/database.ts
import { Client } from 'pg';
import { getSecret } from './secrets';

let client: Client | null = null;

export async function getDbClient(): Promise<Client> {
  if (client && !client.ended) {
    return client;
  }
  
  const secretArn = process.env.DB_SECRET_ARN;
  if (!secretArn) {
    throw new Error('Database secret not configured');
  }
  
  const credentials = await getSecret<{
    username: string;
    password: string;
    host: string;
    port: number;
    dbname: string;
  }>(secretArn);
  
  client = new Client({
    user: credentials.username,
    password: credentials.password,
    host: credentials.host,
    port: credentials.port,
    database: credentials.dbname,
    ssl: {
      rejectUnauthorized: false,
    },
    connectionTimeoutMillis: 10000,
  });
  
  await client.connect();
  return client;
}

// Clean up on Lambda container shutdown
process.on('SIGTERM', async () => {
  if (client && !client.ended) {
    await client.end();
  }
});

Backup and Disaster Recovery#

Automated DynamoDB Backups#

TypeScript
// lib/constructs/backup-plan.ts
import { BackupPlan, BackupResource } from 'aws-cdk-lib/aws-backup';
import { Schedule } from 'aws-cdk-lib/aws-events';

export class TableBackupPlan extends Construct {
  constructor(scope: Construct, id: string, props: {
    tables: ITable[];
    stage: string;
  }) {
    super(scope, id);
    
    const plan = new BackupPlan(this, 'BackupPlan', {
      backupPlanName: `my-service-${props.stage}-backup`,
      backupPlanRules: [
        {
          ruleName: 'DailyBackups',
          scheduleExpression: Schedule.cron({
            hour: '3',
            minute: '0',
          }),
          startWindow: Duration.hours(1),
          completionWindow: Duration.hours(2),
          deleteAfter: Duration.days(
            props.stage === 'prod' ? 30 : 7
          ),
        },
      ],
    });
    
    plan.addSelection('TableSelection', {
      resources: props.tables.map(table => 
        BackupResource.fromDynamoDbTable(table)
      ),
    });
  }
}

Migration Best Practices#

1. Stateful Resource Strategy#

TypeScript
// lib/stacks/stateful-stack.ts
export class StatefulStack extends Stack {
  constructor(scope: Construct, id: string, props: StackProps) {
    super(scope, id, {
      ...props,
      // Prevent accidental deletion
      terminationProtection: true,
    });
    
    // All stateful resources in one stack
    const tables = this.createTables();
    const secrets = this.createSecrets();
    const parameters = this.createParameters();
    
    // Export for use in other stacks
    tables.forEach((table, name) => {
      new CfnOutput(this, `${name}TableName`, {
        value: table.tableName,
        exportName: `${this.stackName}-${name}TableName`,
      });
    });
  }
}

2. Zero-Downtime Migration Checklist#

  • Import existing tables using fromTableAttributes
  • Test permissions with imported resources
  • Implement dual-write pattern if changing table schema
  • Use Lambda environment variables for gradual rollout
  • Set up CloudWatch alarms before switching
  • Implement circuit breakers for external services
  • Test rollback procedures

Hard-Learned Lessons from Production#

After 6 months managing stateful infrastructure in CDK, here are the lessons that saved us from disasters:

1. Always Test Imports in Staging First#

Cost of learning: 4 hours of downtime, emergency recovery, uncomfortable meetings. Prevention: Never run cdk deploy against production tables without rehearsing in staging with identical data.

2. Environment Variables Are Not Configuration#

Cost of learning: 12 hours of authentication failures affecting 8K+ users. Prevention: Type-safe environment builders with validation and required field checks.

3. Secrets Need Environment-Specific Validation#

Cost of learning: $23K in accidental production charges. Prevention: Environment-aware secret validation that prevents cross-environment contamination.

4. Data Migration Needs Monitoring#

Cost of learning: 3 migration attempts, 180K records at risk. Prevention: Comprehensive logging, progress tracking, and timeout handling in migration functions.

5. VPC Lambda Functions Are Different#

Cost of learning: 15-minute cold starts, connection pool exhaustion. Prevention: Proper connection management, security group configuration, and subnet planning.

Migration Results#

Before CDK:

  • Manual environment management
  • Plaintext secrets in YAML
  • No table import validation
  • Migration scripts run locally
  • Zero disaster recovery testing

After CDK:

  • Type-safe environment configuration with validation
  • Encrypted secrets with environment isolation
  • Production-safe table imports with verification
  • Automated, monitored data migrations
  • Comprehensive backup and monitoring

Metrics:

  • 23 Lambda functions migrated
  • 3 production DynamoDB tables (180K+ records)
  • 47 environment variables type-safe
  • 12 secrets properly encrypted
  • Zero data loss (eventually)

What's Next#

Your data layer is battle-tested with proper environment management and security. Stateful resources are protected, secrets are encrypted, and disaster recovery is automated.

In Part 5, we'll implement authentication and authorization:

  • Cognito user pools with production constraints
  • API Gateway authorizers that actually work
  • IAM roles that follow least privilege
  • JWT token validation that doesn't break
  • Fine-grained permissions without complexity explosion

The foundation survived production. Let's secure it properly.

Loading...

Comments (0)

Join the conversation

Sign in to share your thoughts and engage with the community

No comments yet

Be the first to share your thoughts on this post!

Related Posts