Migrating from Serverless Framework to AWS CDK: Part 4 - Database and Environment Management
Master DynamoDB migrations, environment variable management, secrets handling, and VPC configurations when moving from Serverless Framework to AWS CDK.
Week 5 of our CDK migration. We'd successfully moved 23 Lambda functions, but the real challenge began when our Lead DevOps Engineer announced she was leaving. "Good luck with the database migration," she said, handing over a sticky note with three DynamoDB table names that powered our $2.8M ARR SaaS platform.
That sticky note represented 4 years of customer data, 180K users, and zero documented backup procedures. This is the story of migrating stateful infrastructure without losing a single record - and the painful lessons learned about data dependencies in production systems.
Series Navigation:
- Part 1: Why Make the Switch?
- Part 2: Setting Up Your CDK Environment
- Part 3: Migrating Lambda Functions and API Gateway
- Part 4: Database and Environment Management (this post)
- Part 5: Authentication, Authorization, and IAM
- Part 6: Migration Strategies and Best Practices
The $47K Data Migration Disaster (Almost)#
Before diving into the technical patterns, let me share what happened when we tried to "just import" our production tables.
The Friday Afternoon Table Import#
March 15th, 3:47 PM. I confidently ran cdk deploy
to import our main user table into CDK management. The deployment succeeded. Our monitoring stayed green. Everything looked perfect.
Monday morning, 6:23 AM. Slack notifications exploded. Our user registration API was throwing 500 errors. The table was gone. Not unreachable - completely deleted.
Root cause: CDK tried to "import" the table but interpreted the existing Serverless Framework CloudFormation template as conflicting. It deleted the old resource before creating the new one. Zero-downtime became zero-data.
Impact: 4 hours of downtime, emergency point-in-time recovery, and a very uncomfortable all-hands meeting about "following the deployment checklist."
Lesson: Never trust imports in production without explicit retention policies and rehearsal in staging.
DynamoDB Migration Strategies That Actually Work#
Safe Table Import Pattern#
Here's the battle-tested approach we use now for production migrations:
# serverless.yml - Original table definition
resources:
Resources:
UsersTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: ${self:service}-${opt:stage}-users
AttributeDefinitions:
- AttributeName: userId
AttributeType: S
- AttributeName: email
AttributeType: S
KeySchema:
- AttributeName: userId
KeyType: HASH
GlobalSecondaryIndexes:
- IndexName: email-index
KeySchema:
- AttributeName: email
KeyType: HASH
Projection:
ProjectionType: ALL
BillingMode: PAY_PER_REQUEST
CDK approach for existing tables:
// lib/constructs/production-table-import.ts
import { Table, ITable } from 'aws-cdk-lib/aws-dynamodb';
import { Construct } from 'constructs';
import { CustomResource, Duration } from 'aws-cdk-lib';
export interface ProductionTableImportProps {
tableName: string;
region?: string;
account?: string;
// Critical: Verify table exists before importing
requireExistingTable: boolean;
}
export class ProductionTableImport extends Construct {
public readonly table: ITable;
constructor(scope: Construct, id: string, props: ProductionTableImportProps) {
super(scope, id);
if (props.requireExistingTable) {
// First, verify the table actually exists
const verifyFn = new NodejsFunction(this, 'VerifyTableExists', {
entry: 'src/migrations/verify-table.ts',
handler: 'handler',
timeout: Duration.seconds(30),
environment: {
TABLE_NAME: props.tableName,
},
});
// Custom resource that fails deployment if table doesn't exist
new CustomResource(this, 'TableVerification', {
serviceToken: verifyFn.functionArn,
properties: {
TableName: props.tableName,
},
});
}
// Only import after verification succeeds
this.table = Table.fromTableAttributes(this, 'ImportedTable', {
tableName: props.tableName,
region: props.region,
account: props.account,
// CRITICAL: This prevents CDK from trying to manage the table lifecycle
tableStreamArn: undefined,
});
}
}
// src/migrations/verify-table.ts - Prevents accidental table deletion
import { DynamoDBClient, DescribeTableCommand } from '@aws-sdk/client-dynamodb';
const client = new DynamoDBClient({});
export const handler = async (event: any) => {
const tableName = event.ResourceProperties.TableName;
try {
// Verify table exists and is ACTIVE
const result = await client.send(new DescribeTableCommand({
TableName: tableName,
}));
if (result.Table?.TableStatus !== 'ACTIVE') {
throw new Error(`Table ${tableName} is not ACTIVE (status: ${result.Table?.TableStatus})`);
}
// Log critical table info for audit trail
console.log('Production table verified:', {
tableName,
itemCount: result.Table.ItemCount || 'unknown',
sizeBytes: result.Table.TableSizeBytes || 'unknown',
status: result.Table.TableStatus,
});
return { PhysicalResourceId: `verified-${tableName}` };
} catch (error) {
console.error('Table verification failed:', error);
throw error; // Fail the CloudFormation deployment
}
};
// Usage with production safety checks
const usersTable = new ProductionTableImport(this, 'UsersTable', {
tableName: `my-service-${config.stage}-users`,
requireExistingTable: config.stage === 'prod', // Only verify in production
}).table;
// Grant permissions as normal
usersTable.grantReadWriteData(createUserFn);
The Production-Grade Table Pattern#
After managing 180K users and 12M transactions, here's our bulletproof table creation pattern:
// lib/constructs/production-user-table.ts
import {
Table,
AttributeType,
BillingMode,
TableEncryption,
StreamViewType,
ProjectionType
} from 'aws-cdk-lib/aws-dynamodb';
import { RemovalPolicy, Tags } from 'aws-cdk-lib';
import { Alarm, Metric, TreatMissingData } from 'aws-cdk-lib/aws-cloudwatch';
export class ProductionUserTable extends Table {
constructor(scope: Construct, id: string, props: {
stage: string;
enableStreams?: boolean;
enableBackup?: boolean;
}) {
super(scope, id, {
// Versioned table names for blue-green deployments
tableName: `my-service-${props.stage}-users-v3`,
partitionKey: {
name: 'userId',
type: AttributeType.STRING,
},
sortKey: {
name: 'recordType', // Enables single-table design patterns
type: AttributeType.STRING,
},
billingMode: BillingMode.PAY_PER_REQUEST, // No provisioning guesswork
encryption: TableEncryption.AWS_MANAGED,
// ALWAYS enable point-in-time recovery in production
pointInTimeRecovery: props.stage === 'prod' ? true : false,
// NEVER accidentally delete production data
removalPolicy: props.stage === 'prod' ? RemovalPolicy.RETAIN : RemovalPolicy.DESTROY,
// Streams enable real-time processing and audit trails
stream: props.enableStreams ? StreamViewType.NEW_AND_OLD_IMAGES : undefined,
});
// GSI for email-based lookups (learned this is critical for auth)
this.addGlobalSecondaryIndex({
indexName: 'EmailLookupIndex',
partitionKey: {
name: 'email',
type: AttributeType.STRING,
},
sortKey: {
name: 'recordType',
type: AttributeType.STRING,
},
projectionType: ProjectionType.KEYS_ONLY, // Minimize costs
});
// GSI for time-based queries (user activity, reporting)
this.addGlobalSecondaryIndex({
indexName: 'TimeSeriesIndex',
partitionKey: {
name: 'entityType',
type: AttributeType.STRING,
},
sortKey: {
name: 'timestamp',
type: AttributeType.STRING,
},
projectionType: ProjectionType.KEYS_ONLY,
});
// Production monitoring (learned the hard way)
this.createProductionAlarms(props.stage);
// Cost tracking tags
Tags.of(this).add('Service', 'my-service');
Tags.of(this).add('Stage', props.stage);
Tags.of(this).add('CostCenter', 'platform');
Tags.of(this).add('DataClassification', 'sensitive');
}
private createProductionAlarms(stage: string) {
if (stage !== 'prod') return;
// Throttle alarm - any throttling is bad
new Alarm(this, 'ThrottleAlarm', {
metric: new Metric({
namespace: 'AWS/DynamoDB',
metricName: 'UserErrorEvents',
dimensionsMap: {
TableName: this.tableName,
},
statistic: 'Sum',
period: Duration.minutes(5),
}),
threshold: 1,
evaluationPeriods: 1,
treatMissingData: TreatMissingData.NOT_BREACHING,
alarmDescription: 'DynamoDB table is experiencing throttling',
});
// Error rate alarm
new Alarm(this, 'ErrorRateAlarm', {
metric: new Metric({
namespace: 'AWS/DynamoDB',
metricName: 'SystemErrorEvents',
dimensionsMap: {
TableName: this.tableName,
},
statistic: 'Sum',
period: Duration.minutes(5),
}),
threshold: 5,
evaluationPeriods: 2,
alarmDescription: 'DynamoDB table experiencing system errors',
});
}
}
The Zero-Downtime Data Migration#
Moving 180K user records and 4 years of transaction history without service interruption required a bulletproof migration strategy. Here's the pattern that worked:
// lib/constructs/production-table-migrator.ts
import { CustomResource, Duration } from 'aws-cdk-lib';
import { Provider } from 'aws-cdk-lib/custom-resources';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';
import { RetentionDays } from 'aws-cdk-lib/aws-logs';
export class ProductionTableMigrator extends Construct {
constructor(scope: Construct, id: string, props: {
sourceTable: ITable;
targetTable: ITable;
batchSize?: number;
enableDualWrite?: boolean;
}) {
super(scope, id);
// Migration function with production settings
const migrationFn = new NodejsFunction(this, 'MigrationFunction', {
entry: 'src/migrations/production-table-migrator.ts',
handler: 'handler',
timeout: Duration.minutes(15),
memorySize: 3008, // Max memory for fastest processing
reservedConcurrentExecutions: 5, // Limit impact on other functions
logRetention: RetentionDays.ONE_MONTH, // Keep migration logs
environment: {
SOURCE_TABLE: props.sourceTable.tableName,
TARGET_TABLE: props.targetTable.tableName,
BATCH_SIZE: String(props.batchSize || 25), // DynamoDB batch limit
ENABLE_DUAL_WRITE: String(props.enableDualWrite || false),
// Migration tracking
MIGRATION_ID: `migration-${Date.now()}`,
},
});
// Comprehensive permissions for migration
props.sourceTable.grantFullAccess(migrationFn); // Need scan/read
props.targetTable.grantFullAccess(migrationFn); // Need write/verify
// Create custom resource with proper error handling
const provider = new Provider(this, 'Provider', {
onEventHandler: migrationFn,
logRetention: RetentionDays.ONE_MONTH,
});
new CustomResource(this, 'DataMigration', {
serviceToken: provider.serviceToken,
properties: {
SourceTable: props.sourceTable.tableName,
TargetTable: props.targetTable.tableName,
MigrationId: `migration-${Date.now()}`,
// Force update only when tables change
TableFingerprint: this.generateTableFingerprint(props),
},
});
}
private generateTableFingerprint(props: {
sourceTable: ITable;
targetTable: ITable;
}): string {
// Create unique fingerprint based on table properties
return Buffer.from(
`${props.sourceTable.tableName}-${props.targetTable.tableName}`
).toString('base64');
}
}
// src/migrations/production-table-migrator.ts
import {
DynamoDBClient,
ScanCommand,
BatchWriteItemCommand,
DescribeTableCommand,
} from '@aws-sdk/client-dynamodb';
import { marshall, unmarshall } from '@aws-sdk/util-dynamodb';
const client = new DynamoDBClient({
maxAttempts: 5, // Retry failed requests
requestHandler: {
connectionTimeout: 2000,
requestTimeout: 30000,
},
});
export const handler = async (event: any) => {
const { RequestType, ResourceProperties } = event;
const { SourceTable, TargetTable, MigrationId } = ResourceProperties;
console.log('Migration event:', { RequestType, SourceTable, TargetTable, MigrationId });
try {
if (RequestType === 'Create' || RequestType === 'Update') {
await migrateTableData(SourceTable, TargetTable, MigrationId);
}
return {
PhysicalResourceId: `migration-${SourceTable}-to-${TargetTable}`,
Data: {
Status: 'Success',
MigrationId,
},
};
} catch (error) {
console.error('Migration failed:', error);
throw error; // Fail CloudFormation deployment
}
};
async function migrateTableData(sourceTable: string, targetTable: string, migrationId: string) {
console.log(`Starting migration: ${sourceTable} -> ${targetTable}`);
// First, verify both tables exist and are active
await verifyTableState(sourceTable);
await verifyTableState(targetTable);
let lastEvaluatedKey: any = undefined;
let totalItems = 0;
let batchCount = 0;
const batchSize = parseInt(process.env.BATCH_SIZE || '25');
do {
// Scan source table
const scanResult = await client.send(new ScanCommand({
TableName: sourceTable,
Limit: batchSize,
ExclusiveStartKey: lastEvaluatedKey,
}));
if (scanResult.Items && scanResult.Items.length > 0) {
// Prepare batch write to target table
const writeRequests = scanResult.Items.map(item => ({
PutRequest: { Item: item },
}));
// Write batch to target table
await client.send(new BatchWriteItemCommand({
RequestItems: {
[targetTable]: writeRequests,
},
}));
totalItems += scanResult.Items.length;
batchCount++;
console.log(`Migrated batch ${batchCount}: ${scanResult.Items.length} items (total: ${totalItems})`);
}
lastEvaluatedKey = scanResult.LastEvaluatedKey;
// Prevent Lambda timeout - break if close to limit
if (process.env.AWS_EXECUTION_ENV && Date.now() > parseInt(process.env.LAMBDA_START_TIME || '0') + 840000) {
console.log('Approaching Lambda timeout, stopping migration');
break;
}
} while (lastEvaluatedKey);
console.log(`Migration completed: ${totalItems} items migrated in ${batchCount} batches`);
}
async function verifyTableState(tableName: string) {
const result = await client.send(new DescribeTableCommand({
TableName: tableName,
}));
if (result.Table?.TableStatus !== 'ACTIVE') {
throw new Error(`Table ${tableName} is not ACTIVE (status: ${result.Table?.TableStatus})`);
}
}
The Environment Variable Hell#
During Week 6, we discovered our production API was silently failing authentication. Debug logs showed random "undefined" values in JWT validation. The culprit? Our CDK migration had converted Serverless Framework's ${env:SECRET_KEY}
references to literal strings.
Impact: 12 hours of intermittent authentication failures affecting 8,000+ users before we caught it in monitoring.
Root cause: Environment variable mismanagement during migration. Serverless Framework's string interpolation is deceptively different from CDK's environment handling.
Production-Grade Environment Management#
After this incident, we built a type-safe environment system that prevents silent failures:
// lib/config/production-environment.ts
export interface ProductionEnvironmentVariables {
// Core application config - NEVER undefined in production
SERVICE_NAME: string;
STAGE: string;
REGION: string;
VERSION: string;
ENVIRONMENT: 'development' | 'staging' | 'production';
// Feature flags with defaults
ENABLE_CACHE: 'true' | 'false';
ENABLE_DEBUG_LOGGING: 'true' | 'false';
ENABLE_METRICS: 'true' | 'false';
// Performance tuning
CACHE_TTL_SECONDS: string;
MAX_RETRY_ATTEMPTS: string;
REQUEST_TIMEOUT_MS: string;
// Database references (table names, never ARNs in env vars)
USERS_TABLE: string;
ORDERS_TABLE: string;
AUDIT_LOG_TABLE: string;
// Secret ARNs (actual secrets retrieved at runtime)
JWT_SECRET_ARN: string;
DATABASE_CREDENTIALS_ARN: string;
THIRD_PARTY_API_KEYS_ARN: string;
// External service configuration
STRIPE_WEBHOOK_ENDPOINT: string;
SENDGRID_FROM_EMAIL: string;
// Monitoring and observability
SENTRY_DSN?: string;
DATADOG_API_KEY_ARN?: string;
LOG_LEVEL: 'debug' | 'info' | 'warn' | 'error';
// Business logic configuration
MAX_FILE_UPLOAD_SIZE_MB: string;
SESSION_TIMEOUT_MINUTES: string;
RATE_LIMIT_PER_MINUTE: string;
}
export class ProductionEnvironmentBuilder {
private vars: Partial<ProductionEnvironmentVariables> = {};
private requiredVars: Set<keyof ProductionEnvironmentVariables> = new Set();
constructor(private stage: string, private region: string, private version: string) {
// Set core variables that are always required
this.vars.STAGE = stage;
this.vars.REGION = region;
this.vars.VERSION = version;
this.vars.ENVIRONMENT = this.mapStageToEnvironment(stage);
// Mark core variables as required
this.requiredVars.add('SERVICE_NAME');
this.requiredVars.add('STAGE');
this.requiredVars.add('REGION');
this.requiredVars.add('VERSION');
}
private mapStageToEnvironment(stage: string): 'development' | 'staging' | 'production' {
switch (stage) {
case 'prod':
case 'production':
return 'production';
case 'staging':
case 'stage':
return 'staging';
default:
return 'development';
}
}
addServiceName(serviceName: string): this {
this.vars.SERVICE_NAME = serviceName;
return this;
}
addTable(key: keyof ProductionEnvironmentVariables, table: ITable): this {
this.vars[key] = table.tableName;
this.requiredVars.add(key);
return this;
}
addSecret(key: keyof ProductionEnvironmentVariables, secret: ISecret): this {
this.vars[key] = secret.secretArn;
this.requiredVars.add(key);
return this;
}
addFeatureFlag(key: keyof ProductionEnvironmentVariables, enabled: boolean): this {
this.vars[key] = enabled ? 'true' : 'false' as any;
return this;
}
addConfig(config: Partial<ProductionEnvironmentVariables>): this {
Object.assign(this.vars, config);
return this;
}
markRequired(key: keyof ProductionEnvironmentVariables): this {
this.requiredVars.add(key);
return this;
}
build(): Record<string, string> {
// Validate all required variables are present
const missing = Array.from(this.requiredVars).filter(key =>
this.vars[key] === undefined || this.vars[key] === ''
);
if (missing.length > 0) {
throw new Error(`Missing required environment variables: ${missing.join(', ')}`);
}
// Set stage-specific defaults
const defaults = this.getStageDefaults();
const merged = { ...defaults, ...this.vars };
// Convert to string record, filtering undefined values
return Object.entries(merged)
.filter(([_, value]) => value !== undefined && value !== '')
.reduce((acc, [key, value]) => ({
...acc,
[key]: String(value),
}), {});
}
private getStageDefaults(): Partial<ProductionEnvironmentVariables> {
const isProd = this.vars.ENVIRONMENT === 'production';
return {
// Conservative defaults for production, aggressive for dev
ENABLE_CACHE: isProd ? 'true' : 'false',
ENABLE_DEBUG_LOGGING: isProd ? 'false' : 'true',
ENABLE_METRICS: isProd ? 'true' : 'false',
CACHE_TTL_SECONDS: isProd ? '300' : '60',
MAX_RETRY_ATTEMPTS: isProd ? '3' : '1',
REQUEST_TIMEOUT_MS: isProd ? '30000' : '10000',
LOG_LEVEL: isProd ? 'info' : 'debug',
MAX_FILE_UPLOAD_SIZE_MB: '10',
SESSION_TIMEOUT_MINUTES: '60',
RATE_LIMIT_PER_MINUTE: isProd ? '100' : '1000',
};
}
}
Using Environment Builder#
// lib/stacks/api-stack.ts
const envBuilder = new EnvironmentBuilder(config.stage, config.region)
.addTable('USERS_TABLE', usersTable)
.addTable('ORDERS_TABLE', ordersTable)
.addConfig({
SERVICE_NAME: 'my-service',
ENABLE_CACHE: config.stage === 'prod' ? 'true' : 'false',
CACHE_TTL: '300',
});
// Add to Lambda function
const createUserFn = new ServerlessFunction(this, 'CreateUserFunction', {
entry: 'src/handlers/users.ts',
handler: 'create',
config,
environment: envBuilder.build(),
});
The $23K Stripe API Key Incident#
Week 8. Our staging environment was accidentally pointed to production Stripe keys through a misconfigured environment variable. We processed 47 test transactions totaling $23,247 before catching the error.
Impact: Manual refund process, awkward customer communications, and a CFO meeting about "better controls around financial integrations."
Root cause: Secrets stored as plaintext environment variables with no environment-specific validation.
Bulletproof Secrets Management#
After paying a $23K lesson in secrets management, here's our production-grade approach:
// lib/constructs/secure-function.ts
import { Secret, ISecret } from 'aws-cdk-lib/aws-secretsmanager';
import { PolicyStatement } from 'aws-cdk-lib/aws-iam';
export interface SecureFunctionProps extends ServerlessFunctionProps {
secrets?: Record<string, ISecret>;
}
export class SecureFunction extends ServerlessFunction {
constructor(scope: Construct, id: string, props: SecureFunctionProps) {
const { secrets = {}, ...functionProps } = props;
// Pass secret ARNs as environment variables
const secretEnvVars = Object.entries(secrets).reduce(
(acc, [key, secret]) => ({
...acc,
[`${key}_SECRET_ARN`]: secret.secretArn,
}),
{}
);
super(scope, id, {
...functionProps,
environment: {
...functionProps.environment,
...secretEnvVars,
},
});
// Grant read permissions for all secrets
Object.values(secrets).forEach(secret => {
secret.grantRead(this);
});
}
}
// Usage
const apiKeySecret = new Secret(this, 'ApiKeySecret', {
secretName: `/${config.stage}/my-service/api-keys`,
generateSecretString: {
secretStringTemplate: JSON.stringify({}),
generateStringKey: 'sendgrid',
excludeCharacters: ' %+~`#$&*()|[]{}:;<>?!\'/@"\\',
},
});
const emailFunction = new SecureFunction(this, 'EmailFunction', {
entry: 'src/handlers/email.ts',
handler: 'send',
config,
secrets: {
API_KEYS: apiKeySecret,
},
});
Runtime Secret Access#
// src/libs/secrets.ts
import {
SecretsManagerClient,
GetSecretValueCommand
} from '@aws-sdk/client-secrets-manager';
const client = new SecretsManagerClient({});
const cache = new Map<string, any>();
export async function getSecret<T = any>(
secretArn: string,
jsonKey?: string
): Promise<T> {
const cacheKey = `${secretArn}:${jsonKey || 'full'}`;
if (cache.has(cacheKey)) {
return cache.get(cacheKey);
}
try {
const response = await client.send(
new GetSecretValueCommand({ SecretId: secretArn })
);
const secret = JSON.parse(response.SecretString || '{}');
const value = jsonKey ? secret[jsonKey] : secret;
cache.set(cacheKey, value);
return value;
} catch (error) {
console.error('Failed to retrieve secret:', error);
throw new Error('Secret retrieval failed');
}
}
// Usage in handler
export const handler = async (event: APIGatewayProxyEventV2) => {
const secretArn = process.env.API_KEYS_SECRET_ARN;
const sendgridKey = await getSecret<string>(secretArn!, 'sendgrid');
// Use the secret
await sendEmail(sendgridKey, event.body);
};
Parameter Store Integration#
For non-sensitive configuration:
// lib/constructs/parameter-store.ts
import { StringParameter, IParameter } from 'aws-cdk-lib/aws-ssm';
export class ServiceParameters extends Construct {
public readonly configs: Map<string, IParameter> = new Map();
constructor(scope: Construct, id: string, props: {
service: string;
stage: string;
parameters: Record<string, string>;
}) {
super(scope, id);
// Create parameters
Object.entries(props.parameters).forEach(([key, value]) => {
const param = new StringParameter(this, key, {
parameterName: `/${props.service}/${props.stage}/${key}`,
stringValue: value,
description: `${key} for ${props.service} ${props.stage}`,
});
this.configs.set(key, param);
});
}
grantRead(grantable: IGrantable) {
this.configs.forEach(param => {
param.grantRead(grantable);
});
}
toEnvironment(): Record<string, string> {
const env: Record<string, string> = {};
this.configs.forEach((param, key) => {
env[`${key}_PARAM`] = param.parameterName;
});
return env;
}
}
VPC Configuration for RDS/ElastiCache#
Creating VPC-Enabled Lambda Functions#
// lib/constructs/vpc-config.ts
import { Vpc, SubnetType, SecurityGroup, Port } from 'aws-cdk-lib/aws-ec2';
import { DatabaseInstance, DatabaseInstanceEngine } from 'aws-cdk-lib/aws-rds';
export class VpcResources extends Construct {
public readonly vpc: Vpc;
public readonly lambdaSecurityGroup: SecurityGroup;
public readonly databaseSecurityGroup: SecurityGroup;
public readonly database?: DatabaseInstance;
constructor(scope: Construct, id: string, props: {
stage: string;
enableDatabase?: boolean;
}) {
super(scope, id);
// Create VPC
this.vpc = new Vpc(this, 'Vpc', {
vpcName: `my-service-${props.stage}`,
maxAzs: 2,
natGateways: props.stage === 'prod' ? 2 : 1,
subnetConfiguration: [
{
name: 'Public',
subnetType: SubnetType.PUBLIC,
cidrMask: 24,
},
{
name: 'Private',
subnetType: SubnetType.PRIVATE_WITH_EGRESS,
cidrMask: 24,
},
{
name: 'Isolated',
subnetType: SubnetType.PRIVATE_ISOLATED,
cidrMask: 24,
},
],
});
// Security groups
this.lambdaSecurityGroup = new SecurityGroup(this, 'LambdaSG', {
vpc: this.vpc,
description: 'Security group for Lambda functions',
allowAllOutbound: true,
});
this.databaseSecurityGroup = new SecurityGroup(this, 'DatabaseSG', {
vpc: this.vpc,
description: 'Security group for RDS database',
allowAllOutbound: false,
});
// Allow Lambda to connect to database
this.databaseSecurityGroup.addIngressRule(
this.lambdaSecurityGroup,
Port.tcp(5432),
'Allow Lambda functions'
);
if (props.enableDatabase) {
this.createDatabase(props.stage);
}
}
private createDatabase(stage: string) {
this.database = new DatabaseInstance(this, 'Database', {
databaseName: 'myservice',
engine: DatabaseInstanceEngine.postgres({
version: PostgresEngineVersion.VER_15_2,
}),
vpc: this.vpc,
vpcSubnets: {
subnetType: SubnetType.PRIVATE_ISOLATED,
},
securityGroups: [this.databaseSecurityGroup],
allocatedStorage: stage === 'prod' ? 100 : 20,
instanceType: InstanceType.of(
InstanceClass.T3,
stage === 'prod' ? InstanceSize.MEDIUM : InstanceSize.MICRO
),
multiAz: stage === 'prod',
deletionProtection: stage === 'prod',
backupRetention: Duration.days(stage === 'prod' ? 30 : 7),
});
}
}
VPC-Enabled Lambda Function#
// lib/constructs/vpc-lambda.ts
export class VpcLambdaFunction extends ServerlessFunction {
constructor(scope: Construct, id: string, props: ServerlessFunctionProps & {
vpcResources: VpcResources;
databaseSecret?: ISecret;
}) {
const { vpcResources, databaseSecret, ...functionProps } = props;
super(scope, id, {
...functionProps,
vpc: vpcResources.vpc,
vpcSubnets: {
subnetType: SubnetType.PRIVATE_WITH_EGRESS,
},
securityGroups: [vpcResources.lambdaSecurityGroup],
environment: {
...functionProps.environment,
...(databaseSecret && {
DB_SECRET_ARN: databaseSecret.secretArn,
}),
},
});
// Grant database access
if (databaseSecret) {
databaseSecret.grantRead(this);
}
}
}
Database Connection Management#
// src/libs/database.ts
import { Client } from 'pg';
import { getSecret } from './secrets';
let client: Client | null = null;
export async function getDbClient(): Promise<Client> {
if (client && !client.ended) {
return client;
}
const secretArn = process.env.DB_SECRET_ARN;
if (!secretArn) {
throw new Error('Database secret not configured');
}
const credentials = await getSecret<{
username: string;
password: string;
host: string;
port: number;
dbname: string;
}>(secretArn);
client = new Client({
user: credentials.username,
password: credentials.password,
host: credentials.host,
port: credentials.port,
database: credentials.dbname,
ssl: {
rejectUnauthorized: false,
},
connectionTimeoutMillis: 10000,
});
await client.connect();
return client;
}
// Clean up on Lambda container shutdown
process.on('SIGTERM', async () => {
if (client && !client.ended) {
await client.end();
}
});
Backup and Disaster Recovery#
Automated DynamoDB Backups#
// lib/constructs/backup-plan.ts
import { BackupPlan, BackupResource } from 'aws-cdk-lib/aws-backup';
import { Schedule } from 'aws-cdk-lib/aws-events';
export class TableBackupPlan extends Construct {
constructor(scope: Construct, id: string, props: {
tables: ITable[];
stage: string;
}) {
super(scope, id);
const plan = new BackupPlan(this, 'BackupPlan', {
backupPlanName: `my-service-${props.stage}-backup`,
backupPlanRules: [
{
ruleName: 'DailyBackups',
scheduleExpression: Schedule.cron({
hour: '3',
minute: '0',
}),
startWindow: Duration.hours(1),
completionWindow: Duration.hours(2),
deleteAfter: Duration.days(
props.stage === 'prod' ? 30 : 7
),
},
],
});
plan.addSelection('TableSelection', {
resources: props.tables.map(table =>
BackupResource.fromDynamoDbTable(table)
),
});
}
}
Migration Best Practices#
1. Stateful Resource Strategy#
// lib/stacks/stateful-stack.ts
export class StatefulStack extends Stack {
constructor(scope: Construct, id: string, props: StackProps) {
super(scope, id, {
...props,
// Prevent accidental deletion
terminationProtection: true,
});
// All stateful resources in one stack
const tables = this.createTables();
const secrets = this.createSecrets();
const parameters = this.createParameters();
// Export for use in other stacks
tables.forEach((table, name) => {
new CfnOutput(this, `${name}TableName`, {
value: table.tableName,
exportName: `${this.stackName}-${name}TableName`,
});
});
}
}
2. Zero-Downtime Migration Checklist#
- Import existing tables using
fromTableAttributes
- Test permissions with imported resources
- Implement dual-write pattern if changing table schema
- Use Lambda environment variables for gradual rollout
- Set up CloudWatch alarms before switching
- Implement circuit breakers for external services
- Test rollback procedures
Hard-Learned Lessons from Production#
After 6 months managing stateful infrastructure in CDK, here are the lessons that saved us from disasters:
1. Always Test Imports in Staging First#
Cost of learning: 4 hours of downtime, emergency recovery, uncomfortable meetings.
Prevention: Never run cdk deploy
against production tables without rehearsing in staging with identical data.
2. Environment Variables Are Not Configuration#
Cost of learning: 12 hours of authentication failures affecting 8K+ users. Prevention: Type-safe environment builders with validation and required field checks.
3. Secrets Need Environment-Specific Validation#
Cost of learning: $23K in accidental production charges. Prevention: Environment-aware secret validation that prevents cross-environment contamination.
4. Data Migration Needs Monitoring#
Cost of learning: 3 migration attempts, 180K records at risk. Prevention: Comprehensive logging, progress tracking, and timeout handling in migration functions.
5. VPC Lambda Functions Are Different#
Cost of learning: 15-minute cold starts, connection pool exhaustion. Prevention: Proper connection management, security group configuration, and subnet planning.
Migration Results#
Before CDK:
- Manual environment management
- Plaintext secrets in YAML
- No table import validation
- Migration scripts run locally
- Zero disaster recovery testing
After CDK:
- Type-safe environment configuration with validation
- Encrypted secrets with environment isolation
- Production-safe table imports with verification
- Automated, monitored data migrations
- Comprehensive backup and monitoring
Metrics:
- 23 Lambda functions migrated
- 3 production DynamoDB tables (180K+ records)
- 47 environment variables type-safe
- 12 secrets properly encrypted
- Zero data loss (eventually)
What's Next#
Your data layer is battle-tested with proper environment management and security. Stateful resources are protected, secrets are encrypted, and disaster recovery is automated.
In Part 5, we'll implement authentication and authorization:
- Cognito user pools with production constraints
- API Gateway authorizers that actually work
- IAM roles that follow least privilege
- JWT token validation that doesn't break
- Fine-grained permissions without complexity explosion
The foundation survived production. Let's secure it properly.
Comments (0)
Join the conversation
Sign in to share your thoughts and engage with the community
No comments yet
Be the first to share your thoughts on this post!
Comments (0)
Join the conversation
Sign in to share your thoughts and engage with the community
No comments yet
Be the first to share your thoughts on this post!