Skip to content
~/sph.sh

Building Ephemeral Preview Environments with AWS CDK and Serverless

Learn to build automated preview environments using AWS CDK, Lambda, and GitHub Actions for seamless PR testing and review workflows

The Problem with Shared Staging Environments

Working with development teams, I've learned that shared staging environments often become bottlenecks. When multiple PRs compete for the same environment, testing becomes unreliable and conflicts are inevitable. Feature branches can't be properly isolated, and the feedback loop slows down dramatically.

Here's what I've seen work: ephemeral preview environments that spin up automatically for each pull request and clean themselves up when done. This approach eliminates staging conflicts and gives each PR its own isolated testing space.

Architecture Overview

The solution combines AWS serverless services with GitHub Actions to create fully automated preview environments. Each PR gets its own subdomain and infrastructure stack that mirrors production but scales down appropriately.

Core Implementation

CDK Stack Architecture

The foundation is a parameterized CDK stack that creates identical infrastructure for each PR:

typescript
// lib/preview-environment-stack.tsimport { Stack, StackProps, Duration, RemovalPolicy } from 'aws-cdk-lib';import { Construct } from 'constructs';import * as lambda from 'aws-cdk-lib/aws-lambda';import * as apigateway from 'aws-cdk-lib/aws-apigateway';import * as route53 from 'aws-cdk-lib/aws-route53';import * as cloudfront from 'aws-cdk-lib/aws-cloudfront';
export interface PreviewStackProps extends StackProps {  prNumber: string;  commitSha: string;  domain: string;  certificateArn: string;}
export class PreviewEnvironmentStack extends Stack {  constructor(scope: Construct, id: string, props: PreviewStackProps) {    super(scope, id, props);
    const { prNumber, commitSha, domain } = props;
    // Common tags for resource management    const commonTags = {      Environment: 'preview',      PRNumber: prNumber,      CommitSha: commitSha.substring(0, 8),      CreatedBy: 'github-actions',      TTL: this.calculateTTL(72), // 72 hours    };
    // Lambda function for the application    const appFunction = new lambda.Function(this, 'AppFunction', {      runtime: lambda.Runtime.NODEJS_22_X,      handler: 'index.handler',      code: lambda.Code.fromAsset('dist'),      timeout: Duration.seconds(30),      memorySize: 256,      environment: {        STAGE: 'preview',        PR_NUMBER: prNumber,        COMMIT_SHA: commitSha,      },    });
    // API Gateway with custom domain    const api = new apigateway.RestApi(this, 'PreviewApi', {      restApiName: `preview-api-pr-${prNumber}`,      description: `Preview environment for PR ${prNumber}`,      binaryMediaTypes: ['*/*'],      defaultCorsPreflightOptions: {        allowOrigins: apigateway.Cors.ALL_ORIGINS,        allowMethods: apigateway.Cors.ALL_METHODS,      },    });
    // Lambda integration    const lambdaIntegration = new apigateway.LambdaIntegration(appFunction, {      requestTemplates: { 'application/json': '{ "statusCode": "200" }' },    });
    api.root.addMethod('ANY', lambdaIntegration);    api.root.addProxy({      defaultIntegration: lambdaIntegration,    });
    // Route53 record for custom domain    const previewDomain = `pr-${prNumber}.${domain}`;    const hostedZone = route53.HostedZone.fromLookup(this, 'HostedZone', {      domainName: domain,    });
    // CloudFront distribution for caching    const distribution = new cloudfront.CloudFrontWebDistribution(this, 'Distribution', {      originConfigs: [{        customOriginSource: {          domainName: api.restApiId + '.execute-api.' + this.region + '.amazonaws.com',          originPath: '/prod',        },        behaviors: [{ isDefaultBehavior: true }],      }],      comment: `Preview distribution for PR ${prNumber}`,    });
    // Apply common tags to all resources    Object.entries(commonTags).forEach(([key, value]) => {      this.node.applyAspect(new TagAspect(key, value));    });
    // Apply cleanup aspect    this.node.applyAspect(new AutoCleanupAspect());  }
  private calculateTTL(hours: number): string {    const expiryDate = new Date(Date.now() + hours * 60 * 60 * 1000);    return expiryDate.toISOString();  }}
// Cleanup aspect for proper resource deletionimport * as cdk from 'aws-cdk-lib';
class AutoCleanupAspect implements cdk.IAspect {  visit(node: cdk.IConstruct): void {    if (node instanceof cdk.CfnResource) {      node.addPropertyOverride('DeletionPolicy', 'Delete');    }  }}
// Tagging aspect for cost trackingclass TagAspect implements cdk.IAspect {  constructor(private key: string, private value: string) {}
  visit(node: cdk.IConstruct): void {    if (cdk.TagManager.isTaggable(node)) {      node.tags.setTag(this.key, this.value);    }  }}

GitHub Actions Workflow

The automation starts with a GitHub Actions workflow that responds to PR events:

yaml
# .github/workflows/preview-environment.ymlname: Preview Environment
on:  pull_request:    types: [opened, synchronize, closed]
permissions:  id-token: write  contents: read  pull-requests: write
jobs:  deploy-preview:    if: github.event.action != 'closed'    runs-on: ubuntu-latest
    env:      PR_NUMBER: ${{ github.event.number }}      COMMIT_SHA: ${{ github.event.pull_request.head.sha }}      PREVIEW_DOMAIN: pr-${{ github.event.number }}.preview.company.com
    steps:      - name: Checkout code        uses: actions/checkout@v4
      - name: Setup Node.js        uses: actions/setup-node@v4        with:          node-version: '22'          cache: 'npm'
      - name: Configure AWS credentials        uses: aws-actions/configure-aws-credentials@v4        with:          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}          aws-region: us-east-1
      - name: Install dependencies        run: |          npm ci          npm run build
      - name: Deploy CDK stack        run: |          npx cdk deploy preview-pr-${{ env.PR_NUMBER }} \            --parameters prNumber=${{ env.PR_NUMBER }} \            --parameters commitSha=${{ env.COMMIT_SHA }} \            --require-approval never \            --outputs-file cdk-outputs.json
      - name: Extract deployment URL        id: extract-url        run: |          PREVIEW_URL=$(jq -r '.["preview-pr-${{ env.PR_NUMBER }}"].PreviewURL' cdk-outputs.json)          echo "url=$PREVIEW_URL" >> $GITHUB_OUTPUT
      - name: Wait for deployment readiness        run: |          for i in {1..30}; do            if curl -f -s "${{ steps.extract-url.outputs.url }}/health" > /dev/null; then              echo "Deployment is ready!"              break            fi            echo "Waiting for deployment... ($i/30)"            sleep 10          done
      - name: Run E2E tests        env:          CYPRESS_BASE_URL: ${{ steps.extract-url.outputs.url }}        run: |          npm run test:e2e
      - name: Update PR comment        uses: actions/github-script@v7        with:          script: |            const { data: comments } = await github.rest.issues.listComments({              owner: context.repo.owner,              repo: context.repo.repo,              issue_number: context.issue.number,            });
            const botComment = comments.find(comment =>              comment.user.type === 'Bot' &&              comment.body.includes('Preview Environment')            );
            const body = `## Preview Environment
            **URL:** ${{ steps.extract-url.outputs.url }}            **Status:** Ready            **Commit:** \`${{ env.COMMIT_SHA }}\`
            E2E tests: Passed            `;
            if (botComment) {              await github.rest.issues.updateComment({                owner: context.repo.owner,                repo: context.repo.repo,                comment_id: botComment.id,                body: body              });            } else {              await github.rest.issues.createComment({                owner: context.repo.owner,                repo: context.repo.repo,                issue_number: context.issue.number,                body: body              });            }
  cleanup-preview:    if: github.event.action == 'closed'    runs-on: ubuntu-latest
    steps:      - name: Checkout code        uses: actions/checkout@v4
      - name: Configure AWS credentials        uses: aws-actions/configure-aws-credentials@v4        with:          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}          aws-region: us-east-1
      - name: Destroy CDK stack        run: |          npx cdk destroy preview-pr-${{ github.event.number }} --force
      - name: Update PR comment        uses: actions/github-script@v7        with:          script: |            const { data: comments } = await github.rest.issues.listComments({              owner: context.repo.owner,              repo: context.repo.repo,              issue_number: context.issue.number,            });
            const botComment = comments.find(comment =>              comment.user.type === 'Bot' &&              comment.body.includes('Preview Environment')            );
            if (botComment) {              await github.rest.issues.updateComment({                owner: context.repo.owner,                repo: context.repo.repo,                comment_id: botComment.id,                body: botComment.body + '\n\n**Status:** Cleaned up'              });            }

OIDC Authentication Setup

Instead of storing long-lived AWS credentials, use GitHub's OIDC provider for secure, temporary access:

typescript
// iam/github-oidc-role.tsimport { Stack, StackProps } from 'aws-cdk-lib';import { Construct } from 'constructs';import * as iam from 'aws-cdk-lib/aws-iam';
export class GitHubOIDCStack extends Stack {  constructor(scope: Construct, id: string, props?: StackProps) {    super(scope, id, props);
    // Create OIDC provider for GitHub Actions    const provider = new iam.OpenIdConnectProvider(this, 'GitHubProvider', {      url: 'https://token.actions.githubusercontent.com',      clientIds: ['sts.amazonaws.com'],      thumbprints: ['6938fd4d98bab03faadb97b34396831e3780aea1'],    });
    // IAM role for GitHub Actions    const role = new iam.Role(this, 'GitHubActionsRole', {      assumedBy: new iam.WebIdentityPrincipal(        provider.openIdConnectProviderArn,        {          StringEquals: {            'token.actions.githubusercontent.com:aud': 'sts.amazonaws.com',          },          StringLike: {            'token.actions.githubusercontent.com:sub': 'repo:your-org/your-repo:*',          },        }      ),      managedPolicies: [        iam.ManagedPolicy.fromAwsManagedPolicyName('PowerUserAccess'),      ],    });
    // Additional policies for CDK operations    role.addToPolicy(new iam.PolicyStatement({      effect: iam.Effect.Allow,      actions: [        'iam:CreateRole',        'iam:DeleteRole',        'iam:AttachRolePolicy',        'iam:DetachRolePolicy',        'iam:PassRole',      ],      resources: ['*'],    }));  }}

Cost Optimization and Monitoring

Resource Right-Sizing

I've learned that preview environments need to balance cost with functionality. Here's a practical cost breakdown for a 72-hour preview environment:

typescript
// Cost optimization configurationconst previewConfig = {  lambda: {    memorySize: 256, // MB - sufficient for most workloads    timeout: Duration.seconds(30),    reservedConcurrency: 5, // Limit concurrent executions  },  apiGateway: {    throttling: {      rateLimit: 100,      burstLimit: 200,    },  },  cloudfront: {    priceClass: cloudfront.PriceClass.PRICE_CLASS_100, // Use edge locations in US and Europe only    defaultTtl: Duration.hours(1), // Short TTL for development  },};
// Estimated costs per 72-hour environment:// - API Gateway: ~$0.12 (1000 requests × $3.50/million)// - Lambda: ~$0.08 (50 invocations × 1GB-sec × $0.0000166667)// - CloudFront: ~$0.04 (1GB data transfer)// - Route53: ~$0.02 (hosted zone queries)// Total: ~$0.26 per preview environment

Automated Cleanup

Cleanup automation prevents cost runaway and ensures environments don't accumulate:

typescript
// lib/cleanup-function.tsimport { CloudFormationClient, DeleteStackCommand, ListStacksCommand } from '@aws-sdk/client-cloudformation';
export const handler = async (event: any) => {  const cfn = new CloudFormationClient({});
  try {    // Find stacks with expired TTL tags    const { StackSummaries } = await cfn.send(new ListStacksCommand({      StackStatusFilter: ['CREATE_COMPLETE', 'UPDATE_COMPLETE'],    }));
    const expiredStacks = StackSummaries?.filter(stack => {      if (!stack.StackName?.startsWith('preview-pr-')) return false;
      const ttlTag = stack.Tags?.find(tag => tag.Key === 'TTL');      if (!ttlTag?.Value) return false;
      const expiryDate = new Date(ttlTag.Value);      return expiryDate < new Date();    }) || [];
    // Delete expired stacks    for (const stack of expiredStacks) {      console.log(`Deleting expired stack: ${stack.StackName}`);
      await cfn.send(new DeleteStackCommand({        StackName: stack.StackName,      }));    }
    return {      statusCode: 200,      body: JSON.stringify({        message: `Cleaned up ${expiredStacks.length} expired stacks`,        deletedStacks: expiredStacks.map(s => s.StackName),      }),    };  } catch (error) {    console.error('Cleanup failed:', error);    throw error;  }};
typescript
// Schedule cleanup Lambda with EventBridgeimport { Duration } from 'aws-cdk-lib';import * as events from 'aws-cdk-lib/aws-events';import * as targets from 'aws-cdk-lib/aws-events-targets';import * as lambda from 'aws-cdk-lib/aws-lambda';
const cleanupRule = new events.Rule(this, 'CleanupRule', {  schedule: events.Schedule.rate(Duration.hours(6)),  description: 'Clean up expired preview environments',});
cleanupRule.addTarget(new targets.LambdaFunction(cleanupFunction));

E2E Testing Integration

Cypress Configuration

Here's how I've integrated E2E testing with preview environments:

typescript
// cypress.config.tsimport { defineConfig } from 'cypress';
export default defineConfig({  e2e: {    baseUrl: process.env.CYPRESS_BASE_URL || 'http://localhost:3000',    video: true,    screenshotOnRunFailure: true,    defaultCommandTimeout: 10000,    requestTimeout: 15000,    responseTimeout: 15000,
    setupNodeEvents(on, config) {      // Take screenshots on failure      on('after:screenshot', (details) => {        console.log('Screenshot taken:', details.path);      });
      // Custom commands for preview environment testing      on('task', {        waitForDeployment() {          // Custom logic to wait for deployment readiness          return null;        },      });    },  },});
javascript
// cypress/e2e/preview-environment.cy.jsdescribe('Preview Environment Tests', () => {  beforeEach(() => {    // Ensure we're testing the right environment    cy.visit('/');    cy.get('[data-testid="environment-indicator"]')      .should('contain', 'Preview');  });
  it('should load the application correctly', () => {    cy.get('[data-testid="app-header"]').should('be.visible');    cy.get('[data-testid="main-content"]').should('be.visible');  });
  it('should handle API requests', () => {    cy.intercept('GET', '/api/health').as('healthCheck');    cy.visit('/dashboard');    cy.wait('@healthCheck').then((interception) => {      expect(interception.response?.statusCode).to.equal(200);    });  });
  it('should display correct environment information', () => {    cy.visit('/debug');    cy.get('[data-testid="pr-number"]')      .should('contain', Cypress.env('PR_NUMBER'));    cy.get('[data-testid="commit-sha"]')      .should('contain', Cypress.env('COMMIT_SHA'));  });});

Security Best Practices

Network Security

Working with preview environments exposed to the internet, I've learned these security patterns work well:

typescript
// VPC and security groups for Lambdaimport * as ec2 from 'aws-cdk-lib/aws-ec2';
const vpc = new ec2.Vpc(this, 'PreviewVPC', {  maxAzs: 2,  natGateways: 1, // Cost optimization  subnetConfiguration: [    {      cidrMask: 24,      name: 'Private',      subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,    },    {      cidrMask: 24,      name: 'Public',      subnetType: ec2.SubnetType.PUBLIC,    },  ],});
const securityGroup = new ec2.SecurityGroup(this, 'LambdaSecurityGroup', {  vpc,  description: 'Security group for preview Lambda functions',  allowAllOutbound: true,});
// Restrict inbound trafficsecurityGroup.addIngressRule(  ec2.Peer.anyIpv4(),  ec2.Port.tcp(443),  'HTTPS traffic only');

Secrets Management

typescript
// Systems Manager Parameter Store for secretsimport * as ssm from 'aws-cdk-lib/aws-ssm';import * as lambda from 'aws-cdk-lib/aws-lambda';
const dbPassword = new ssm.StringParameter(this, 'DatabasePassword', {  parameterName: `/preview/${prNumber}/database/password`,  stringValue: 'generated-secure-password',  tier: ssm.ParameterTier.STANDARD,});
// Lambda environment variables reference parametersconst appFunction = new lambda.Function(this, 'AppFunction', {  // ... other configuration  environment: {    DATABASE_PASSWORD_PARAM: dbPassword.parameterName,  },});
// Grant Lambda permission to read the parameterdbPassword.grantRead(appFunction);

Real-World Lessons Learned

Common Pitfalls I've Encountered

DNS Propagation Delays: Route53 changes can take 30-60 seconds to propagate. I learned to add health checks before marking deployments as ready:

typescript
// Health check implementationconst healthCheck = new route53.HealthCheck(this, 'HealthCheck', {  type: route53.HealthCheckType.HTTPS,  resourcePath: '/health',  fqdn: previewDomain,  requestInterval: Duration.seconds(30),  failureThreshold: 3,});

Resource Cleanup Failures: Sometimes CDK destroy operations fail due to resource dependencies. Here's a retry mechanism that works:

bash
# Enhanced cleanup script#!/bin/bashSTACK_NAME="preview-pr-$1"MAX_RETRIES=3
for i in $(seq 1 $MAX_RETRIES); do  echo "Attempt $i to destroy stack $STACK_NAME"
  if npx cdk destroy $STACK_NAME --force; then    echo "Stack destroyed successfully"    exit 0  fi
  if [ $i -lt $MAX_RETRIES ]; then    echo "Retry in 30 seconds..."    sleep 30  fidone
echo "Failed to destroy stack after $MAX_RETRIES attempts"exit 1

Cold Start Performance: Lambda cold starts can make initial tests fail. Pre-warming helps:

typescript
// Lambda warmer functionconst warmerFunction = new lambda.Function(this, 'Warmer', {  runtime: lambda.Runtime.NODEJS_18_X,  handler: 'warmer.handler',  code: lambda.Code.fromInline(`    const AWS = require('aws-sdk');    const lambda = new AWS.Lambda();
    exports.handler = async () => {      await lambda.invoke({        FunctionName: process.env.TARGET_FUNCTION,        InvocationType: 'Event',        Payload: JSON.stringify({ warmer: true })      }).promise();    };  `),  environment: {    TARGET_FUNCTION: appFunction.functionName,  },});
// Schedule pre-warmingconst warmupRule = new events.Rule(this, 'WarmupRule', {  schedule: events.Schedule.rate(Duration.minutes(5)),});
warmupRule.addTarget(new targets.LambdaFunction(warmerFunction));

Performance Optimizations

Parallel CDK Deployments: For teams with many concurrent PRs, deploy multiple stacks in parallel:

yaml
# Matrix strategy for parallel deploymentsstrategy:  matrix:    include:      - stack: frontend        directory: packages/frontend      - stack: backend        directory: packages/backend      - stack: infrastructure        directory: infrastructure
steps:  - name: Deploy ${{ matrix.stack }}    working-directory: ${{ matrix.directory }}    run: |      npx cdk deploy preview-pr-${{ env.PR_NUMBER }}-${{ matrix.stack }} \        --require-approval never

CloudFront Caching Strategy: Balance freshness with performance:

typescript
const distribution = new cloudfront.CloudFrontWebDistribution(this, 'Distribution', {  originConfigs: [{    customOriginSource: {      domainName: api.restApiId + '.execute-api.' + this.region + '.amazonaws.com',      originPath: '/prod',    },    behaviors: [      {        isDefaultBehavior: true,        allowedMethods: cloudfront.CloudFrontAllowedMethods.ALL,        cachedMethods: cloudfront.CloudFrontAllowedCachedMethods.GET_HEAD_OPTIONS,        cachePolicyId: cloudfront.OriginRequestPolicyId.CORS_S3_ORIGIN,        ttl: {          default: Duration.minutes(5), // Short TTL for development          max: Duration.hours(1),          min: Duration.seconds(0),        },      },    ],  }],});

Monitoring and Alerting

Cost Monitoring

Track spending per PR to prevent budget surprises:

typescript
// CloudWatch dashboard for preview environmentsimport { Construct } from 'constructs';import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
const dashboard = new cloudwatch.Dashboard(this, 'PreviewDashboard', {  dashboardName: 'Preview-Environments',  widgets: [    [      new cloudwatch.GraphWidget({        title: 'Preview Environment Costs',        left: [          new cloudwatch.Metric({            namespace: 'AWS/Billing',            metricName: 'EstimatedCharges',            dimensionsMap: {              Currency: 'USD',            },          }),        ],      }),    ],    [      new cloudwatch.SingleValueWidget({        title: 'Active Preview Environments',        metrics: [          new cloudwatch.Metric({            namespace: 'Custom/Preview',            metricName: 'ActiveEnvironments',          }),        ],      }),    ],  ],});
// Cost alertconst costAlarm = new cloudwatch.Alarm(this, 'PreviewCostAlarm', {  metric: new cloudwatch.Metric({    namespace: 'AWS/Billing',    metricName: 'EstimatedCharges',    dimensionsMap: {      Currency: 'USD',    },  }),  threshold: 50, // Alert if monthly costs exceed $50  evaluationPeriods: 1,});

Deployment Success Tracking

typescript
// Custom metrics for deployment trackingconst deploymentMetric = new cloudwatch.Metric({  namespace: 'Custom/Preview',  metricName: 'DeploymentSuccess',  dimensionsMap: {    Environment: 'preview',    PRNumber: prNumber,  },});
// Send success metric after deploymentconst successMetric = new cloudwatch.PutMetricDataCommand({  Namespace: 'Custom/Preview',  MetricData: [{    MetricName: 'DeploymentSuccess',    Value: 1,    Unit: 'Count',    Dimensions: [      { Name: 'Environment', Value: 'preview' },      { Name: 'PRNumber', Value: prNumber },    ],  }],});

Key Takeaways

After implementing this pattern across multiple projects, here's what I've learned:

  1. Start Simple: Begin with basic Lambda + API Gateway. Add complexity as your team grows comfortable with the automation.

  2. Cost Control is Critical: Without proper tagging and cleanup, preview environments can quickly become expensive. The automated cleanup is non-negotiable.

  3. Security from Day One: Use OIDC instead of long-lived credentials. It's more secure and eliminates credential rotation headaches.

  4. Monitor Everything: Failed deployments and runaway costs are much easier to catch with proper monitoring from the start.

  5. Test the Cleanup: Your cleanup automation will eventually fail. Test it regularly and have manual fallbacks ready.

The investment in automation pays off quickly. Teams report faster review cycles, fewer staging environment conflicts, and more confidence in their deployments. Most importantly, it eliminates the friction that often slows down development workflows.

Working with this pattern, I've seen deployment-to-ready times consistently under 5 minutes, with costs staying below $0.30 per 72-hour environment. The developer experience improvement alone makes this architectural pattern worthwhile.

Related Posts