Skip to content

Build a RAG Agent with AWS Bedrock and CDK

Building a RAG agent on AWS Bedrock + Knowledge Bases + OpenSearch Serverless with CDK in TypeScript — architecture, IAM wiring, automated ingestion, and the chat UI.

AWS markets Bedrock Agent plus Knowledge Bases plus OpenSearch Serverless as the managed-RAG path: ship retrieval-grounded chat without owning the retrieval code. I ported the DigitalOcean rag-assistant blueprint shape onto AWS-native services with CDK in TypeScript, then deployed it end-to-end and watched the first run fail. The AWS Bedrock managed-RAG stack works with CDK, but two doc-gap traps will fail your first deploy: a cross-region inference-profile IAM hole, and the absence of an automatic ingestion job.

Architecture mapping

The shape is one-for-one with the DO blueprint. Where DO bundles a GenAI Agent on a managed inference cluster and a KBaaS with a managed vector store, AWS splits the pieces into named services that you wire together in IaC. Seven services, one CDK stack, 51 resources.

DigitalOcean BlueprintAWS Equivalent (this repo)
GenAI Agent (Nemotron)Bedrock Agent + Claude Sonnet 4.5
Knowledge Base (Qwen3 embed)Bedrock KB + Titan Embed v2
KBaaS managed vector storeOpenSearch Serverless VECTORSEARCH
Guardrails (jailbreak/content/PII)Bedrock Guardrails (content + PII)
App Platform FastAPI chat UICloudFront + S3 SPA, Lambda Function URL
tor1-only regionAny Bedrock region (deployed in us-east-1)
TerraformAWS CDK (TypeScript)

The full stack took ~757 seconds to deploy clean in us-east-1. Both traps below surfaced on the first run.

How the stack is wired

The CDK code lives in three small custom constructs — VectorStore, KnowledgeBase, Agent — each composed of aws-cdk-lib/aws-bedrock L1 primitives (CfnKnowledgeBase, CfnDataSource, CfnAgent, CfnGuardrail) plus stock S3 and IAM. No third-party CDK libraries. Three concrete shapes are worth seeing before the traps.

S3 bucket + seed corpus. The documents bucket is a plain s3.Bucket with auto-delete on stack destroy. A BucketDeployment ships the seed corpus from a local docs-sample/ directory into the bucket at synth time, so the first deploy is functional end-to-end without a manual upload step.

ts
const documents = new s3.Bucket(this, 'Documents', {  blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,  encryption: s3.BucketEncryption.S3_MANAGED,  removalPolicy: cdk.RemovalPolicy.DESTROY,  autoDeleteObjects: true,});
new s3deploy.BucketDeployment(this, 'SeedDocs', {  sources: [s3deploy.Source.asset(path.join(__dirname, '..', '..', 'docs-sample'))],  destinationBucket: documents,});

Knowledge Base + S3 data source. CfnKnowledgeBase points at the OpenSearch Serverless collection and names the field mapping Bedrock expects. CfnDataSource attaches the S3 bucket and sets a fixed-size chunking strategy with 20% overlap, which works well for dense reference material.

ts
const kb = new bedrock.CfnKnowledgeBase(this, 'Kb', {  name: `${basename}-kb`,  roleArn: kbRole.roleArn,  knowledgeBaseConfiguration: {    type: 'VECTOR',    vectorKnowledgeBaseConfiguration: {      embeddingModelArn: `arn:aws:bedrock:${region}::foundation-model/amazon.titan-embed-text-v2:0`,    },  },  storageConfiguration: {    type: 'OPENSEARCH_SERVERLESS',    opensearchServerlessConfiguration: {      collectionArn: vectorStore.collectionArn,      vectorIndexName: vectorStore.indexName,      fieldMapping: {        vectorField: 'bedrock-knowledge-base-default-vector',        textField: 'AMAZON_BEDROCK_TEXT_CHUNK',        metadataField: 'AMAZON_BEDROCK_METADATA',      },    },  },});
new bedrock.CfnDataSource(this, 'S3DataSource', {  knowledgeBaseId: kb.attrKnowledgeBaseId,  name: `${basename}-s3-source`,  dataSourceConfiguration: {    type: 'S3',    s3Configuration: { bucketArn: documents.bucketArn },  },  vectorIngestionConfiguration: {    chunkingConfiguration: {      chunkingStrategy: 'FIXED_SIZE',      fixedSizeChunkingConfiguration: { maxTokens: 512, overlapPercentage: 20 },    },  },});

Guardrail. CfnGuardrail takes a content policy (six filter types with input/output strength) and a sensitive-information policy (PII entities with anonymise-or-block actions). Attaching it to the agent is a single guardrailConfiguration block on CfnAgent that references this resource's attrGuardrailId.

ts
new bedrock.CfnGuardrail(this, 'Guardrail', {  name: `${basename}-guardrail`,  blockedInputMessaging: 'I cannot process that request.',  blockedOutputsMessaging: 'I cannot share that response.',  contentPolicyConfig: {    filtersConfig: [      { type: 'SEXUAL',        inputStrength: 'HIGH',   outputStrength: 'HIGH' },      { type: 'VIOLENCE',      inputStrength: 'HIGH',   outputStrength: 'HIGH' },      { type: 'HATE',          inputStrength: 'HIGH',   outputStrength: 'HIGH' },      { type: 'INSULTS',       inputStrength: 'MEDIUM', outputStrength: 'MEDIUM' },      { type: 'MISCONDUCT',    inputStrength: 'MEDIUM', outputStrength: 'MEDIUM' },      { type: 'PROMPT_ATTACK', inputStrength: 'HIGH',   outputStrength: 'NONE' },    ],  },  sensitiveInformationPolicyConfig: {    piiEntitiesConfig: [      { type: 'EMAIL',                       action: 'ANONYMIZE' },      { type: 'PHONE',                       action: 'ANONYMIZE' },      { type: 'CREDIT_DEBIT_CARD_NUMBER',    action: 'BLOCK' },      { type: 'US_SOCIAL_SECURITY_NUMBER',   action: 'BLOCK' },    ],  },});

The full source for each construct is in the repo's lib/constructs/ directory. The traps below all sit somewhere inside this shape.

The cross-region inference profile IAM hole

The starter snippets I worked from grant bedrock:InvokeModel on arn:aws:bedrock:${region}::foundation-model/*. That ARN is region-pinned to whatever region the stack is deployed in. The Claude Sonnet 4.5 model ID I pass to the Agent is the cross-region inference profile form: us.anthropic.claude-sonnet-4-5-20250929-v1:0. Profiles in the us.* family route the underlying invocation to whichever pool region has capacity — from us-east-1 that pool is us-east-1, us-east-2, and us-west-2; the routed-destination set can differ from other source regions, so consult Bedrock's inference-profile docs for your case. IAM checks the routed region, not the caller's region.

So the deploy succeeds. Agent Prepare succeeds. Ingestion succeeds (Titan Embed v2 is region-local, which masks the bug). Then the first InvokeAgent call from the SPA comes back with this:

"Access denied when calling Bedrock. Check your request permissions and retry the request."

No service trace, no foundation-model ARN in the error, nothing pointing at the region pin. I pulled aws iam get-role-policy on the AgentRole, scanned the foundation-model resource line, and saw ${region} had baked into a single value. The fix is a region wildcard for foundation models and an explicit inference-profile/* resource for the profile itself.

ts
agentRole.addToPolicy(  new iam.PolicyStatement({    actions: ['bedrock:InvokeModel', 'bedrock:InvokeModelWithResponseStream'],    resources: [      `arn:aws:bedrock:*::foundation-model/*`,      `arn:aws:bedrock:*:${account}:inference-profile/*`,    ],  }),);

Foundation-model ARNs have no account segment by design; the * in the region slot widens the policy to any region the us.* profile can route to. The profile ARN itself does carry an account segment.

Side gotcha that cost me one extra cycle: cdk deploy --hotswap reports "no changes" for IAM-only diffs because hotswap only touches Lambda/Step Functions code. Use a full cdk deploy to actually push the policy.

No ingestion job at deploy time

CfnKnowledgeBase plus CfnDataSource are declarative resources. They create the KB and link the S3 data source. They do not start an ingestion job. The seeded S3 corpus stays at zero vectors in OpenSearch until something calls bedrock-agent start-ingestion-job.

This trap is worse than the IAM one because the failure mode looks like a retrieval-quality problem. The chat returns a generic "no information available" answer to every question. The natural first reaction is to blame chunking, embeddings, or the corpus itself. The actual cause is that nothing in the stack ever embedded the documents.

I weighed three triggers:

  • A post-deploy shell step that runs aws bedrock-agent start-ingestion-job. Works, but lives outside IaC and gets forgotten on the next deploy.
  • An EventBridge rule plus a tiny Lambda. Two extra resources and a permissions hop for behaviour that should fire exactly once per deploy.
  • An AwsCustomResource that calls the same API directly from CloudFormation.

I went with AwsCustomResource. It fires inside CloudFormation, has no separate runtime to monitor, and re-fires on every deploy when I rotate the PhysicalResourceId. The policy stays scoped to a single action on a single ARN.

ts
const ingestParams = {  knowledgeBaseId: kb.attrKnowledgeBaseId,  dataSourceId: dataSource.attrDataSourceId,};const physicalIdBase = `${cdk.Stack.of(this).stackName}-ingest`;const ingestionTrigger = new cr.AwsCustomResource(this, 'IngestionTrigger', {  onCreate: {    service: 'BedrockAgent',    action: 'StartIngestionJob',    parameters: ingestParams,    physicalResourceId: cr.PhysicalResourceId.of(`${physicalIdBase}-create`),  },  onUpdate: {    service: 'BedrockAgent',    action: 'StartIngestionJob',    parameters: ingestParams,    physicalResourceId: cr.PhysicalResourceId.of(`${physicalIdBase}-${Date.now()}`),  },  policy: cr.AwsCustomResourcePolicy.fromStatements([    new iam.PolicyStatement({      actions: ['bedrock:StartIngestionJob'],      resources: [kb.attrKnowledgeBaseArn],    }),  ]),  installLatestAwsSdk: false,});ingestionTrigger.node.addDependency(seedDeployment);ingestionTrigger.node.addDependency(dataSource);

Two pieces matter. The Date.now() in the update PhysicalResourceId makes every synth look like a real change to CloudFormation, which forces re-ingestion on every deploy. Subsequent runs are incremental: Bedrock only re-processes new or changed S3 objects. The construct also depends on the BucketDeployment that seeds the corpus, so the job is never queued against an empty bucket.

The call is async. CDK does not wait for ingestion to finish; the custom resource returns as soon as the job is queued. Three small markdown documents indexed in roughly 40 seconds after cdk deploy returned in this test. The behaviour does not generalise: ingestion time depends on object count, chunk size, and how busy Bedrock is. Treat 40 seconds as a sanity-check number, not a forecast.

What it costs (and how to keep it cheap)

OpenSearch Serverless dominates the bill. The collection runs at a 2-OCU minimum and charges hourly even when nothing is indexing or querying, which works out to roughly 350permonthinuseast1ifleftrunning(2OCU×350 per month in us-east-1 if left running (2 OCU × 0.24/OCU-hour × 730 hours). Bedrock invocations, Titan embeddings, S3, Lambda, and CloudWatch are per-request and negligible at demo scale. For learning projects the only workflow that makes sense is deploy, test, cdk destroy, all inside the same hour.

Closing

If you are evaluating Bedrock managed RAG, clone the repo, follow the README, and expect the two traps to surface in the order above: IAM first, ingestion second. Bedrock managed RAG is a real shortcut compared to building retrieval from scratch, but the docs treat each resource in isolation and the wiring between them is on you.

This is a small-corpus learning report. Production-scale OCU sizing, multi-tenant access patterns, and rerank-model integration are not covered.

References

Related Posts