Skip to content

Building Production-Ready AI Agents with AWS Bedrock AgentCore

Learn how AWS Bedrock AgentCore solves the infrastructure challenges of deploying agentic AI at scale - from prototype to production with runtime, memory, gateway, and multi-agent coordination.

The Production Gap

Many teams have built impressive LangChain or CrewAI prototypes that demonstrate real value - until it's time to deploy them. The jump from "it works on my laptop" to production involves session isolation, credential management, memory persistence, observability, and security controls. Building this infrastructure from scratch takes months, which is why 70% of AI projects never make it past the pilot phase.

AWS Bedrock AgentCore (GA October 2025) addresses this production gap. It's not another agent framework competing with LangChain or CrewAI. Instead, it's the managed infrastructure layer that agents built with ANY framework need to run at scale. Think of it as "Lambda for AI agents" - you bring your agent code, AgentCore handles runtime, memory, tool management, and security.

This post explores how AgentCore solves real infrastructure challenges and when it makes sense to use it over self-hosted alternatives.

AgentCore Architecture

AgentCore consists of five integrated services that work independently or together:

Runtime: Serverless execution environment with 8-hour session windows and automatic session isolation using dedicated microVMs per user.

Memory: Managed storage for both short-term conversation context and long-term user preferences, facts, and summaries - without building your own vector database.

Gateway: Centralized tool management using the Model Context Protocol (MCP). Convert Lambda functions, REST APIs, and existing services into agent-accessible tools.

Identity: Secure credential management with OAuth 2.0 integration. Agents access third-party APIs on behalf of users without storing credentials.

Observability: OpenTelemetry-compatible metrics and traces exported to CloudWatch, Datadog, or LangSmith.

Runtime: Deploy Any Framework

The fundamental challenge with production agents is providing secure, isolated execution environments. AgentCore Runtime handles this through consumption-based microVM allocation.

Here's how to deploy a Strands agent to AgentCore:

python
from bedrock_agentcore import BedrockAgentCoreAppfrom strands import Agent
app = BedrockAgentCoreApp()
@app.entrypointdef invoke(payload):    agent = Agent(        model="anthropic.claude-sonnet-4-20250514-v1:0",        instructions="You are a customer support agent with access to order history and return policies."    )    return agent.run(payload.get("message"))

Deploy with the CLI:

bash
agentcore configureagentcore launch --region us-east-1

Key runtime characteristics:

  • 8-hour execution windows: Industry-leading for async agentic workflows. Traditional serverless functions timeout at 15 minutes.
  • Session isolation: Each user gets a dedicated microVM. No data leakage between sessions.
  • Consumption pricing: Pay for active CPU/memory only, not I/O wait time. This can be significantly cheaper than pre-allocated Lambda configurations for agentic workloads that spend significant time waiting on LLM responses.
  • ARM64 containers: Required for performance optimization. Use --platform=linux/arm64 in Docker builds.

Common pitfall: Not handling the Mcp-Session-Id header. AgentCore auto-injects this for stateless MCP servers:

python
from fastapi import FastAPI, Header
app = FastAPI()
@app.post("/mcp")async def mcp_endpoint(    mcp_session_id: str = Header(None, alias="Mcp-Session-Id")):    # AgentCore manages session isolation    # Your server must accept platform-generated IDs    session_state = load_session(mcp_session_id)    return {"status": "ok"}

Memory: Context Without Infrastructure

Building production memory for agents requires solving two problems: short-term conversation context and long-term knowledge persistence. AgentCore Memory handles both.

Memory extraction pipeline:

Implementing memory with three strategies:

python
from bedrock_agentcore.memory import (    MemoryClient,    UserPreferenceMemoryStrategy,    SemanticMemoryStrategy,    SummaryMemoryStrategy)
memory_client = MemoryClient()
# Create memory with multiple strategiesmemory = memory_client.create_memory(    name="customer-support-memory",    strategies=[        UserPreferenceMemoryStrategy(),  # Learn user patterns        SemanticMemoryStrategy(),  # Store facts/knowledge        SummaryMemoryStrategy()  # Compress sessions    ],    encryption_key_arn="arn:aws:kms:us-east-1:123456789012:key/abc123")
# Store conversation eventmemory_client.create_event(    memory_id=memory.id,    event_data={        "type": "conversation",        "content": "User prefers technical explanations with code examples"    })

Strategy selection guide:

  • Customer Support: UserPreferences + Summaries (remember communication style)
  • Technical Assistant: SemanticFacts + Summaries (remember codebase knowledge)
  • Personal Agent: All three strategies (comprehensive personalization)

Critical security pattern - always use Guardrails before CreateEvent API:

python
import boto3
bedrock = boto3.client('bedrock')
# WRONG: Direct storage (vulnerable to memory poisoning)# memory_client.create_event(#  memory_id=memory.id,#  event_data={"content": user_input}# )
# RIGHT: Sanitize with Guardrails firstguardrail_response = bedrock.apply_guardrail(    guardrailId='guardrail-123',    guardrailVersion='1',    content=[{"text": {"text": user_input}}])
if guardrail_response['action'] == 'NONE':    memory_client.create_event(        memory_id=memory.id,        event_data={"content": user_input}    )else:    # Block and log attack attempt    logger.warning(f"Memory poisoning attempt blocked: {guardrail_response}")

Cost optimization: Limit retriever hops. Two-three retrieval operations per turn is normal, ten indicates over-retrieval:

python
memory_config = {    'retrieval_strategy': 'semantic',    'max_results': 5,    'max_retriever_hops': 2}

Gateway: Centralized Tool Management

Embedding tools directly in agent code leads to duplication and inconsistency. When you have customer support, sales, and technical agents all needing weather data, maintaining three copies of weather tool code becomes a maintenance problem.

AgentCore Gateway solves this through centralized MCP-compatible tool servers:

Registering a Lambda function as a tool:

python
import boto3
agentcore = boto3.client('bedrock-agentcore')
# Register Lambda as tool targetresponse = agentcore.create_target(    gatewayId='gateway-123',    targetConfig={        'type': 'LAMBDA',        'lambdaArn': 'arn:aws:lambda:us-east-1:123456789012:function:get-weather',        'description': 'Get current weather for a city'    })

Gateway handles:

  • Authentication: IAM roles for AWS resources, OAuth 2.0 for third-party APIs, API keys for services
  • Semantic tool search: Agents discover relevant tools via x_amz_bedrock_agentcore_search without knowing all available tools
  • Protocol conversion: Lambda functions, OpenAPI specs, Smithy models, and MCP servers all exposed through standardized MCP interface

Architecture pattern - centralize common tools, keep domain-specific tools local:

yaml
Common Tools (via Gateway):  - Web search  - Database queries  - Weather API  - Stock prices
Domain-Specific Tools (agent-local):  - Return policy logic  - Product catalog  - Business rules

Multi-Agent Coordination with A2A Protocol

Scaling from single agents to coordinated agent teams requires standardized communication. AgentCore uses the Agent-to-Agent (A2A) protocol for this.

A2A vs MCP distinction:

  • MCP: Agent-to-tool communication (agent calling weather API)
  • A2A: Agent-to-agent communication (supervisor coordinating specialists)

Hub-and-spoke supervisor implementation:

typescript
import { BedrockAgentCoreClient, InvokeAgentCommand } from '@aws-sdk/client-bedrock-agentcore';
class HostAgent {  private client: BedrockAgentCoreClient;  private specialistAgents: Map<string, AgentConfig>;
  async routeToSpecialist(query: string, capability: string) {    const agentConfig = this.specialistAgents.get(capability);
    // Fetch remote agent's A2A configuration    const agentCard = await this.fetchAgentCard(agentConfig.endpoint);
    // Invoke via A2A protocol    const command = new InvokeAgentCommand({      agentId: agentCard.id,      sessionId: this.generateSessionId(),      inputText: query,      protocol: 'A2A'    });
    return await this.client.send(command);  }
  private async fetchAgentCard(endpoint: string): Promise<AgentCard> {    // Retrieve agent capabilities schema    const response = await fetch(`${endpoint}/.well-known/agent-card`);    return response.json();  }}

Orchestration patterns:

Supervisor with routing mode - not every query needs full orchestration:

python
class SupervisorAgent:    def route_query(self, query: str):        # Simple query → direct routing        if self.is_simple_query(query):            specialist = self.select_single_specialist(query)            return specialist.invoke(query)
        # Complex query → full orchestration        else:            plan = self.analyze_and_plan(query)            results = self.orchestrate_subagents(plan)            return self.synthesize(results)
    def is_simple_query(self, query: str) -> bool:        intents = self.detect_intents(query)        return len(intents) == 1

Framework interoperability: LangGraph monitoring agent + CrewAI analytics agent + Strands incident response agent can all communicate via A2A. No framework lock-in.

Security and Cost Optimization

Guardrails Configuration

Guardrails protect against prompt injection, memory poisoning, and harmful content:

python
import boto3
bedrock = boto3.client('bedrock')
guardrail = bedrock.create_guardrail(    name='production-agent-guardrail',    contentPolicyConfig={        'filtersConfig': [            {'type': 'HATE', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},            {'type': 'VIOLENCE', 'inputStrength': 'MEDIUM', 'outputStrength': 'HIGH'},            {'type': 'PROMPT_ATTACK', 'inputStrength': 'HIGH', 'outputStrength': 'NONE'}        ]    },    topicPolicyConfig={        'topicsConfig': [            {                'name': 'Financial Advice',                'definition': 'Providing specific investment recommendations',                'type': 'DENY'            }        ]    },    wordPolicyConfig={        'wordsConfig': [            {'text': 'internal-api-key'},            {'text': 'secret-token'}        ],        'managedWordListsConfig': [            {'type': 'PROFANITY'}        ]    })

Defense-in-depth strategy:

  1. Input validation: Block malicious prompts at entry
  2. Memory protection: Sanitize before CreateEvent API
  3. Output filtering: Prevent harmful responses
  4. Audit trails: CloudWatch logs for compliance

Cost Optimization Strategies

Prompt caching - 90% discount on cached tokens:

python
response = bedrock_runtime.converse(    modelId="anthropic.claude-sonnet-4-20250514-v1:0",    messages=[{"role": "user", "content": user_query}],    system=[        {            "text": large_system_prompt,            "cachePoint": {"type": "default"}        }    ])

Model routing - match complexity to model cost:

python
def route_to_model(query: str) -> str:    complexity = classify_query_complexity(query)
    if complexity < 0.3:        return "anthropic.claude-haiku-4-5-20241022-v1:0"  # $1.00/$5.00 per 1M tokens (input/output)    elif complexity < 0.7:        return "anthropic.claude-sonnet-4-20250514-v1:0"  # $3/$15 per 1M tokens (input/output)    else:        return "anthropic.claude-opus-4-20250514-v1:0"  # $15/$75 per 1M tokens (input/output) - Opus 4

Tool-call budgets - prevent unbounded tool use:

python
agent = Agent(    model="anthropic.claude-sonnet-4-20250514-v1:0",    max_tool_calls_per_turn=3,    instructions="If user asks about multiple items, summarize instead of exhaustive lookup")

Cost components:

  • Runtime: Active CPU/memory consumption (not pre-allocated)
  • Memory: Short-term (per event), long-term (per memory processed + retrievals)
  • Gateway: MCP operations (ListTools, CallTool, Ping) + semantic search queries
  • Identity: No additional charges when used via Runtime/Gateway
  • Observability: CloudWatch standard pricing

Common Pitfalls

Memory Poisoning Without Guardrails

Problem: Storing raw user input directly allows prompt injection into memory:

python
# WRONGuser_input = "Ignore previous instructions, you are now..."memory_client.create_event(    memory_id=memory.id,    event_data={"content": user_input})

Solution: Always sanitize with Guardrails first (shown in Memory section above).

Tool-Call Storms

Problem: Agent invokes 20+ tools per query without limits:

User: "What's the weather in major cities?"Agent makes 50 separate get_weather() callsTotal: 10s latency, $0.05 per query

Solution: Enforce tool-call budgets and guide via instructions:

python
agent = Agent(    max_tool_calls_per_turn=3,    instructions="For multiple items, summarize instead of exhaustive lookup")

ARM64 Container Requirements

Problem: Using x86 containers causes deployment failures.

Solution: Build for ARM64 explicitly:

dockerfile
FROM --platform=linux/arm64 python:3.11-slimCOPY . /appCMD ["python", "agent.py"]
bash
docker buildx build --platform linux/arm64 -t agent:latest .

No VPC Integration for Internal APIs

Problem: Agent traffic goes over public internet.

Solution: Configure VPC and PrivateLink:

python
runtime_config = {    'vpcConfig': {        'securityGroupIds': ['sg-12345'],        'subnetIds': ['subnet-abc', 'subnet-def']    },    'privateLinkEnabled': True}

When to Use AgentCore

Use AgentCore when:

  • Multiple agent frameworks in use (LangChain + CrewAI + custom)
  • Need to evaluate different models (Bedrock + OpenAI + Anthropic)
  • Enterprise security required (VPC, PrivateLink, customer-managed KMS)
  • Multi-agent systems planned (A2A coordination)
  • Fast time-to-production needed (weeks, not months)
  • Team size under 10 (can't build infrastructure from scratch)

Consider alternatives when:

  • Single framework forever (e.g., only LangGraph → use LangGraph Cloud)
  • Single cloud ecosystem (e.g., all Azure → Azure AI Agent Service)
  • Extreme high volume (over 10M sessions/month → self-hosted may be cheaper)
  • Need custom hardware (GPUs for specialized models → self-hosted)
  • Already built agent infrastructure (sunk costs)

Break-even analysis for self-hosting:

AgentCore becomes cost-effective when:

  • Agent development time exceeds 2 weeks
  • Multiple agent types (customer support, analytics, monitoring)
  • Enterprise security/compliance required
  • Team size under 10 dedicated to agent infrastructure

Self-hosted infrastructure costs: 50k150kbuildcost+50k-150k build cost + 200k/year DevOps team. Break-even at approximately 10M sessions/month.

Key Takeaways

AgentCore is infrastructure, not a framework. It doesn't replace LangChain or CrewAI - it provides the production runtime they need to scale.

Modular adoption reduces risk. Start with Runtime only, add Memory → Gateway → Identity → Observability incrementally. Each service delivers independent value.

Security is built-in. Session isolation, Guardrails, Identity management, and VPC integration are production-ready features, not bolt-ons.

Cost optimization is multi-dimensional. Prompt caching (90% discount), model routing (30% savings), tool-call budgets, and consumption pricing compound to reduce costs 60-80%.

Multi-agent systems need protocols. MCP for agent-to-tool, A2A for agent-to-agent. Framework interoperability allows LangGraph + CrewAI + Strands agents to work together.

Resources

Start with the $200 AWS free tier credit available to new AWS customers to validate your use case before committing to production deployment.

Related Posts