mech.app
AI Agents

Amazon Bedrock AgentCore Memory: Managed State for Conversational Agents

How AWS handles agent memory persistence, retrieval, and session boundaries through MCP servers, plus cost and latency trade-offs versus self-hosted state.

Source: aws.amazon.com
Amazon Bedrock AgentCore Memory: Managed State for Conversational Agents

Most agentic IDEs lose context between sessions. You spend Monday explaining your codebase architecture, Tuesday repeating the same details, and Wednesday wondering why you pay for an agent that forgets everything overnight. Amazon Bedrock AgentCore Memory is AWS’s answer: a fully managed service that persists conversational state across agent sessions, exposed through Model Context Protocol (MCP) servers.

The recent Kiro CLI integration shows the plumbing. Kiro is an agentic IDE that runs in your terminal. By implementing a custom MCP server, Kiro can store conversation history, retrieve past context, and monitor memory usage without building its own state layer. This pattern separates agent orchestration from memory persistence, letting you swap backends or scale storage independently.

Architecture: MCP Server as State Boundary

The stack has three layers:

AgentCore Memory (AWS managed service)

  • Stores conversational context with semantic indexing
  • Provides short-term working memory and long-term intelligent memory
  • Handles retrieval via built-in vector search

Custom MCP Server (your code)

  • Exposes memory operations as MCP tools
  • Translates between MCP protocol and AgentCore Memory API
  • Enforces access control and session boundaries

Kiro CLI (agent client)

  • Connects to MCP server via STDIO
  • Calls memory tools during agent execution
  • Receives context from previous sessions

The MCP server acts as a protocol adapter. Kiro sends tool calls over STDIO, the server translates them into AgentCore Memory API requests, and responses flow back through the same pipe. This keeps memory logic out of the agent runtime.

Memory Operations Exposed as Tools

The MCP server exposes four tool categories:

Store operations

  • store_context: Write new conversational turns
  • update_context: Modify existing memory entries
  • Serialization happens server-side before hitting AgentCore

Retrieval operations

  • retrieve_context: Fetch relevant history by semantic similarity
  • list_sessions: Enumerate past conversations
  • Returns ranked results based on vector distance

Monitoring operations

  • get_memory_usage: Check storage consumption
  • get_quota_limits: Retrieve account-level caps
  • Useful for triggering eviction or alerting

Infrastructure operations

  • create_memory_store: Provision new memory backends
  • delete_memory_store: Clean up unused stores
  • Lets agents manage their own persistence layer

Each tool maps to an AgentCore Memory API call. The server handles authentication, retries, and error translation.

State Persistence and Retrieval Flow

When Kiro stores context:

  1. Agent generates a conversational turn (user message + assistant response)
  2. Kiro calls store_context tool via MCP
  3. MCP server serializes turn into AgentCore Memory format
  4. AgentCore indexes content for semantic search
  5. Server returns confirmation or error to Kiro

When Kiro retrieves context:

  1. Agent starts new session and needs prior context
  2. Kiro calls retrieve_context with query or session ID
  3. MCP server queries AgentCore Memory vector index
  4. AgentCore ranks results by relevance
  5. Server returns top N results to Kiro
  6. Agent uses context to inform next response

The retrieval step is where latency matters. AgentCore Memory runs semantic search on every query, which adds 50-200ms compared to in-memory lookup. For agents that need context on every turn, this compounds.

Cost and Latency Trade-offs

ApproachLatency (p95)Cost ModelOperational OverheadContext Leakage Risk
AgentCore Memory100-200msPer-request + storageZero (fully managed)Low (AWS IAM boundaries)
Self-hosted Redis5-10msInstance costMedium (patching, backups)Medium (network segmentation)
In-process state<1msCompute onlyLow (no separate service)High (shared process memory)
Postgres with pgvector20-50msInstance + storageHigh (indexing, tuning)Medium (row-level security)

AgentCore Memory trades latency for operational simplicity. If your agent makes one memory call per session, the 100ms hit is negligible. If your agent retrieves context on every turn in a 50-turn conversation, you add 5 seconds of cumulative latency.

Cost scales with request volume and storage size. AWS charges per API call and per GB stored. For low-volume prototypes, this is cheaper than running a dedicated Redis instance. For high-throughput production agents, the per-request cost can exceed self-hosted infrastructure.

Session Boundaries and Access Control

The MCP server enforces memory isolation. Each Kiro CLI instance connects to the MCP server with credentials tied to a specific user or workspace. The server uses these credentials to scope AgentCore Memory queries.

Isolation mechanisms:

  • Session IDs: Each conversation gets a unique identifier. Retrieval queries filter by session ID to prevent cross-session leakage.
  • IAM policies: AgentCore Memory uses AWS IAM for access control. The MCP server assumes a role with read/write permissions scoped to specific memory stores.
  • Namespace prefixes: Memory stores can be partitioned by tenant or user. The server prefixes all keys with the authenticated identity.

If the MCP server crashes or restarts, session state persists in AgentCore Memory. The agent reconnects, retrieves context by session ID, and continues. This differs from in-process state, which evaporates on restart.

Quota Limits and Eviction Behavior

AgentCore Memory enforces account-level quotas:

  • Maximum memory stores per account
  • Maximum storage per memory store
  • Maximum requests per second

When you hit a quota, the service returns a throttling error. The MCP server can catch this and implement fallback logic:

Eviction strategies:

  • Drop oldest context first (FIFO)
  • Drop least-recently-used context (LRU)
  • Drop lowest-relevance context (semantic pruning)

The blog post does not specify whether AgentCore Memory auto-evicts or fails writes. Based on typical AWS service behavior, it likely throttles requests and requires client-side eviction. This means your MCP server needs logic to monitor usage and trigger cleanup before hitting limits.

Code Snippet: MCP Tool Definition

Here’s how the MCP server exposes a memory tool:

// MCP server tool definition for storing context
const storeContextTool = {
  name: "store_context",
  description: "Store conversational context in AgentCore Memory",
  inputSchema: {
    type: "object",
    properties: {
      sessionId: { type: "string" },
      userMessage: { type: "string" },
      assistantResponse: { type: "string" },
      metadata: { type: "object" }
    },
    required: ["sessionId", "userMessage", "assistantResponse"]
  }
};

// Handler that translates MCP call to AgentCore API
async function handleStoreTool(params) {
  const { sessionId, userMessage, assistantResponse, metadata } = params;
  
  const memoryEntry = {
    sessionId,
    timestamp: Date.now(),
    content: `User: ${userMessage}\nAssistant: ${assistantResponse}`,
    metadata
  };
  
  const response = await agentCoreClient.putMemory({
    memoryStoreId: getMemoryStoreForSession(sessionId),
    entry: memoryEntry
  });
  
  return { success: true, entryId: response.entryId };
}

The MCP server registers the tool, listens for calls over STDIO, and translates them into AgentCore Memory API requests. Error handling, retries, and credential management happen in the server layer.

Failure Modes

MCP server crashes

  • Agent loses connection to memory backend
  • Retrieval calls fail, agent proceeds without context
  • Mitigation: Run MCP server as supervised process with auto-restart

AgentCore Memory throttling

  • Service returns 429 errors when quota exceeded
  • Agent cannot store new context
  • Mitigation: Implement client-side rate limiting and eviction

Stale context retrieval

  • Semantic search returns irrelevant results
  • Agent uses outdated or wrong context
  • Mitigation: Add recency weighting to retrieval queries

Cross-session leakage

  • Bug in session ID filtering exposes other users’ context
  • Privacy and security violation
  • Mitigation: Audit IAM policies and namespace prefixing logic

Network latency spikes

  • AgentCore Memory queries timeout
  • Agent blocks waiting for context
  • Mitigation: Set aggressive timeouts and fall back to no-context mode

When to Use AgentCore Memory

Good fit:

  • Prototyping agentic workflows without infrastructure overhead
  • Low-to-medium request volume (hundreds of calls per minute)
  • Multi-session agents that need persistent context
  • Teams without ops capacity to run Redis or Postgres

Poor fit:

  • High-frequency agents that retrieve context on every turn
  • Latency-sensitive applications (sub-50ms response targets)
  • Cost-sensitive production workloads with high request volume
  • Agents that need complex query patterns beyond semantic search

Technical Verdict

AgentCore Memory is a managed state layer for agents that need conversational persistence without running their own database. The MCP server pattern cleanly separates orchestration from storage, making it easy to swap backends or scale independently.

The trade-off is latency and cost. If your agent makes one memory call per session, the 100ms overhead is fine. If your agent retrieves context on every turn, you will notice the delay and the bill. For production systems with tight latency budgets, self-hosted Redis or in-process state will be faster and cheaper.

Use AgentCore Memory when you want to ship quickly and can tolerate 100-200ms per memory operation. Avoid it when you need sub-50ms retrieval or when per-request costs exceed the operational cost of running your own state layer.