Amazon Bedrock AgentCore Memory: Managed State for Conversational Agents

Most agentic IDEs lose context between sessions. You spend Monday explaining your codebase architecture, Tuesday repeating the same details, and Wednesday wondering why you pay for an agent that forgets everything overnight. Amazon Bedrock AgentCore Memory is AWS’s answer: a fully managed service that persists conversational state across agent sessions, exposed through Model Context Protocol (MCP) servers.

The recent Kiro CLI integration shows the plumbing. Kiro is an agentic IDE that runs in your terminal. By implementing a custom MCP server, Kiro can store conversation history, retrieve past context, and monitor memory usage without building its own state layer. This pattern separates agent orchestration from memory persistence, letting you swap backends or scale storage independently.

Architecture: MCP Server as State Boundary

The stack has three layers:

AgentCore Memory (AWS managed service)

Stores conversational context with semantic indexing
Provides short-term working memory and long-term intelligent memory
Handles retrieval via built-in vector search

Custom MCP Server (your code)

Exposes memory operations as MCP tools
Translates between MCP protocol and AgentCore Memory API
Enforces access control and session boundaries

Kiro CLI (agent client)

Connects to MCP server via STDIO
Calls memory tools during agent execution
Receives context from previous sessions

The MCP server acts as a protocol adapter. Kiro sends tool calls over STDIO, the server translates them into AgentCore Memory API requests, and responses flow back through the same pipe. This keeps memory logic out of the agent runtime.

Memory Operations Exposed as Tools

The MCP server exposes four tool categories:

Store operations

store_context: Write new conversational turns
update_context: Modify existing memory entries
Serialization happens server-side before hitting AgentCore

Retrieval operations

retrieve_context: Fetch relevant history by semantic similarity
list_sessions: Enumerate past conversations
Returns ranked results based on vector distance

Monitoring operations

get_memory_usage: Check storage consumption
get_quota_limits: Retrieve account-level caps
Useful for triggering eviction or alerting

Infrastructure operations

create_memory_store: Provision new memory backends
delete_memory_store: Clean up unused stores
Lets agents manage their own persistence layer

Each tool maps to an AgentCore Memory API call. The server handles authentication, retries, and error translation.

State Persistence and Retrieval Flow

When Kiro stores context:

Agent generates a conversational turn (user message + assistant response)
Kiro calls store_context tool via MCP
MCP server serializes turn into AgentCore Memory format
AgentCore indexes content for semantic search
Server returns confirmation or error to Kiro

When Kiro retrieves context:

Agent starts new session and needs prior context
Kiro calls retrieve_context with query or session ID
MCP server queries AgentCore Memory vector index
AgentCore ranks results by relevance
Server returns top N results to Kiro
Agent uses context to inform next response

The retrieval step is where latency matters. AgentCore Memory runs semantic search on every query, which adds 50-200ms compared to in-memory lookup. For agents that need context on every turn, this compounds.

Cost and Latency Trade-offs

Approach	Latency (p95)	Cost Model	Operational Overhead	Context Leakage Risk
AgentCore Memory	100-200ms	Per-request + storage	Zero (fully managed)	Low (AWS IAM boundaries)
Self-hosted Redis	5-10ms	Instance cost	Medium (patching, backups)	Medium (network segmentation)
In-process state	<1ms	Compute only	Low (no separate service)	High (shared process memory)
Postgres with pgvector	20-50ms	Instance + storage	High (indexing, tuning)	Medium (row-level security)

AgentCore Memory trades latency for operational simplicity. If your agent makes one memory call per session, the 100ms hit is negligible. If your agent retrieves context on every turn in a 50-turn conversation, you add 5 seconds of cumulative latency.

Cost scales with request volume and storage size. AWS charges per API call and per GB stored. For low-volume prototypes, this is cheaper than running a dedicated Redis instance. For high-throughput production agents, the per-request cost can exceed self-hosted infrastructure.

Session Boundaries and Access Control

The MCP server enforces memory isolation. Each Kiro CLI instance connects to the MCP server with credentials tied to a specific user or workspace. The server uses these credentials to scope AgentCore Memory queries.

Isolation mechanisms:

Session IDs: Each conversation gets a unique identifier. Retrieval queries filter by session ID to prevent cross-session leakage.
IAM policies: AgentCore Memory uses AWS IAM for access control. The MCP server assumes a role with read/write permissions scoped to specific memory stores.
Namespace prefixes: Memory stores can be partitioned by tenant or user. The server prefixes all keys with the authenticated identity.

If the MCP server crashes or restarts, session state persists in AgentCore Memory. The agent reconnects, retrieves context by session ID, and continues. This differs from in-process state, which evaporates on restart.

Quota Limits and Eviction Behavior

AgentCore Memory enforces account-level quotas:

Maximum memory stores per account
Maximum storage per memory store
Maximum requests per second

When you hit a quota, the service returns a throttling error. The MCP server can catch this and implement fallback logic:

Eviction strategies:

Drop oldest context first (FIFO)
Drop least-recently-used context (LRU)
Drop lowest-relevance context (semantic pruning)

The blog post does not specify whether AgentCore Memory auto-evicts or fails writes. Based on typical AWS service behavior, it likely throttles requests and requires client-side eviction. This means your MCP server needs logic to monitor usage and trigger cleanup before hitting limits.

Code Snippet: MCP Tool Definition

Here’s how the MCP server exposes a memory tool:

// MCP server tool definition for storing context
const storeContextTool = {
  name: "store_context",
  description: "Store conversational context in AgentCore Memory",
  inputSchema: {
    type: "object",
    properties: {
      sessionId: { type: "string" },
      userMessage: { type: "string" },
      assistantResponse: { type: "string" },
      metadata: { type: "object" }
    },
    required: ["sessionId", "userMessage", "assistantResponse"]
  }
};

// Handler that translates MCP call to AgentCore API
async function handleStoreTool(params) {
  const { sessionId, userMessage, assistantResponse, metadata } = params;
  
  const memoryEntry = {
    sessionId,
    timestamp: Date.now(),
    content: `User: ${userMessage}\nAssistant: ${assistantResponse}`,
    metadata
  };
  
  const response = await agentCoreClient.putMemory({
    memoryStoreId: getMemoryStoreForSession(sessionId),
    entry: memoryEntry
  });
  
  return { success: true, entryId: response.entryId };
}

The MCP server registers the tool, listens for calls over STDIO, and translates them into AgentCore Memory API requests. Error handling, retries, and credential management happen in the server layer.

Failure Modes

MCP server crashes

Agent loses connection to memory backend
Retrieval calls fail, agent proceeds without context
Mitigation: Run MCP server as supervised process with auto-restart

AgentCore Memory throttling

Service returns 429 errors when quota exceeded
Agent cannot store new context
Mitigation: Implement client-side rate limiting and eviction

Stale context retrieval

Semantic search returns irrelevant results
Agent uses outdated or wrong context
Mitigation: Add recency weighting to retrieval queries

Cross-session leakage

Bug in session ID filtering exposes other users’ context
Privacy and security violation
Mitigation: Audit IAM policies and namespace prefixing logic

Network latency spikes

AgentCore Memory queries timeout
Agent blocks waiting for context
Mitigation: Set aggressive timeouts and fall back to no-context mode

When to Use AgentCore Memory

Good fit:

Prototyping agentic workflows without infrastructure overhead
Low-to-medium request volume (hundreds of calls per minute)
Multi-session agents that need persistent context
Teams without ops capacity to run Redis or Postgres

Poor fit:

High-frequency agents that retrieve context on every turn
Latency-sensitive applications (sub-50ms response targets)
Cost-sensitive production workloads with high request volume
Agents that need complex query patterns beyond semantic search

Technical Verdict

AgentCore Memory is a managed state layer for agents that need conversational persistence without running their own database. The MCP server pattern cleanly separates orchestration from storage, making it easy to swap backends or scale independently.

The trade-off is latency and cost. If your agent makes one memory call per session, the 100ms overhead is fine. If your agent retrieves context on every turn, you will notice the delay and the bill. For production systems with tight latency budgets, self-hosted Redis or in-process state will be faster and cheaper.

Use AgentCore Memory when you want to ship quickly and can tolerate 100-200ms per memory operation. Avoid it when you need sub-50ms retrieval or when per-request costs exceed the operational cost of running your own state layer.

Source Links

AWS Blog: Extending conversational memory in Kiro CLI using Amazon Bedrock AgentCore Memory