Most agentic IDEs lose context between sessions. You spend Monday explaining your codebase architecture, Tuesday repeating the same details, and Wednesday wondering why you pay for an agent that forgets everything overnight. Amazon Bedrock AgentCore Memory is AWS’s answer: a fully managed service that persists conversational state across agent sessions, exposed through Model Context Protocol (MCP) servers.
The recent Kiro CLI integration shows the plumbing. Kiro is an agentic IDE that runs in your terminal. By implementing a custom MCP server, Kiro can store conversation history, retrieve past context, and monitor memory usage without building its own state layer. This pattern separates agent orchestration from memory persistence, letting you swap backends or scale storage independently.
Architecture: MCP Server as State Boundary
The stack has three layers:
AgentCore Memory (AWS managed service)
- Stores conversational context with semantic indexing
- Provides short-term working memory and long-term intelligent memory
- Handles retrieval via built-in vector search
Custom MCP Server (your code)
- Exposes memory operations as MCP tools
- Translates between MCP protocol and AgentCore Memory API
- Enforces access control and session boundaries
Kiro CLI (agent client)
- Connects to MCP server via STDIO
- Calls memory tools during agent execution
- Receives context from previous sessions
The MCP server acts as a protocol adapter. Kiro sends tool calls over STDIO, the server translates them into AgentCore Memory API requests, and responses flow back through the same pipe. This keeps memory logic out of the agent runtime.
Memory Operations Exposed as Tools
The MCP server exposes four tool categories:
Store operations
store_context: Write new conversational turnsupdate_context: Modify existing memory entries- Serialization happens server-side before hitting AgentCore
Retrieval operations
retrieve_context: Fetch relevant history by semantic similaritylist_sessions: Enumerate past conversations- Returns ranked results based on vector distance
Monitoring operations
get_memory_usage: Check storage consumptionget_quota_limits: Retrieve account-level caps- Useful for triggering eviction or alerting
Infrastructure operations
create_memory_store: Provision new memory backendsdelete_memory_store: Clean up unused stores- Lets agents manage their own persistence layer
Each tool maps to an AgentCore Memory API call. The server handles authentication, retries, and error translation.
State Persistence and Retrieval Flow
When Kiro stores context:
- Agent generates a conversational turn (user message + assistant response)
- Kiro calls
store_contexttool via MCP - MCP server serializes turn into AgentCore Memory format
- AgentCore indexes content for semantic search
- Server returns confirmation or error to Kiro
When Kiro retrieves context:
- Agent starts new session and needs prior context
- Kiro calls
retrieve_contextwith query or session ID - MCP server queries AgentCore Memory vector index
- AgentCore ranks results by relevance
- Server returns top N results to Kiro
- Agent uses context to inform next response
The retrieval step is where latency matters. AgentCore Memory runs semantic search on every query, which adds 50-200ms compared to in-memory lookup. For agents that need context on every turn, this compounds.
Cost and Latency Trade-offs
| Approach | Latency (p95) | Cost Model | Operational Overhead | Context Leakage Risk |
|---|---|---|---|---|
| AgentCore Memory | 100-200ms | Per-request + storage | Zero (fully managed) | Low (AWS IAM boundaries) |
| Self-hosted Redis | 5-10ms | Instance cost | Medium (patching, backups) | Medium (network segmentation) |
| In-process state | <1ms | Compute only | Low (no separate service) | High (shared process memory) |
| Postgres with pgvector | 20-50ms | Instance + storage | High (indexing, tuning) | Medium (row-level security) |
AgentCore Memory trades latency for operational simplicity. If your agent makes one memory call per session, the 100ms hit is negligible. If your agent retrieves context on every turn in a 50-turn conversation, you add 5 seconds of cumulative latency.
Cost scales with request volume and storage size. AWS charges per API call and per GB stored. For low-volume prototypes, this is cheaper than running a dedicated Redis instance. For high-throughput production agents, the per-request cost can exceed self-hosted infrastructure.
Session Boundaries and Access Control
The MCP server enforces memory isolation. Each Kiro CLI instance connects to the MCP server with credentials tied to a specific user or workspace. The server uses these credentials to scope AgentCore Memory queries.
Isolation mechanisms:
- Session IDs: Each conversation gets a unique identifier. Retrieval queries filter by session ID to prevent cross-session leakage.
- IAM policies: AgentCore Memory uses AWS IAM for access control. The MCP server assumes a role with read/write permissions scoped to specific memory stores.
- Namespace prefixes: Memory stores can be partitioned by tenant or user. The server prefixes all keys with the authenticated identity.
If the MCP server crashes or restarts, session state persists in AgentCore Memory. The agent reconnects, retrieves context by session ID, and continues. This differs from in-process state, which evaporates on restart.
Quota Limits and Eviction Behavior
AgentCore Memory enforces account-level quotas:
- Maximum memory stores per account
- Maximum storage per memory store
- Maximum requests per second
When you hit a quota, the service returns a throttling error. The MCP server can catch this and implement fallback logic:
Eviction strategies:
- Drop oldest context first (FIFO)
- Drop least-recently-used context (LRU)
- Drop lowest-relevance context (semantic pruning)
The blog post does not specify whether AgentCore Memory auto-evicts or fails writes. Based on typical AWS service behavior, it likely throttles requests and requires client-side eviction. This means your MCP server needs logic to monitor usage and trigger cleanup before hitting limits.
Code Snippet: MCP Tool Definition
Here’s how the MCP server exposes a memory tool:
// MCP server tool definition for storing context
const storeContextTool = {
name: "store_context",
description: "Store conversational context in AgentCore Memory",
inputSchema: {
type: "object",
properties: {
sessionId: { type: "string" },
userMessage: { type: "string" },
assistantResponse: { type: "string" },
metadata: { type: "object" }
},
required: ["sessionId", "userMessage", "assistantResponse"]
}
};
// Handler that translates MCP call to AgentCore API
async function handleStoreTool(params) {
const { sessionId, userMessage, assistantResponse, metadata } = params;
const memoryEntry = {
sessionId,
timestamp: Date.now(),
content: `User: ${userMessage}\nAssistant: ${assistantResponse}`,
metadata
};
const response = await agentCoreClient.putMemory({
memoryStoreId: getMemoryStoreForSession(sessionId),
entry: memoryEntry
});
return { success: true, entryId: response.entryId };
}
The MCP server registers the tool, listens for calls over STDIO, and translates them into AgentCore Memory API requests. Error handling, retries, and credential management happen in the server layer.
Failure Modes
MCP server crashes
- Agent loses connection to memory backend
- Retrieval calls fail, agent proceeds without context
- Mitigation: Run MCP server as supervised process with auto-restart
AgentCore Memory throttling
- Service returns 429 errors when quota exceeded
- Agent cannot store new context
- Mitigation: Implement client-side rate limiting and eviction
Stale context retrieval
- Semantic search returns irrelevant results
- Agent uses outdated or wrong context
- Mitigation: Add recency weighting to retrieval queries
Cross-session leakage
- Bug in session ID filtering exposes other users’ context
- Privacy and security violation
- Mitigation: Audit IAM policies and namespace prefixing logic
Network latency spikes
- AgentCore Memory queries timeout
- Agent blocks waiting for context
- Mitigation: Set aggressive timeouts and fall back to no-context mode
When to Use AgentCore Memory
Good fit:
- Prototyping agentic workflows without infrastructure overhead
- Low-to-medium request volume (hundreds of calls per minute)
- Multi-session agents that need persistent context
- Teams without ops capacity to run Redis or Postgres
Poor fit:
- High-frequency agents that retrieve context on every turn
- Latency-sensitive applications (sub-50ms response targets)
- Cost-sensitive production workloads with high request volume
- Agents that need complex query patterns beyond semantic search
Technical Verdict
AgentCore Memory is a managed state layer for agents that need conversational persistence without running their own database. The MCP server pattern cleanly separates orchestration from storage, making it easy to swap backends or scale independently.
The trade-off is latency and cost. If your agent makes one memory call per session, the 100ms overhead is fine. If your agent retrieves context on every turn, you will notice the delay and the bill. For production systems with tight latency budgets, self-hosted Redis or in-process state will be faster and cheaper.
Use AgentCore Memory when you want to ship quickly and can tolerate 100-200ms per memory operation. Avoid it when you need sub-50ms retrieval or when per-request costs exceed the operational cost of running your own state layer.