Agent-cache: Multi-Tier LLM Caching for Valkey and Redis

Agent-cache is a multi-tier caching library that sits between your agentic workflow and Valkey or Redis. It caches three distinct layers: LLM responses, tool call results, and session state. The goal is to cut token spend and latency without forcing you into framework-specific storage patterns or Redis modules.

LangChain’s built-in cache only handles LLM responses. LangGraph’s checkpoint-redis only handles state and requires Redis 8 with modules. Agent-cache consolidates all three tiers behind one connection and ships with OpenTelemetry and Prometheus instrumentation at the cache layer.

Three-Tier Architecture

Agent-cache maps cache tiers to agent execution phases:

Prompt cache: Stores LLM responses keyed by exact prompt text and model parameters. If your agent calls gpt-4o with the same prompt twice, the second call returns from Valkey in under 1ms.
Tool cache: Stores tool call results keyed by function name and serialized arguments. If your agent calls get_weather("Sofia") twice with identical arguments, the cached result comes back instantly.
Session cache: Stores agent state (current step, user intent, LangGraph checkpoints) with per-field TTL. This persists context across requests without re-running the entire graph.

Each tier uses exact-match keying. There is no semantic similarity layer. If your prompt changes by one character, you get a cache miss.

Invalidation Strategy

The library does not automatically invalidate caches when tool calls mutate external state. You must handle invalidation manually:

Prompt cache: Set a global TTL (e.g., 1 hour) or invalidate by key prefix when you deploy a new prompt template.
Tool cache: Tag tool results with a version or timestamp. Invalidate by tag when the underlying data source changes (e.g., file writes, API updates).
Session cache: Use per-field TTL to expire stale checkpoints. If your agent writes to a database, invalidate the session cache for that user immediately after the write.

Agent-cache does not track dependencies between cache tiers. If a tool call result changes, you must manually invalidate any LLM responses that depend on it.

Serialization and Memory

LangGraph checkpoints are serialized as JSON and stored in Redis strings. The library does not compress or deduplicate checkpoint data. If your agent runs a 50-step graph, you will store 50 checkpoints in Redis.

To avoid bloating memory:

Set aggressive TTLs on session state (e.g., 15 minutes).
Use Redis cluster mode to shard checkpoints across nodes.
Prune old checkpoints with a background job that scans by timestamp.

The library supports both Valkey 7+ and Redis 6.2+ without modules. It does not use RedisJSON, RedisGraph, or RediSearch. All data is stored as strings, hashes, or sorted sets.

Code Example: Tool Cache with Manual Invalidation

import { AgentCache } from '@betterdb/agent-cache';
import { Redis } from 'ioredis';

const redis = new Redis({ host: 'localhost', port: 6379 });
const cache = new AgentCache({ client: redis });

// Cache a tool call result
async function getWeather(city: string): Promise<string> {
  const cacheKey = `tool:get_weather:${city}`;
  const cached = await cache.get(cacheKey);
  if (cached) return cached;

  const result = await fetch(`https://api.weather.com/${city}`).then(r => r.text());
  await cache.set(cacheKey, result, { ttl: 3600 }); // 1 hour TTL
  return result;
}

// Invalidate when external state changes
async function updateWeatherData(city: string): Promise<void> {
  await fetch(`https://api.weather.com/${city}`, { method: 'POST', body: '...' });
  await cache.delete(`tool:get_weather:${city}`); // Manual invalidation
}

Observability

Agent-cache ships with OpenTelemetry and Prometheus instrumentation. Each cache operation emits:

Span: Cache hit/miss, latency, key prefix.
Metric: Hit rate, miss rate, eviction count, memory usage.

You can plug these into Grafana or Datadog without writing custom exporters. The library does not log cache keys by default (to avoid leaking sensitive prompts), but you can enable key logging in development.

Trade-offs: Exact Match vs. Semantic Similarity

Dimension	Exact Match (agent-cache)	Semantic Similarity
Cache hit rate	Low if prompts vary slightly	High if prompts are semantically similar
Latency	Sub-millisecond lookup	10-50ms for embedding + vector search
Complexity	Simple key-value store	Requires vector database (Pinecone, Weaviate)
False positives	Zero	High if similarity threshold is too loose
Invalidation	Exact key or prefix match	Must re-embed and re-index

Exact-match caching works when your agent uses templated prompts with fixed parameters. It breaks down when users rephrase questions or when your agent iterates on the same task with slight variations.

If you need semantic similarity, you will need a vector database and an embedding model. Agent-cache does not provide this. You would layer it on top (e.g., embed the prompt, search for similar embeddings, fall back to exact match if no semantic hit).

Failure Modes

Stale tool results: If your agent caches get_stock_price("AAPL") and the price changes, the cache will serve outdated data until TTL expires. Solution: Set short TTLs (e.g., 60 seconds) or invalidate on every write.
Checkpoint bloat: If your agent runs long graphs, session cache memory usage will grow linearly with graph depth. Solution: Prune old checkpoints or use Redis cluster mode.
Cache stampede: If 100 agents request the same uncached prompt simultaneously, all 100 will hit the LLM API. Solution: Use a distributed lock (e.g., Redlock) to serialize the first request and let others wait for the cache.
Key collision: If two agents use the same prompt with different context, they will share a cache entry. Solution: Include user ID or session ID in the cache key.

Deployment Shape

Agent-cache is a library, not a service. You deploy it as a dependency in your Node.js application. It connects to an existing Valkey or Redis instance. You do not need to run a separate cache server.

Recommended topology:

Development: Single Redis instance on localhost.
Production: Redis cluster with 3+ nodes, sharded by key prefix (e.g., prompt:*, tool:*, session:*).
Observability: Export OpenTelemetry spans to Jaeger or Honeycomb. Export Prometheus metrics to Grafana.

The library does not handle Redis failover or replication. You must configure Redis Sentinel or Redis Cluster yourself.

Technical Verdict

Use agent-cache when:

You run agentic workflows with templated prompts and deterministic tool calls.
You want to consolidate LLM, tool, and session caching behind one connection.
You already operate Valkey or Redis and do not want to add a vector database.
You need observability at the cache layer without writing custom instrumentation.

Avoid agent-cache when:

Your agent prompts vary significantly between requests (low exact-match hit rate).
You need semantic similarity caching (requires vector search).
Your tool calls mutate state frequently (invalidation overhead outweighs cache benefits).
You run agents that iterate on the same task with slight variations (exact match will miss).

Agent-cache is infrastructure for cost control, not intelligence. It will not make your agent smarter. It will make repeated work cheaper and faster, as long as you can tolerate exact-match semantics and manual invalidation.