Most agent frameworks treat memory as an append-only log. Every observation, tool result, and conversation turn gets serialized into a growing context window until you hit token limits or performance cliffs. YourMemory introduces a decay mechanism inspired by biological forgetting: memories age, relevance scores drop, and the system prunes without calling an LLM.
A customer support bot that remembers every interaction from six months ago will burn tokens on irrelevant context. A research agent that never forgets intermediate search results will drown in noise. Forgetting is not a bug, it is a feature you need to engineer.
The Decay Algorithm
YourMemory applies a time-based decay curve to stored memories. Each memory gets a relevance score that decreases as time passes. The system does not ask an LLM to decide what matters. It uses a deterministic function:
- Initial score: Set when the memory is created, based on source type (user input, tool output, agent reasoning).
- Decay rate: Configurable per memory type. User preferences decay slower than transient search results.
- Access boost: When an agent retrieves a memory, the score gets a bump. Frequently accessed memories stay alive longer.
The pruning logic runs on a schedule or when context size exceeds a threshold. Memories below a cutoff score get archived or deleted. This keeps the active working set small without manual intervention.
Temporal Reasoning Without LLM Calls
The CLI inference command queries the memory layer using temporal predicates. You can ask “what happened last Tuesday” or “show me memories from the past hour” without embedding the question in a prompt. The system parses time expressions, filters by timestamp, and returns matching memories.
This is rule-based retrieval, not semantic search. The advantage is speed and cost. No embedding model, no vector database query, no LLM token spend. The tradeoff is precision: you get exact time matches but not conceptual similarity.
For hybrid retrieval, you can combine temporal filters with embedding similarity. Run a time-bounded query first to narrow the candidate set, then rank by semantic relevance. This two-stage approach keeps vector search costs low while preserving temporal context.
Memory Dashboard as Audit Trail
The dashboard exposes the full memory lifecycle:
- Creation timestamp: When the memory entered the system.
- Last access time: When an agent last retrieved it.
- Current relevance score: The decayed value after time and access adjustments.
- Source metadata: Which agent, tool, or user generated the memory.
For debugging, this is critical. If an agent makes a bad decision, you can trace which memories it consulted and see their scores at decision time. If a memory decayed too fast, you can adjust the rate. If an agent is ignoring recent context, you can check whether retrieval logic is broken or the score function is miscalibrated.
The dashboard also functions as an audit log. In regulated environments, you need to show why an agent took an action. A memory trail with timestamps and scores provides that evidence.
Resurrection vs. Permanent Deletion
When a memory decays below the threshold, you have two options:
- Archive: Move it to cold storage. The memory is no longer in the active context window but can be resurrected if an agent explicitly requests it.
- Delete: Remove it permanently to comply with data retention policies or reduce storage costs.
Resurrection requires a secondary retrieval path. The agent must know to check the archive, which adds latency. For most use cases, archiving is safer. You avoid losing context that might become relevant later. For privacy-sensitive data, deletion is mandatory.
The system does not automatically resurrect memories. If an agent needs archived context, it must issue a separate query. This prevents accidental context pollution from stale data.
Architecture: Decay Pipeline
Here is how the components fit together:
| Component | Responsibility | Failure Mode |
|---|---|---|
| Memory Store | Persist memories with timestamps and scores | Unbounded growth if pruning halts; context bloat within 6-12 hours for typical workloads |
| Decay Scheduler | Run pruning logic on interval or size trigger | Missed runs if process dies; memory accumulation outpaces pruning if interval too long |
| Retrieval Engine | Query by time, score, or embedding similarity | Slow queries on large datasets; index corruption under concurrent writes |
| Dashboard API | Expose memory state for debugging and audit | Stale data if cache not invalidated; missing entries if sync lags store writes |
| Archive Layer | Store decayed memories for potential resurrection | High latency on cold storage reads; resurrection failures if archive corrupted |
The decay scheduler is the critical path. If it stops running, the memory store grows unbounded. You need monitoring on scheduler health and pruning latency. If pruning takes longer than the interval, you will fall behind.
Code Snippet: Decay Function
import time
import math
def calculate_relevance(memory, current_time, config):
"""
Compute relevance score with time decay and access boost.
"""
age_seconds = current_time - memory.created_at
decay_rate = config.decay_rates.get(memory.type, 0.01)
# Exponential decay
time_factor = math.exp(-decay_rate * age_seconds)
# Boost for recent access
if memory.last_accessed:
access_age = current_time - memory.last_accessed
access_boost = 1.0 + (0.5 * math.exp(-0.001 * access_age))
else:
access_boost = 1.0
return memory.initial_score * time_factor * access_boost
def prune_memories(store, config):
"""
Remove memories below relevance threshold.
"""
current_time = time.time()
threshold = config.prune_threshold
for memory in store.list_all():
score = calculate_relevance(memory, current_time, config)
if score < threshold:
if config.archive_enabled:
store.archive(memory)
else:
store.delete(memory)
The exponential decay function ensures old memories fade gradually. The access boost keeps frequently used context alive. Adjust decay_rate per memory type to control retention.
Benchmarking Decay Performance
You need three metrics:
- Retrieval latency: How long does it take to query active memories after pruning?
- Context window savings: How many tokens do you save by removing decayed memories?
- Task accuracy: Does the agent still perform well after forgetting?
Retrieval latency should drop as the active set shrinks. If it does not, your pruning logic is not aggressive enough or your indexing is broken.
Context window savings are easy to measure: count tokens before and after pruning. For a long-running agent, this can yield significant reductions depending on decay configuration and memory access patterns.
Task accuracy is harder. You need a benchmark suite that tests whether the agent can still answer questions or complete tasks after memories decay. If accuracy drops sharply, your decay rates are too aggressive or your retrieval logic is not surfacing the right context.
Technical Verdict
Use YourMemory’s decay approach if:
- Your agents run continuously for more than 4 hours AND context window is under 32k tokens AND you need deterministic pruning without LLM overhead.
- You have audit or compliance requirements that demand visibility into what the agent remembered at decision time.
- Token costs are a bottleneck AND you can tolerate information loss from aggressive decay.
- You need temporal queries (last hour, past week) more than semantic similarity searches.
Avoid this approach if:
- Your agents run for minutes, not hours. The scheduler overhead and tuning effort are not justified.
- You need perfect recall or semantic relevance ranking. Decay curves optimize for recency, not conceptual similarity.
- You have more than 1 million memories and need sub-second retrieval. The pruning scan becomes a bottleneck without sharding.
- Your workload requires resurrecting archived memories frequently. The two-tier retrieval adds latency you cannot afford.
YourMemory solves context bloat in long-running agents with a simple, deterministic decay mechanism. The dashboard provides visibility that most frameworks lack. The temporal reasoning CLI is useful for time-bounded queries without embedding costs. But you lose information, you need to tune decay rates per memory type, and you need monitoring on the pruning pipeline. For production agents that run continuously, forgetting is not optional. You either build it or you hit token limits and performance cliffs.