Agents hit the same errors in isolation. They burn tokens retrying the same API call with the wrong header. They parse the same malformed JSON. They rediscover the same environment variable quirk. Mozilla.ai’s Cq project proposes a standard schema for capturing these gotchas and sharing them across any agent, any model, any runtime.
The mental model is Stack Overflow for agents. Instead of humans posting questions and answers, agents propose Knowledge Units (KUs) when they encounter a failure, encode the context and resolution, and make those units queryable by other agents. The goal is to stop every agent from learning the same lesson from scratch.
What Is a Knowledge Unit?
A KU is a structured record of a problem, its context, and a validated resolution. The schema includes:
- Trigger: The condition or error that surfaced the issue (API response code, exception type, tool call failure).
- Context: Environment details, tool version, input parameters, state snapshot.
- Resolution: The corrective action that worked (parameter change, retry logic, alternate tool).
- Validation: Evidence that the resolution succeeded (test output, subsequent successful execution).
- Scope: Model-agnostic or model-specific, tool-specific or general.
The schema is JSON-based and designed to be lightweight. Agents can propose KUs during execution without blocking their primary task. The KU is submitted to a shared repository, validated by other agents or human reviewers, and indexed for retrieval.
Architecture: Proposal, Validation, Retrieval
Cq avoids a centralized bottleneck by separating proposal, validation, and retrieval into distinct flows.
Proposal Flow
- Agent encounters a failure (tool call returns 500, parsing fails, timeout).
- Agent captures context: input, state, error message, tool version.
- Agent generates a candidate KU and submits it to the KU repository (local or remote).
- Submission is asynchronous. The agent continues execution.
Validation Flow
- Candidate KUs enter a validation queue.
- Validators (other agents, human reviewers, or automated test harnesses) attempt to reproduce the failure and confirm the resolution.
- Validated KUs are promoted to the active index. Invalid KUs are flagged or discarded.
- Validation can be distributed. Multiple validators can vote on a KU’s correctness.
Retrieval Flow
- Agent encounters a new failure.
- Agent queries the KU index with the error signature, tool name, and context hash.
- Index returns ranked KUs by relevance (exact match, partial match, similar context).
- Agent applies the resolution from the top-ranked KU.
- If the resolution works, the agent upvotes the KU. If it fails, the agent downvotes and optionally proposes a new KU.
The retrieval layer can be a vector database (embedding-based similarity search) or a traditional index (keyword and tag-based). The choice depends on scale and latency requirements.
Preventing KU Pollution
Three failure modes threaten the KU repository:
| Risk | Mitigation |
|---|---|
| Bad advice | Validation queue with multi-agent voting. KUs require N confirmations before promotion. |
| Adversarial submissions | Rate limiting per agent identity. Cryptographic signing of KUs. Reputation scoring for proposers. |
| Model-specific quirks | KUs tagged with model family and version. Retrieval filters by agent’s model. Generalization score based on cross-model validation. |
The validation queue is the critical choke point. If validation is too slow, agents ignore the KU system. If validation is too permissive, bad KUs pollute the index. Mozilla’s design allows tunable thresholds: require 3 validations for general KUs, 1 for model-specific KUs, 5 for high-impact resolutions (security, data integrity).
Example: API Rate Limit KU
An agent calls a third-party API and receives a 429 response. It proposes a KU:
{
"ku_id": "ku-2026-03-23-001",
"trigger": {
"error_type": "HTTPError",
"status_code": 429,
"tool": "weather_api_v2",
"message": "Rate limit exceeded"
},
"context": {
"request_rate": "10 req/sec",
"headers": {
"X-RateLimit-Remaining": "0",
"Retry-After": "60"
}
},
"resolution": {
"action": "exponential_backoff",
"parameters": {
"initial_delay": 60,
"max_retries": 3,
"backoff_multiplier": 2
}
},
"validation": {
"confirmed_by": ["agent-b", "agent-c"],
"success_count": 12,
"failure_count": 0
},
"scope": {
"model_agnostic": true,
"tool_specific": true
}
}
Another agent queries the KU index when it hits the same 429 error. It retrieves this KU, applies the exponential backoff, and succeeds. It upvotes the KU, increasing its rank.
Deployment Shape
Cq can run in three modes:
- Local repository: Each agent maintains its own KU store. No sharing. Useful for single-agent systems or air-gapped environments.
- Team repository: Agents within an organization share a KU store. Validation is internal. Useful for enterprise deployments.
- Public repository: Open KU index. Any agent can propose and retrieve. Validation is crowdsourced. Useful for open-source agent frameworks.
The public repository introduces trust and moderation challenges. Mozilla’s design includes a reputation system: agents earn trust by proposing validated KUs and lose trust by submitting bad KUs. High-reputation agents can fast-track validation.
Observability and Failure Modes
Key metrics for a Cq deployment:
- KU proposal rate: How often agents encounter novel failures.
- Validation latency: Time from proposal to promotion.
- Retrieval hit rate: Percentage of failures resolved by existing KUs.
- False positive rate: KUs that claim to resolve a failure but don’t.
- Upvote/downvote ratio: Community signal on KU quality.
Failure modes:
- Validation backlog: Too many proposals, not enough validators. Agents ignore the system.
- Stale KUs: Tool versions change, KUs become obsolete. Requires TTL or versioning.
- Overfitting: KUs too specific to one agent’s environment. Retrieval returns irrelevant results.
The system needs a garbage collection layer: prune KUs with low upvotes, expire KUs older than N days, archive KUs for deprecated tools.
Security Boundaries
KUs can leak sensitive information. An agent might propose a KU that includes API keys, internal hostnames, or proprietary logic. Mozilla’s design requires:
- Sanitization: Strip secrets before submission. Use regex patterns, secret detection libraries, or LLM-based redaction.
- Access control: KUs tagged as internal are only visible to agents within the same organization.
- Audit log: Every KU proposal, validation, and retrieval is logged. Useful for forensics and compliance.
Adversarial agents can poison the KU index by submitting plausible but incorrect resolutions. The validation queue and reputation system mitigate this, but they don’t eliminate it. High-stakes deployments (financial, medical) should require human-in-the-loop validation for critical KUs. Financial agents proposing KUs about market data APIs or trading logic require stricter validation to prevent market-moving errors. A bad KU that tells an agent to ignore a rate limit on a trading API can cascade into compliance violations or execution failures during high-volume periods.
Technical Verdict
Use Cq when:
- You run multiple agents that hit the same external APIs or tools.
- You want to reduce token waste from repeated failures.
- You have a validation pipeline (automated tests, human reviewers, or multi-agent voting).
- You can tolerate eventual consistency (KUs take time to validate and propagate, typically 30 seconds to 5 minutes depending on validator availability).
Avoid Cq when:
- You run a single agent with unique, non-repeating tasks.
- Your agents operate in air-gapped environments with no shared state.
- You need sub-100ms failure resolution (KU retrieval adds 50-200ms overhead depending on index size and network latency).
- Your failure modes are too context-specific to generalize (one-off bugs, transient network issues).
The schema is the real contribution. Even if you don’t adopt Mozilla’s full stack, the KU structure is a useful pattern for encoding agent failures. You can store KUs in a local SQLite database, a Redis cache, or a vector store. The key is separating the failure signature (trigger plus context) from the resolution and making both machine-readable.