Git was built for merge conflicts in text files. It has no opinion about prompt templates, tool schemas, or the execution traces that agents produce. When you version an agent, you’re not just tracking code. You’re tracking configuration (system prompts, temperature, model ID), tool bindings (function schemas, API keys, retry policies), and runtime state (conversation history, tool call results, branching decisions).
Cognato AI is building a version control system specifically for agents. The pitch: treat agent memory like a Git repository. Push, pull, clone, and branch agent sessions. Swap models mid-task. Audit every action with cryptographic proof.
This is not a Git wrapper. It’s a protocol for serializing, versioning, and verifying agent memory across models and platforms.
What Does a Commit Mean for an Agent?
In Git, a commit is a snapshot of files. In agent version control, a commit is a snapshot of:
- Configuration: System prompt, model ID, temperature, max tokens, tool registry.
- Execution state: Conversation history, tool call results, intermediate outputs.
- Metadata: Timestamp, user ID, cost, latency, model version.
A single agent run might produce dozens of commits. Each tool call is a state transition. Each model response is a new snapshot. The commit graph is not a linear history of code changes. It’s a tree of execution paths.
Cognato’s Mach protocol wraps agents in a recording environment. Every step is hashed and linked, creating an immutable ledger. The result is a Git-like commit history for agent behavior, not just code.
Why Git Doesn’t Work
Git assumes text files and line-based diffs. Agents produce structured data: JSON tool calls, nested conversation turns, embeddings, and non-deterministic outputs. Here’s what breaks:
| Git Primitive | Agent Reality | Why It Fails |
|---|---|---|
| Line-based diff | Prompt templates with variable substitution | Changing a variable name looks like a full rewrite |
| Merge conflict | Two versions of a tool schema | No semantic merge: you can’t combine function signatures |
| Branch | A/B testing agent behavior | Branches diverge on runtime behavior, not code |
| Rollback | Reverting a bad prompt | You need to replay execution, not just restore text |
| Blame | Who changed this line? | Agents generate outputs; no human author to blame |
Git tracks intent through commit messages. Agents track intent through execution traces. You need to version the trace, not just the config.
Architecture: Mach Protocol
Mach is a protocol, not a hosted service. It defines how to serialize agent memory, hash execution steps, and push/pull sessions to a remote registry.
Core Components
-
Wrapper: Agents run inside a Mach-wrapped environment. The wrapper intercepts tool calls, model responses, and state transitions. It serializes each step as a commit.
-
Registry: A remote store for agent sessions. Think Docker Hub for agent memory. You push sessions, pull them, and branch from specific commits.
-
Cryptographic Ledger: Every commit is hashed and linked to its parent. The chain is immutable. You can verify that an agent’s execution matches its claimed history.
-
Model Swapping: Pull a session, change the model ID, and resume. The conversation history and tool bindings stay intact. The new model picks up where the old one left off.
Example Workflow
# Clone an existing agent session
$ mach clone origin/session_29a
Cloning into 'working-dir'...
remote: Enumerating agent states: 45, done.
Receiving state: 100% (45/45), done.
✓ Switched to branch 'gemini-handoff'
# Inspect the commit history
$ mach log
commit a3f8b2c (HEAD -> gemini-handoff)
Author: claude-3.5-sonnet
Date: 2026-06-07 10:41:00
Message: Generated UI mockup, hit rate limit
commit 7d4e1a9
Author: claude-3.5-sonnet
Date: 2026-06-07 10:38:15
Message: Analyzed requirements, called search_docs tool
# Swap to a different model
$ mach checkout -b feature/auth-gen
$ mach config set model gemini-1.5-pro
$ mach resume
Resuming agent task from commit a3f8b2c...
Model: gemini-1.5-pro
The session state includes conversation history, tool call results, and intermediate outputs. The new model sees the same context window as the old one.
Diff Semantics for Prompts
Git diffs show line changes. Agent diffs need to show semantic changes: variable substitutions, tool schema updates, and behavior shifts.
Mach doesn’t expose a diff command yet, but the protocol supports it. A semantic diff for prompts would compare:
- Variable bindings: Did the user ID change? Did the context window shrink?
- Tool availability: Was a tool added, removed, or updated?
- Model config: Did temperature or max tokens change?
This is closer to infrastructure-as-code diffing (Terraform, Pulumi) than Git. You’re comparing configuration, not text.
Rollback and Replay
Rollback in Git restores files. Rollback in agent version control restores execution state and replays from a checkpoint.
If an agent hallucinates and deletes critical code, you:
- Checkout the session state before the error.
- Resume execution with a different prompt or model.
- Verify the new output before committing.
This requires deterministic replay. The agent must produce the same tool calls and outputs given the same inputs. LLMs are non-deterministic, so Mach relies on:
- Fixed seeds: Set temperature to 0 and seed the random number generator.
- Cached responses: Store model outputs in the commit. Replay uses cached responses instead of calling the model again.
Replay is not perfect. If you change the model or prompt, behavior diverges. But you can replay the tool calls and inspect the state at each step.
Branching for A/B Testing
Git branches let you work on features in parallel. Agent branches let you test different behaviors in parallel.
Use case: You have a research agent that generates a session tree. You want to test two different summarization strategies. You:
- Branch from the research commit.
- Run one branch with Claude, one with Gemini.
- Compare the outputs and merge the winner.
Merging is not automatic. You can’t merge two prompt templates or tool sets without runtime testing. But you can compare execution traces and pick the branch that performed better.
Multi-Agent Orchestration
Mach supports multi-agent workflows. A specialized research agent generates a session tree. Independent coder agents branch from specific commits and build implementations in parallel.
Each agent runs in its own wrapper. Each agent pushes its session to the registry. The orchestrator pulls sessions, inspects commit histories, and decides which branches to merge.
This is closer to CI/CD than Git. You’re orchestrating execution, not just tracking code.
Observability and Audit
Every agent step is hashed and linked. The chain is immutable. You can verify that an agent’s execution matches its claimed history.
This is critical for enterprise deployments. If an agent makes a mistake, you need to audit its reasoning. Mach provides:
- Execution trace: Every tool call, model response, and state transition.
- Cryptographic proof: Hash chains verify that the trace is unmodified.
- Blame: Track which model and prompt produced each output.
This overlaps with observability tools (LangSmith, Helicone, Agent Trace RFC), but Mach focuses on versioning and audit, not just logging.
Deployment Shape
Mach is a protocol, not a hosted service. You can run it locally or deploy a registry to your own infrastructure. The architecture is:
- Client: CLI tool that wraps agents and serializes state.
- Registry: Remote store for sessions (S3, GCS, or a custom backend).
- Verifier: Optional service that validates hash chains and enforces access control.
The client is open source. The registry is bring-your-own-storage. The verifier is optional (useful for enterprises that need audit trails).
Failure Modes
| Risk | Impact | Mitigation |
|---|---|---|
| Non-deterministic replay | Rollback produces different outputs | Use temperature=0, cache responses, or accept divergence |
| Large session size | Pushing/pulling is slow | Compress conversation history, prune old commits |
| Model API changes | Tool schemas break across versions | Version tool schemas separately, test before swapping |
| Merge conflicts | Two branches modify the same tool | No automatic merge; require manual testing |
| Registry downtime | Can’t push/pull sessions | Cache locally, retry with exponential backoff |
The biggest risk is non-determinism. LLMs are probabilistic. Even with temperature=0, outputs can vary across API versions or hardware. Mach mitigates this by caching responses, but replay is not guaranteed.
Technical Verdict
Use Mach when:
- You’re deploying multiple agent versions in production and need rollback.
- You’re A/B testing agent behavior and need to compare execution traces.
- You’re building multi-agent systems and need to hand off context between models.
- You need audit trails for enterprise compliance.
Avoid Mach when:
- You’re prototyping a single agent and don’t need version control yet.
- You’re using a hosted agent platform (OpenAI Assistants, Anthropic Claude) that doesn’t expose execution state.
- You need real-time collaboration (Mach is async; sessions are pushed/pulled, not live-edited).
- You’re building deterministic workflows (traditional CI/CD is simpler).
Mach is early. The protocol is defined, but tooling is minimal. The open-source CLI is available, but the registry and verifier are not yet public. This is infrastructure for the agentic era, but it’s still being built.