Better models don’t mean looser workflows. When coding agents move from single-file edits to multi-repo orchestration, capability increases expose new failure modes: context leakage, token waste, and ownership boundary violations. The solution is not more autonomy. It’s stricter workflow primitives.
The Context Explosion Problem
A coding agent working across multiple repositories can theoretically consume unlimited context. It can read every file in every repo, follow every import, trace every API boundary. This sounds powerful until you measure the cost.
What happens without boundaries:
- A single task balloons from 10K tokens to 200K tokens because the agent pulls in tangentially related files
- Cross-repo changes violate architectural boundaries (backend agent modifies frontend state management)
- Token costs become unpredictable and unattributable
- Long sessions drift from the original task scope
The problem is not the model. The problem is the absence of workflow constraints that enforce context discipline.
Workflow Primitives for Multi-Repo Agents
Production use of coding agents across repositories requires explicit primitives that limit scope, enforce ownership, and track resource consumption.
1. Context Scopes
Define what the agent can see before it starts work.
# handover.yaml
task: "Add rate limiting to user API"
context_scope:
repos:
- name: "backend-api"
paths: ["src/routes/user.ts", "src/middleware/"]
exclude: ["src/routes/admin.ts"]
max_files: 15
max_tokens: 50000
ownership:
primary_repo: "backend-api"
allowed_cross_repo_reads: ["shared-types"]
allowed_cross_repo_writes: []
This scope declaration prevents the agent from wandering into unrelated code. It also makes token budgets predictable.
2. Token Budgets Per Task
Enforce hard limits on token consumption for each discrete task. Track both input tokens (context) and output tokens (generated code).
| Budget Type | Limit | Enforcement Point | Failure Behavior |
|---|---|---|---|
| Context read | 50K tokens | Pre-execution | Reject task, request narrower scope |
| Code generation | 10K tokens | Mid-execution | Stop generation, return partial result |
| Tool calls | 20K tokens | Per-tool invocation | Block additional tool use |
| Session total | 100K tokens | Cumulative | Force handover to new session |
When an agent hits a budget limit, the workflow stops. The human reviews the partial result and either approves continuation with a higher budget or reshapes the task.
3. Approval Gates for Cross-Repo Changes
Any change that touches multiple repositories requires explicit approval before execution.
# workflow_engine.py
class MultiRepoGuard:
def __init__(self, primary_repo: str):
self.primary_repo = primary_repo
self.pending_changes = []
def propose_change(self, repo: str, file_path: str, diff: str):
if repo != self.primary_repo:
self.pending_changes.append({
"repo": repo,
"file": file_path,
"diff": diff,
"status": "pending_approval"
})
return False # Block execution
return True # Allow same-repo change
def require_human_approval(self) -> bool:
return len(self.pending_changes) > 0
This gate prevents agents from making architectural decisions that span ownership boundaries. A backend agent should not modify frontend routing logic without explicit approval.
4. Handover Documents
When a task completes, the agent produces a structured handover document that the next agent (or human) consumes. This replaces the anti-pattern of one endless chat session.
# Handover: Rate Limiting Implementation
## What was done
- Added rate limit middleware to user API routes
- Configured Redis-backed rate limiter (100 req/min per user)
- Updated route registration in `src/routes/index.ts`
## Token consumption
- Context read: 42,000 tokens
- Code generation: 8,500 tokens
- Tool calls (tests): 3,200 tokens
- Total: 53,700 tokens
## Cross-repo impact
- None. All changes in `backend-api` repo.
## Next task dependencies
- Frontend needs to handle 429 responses
- Monitoring dashboard should track rate limit hits
The handover makes token costs visible and forces explicit task boundaries.
Measuring and Attributing Token Costs
When a single agent task spans multiple repos and model calls, you need instrumentation that attributes costs to the right task and repo.
# token_tracker.py
class TokenTracker:
def __init__(self, task_id: str):
self.task_id = task_id
self.spans = []
def start_span(self, repo: str, operation: str):
return TokenSpan(self.task_id, repo, operation)
def report(self):
by_repo = {}
for span in self.spans:
if span.repo not in by_repo:
by_repo[span.repo] = {"input": 0, "output": 0}
by_repo[span.repo]["input"] += span.input_tokens
by_repo[span.repo]["output"] += span.output_tokens
return {
"task_id": self.task_id,
"total_tokens": sum(r["input"] + r["output"] for r in by_repo.values()),
"by_repo": by_repo
}
This tracking reveals which repositories consume the most tokens and which tasks are inefficient. It also enables cost allocation when multiple teams share agent infrastructure.
Failure Modes Without Discipline
| Failure Mode | Symptom | Root Cause |
|---|---|---|
| Context drift | Agent rewrites unrelated code | No scope boundaries |
| Token explosion | Single task costs $50+ | No budget enforcement |
| Architectural violations | Backend agent modifies frontend | No ownership gates |
| Unattributable costs | Cannot trace spend to tasks | No instrumentation |
| Session bloat | 500K token conversations | No handover forcing |
Each of these failures becomes more likely as model capability increases. Smarter models can follow more connections, read more files, and generate more code. Without workflow constraints, they will.
Architecture: Agentic OS Pattern
The Agentic OS pattern structures multi-repo agent work around lanes, specs, tickets, and handovers.
Key components:
- Lanes: Separate planning, execution, and verification work
- Specs: Define task scope before agent execution
- Tickets: Decompose large work into budget-constrained units
- Handovers: Force explicit context boundaries between tasks
- Project memory: Store decisions and context outside of chat sessions
This structure treats the agent as a stateless executor that consumes a handover and produces a result. It does not accumulate context across tasks.
When Stricter Workflows Matter
You need workflow discipline when:
- Agents operate across multiple repositories with different owners
- Token costs are significant enough to require attribution
- Architectural boundaries must be enforced programmatically
- Multiple agents or humans collaborate on the same product
- Tasks span days or weeks (long-running context is expensive)
You can skip this overhead when:
- Working in a single repository with clear boundaries
- Prototyping or exploratory work where cost is not a concern
- Tasks are small enough to fit in a single session (under 20K tokens)
- One human owns all the code and makes all decisions
Technical Verdict
Use stricter workflows when:
- You run coding agents in production across multiple repositories
- Token costs exceed $100/month per engineer
- Multiple teams share agent infrastructure
- You need to enforce architectural boundaries programmatically
Skip the overhead when:
- Prototyping in a single repo with one owner
- Token costs are negligible
- Tasks are small and self-contained
- You are exploring agent capabilities, not shipping production code
Smarter models do not eliminate the need for engineering discipline. They expose new failure modes that require explicit workflow primitives: context scopes, token budgets, approval gates, and handover documents. The goal is not to limit the agent. The goal is to make its behavior predictable and its costs attributable.