Agentic coding tools make hundreds of micro-decisions per session. One of the most consequential is whether to import a library or write a function from scratch. A new study protocol from researchers at multiple institutions formalizes this problem space and proposes controlled experiments to measure how configuration mechanisms influence build-versus-buy behavior in Claude Code and OpenAI Codex. The protocol has not yet been executed. It defines the experimental framework for future measurement.
Note: OpenAI Codex was deprecated in January 2023. The paper abstract lists it as a target tool, so teams should verify the actual tools tested when the study executes or consult the full PDF for clarifications.
This choice determines your dependency graph, supply chain exposure, bundle size, and the shape of technical debt you inherit. The protocol exposes the plumbing layer where agent autonomy meets infrastructure risk.
The Decision Boundary
When an agent encounters a task like “parse this CSV” or “validate this email address,” it evaluates two paths:
- Import a library (e.g.,
csv,email-validator) - Write a function inline
The choice depends on:
- Context window contents (existing imports, project structure)
- Configuration files (
.cursorrules,.clinerules, system prompts) - Tool access (MCP servers, package registries, documentation)
- Implicit heuristics (token cost, perceived complexity, training data bias)
The study protocol identifies five configuration layers:
| Configuration Layer | Mechanism | Influence Scope |
|---|---|---|
| No configuration | Default agent behavior | Baseline measurement |
| Context files | Soft preferences in markdown | Project-level guidance |
| Explicit prohibitions | Hard rules in config | Dependency blocklist |
| Skills | Discoverable instructions | Reusable patterns |
| MCP-enabled discovery | Live package search tools | Real-time library evaluation |
Each layer changes the agent’s decision surface. The protocol aims to measure how much.
Why This Matters Now
Coding agents already ship production code at scale. Teams discover the consequences of agent-driven dependency choices only after deployment:
- Supply chain risk: Agents may import unmaintained or malicious packages because they appear in training data or search results.
- Licensing violations: Agents do not parse license files. They import based on functionality match.
- Bundle bloat: Importing a 500 KB library to use one function is invisible to the agent.
- Phantom dependencies: Agents may hallucinate imports that do not exist, then implement them inline when the import fails.
The study protocol treats these outcomes as observable, measurable phenomena tied to configuration state.
Configuration Mechanisms in Practice
The protocol is designed to test five configuration strategies across staged programming tasks. Each task includes an identifiable build-versus-buy decision point.
1. No Configuration
The agent operates with default behavior. This establishes a baseline for how often it imports versus implements.
2. Context Files
A markdown file in the project root contains preferences:
# Project Guidelines
Prefer standard library functions over third-party dependencies.
Avoid packages with fewer than 1,000 weekly downloads.
This is a soft constraint. The agent can ignore it.
3. Explicit Prohibitions
A structured config file enforces hard rules:
{
"dependencies": {
"blocklist": ["lodash", "moment"],
"allowlist": ["date-fns", "ramda"]
}
}
The agent must respect this or fail the task.
4. Skills
Skills are reusable instructions the agent can discover and apply. A skill might encode:
# Skill: Minimal Dependency Strategy
When implementing date manipulation:
1. Check if native Date methods suffice
2. If not, prefer date-fns over moment
3. Import only the specific function, not the entire package
Skills sit between soft preferences and hard rules. They provide reasoning scaffolding.
5. MCP-Enabled Library Discovery
The Model Context Protocol (MCP) is a standard for connecting AI systems to external data sources and tools. In this configuration layer, the agent has access to an MCP server that queries package registries in real time. It can:
- Search npm, PyPI, or crates.io
- Retrieve download stats, last update date, and license
- Compare multiple packages before choosing
This configuration layer gives the agent the most information but also the most autonomy.
Observability and Audit Trail
The protocol specifies three measurement categories:
- Decision frequency: How often does the agent import versus implement?
- Configuration adherence: Does the agent follow the rules you set?
- Outcome quality: Does the code work, pass tests, and meet non-functional requirements (security, performance)?
Each task will generate:
- A commit with the agent’s code
- A decision log (JSON with timestamp, tool name, considered packages, selection rationale, and configuration state)
- A dependency manifest diff
The decision log exposes the agent’s reasoning chain, which is otherwise opaque. This is the only artifact that prevents configuration drift from becoming invisible.
Failure Modes
The study protocol anticipates several failure patterns. Each maps to specific configuration layers that might prevent it:
- Configuration drift: The agent starts respecting rules, then ignores them as context grows. (Prevented by explicit prohibitions and validation layers)
- Hallucinated constraints: The agent invents restrictions that do not exist in the config. (Detected through decision log analysis)
- Over-implementation: The agent writes complex code to avoid a simple import. (Addressed by context files and skills that define acceptable dependencies)
- Under-implementation: The agent imports a library for trivial functionality. (Prevented by MCP-enabled discovery that surfaces implementation complexity)
- License-agnostic import behavior: The agent imports GPL code into a proprietary project. (Blocked by explicit prohibitions or MCP servers that filter by license)
These patterns represent testable hypotheses within the experimental design.
Architecture: Configuration Injection Points
To control build-versus-buy decisions, you need to inject configuration at the right layer. Here is the conceptual flow (not a specific tool implementation):
User Task
|
v
Agent Orchestrator
|
v
Context Assembly (project files, config, skills)
|
v
LLM Inference (with tool access)
|
v
Tool Call: Import or Implement?
|
v
Dependency Resolution (MCP, package manager)
|
v
Code Generation
|
v
Validation (tests, linters, security scan)
|
v
Commit
Configuration can enter at three points:
- Context Assembly: Static files, rules, and preferences
- Tool Access: MCP servers that filter or rank packages
- Validation: Post-generation checks that reject unwanted dependencies
The protocol will test all three.
When to Intervene
Not every build-versus-buy decision needs human oversight. The protocol suggests intervention thresholds:
- High-risk dependencies: Cryptography, authentication, payment processing
- License-sensitive projects: GPL, AGPL, or proprietary restrictions
- Performance-critical paths: Bundle size, cold start time, memory footprint
- Long-lived codebases: Maintenance burden of custom implementations
For low-risk, short-lived scripts, agent autonomy is fine. For production infrastructure, you need guardrails.
Technical Verdict
Relevant when:
- You deploy agentic coding tools in production environments
- You need to audit or explain dependency choices
- You manage supply chain risk or licensing compliance
- You want to measure the impact of configuration on agent behavior
Less relevant when:
- You are prototyping or exploring ideas
- You trust the agent’s default behavior for your use case
- You lack the infrastructure to log and review agent decisions
- You do not have a dependency policy to encode
The study protocol is a research design, not a deployable product. But it formalizes a problem that every team using agentic coding tools will encounter: how do you control what the agent imports? The answer is configuration, observability, and validation at multiple layers. The protocol shows you where to instrument.