Build vs. Buy in Agentic Code: A Study Protocol for Configuration-Driven Dependency Decisions

Agentic coding tools make hundreds of micro-decisions per session. One of the most consequential is whether to import a library or write a function from scratch. A new study protocol from researchers at multiple institutions formalizes this problem space and proposes controlled experiments to measure how configuration mechanisms influence build-versus-buy behavior in Claude Code and OpenAI Codex. The protocol has not yet been executed. It defines the experimental framework for future measurement.

Note: OpenAI Codex was deprecated in January 2023. The paper abstract lists it as a target tool, so teams should verify the actual tools tested when the study executes or consult the full PDF for clarifications.

This choice determines your dependency graph, supply chain exposure, bundle size, and the shape of technical debt you inherit. The protocol exposes the plumbing layer where agent autonomy meets infrastructure risk.

The Decision Boundary

When an agent encounters a task like “parse this CSV” or “validate this email address,” it evaluates two paths:

Import a library (e.g., csv, email-validator)
Write a function inline

The choice depends on:

Context window contents (existing imports, project structure)
Configuration files (.cursorrules, .clinerules, system prompts)
Tool access (MCP servers, package registries, documentation)
Implicit heuristics (token cost, perceived complexity, training data bias)

The study protocol identifies five configuration layers:

Configuration Layer	Mechanism	Influence Scope
No configuration	Default agent behavior	Baseline measurement
Context files	Soft preferences in markdown	Project-level guidance
Explicit prohibitions	Hard rules in config	Dependency blocklist
Skills	Discoverable instructions	Reusable patterns
MCP-enabled discovery	Live package search tools	Real-time library evaluation

Each layer changes the agent’s decision surface. The protocol aims to measure how much.

Why This Matters Now

Coding agents already ship production code at scale. Teams discover the consequences of agent-driven dependency choices only after deployment:

Supply chain risk: Agents may import unmaintained or malicious packages because they appear in training data or search results.
Licensing violations: Agents do not parse license files. They import based on functionality match.
Bundle bloat: Importing a 500 KB library to use one function is invisible to the agent.
Phantom dependencies: Agents may hallucinate imports that do not exist, then implement them inline when the import fails.

The study protocol treats these outcomes as observable, measurable phenomena tied to configuration state.

Configuration Mechanisms in Practice

The protocol is designed to test five configuration strategies across staged programming tasks. Each task includes an identifiable build-versus-buy decision point.

1. No Configuration

The agent operates with default behavior. This establishes a baseline for how often it imports versus implements.

2. Context Files

A markdown file in the project root contains preferences:

# Project Guidelines

Prefer standard library functions over third-party dependencies.
Avoid packages with fewer than 1,000 weekly downloads.

This is a soft constraint. The agent can ignore it.

3. Explicit Prohibitions

A structured config file enforces hard rules:

{
  "dependencies": {
    "blocklist": ["lodash", "moment"],
    "allowlist": ["date-fns", "ramda"]
  }
}

The agent must respect this or fail the task.

4. Skills

Skills are reusable instructions the agent can discover and apply. A skill might encode:

# Skill: Minimal Dependency Strategy

When implementing date manipulation:
1. Check if native Date methods suffice
2. If not, prefer date-fns over moment
3. Import only the specific function, not the entire package

Skills sit between soft preferences and hard rules. They provide reasoning scaffolding.

5. MCP-Enabled Library Discovery

The Model Context Protocol (MCP) is a standard for connecting AI systems to external data sources and tools. In this configuration layer, the agent has access to an MCP server that queries package registries in real time. It can:

Search npm, PyPI, or crates.io
Retrieve download stats, last update date, and license
Compare multiple packages before choosing

This configuration layer gives the agent the most information but also the most autonomy.

Observability and Audit Trail

The protocol specifies three measurement categories:

Decision frequency: How often does the agent import versus implement?
Configuration adherence: Does the agent follow the rules you set?
Outcome quality: Does the code work, pass tests, and meet non-functional requirements (security, performance)?

Each task will generate:

A commit with the agent’s code
A decision log (JSON with timestamp, tool name, considered packages, selection rationale, and configuration state)
A dependency manifest diff

The decision log exposes the agent’s reasoning chain, which is otherwise opaque. This is the only artifact that prevents configuration drift from becoming invisible.

Failure Modes

The study protocol anticipates several failure patterns. Each maps to specific configuration layers that might prevent it:

Configuration drift: The agent starts respecting rules, then ignores them as context grows. (Prevented by explicit prohibitions and validation layers)
Hallucinated constraints: The agent invents restrictions that do not exist in the config. (Detected through decision log analysis)
Over-implementation: The agent writes complex code to avoid a simple import. (Addressed by context files and skills that define acceptable dependencies)
Under-implementation: The agent imports a library for trivial functionality. (Prevented by MCP-enabled discovery that surfaces implementation complexity)
License-agnostic import behavior: The agent imports GPL code into a proprietary project. (Blocked by explicit prohibitions or MCP servers that filter by license)

These patterns represent testable hypotheses within the experimental design.

Architecture: Configuration Injection Points

To control build-versus-buy decisions, you need to inject configuration at the right layer. Here is the conceptual flow (not a specific tool implementation):

User Task
    |
    v
Agent Orchestrator
    |
    v
Context Assembly (project files, config, skills)
    |
    v
LLM Inference (with tool access)
    |
    v
Tool Call: Import or Implement?
    |
    v
Dependency Resolution (MCP, package manager)
    |
    v
Code Generation
    |
    v
Validation (tests, linters, security scan)
    |
    v
Commit

Configuration can enter at three points:

Context Assembly: Static files, rules, and preferences
Tool Access: MCP servers that filter or rank packages
Validation: Post-generation checks that reject unwanted dependencies

The protocol will test all three.

When to Intervene

Not every build-versus-buy decision needs human oversight. The protocol suggests intervention thresholds:

High-risk dependencies: Cryptography, authentication, payment processing
License-sensitive projects: GPL, AGPL, or proprietary restrictions
Performance-critical paths: Bundle size, cold start time, memory footprint
Long-lived codebases: Maintenance burden of custom implementations

For low-risk, short-lived scripts, agent autonomy is fine. For production infrastructure, you need guardrails.

Technical Verdict

Relevant when:

You deploy agentic coding tools in production environments
You need to audit or explain dependency choices
You manage supply chain risk or licensing compliance
You want to measure the impact of configuration on agent behavior

Less relevant when:

You are prototyping or exploring ideas
You trust the agent’s default behavior for your use case
You lack the infrastructure to log and review agent decisions
You do not have a dependency policy to encode

The study protocol is a research design, not a deployable product. But it formalizes a problem that every team using agentic coding tools will encounter: how do you control what the agent imports? The answer is configuration, observability, and validation at multiple layers. The protocol shows you where to instrument.