The current tooling mood says reach for Claude Code when you need to touch hundreds of files. The practical answer is more boring: if the change is truly mechanical, you want a script, not an agent making judgment calls.
This is not about agent capability. It is about what makes a diff reviewable, what makes a rollback safe, and what happens when non-deterministic tooling meets a 4,000-file changeset.
The Wrong Decision Variable
Most teams pick tools based on scale. Small change, maybe manual. Large change, definitely use an agent. That logic inverts the actual risk surface.
A change across 4,000 files can be safer with a script than a change across 40 files if the large one follows one deterministic rule and the small one requires understanding local semantics.
The deciding variable is not file count. It is uniformity.
Script-shaped changes
These are transformations you can express like compiler passes:
- Rename one namespace to another everywhere
- Replace one config key with a new key
- Rewrite import paths from one package entrypoint to another
- Swap a deprecated helper for its replacement
- Update a function signature with a new parameter order
The pattern: one rule, applied mechanically, with no context-dependent interpretation.
Agent-shaped changes
These require local reasoning:
- Refactor a component API where each call site has different prop combinations
- Migrate from one state management pattern to another where local logic varies
- Update error handling where each catch block does something different
- Rewrite tests where assertions depend on what the test is actually checking
The pattern: the transformation rule changes based on what the code is doing.
What Makes a Diff Reviewable
When you generate a 300-file diff with Claude Code, the reviewer faces a hidden cost: they must verify that the agent understood the context correctly in every single location.
With a script-generated diff, the reviewer verifies the script once, then scans for exceptions. The cognitive load is inverted.
Determinism as a review primitive
A script gives you:
- Identical output on re-run: same input, same diff, every time
- Line-by-line predictability: if the script changes line 47 in file A, you know what it will do to line 47 in file B
- Auditable logic: the transformation rule is in 50 lines of bash or Python, not buried in a prompt chain
An agent gives you:
- Contextual adaptation: it can handle edge cases you did not anticipate
- Semantic understanding: it knows when a rename would break something
- Non-deterministic output: same prompt, different run, potentially different diff
The second list sounds better until you are reviewing 800 changed files and cannot tell if a deviation is intelligent adaptation or a hallucination.
Blast Radius and Rollback Guarantees
Scripts are boring. That is the feature.
If a script-based migration breaks production, you know exactly what it did. You can write the inverse script in 10 minutes. You can re-run the original script with a filter to exclude problem files.
If an agent-based migration breaks production, you are debugging 300 judgment calls. Which files did it interpret differently? Where did it make a semantic leap? What was the prompt state when it processed the broken file?
Idempotency matters
A good migration script is idempotent. You can run it twice. You can run it on a subset. You can run it, revert, fix one file manually, and run it again.
An agent-driven change is harder to make idempotent because the agent is optimizing for “do the right thing” not “do the same thing twice.”
Decision Framework
| Property | Script | Agent |
|---|---|---|
| Transformation rule | Single deterministic pattern | Context-dependent interpretation |
| Review cost | Verify script once, scan for exceptions | Verify every file individually |
| Re-run guarantee | Identical output | Potentially different output |
| Rollback | Write inverse script | Manual or partial revert |
| Failure mode | Breaks uniformly (easy to spot) | Breaks inconsistently (hard to spot) |
| Edge case handling | Explicit (you write the logic) | Implicit (agent infers) |
| Audit trail | Script source code | Prompt + model version + run context |
When Agents Earn Their Place
Agents are not wrong for large changes. They are wrong for uniform large changes.
Use Claude Code when:
- The transformation rule depends on what the code does, not just what it says
- You need to handle 15 different edge cases and writing explicit logic for each is worse than letting the agent infer
- The change is exploratory and you are still figuring out the pattern
- You are willing to trade determinism for semantic correctness
Use a script when:
- You can describe the change as a single find-and-replace rule (even if it is a complex regex)
- The diff needs to be reviewable by someone who did not write the transformation
- You need to re-run the change on a subset of files
- Rollback needs to be trivial
Implementation: Script-First, Agent-Second
The practical pattern is to start with a script and escalate to an agent only when you hit semantic complexity.
Step 1: Write the dumbest script that could work
#!/bin/bash
# Rename old API to new API across all TypeScript files
find src -name "*.ts" -type f -exec sed -i '' \
's/oldApiMethod(/newApiMethod(/g' {} \;
Step 2: Run it on a sample
Pick 10 representative files. Run the script. Review the diff. If it works, you are done. If it breaks, you learned where the uniformity assumption fails.
Step 3: Add edge case handling to the script
import re
from pathlib import Path
def transform_file(path):
content = path.read_text()
# Handle the common case
content = re.sub(
r'oldApiMethod\(',
r'newApiMethod(',
content
)
# Handle the edge case where options object is passed
content = re.sub(
r'oldApiMethod\(\{([^}]+)\}\)',
r'newApiMethod({ \1, newFlag: true })',
content
)
path.write_text(content)
for ts_file in Path('src').rglob('*.ts'):
transform_file(ts_file)
Step 4: Escalate to agent only if script logic becomes unmaintainable
If you are writing 200 lines of conditional logic to handle semantic variations, that is the signal to switch to Claude Code. Not file count. Not diff size. Semantic complexity.
Observability for Script-Based Migrations
Scripts give you free observability because they are just code.
Log every file touched:
import logging
logging.basicConfig(level=logging.INFO)
for ts_file in Path('src').rglob('*.ts'):
logging.info(f"Processing {ts_file}")
transform_file(ts_file)
logging.info(f"Completed {ts_file}")
Track exceptions:
failed_files = []
for ts_file in Path('src').rglob('*.ts'):
try:
transform_file(ts_file)
except Exception as e:
logging.error(f"Failed on {ts_file}: {e}")
failed_files.append(ts_file)
if failed_files:
logging.error(f"Failed files: {failed_files}")
Generate a summary report:
print(f"Processed {len(list(Path('src').rglob('*.ts')))} files")
print(f"Failed: {len(failed_files)}")
print(f"Success rate: {(1 - len(failed_files) / total) * 100:.1f}%")
With an agent, you get a chat transcript. With a script, you get structured logs, error counts, and a deterministic audit trail.
The Hidden Cost of Agent Flexibility
Agent-driven refactors feel faster because you skip the “write the script” step. But you pay for that speed in review time and debugging time.
A 10-line script takes 20 minutes to write and 2 minutes to review. An agent prompt takes 2 minutes to write and 60 minutes to review because every file is a potential judgment call.
The flexibility is real. The cost is also real. Choose based on whether you need that flexibility or whether uniformity is the actual goal.
Technical Verdict
Use scripts when:
- The transformation is uniform across all files
- You need deterministic output for review and rollback
- The change is idempotent and you might need to re-run it
- Blast radius is high and you want predictable failure modes
Use agents when:
- The transformation depends on local code semantics
- You are exploring the pattern and do not know the edge cases yet
- Writing explicit logic for every case is harder than letting the agent infer
- You are willing to trade determinism for contextual correctness
Avoid agents when:
- The diff will be reviewed by someone other than you
- Rollback needs to be trivial
- You need to run the change on subsets of files
- The transformation rule is simple enough to express as a regex or AST visitor
The boring answer is usually right: if it is mechanical, write a script. If it requires understanding, use an agent. The mistake is using an agent because the change is large, not because it is complex.