Claude Code vs. Shell Scripts: When Mechanical Edits Beat Agent-Driven Refactors

The current tooling mood says reach for Claude Code when you need to touch hundreds of files. The practical answer is more boring: if the change is truly mechanical, you want a script, not an agent making judgment calls.

This is not about agent capability. It is about what makes a diff reviewable, what makes a rollback safe, and what happens when non-deterministic tooling meets a 4,000-file changeset.

The Wrong Decision Variable

Most teams pick tools based on scale. Small change, maybe manual. Large change, definitely use an agent. That logic inverts the actual risk surface.

A change across 4,000 files can be safer with a script than a change across 40 files if the large one follows one deterministic rule and the small one requires understanding local semantics.

The deciding variable is not file count. It is uniformity.

Script-shaped changes

These are transformations you can express like compiler passes:

Rename one namespace to another everywhere
Replace one config key with a new key
Rewrite import paths from one package entrypoint to another
Swap a deprecated helper for its replacement
Update a function signature with a new parameter order

The pattern: one rule, applied mechanically, with no context-dependent interpretation.

Agent-shaped changes

These require local reasoning:

Refactor a component API where each call site has different prop combinations
Migrate from one state management pattern to another where local logic varies
Update error handling where each catch block does something different
Rewrite tests where assertions depend on what the test is actually checking

The pattern: the transformation rule changes based on what the code is doing.

What Makes a Diff Reviewable

When you generate a 300-file diff with Claude Code, the reviewer faces a hidden cost: they must verify that the agent understood the context correctly in every single location.

With a script-generated diff, the reviewer verifies the script once, then scans for exceptions. The cognitive load is inverted.

Determinism as a review primitive

A script gives you:

Identical output on re-run: same input, same diff, every time
Line-by-line predictability: if the script changes line 47 in file A, you know what it will do to line 47 in file B
Auditable logic: the transformation rule is in 50 lines of bash or Python, not buried in a prompt chain

An agent gives you:

Contextual adaptation: it can handle edge cases you did not anticipate
Semantic understanding: it knows when a rename would break something
Non-deterministic output: same prompt, different run, potentially different diff

The second list sounds better until you are reviewing 800 changed files and cannot tell if a deviation is intelligent adaptation or a hallucination.

Blast Radius and Rollback Guarantees

Scripts are boring. That is the feature.

If a script-based migration breaks production, you know exactly what it did. You can write the inverse script in 10 minutes. You can re-run the original script with a filter to exclude problem files.

If an agent-based migration breaks production, you are debugging 300 judgment calls. Which files did it interpret differently? Where did it make a semantic leap? What was the prompt state when it processed the broken file?

Idempotency matters

A good migration script is idempotent. You can run it twice. You can run it on a subset. You can run it, revert, fix one file manually, and run it again.

An agent-driven change is harder to make idempotent because the agent is optimizing for “do the right thing” not “do the same thing twice.”

Decision Framework

Property	Script	Agent
Transformation rule	Single deterministic pattern	Context-dependent interpretation
Review cost	Verify script once, scan for exceptions	Verify every file individually
Re-run guarantee	Identical output	Potentially different output
Rollback	Write inverse script	Manual or partial revert
Failure mode	Breaks uniformly (easy to spot)	Breaks inconsistently (hard to spot)
Edge case handling	Explicit (you write the logic)	Implicit (agent infers)
Audit trail	Script source code	Prompt + model version + run context

When Agents Earn Their Place

Agents are not wrong for large changes. They are wrong for uniform large changes.

Use Claude Code when:

The transformation rule depends on what the code does, not just what it says
You need to handle 15 different edge cases and writing explicit logic for each is worse than letting the agent infer
The change is exploratory and you are still figuring out the pattern
You are willing to trade determinism for semantic correctness

Use a script when:

You can describe the change as a single find-and-replace rule (even if it is a complex regex)
The diff needs to be reviewable by someone who did not write the transformation
You need to re-run the change on a subset of files
Rollback needs to be trivial

Implementation: Script-First, Agent-Second

The practical pattern is to start with a script and escalate to an agent only when you hit semantic complexity.

Step 1: Write the dumbest script that could work

#!/bin/bash
# Rename old API to new API across all TypeScript files

find src -name "*.ts" -type f -exec sed -i '' \
  's/oldApiMethod(/newApiMethod(/g' {} \;

Step 2: Run it on a sample

Pick 10 representative files. Run the script. Review the diff. If it works, you are done. If it breaks, you learned where the uniformity assumption fails.

Step 3: Add edge case handling to the script

import re
from pathlib import Path

def transform_file(path):
    content = path.read_text()
    
    # Handle the common case
    content = re.sub(
        r'oldApiMethod\(',
        r'newApiMethod(',
        content
    )
    
    # Handle the edge case where options object is passed
    content = re.sub(
        r'oldApiMethod\(\{([^}]+)\}\)',
        r'newApiMethod({ \1, newFlag: true })',
        content
    )
    
    path.write_text(content)

for ts_file in Path('src').rglob('*.ts'):
    transform_file(ts_file)

Step 4: Escalate to agent only if script logic becomes unmaintainable

If you are writing 200 lines of conditional logic to handle semantic variations, that is the signal to switch to Claude Code. Not file count. Not diff size. Semantic complexity.

Observability for Script-Based Migrations

Scripts give you free observability because they are just code.

Log every file touched:

import logging

logging.basicConfig(level=logging.INFO)

for ts_file in Path('src').rglob('*.ts'):
    logging.info(f"Processing {ts_file}")
    transform_file(ts_file)
    logging.info(f"Completed {ts_file}")

Track exceptions:

failed_files = []

for ts_file in Path('src').rglob('*.ts'):
    try:
        transform_file(ts_file)
    except Exception as e:
        logging.error(f"Failed on {ts_file}: {e}")
        failed_files.append(ts_file)

if failed_files:
    logging.error(f"Failed files: {failed_files}")

Generate a summary report:

print(f"Processed {len(list(Path('src').rglob('*.ts')))} files")
print(f"Failed: {len(failed_files)}")
print(f"Success rate: {(1 - len(failed_files) / total) * 100:.1f}%")

With an agent, you get a chat transcript. With a script, you get structured logs, error counts, and a deterministic audit trail.

The Hidden Cost of Agent Flexibility

Agent-driven refactors feel faster because you skip the “write the script” step. But you pay for that speed in review time and debugging time.

A 10-line script takes 20 minutes to write and 2 minutes to review. An agent prompt takes 2 minutes to write and 60 minutes to review because every file is a potential judgment call.

The flexibility is real. The cost is also real. Choose based on whether you need that flexibility or whether uniformity is the actual goal.

Technical Verdict

Use scripts when:

The transformation is uniform across all files
You need deterministic output for review and rollback
The change is idempotent and you might need to re-run it
Blast radius is high and you want predictable failure modes

Use agents when:

The transformation depends on local code semantics
You are exploring the pattern and do not know the edge cases yet
Writing explicit logic for every case is harder than letting the agent infer
You are willing to trade determinism for contextual correctness

Avoid agents when:

The diff will be reviewed by someone other than you
Rollback needs to be trivial
You need to run the change on subsets of files
The transformation rule is simple enough to express as a regex or AST visitor

The boring answer is usually right: if it is mechanical, write a script. If it requires understanding, use an agent. The mistake is using an agent because the change is large, not because it is complex.

Source Links

Claude Code or a script? Depends on what kind of change you’re making