Karajan v3: Orchestrating AI CLIs as Subprocesses Instead of API Calls

Most multi-agent orchestrators call /v1/messages or similar HTTP endpoints. Karajan v3 takes a different path: it treats existing AI CLI tools (Claude Code, Aider, Gemini, Codex, OpenCode) as subprocesses and coordinates them through stdio, exit codes, and a TDD-first pipeline. No vendor API lock-in. No rate limit surprises. Just process management and careful contract enforcement between tools that were never designed to work together.

Why Subprocess Orchestration

The typical multi-agent setup involves a central runtime that calls model APIs directly. You build abstractions for tool calling, state management, and retry logic. You pay for every token twice (once for the orchestrator’s planning calls, once for the actual work).

Karajan inverts this. You already have CLI tools that wrap model APIs. Claude Code knows how to edit files and run tests. Aider knows how to apply diffs. Gemini has its own context management. Instead of reimplementing those capabilities, Karajan spawns them as child processes and coordinates their outputs.

Benefits:

Use existing subscriptions: If you pay for Claude Pro or Gemini Advanced, those quotas apply to the CLI tools you already use.
No API key sprawl: The orchestrator doesn’t need its own keys. It delegates to tools that already have them.
Failure isolation: A crashed subprocess doesn’t take down the orchestrator. You get an exit code and move on.
Local-first: Everything runs on your machine. No intermediate cloud service logging your prompts.

Trade-offs:

Stdout parsing fragility: CLI tools change their output formats. You’re parsing unstructured text, not JSON schemas.
No shared memory: Agents can’t directly inspect each other’s state. Coordination happens through files and process signals.
Timeout complexity: Some tools hang. You need watchdog timers and kill signals.

Architecture: Role-Based Pipeline

Karajan runs a fixed pipeline of roles. Each role is a subprocess invocation with a specific responsibility:

Triage: Classifies task complexity and decides which route to take (simple fix, feature, refactor).
Researcher: Reads existing code, extracts context, writes a summary.
Architect: Designs the change based on the research summary.
Planner: Breaks the design into discrete steps.
Coder: Implements each step by spawning a coding CLI (Claude Code, Aider, etc.).
Reviewer: Checks the diff for obvious mistakes.
Tester: Runs the test suite.
Quality Gate: Runs SonarQube or similar static analysis.
Security Auditor: Scans for known vulnerabilities.

Each role writes its output to a file. The next role reads that file as input. If a role fails (non-zero exit code), the pipeline halts or retries depending on configuration.

State Synchronization

State lives in the filesystem. Each role writes to a known path:

.karajan/
  task.md
  triage-result.json
  research-summary.md
  architecture.md
  plan.json
  coder-output.diff
  review-notes.md
  test-results.json
  quality-gate.json
  security-report.json

The orchestrator doesn’t hold state in memory. It reads the last file, spawns the next subprocess, waits for completion, reads the new file. This makes the pipeline restartable. If the coder subprocess crashes, you can rerun just that step without losing the research or architecture work.

Subprocess Invocation

Karajan uses Node’s child_process.spawn with explicit stdio handling:

const { spawn } = require('child_process');

function runRole(roleName, inputFile, outputFile) {
  return new Promise((resolve, reject) => {
    const proc = spawn('claude-code', ['--input', inputFile, '--output', outputFile], {
      stdio: ['ignore', 'pipe', 'pipe'],
      timeout: 300000 // 5 minutes
    });

    let stdout = '';
    let stderr = '';

    proc.stdout.on('data', (data) => { stdout += data; });
    proc.stderr.on('data', (data) => { stderr += data; });

    proc.on('close', (code) => {
      if (code === 0) {
        resolve({ stdout, stderr });
      } else {
        reject(new Error(`Role ${roleName} exited with code ${code}: ${stderr}`));
      }
    });

    proc.on('error', (err) => {
      reject(new Error(`Failed to spawn ${roleName}: ${err.message}`));
    });
  });
}

Key details:

Timeout: Each role gets a hard timeout. If it hangs, the process is killed.
Exit code contract: Zero means success. Anything else is a failure.
Stderr capture: Useful for debugging when a tool fails silently.

TDD-First Pipeline

Karajan enforces test-driven development at the orchestration level. The pipeline won’t proceed past the coder step unless tests pass. This is not a suggestion. It’s a hard gate.

The tester role runs your test suite (Jest, pytest, whatever) and writes results to test-results.json. If any test fails, the pipeline halts and the reviewer role gets invoked with the test output as context. The reviewer writes notes, the coder re-runs with those notes, and the loop repeats.

This prevents the common failure mode where an AI agent writes code that looks plausible but breaks existing functionality. The test suite is the source of truth, not the model’s confidence score.

Failure Modes and Mitigations

Failure Mode	Symptom	Mitigation
Stdout format change	Parser breaks when CLI tool updates	Version-pin CLI tools in `package.json`; write parsers defensively with fallback patterns
Subprocess hang	Pipeline stalls indefinitely	Hard timeout per role; kill signal after timeout; log stderr for post-mortem
Exit code ambiguity	Tool exits 0 but didn’t actually succeed	Parse output file for expected structure; fail if missing
File race condition	Next role starts before previous role finishes writing	Use atomic file writes (write to temp, rename); add fsync before spawning next role
Disk space exhaustion	Large diffs or logs fill `.karajan/`	Rotate logs after each run; compress old artifacts; set max artifact size
Tool not installed	`spawn` fails with ENOENT	Pre-flight check for required binaries; fail fast with clear error message

Observability

Because each role is a subprocess, traditional process monitoring works. You can:

Tail logs: Each role writes to its own log file. tail -f .karajan/coder.log shows real-time progress.
Inspect exit codes: The orchestrator logs every exit code. Grep for non-zero codes to find failures.
Profile resource usage: Use ps or top to see which role is CPU or memory-bound.
Replay pipelines: Since state is on disk, you can re-run a failed pipeline from any step.

Karajan doesn’t include a web UI or dashboard. It’s a CLI tool. Observability is through standard Unix tools and log files.

Deployment Shape

Karajan runs on a single machine. It’s not a distributed system. You install it globally via npm:

nvm install 22.22.1
npm i -g karajan-code@3

Then you run it in your project directory:

karajan run --task "Add user authentication"

It spawns subprocesses, writes to .karajan/, and exits when done. No daemon. No server. No cloud dependency.

For CI/CD integration, you can run Karajan in a GitHub Actions workflow or Jenkins job. It behaves like any other CLI tool. Just make sure the AI CLI tools (Claude Code, Aider, etc.) are installed in the CI environment and have valid credentials.

Technical Verdict

Use Karajan when:

You already pay for multiple AI coding tools and want to coordinate them without additional API costs.
You need reproducible, local-first orchestration that doesn’t depend on a third-party service.
You want TDD enforcement at the orchestration level, not just within individual agents.
You’re comfortable with subprocess management and stdout parsing.

Avoid Karajan when:

You need real-time collaboration between agents (shared memory, not file-based state).
You require sub-second latency (subprocess spawn overhead is non-trivial).
You want a web UI or visual pipeline editor.
Your AI tools don’t expose stable CLI interfaces.

Karajan is plumbing for people who already have the pipes. If you’re starting from scratch and don’t have existing CLI tools, you’ll spend more time integrating them than you’ll save on orchestration. But if you’re already running Claude Code and Aider manually, Karajan gives you a way to automate the handoffs and enforce quality gates without rewriting your workflow.

Source Links

Karajan v3.0.0 Announcement