mech.app
Dev Tools

Dead Light Framework: A 3-Minute Test for When Agent Projects Outgrow HANDOFF + LOG Files

Practical decision framework for migrating from file-based agent state to structured orchestration when HANDOFF.md and LOG.md hit concurrency limits.

Source: dev.to
Dead Light Framework: A 3-Minute Test for When Agent Projects Outgrow HANDOFF + LOG Files

Most agent projects start with two Markdown files: HANDOFF.md for current state and LOG.md for append-only history. This works until it doesn’t. The Dead Light Framework offers a three-question test to identify when file-based coordination becomes the bottleneck and what tier of infrastructure you actually need.

The File-Based State Pattern

The minimal setup looks like this:

  • HANDOFF.md: Current snapshot. Agent reads this first on session start. Contains active tasks, decisions, blockers.
  • LOG.md: Append-only history. Every meaningful action gets logged. HANDOFF is derived from LOG.

This pattern works because agents reset to zero memory each session. Files provide durable state without databases, message queues, or orchestration layers.

The failure mode is predictable: file I/O becomes a coordination bottleneck when multiple agents or sessions need concurrent access.

The Three-Question Test

Run this in three minutes to find your infrastructure tier:

1. How many concurrent sessions touch the same state?

  • One session, one repo: Plain README or single HANDOFF file works.
  • Multiple sessions, sequential: Two files (HANDOFF + LOG) handle it.
  • Multiple sessions, concurrent: File locks break down. You need structured state.

2. How often does state change?

  • Hourly or daily: Files are fine. Append to LOG, regenerate HANDOFF.
  • Every few minutes: File writes start causing conflicts.
  • Sub-minute: Files are already the bottleneck. Move to in-memory or database.

3. What happens if state gets corrupted?

  • Rebuild from scratch in under 10 minutes: Files are acceptable.
  • Rebuild takes hours or requires manual intervention: You need transactional storage.
  • Can’t rebuild at all: You skipped this test too late.

Infrastructure Tiers

TierState StorageCoordinationWhen to Use
Plain READMESingle markdown fileNoneSolo prototypes, documentation-only projects
HANDOFF + LOGTwo markdown filesFile-based, sequentialSingle-session agents, low-frequency updates
Multi-unit paperworkMultiple markdown files per domainDirectory structure, file naming conventionsMultiple agents, different domains, still sequential
Running serviceDatabase, message queue, or state storeAPI, locks, transactionsConcurrent agents, high-frequency updates, production workloads

Failure Modes of File-Based State

File coordination breaks in predictable ways:

Race conditions: Two agents read HANDOFF, both append to LOG, both write conflicting HANDOFF updates. Last write wins, first agent’s work disappears.

Lock contention: File locks serialize access. If Agent A holds the lock for 30 seconds while calling an LLM, Agent B waits. With three agents, you get a queue.

Merge conflicts: Git-based coordination (committing HANDOFF and LOG) creates merge conflicts when branches diverge. Resolving conflicts manually defeats the automation.

State size: Appending to LOG forever means file size grows unbounded. Reading a 50 MB markdown file on every session start adds latency.

No rollback: If an agent writes garbage to HANDOFF, you need manual intervention or a separate backup strategy. Files don’t give you transactions.

Migration Path: Files to Structured State

When the test says you need more than files, here’s the transition:

Step 1: Add a State Abstraction Layer

Wrap file access behind functions:

class StateManager:
    def read_handoff(self) -> dict:
        # Currently reads HANDOFF.md
        # Later: query database
        pass
    
    def append_log(self, entry: dict):
        # Currently appends to LOG.md
        # Later: insert into time-series table
        pass
    
    def update_handoff(self, state: dict):
        # Currently overwrites HANDOFF.md
        # Later: transactional update
        pass

This lets you swap storage without rewriting agent logic.

Step 2: Choose Your State Store

SQLite: Single-file database. Transactions, concurrent reads, write serialization. Good for local agents or single-machine deployments.

PostgreSQL: Full ACID transactions, row-level locking, replication. Use when multiple machines run agents.

Redis: In-memory state with persistence. Fast reads, pub/sub for coordination. Good for high-frequency updates.

Message queue (RabbitMQ, SQS): Append-only log becomes a queue. Agents consume messages, update state, publish results. Decouples producers and consumers.

Step 3: Handle Concurrency

File locks become database transactions:

# File-based (breaks under concurrency)
handoff = read_handoff()
handoff['status'] = 'in_progress'
write_handoff(handoff)

# Database (serializable transaction)
with db.transaction():
    handoff = db.query("SELECT * FROM handoff WHERE id = ? FOR UPDATE", id)
    handoff['status'] = 'in_progress'
    db.execute("UPDATE handoff SET status = ? WHERE id = ?", 'in_progress', id)

The FOR UPDATE lock prevents other agents from reading stale state.

Step 4: Add Observability

Files give you grep and tail. Databases need structured logging:

  • State transitions: Log every HANDOFF update with timestamp, agent ID, previous state, new state.
  • Query patterns: Track which agents read which state, how often, and latency.
  • Conflict detection: Count transaction retries, lock wait times, deadlocks.

When to Stay on Files

Files remain the right choice when:

  • You run one agent session at a time
  • State updates happen infrequently (hourly or slower)
  • The entire state fits in a few hundred KB
  • You can rebuild from scratch in minutes
  • You want zero infrastructure dependencies

The Dead Light Framework’s core insight is that stateless agents need durable state, but durable doesn’t mean complex. Files work until concurrency or frequency forces your hand.

Technical Verdict

Use HANDOFF + LOG files when: You have a single-session agent, state updates are infrequent, and you can tolerate sequential access. This covers most prototypes and solo side projects.

Migrate to structured state when: You need concurrent agent sessions, state updates happen every few minutes, or you can’t rebuild from scratch quickly. The three-question test identifies this transition point before file coordination becomes a production incident.

Avoid premature migration: Starting with a database adds complexity you don’t need until the test says otherwise. Files are simpler to debug, version control, and reason about. The framework’s value is knowing when simplicity stops being sufficient.

Tags

agentic-ai orchestration infrastructure

Primary Source

dev.to