Enju: How a Unified Workflow Graph Treats Humans, AI Agents, and Compute as Interchangeable Peers

Enju is a GitHub project that claims to coordinate humans, AI agents, and compute jobs as equivalent nodes in a single workflow graph. The Show HN post (1 point, 1 comment as of May 27, 2026) positions it as a solution for orchestrating heterogeneous tasks without special-casing any participant type. The repository title states the goal is treating all three as “peers on a shared workflow graph.”

The available research material contains only a GitHub navigation menu, not the actual README or documentation. This article examines the architectural patterns any unified workflow graph would need to implement, the orchestration problems it would solve, and the open questions such a system must address. The technical details below are hypothetical requirements for the pattern, not a review of Enju’s specific implementation.

What the Concept Proposes

Based on the Show HN title and repository description, Enju appears to propose a unified workflow engine where:

Human approval tasks are nodes in a graph
AI agent operations are nodes in the same graph
Compute jobs are nodes in the same graph
All three share a common execution model

The pitch is that this eliminates the need for separate orchestration systems or glue code when workflows span multiple participant types. A compliance workflow that needs an agent to draft a response, a human to approve it, another agent to file paperwork, and a compute job to archive the result would use a single graph representation.

The Architectural Problem This Addresses

Most workflow engines treat human approvals as special cases. You get a dedicated approval node type, a separate UI for task assignment, and custom retry logic when someone is unavailable. Agent nodes get their own abstractions (tool calls, LLM sessions, token budgets). Compute nodes get yet another set of primitives (job scheduling, resource allocation, batch processing).

This fragmentation makes orchestration harder when workflows span all three. If each step uses a different orchestration primitive, you end up with glue code that translates between subsystems.

What a Unified Node Abstraction Must Solve

Any system attempting to treat humans, agents, and compute as interchangeable nodes would need a common interface that works for all three. That interface would likely include:

State tracking: pending, in-progress, completed, failed, timed-out
Input/output contracts: typed parameters and results
Execution policies: timeout, retry count, failure behavior
Dependency resolution: which nodes must complete before this one starts

The challenge is handling node-specific behavior without breaking the abstraction. Human nodes need escalation policies if the assignee doesn’t respond. Agent nodes need token budget tracking and tool call management. Compute nodes need resource allocation and job scheduler integration.

A hypothetical approach would use pluggable node handlers that implement the common interface but add their own logic. The workflow engine only sees the external state transitions. The handler manages internal details.

Node Type Comparison

A unified workflow graph would need to handle fundamentally different execution models. Here is how the three node types compare:

Dimension	Human Node	Agent Node	Compute Node
State Model	Pending until assignee responds	In-progress during LLM/tool calls	Queued, running, completed
Timeout Handling	Escalation to backup assignee	Retry with exponential backoff	Job scheduler requeue
Failure Recovery	Manual override or alternate path	Automatic retry with context	Checkpoint and resume
Observability	Notification delivery + response time	Token usage + tool call logs	Resource utilization + exit code
Blocking Behavior	Can block indefinitely	Fails fast on rate limit	Blocks on resource exhaustion

Any workflow engine implementing this pattern must abstract over these differences while still exposing enough control to handle each type’s failure modes.

State Transitions and Partial Completion

When a node fails mid-execution, a hypothetical workflow engine would need to decide what to do. A possible state machine per node might look like this:

Example state transition model (hypothetical pattern, not Enju’s actual implementation):

pending -> in-progress -> completed
                       -> failed -> retrying -> in-progress
                       -> timed-out -> escalated

The graph tracks which nodes have completed and which are blocked. If a human approval times out, the workflow could route to an escalation node (another human, an override agent, or a fallback path). If an agent fails, the retry policy kicks in. If a compute job crashes, the node transitions to failed and the graph decides whether to retry or abort.

Partial completion is trickier. If an agent node makes three tool calls and the third one fails, does the node restart from scratch or resume from the last successful call? The workflow engine cannot answer this. The node handler needs to manage internal checkpoints.

This works well for stateless operations (LLM calls, API requests). It is harder for stateful operations (database writes, file uploads). If a node writes half a file and crashes, the handler needs to clean up or resume. Without automatic rollback, you build that into the node handler or use external transaction boundaries.

Workflow Serialization and Versioning

A workflow definition is a directed acyclic graph (DAG) of nodes and edges. A hypothetical JSON representation for a unified workflow graph might look like this:

Example workflow definition (illustrative format for the pattern, not Enju’s actual schema):

{
  "version": "1.0",
  "nodes": [
    {
      "id": "draft_response",
      "type": "agent",
      "handler": "llm_agent_v2",
      "input": {"prompt": "Draft a compliance response"},
      "timeout": "5m"
    },
    {
      "id": "human_approval",
      "type": "human",
      "handler": "slack_approval",
      "input": {"assignee": "legal_team"},
      "timeout": "24h",
      "escalation": "manager_override"
    },
    {
      "id": "file_paperwork",
      "type": "agent",
      "handler": "filing_agent_v1",
      "input": {"document": "$draft_response.output"},
      "timeout": "10m"
    }
  ],
  "edges": [
    {"from": "draft_response", "to": "human_approval"},
    {"from": "human_approval", "to": "file_paperwork"}
  ]
}

The handler field would point to a registered node implementation. This lets you swap out agent implementations without changing the workflow definition. If you upgrade from llm_agent_v2 to llm_agent_v3, you update the handler registry and redeploy. Running workflows continue with the old handler. New workflows use the new one.

Versioning gets messy when you change the workflow structure. If you add a new node between human_approval and file_paperwork, running workflows do not see it. You either let them finish with the old definition or manually intervene.

Open Questions for Unified Workflow Graphs

The unified node abstraction raises several questions that any implementation would need to address:

How does the orchestrator dispatch work? A hypothetical system might poll a database for pending nodes, or it might use an event-driven model where nodes signal completion. Polling is simple but does not scale well. Event-driven is more complex but handles high concurrency better.

Where does workflow state live? If the orchestrator is stateless and the database is the source of truth, you get easy horizontal scaling but potential database contention. If the orchestrator holds state in memory, you get better performance but harder failover.

How do you handle long-running human approvals? A node that waits 24 hours for a human response ties up resources. A production system would need to checkpoint the workflow and unload it from memory, or use a separate task queue for human nodes.

What happens when a handler crashes? If an agent node handler dies mid-execution, the workflow engine must detect it and retry, or the node stays in in-progress state forever until a timeout fires.

How do you correlate logs across node types? If an agent node fails because the LLM returned invalid JSON, you need to look at the agent’s internal logs. If a compute node fails because the job scheduler rejected it, you need to look at the scheduler’s logs. The workflow engine can provide a correlation ID, but aggregating logs from multiple systems is still manual.

When Humans Block the Graph

The most common failure mode in a unified workflow graph is a human approval node that never completes. The assignee is unavailable, the notification is missed, or the person leaves the company. Timeout and escalation policies help, but they are not automatic.

Any production implementation would need:

Default timeouts: every human node has a maximum wait time
Escalation chains: if the primary assignee does not respond, route to a backup
Override nodes: let managers or admins force-complete a node
Parallel approvals: require N of M approvers, so one person cannot block the workflow

Agent nodes face similar issues (rate limits, API downtime), as do compute nodes (resource exhaustion, job scheduler failures). Human unpredictability is the distinguishing factor. An agent might fail fast. A human might just ignore the notification.

Observability Needs

When a workflow spans humans, agents, and compute, you need visibility into where things are stuck. Minimum requirements for any implementation:

Node execution logs: what each node did, how long it took, what it returned
State snapshots: the full workflow state at any point in time
Dependency traces: which nodes are waiting on which other nodes

This is enough to answer “why is this workflow stuck?” but not always “why did this node fail?” If an agent node fails because the LLM returned invalid JSON, you need to look at the agent’s internal logs. The workflow engine cannot aggregate those automatically. You correlate by workflow ID and node ID.

The graph structure makes it easy to visualize bottlenecks. If 50 workflows are blocked on the same human approval node, you see it immediately. If an agent node has a 30% failure rate, you see it in the aggregate metrics.

Technical Verdict

The unified workflow graph pattern addresses a real problem: coordinating heterogeneous tasks (human approvals, agent actions, compute jobs) in a single workflow without special-casing any participant type. The unified node abstraction is conceptually clean and could simplify orchestration logic.

However, Enju as a project has minimal community validation (1 point, 1 comment on Show HN as of May 27, 2026). The available research material contains only a GitHub navigation menu, not the actual README or documentation. This assessment is based on the Show HN post title and repository description, not a review of the implementation.

Use this pattern when you need to coordinate multi-step workflows that genuinely span humans, agents, and compute, and you want a single execution model for all three. Avoid it if your workflows are purely agent-to-agent or purely compute-to-compute. The overhead of the unified abstraction does not pay off. Also avoid it if you need automatic rollback or distributed transactions. Those require external systems or custom node handlers.

The architecture is promising but unproven. If you are evaluating Enju, read the actual repository README and source code to see how it handles state transitions, partial completion, and failure modes. The concept is sound. The implementation determines whether it is production-ready.

Source Links

Primary: Enju on GitHub
Discussion: Show HN