Trigger.dev started as a webhook-driven Zapier alternative in February 2023 (745 HN points). By October 2023, the team shipped V2 as a Temporal competitor for TypeScript developers (172 points). The pivot exposes a gap in the durable execution market: most agent builders need retries, timeouts, and state persistence without running Go or Java infrastructure.
The shift from webhook automation to durable execution primitives reveals architectural choices that matter when you build multi-step agent workflows. Here is what the plumbing looks like.
Why Webhook Automation Is Not Durable Execution
V1 Trigger.dev connected APIs via webhooks. You defined triggers (GitHub push, Stripe payment) and actions (send Slack message, update database). The runtime handled HTTP retries and basic error handling.
V2 introduced task primitives that survive process crashes:
- Sleep and delay: pause execution for hours or days without holding a connection
- Automatic retries: exponential backoff with configurable limits
- Timeouts: per-step and global execution boundaries
- Fan-out/fan-in: parallel task execution with result aggregation
- Human-in-the-loop pauses: wait for external approval before continuing
These primitives require state checkpointing. A webhook handler restarts from scratch on failure. A durable task resumes from the last checkpoint.
State Persistence Model
Trigger.dev uses a database-backed state machine instead of Temporal’s event sourcing. Each task execution writes state snapshots to Postgres at defined checkpoints:
- Before and after each
awaitboundary - Before and after tool calls in agent loops
- At explicit
checkpoint()calls
On failure, the runtime replays from the last snapshot. This avoids Temporal’s full event log replay but requires developers to understand checkpoint boundaries.
Key difference: Temporal replays your code from the beginning using deterministic execution. Trigger.dev restores state from the database and continues forward. This makes debugging easier (you see actual state, not replayed history) but requires idempotent operations between checkpoints.
Retry Semantics and Failure Recovery
Trigger.dev exposes retry policies at the task level:
export const researchAgent = task({
id: "research-agent",
retry: {
maxAttempts: 3,
factor: 2,
minTimeout: 1000,
maxTimeout: 10000,
},
run: async ({ topic }: { topic: string }) => {
// Task logic with automatic retries
},
});
Failures trigger retries with exponential backoff. The runtime tracks attempt count in the database. If all retries fail, the task moves to a dead-letter queue.
Replay behavior: On retry, the runtime loads the last checkpoint and re-executes from that point. Side effects before the checkpoint do not repeat. Side effects after the checkpoint may repeat unless you add idempotency keys.
Agent Loop Architecture
The example code shows a 10-iteration agent loop with tool calls. Each iteration checkpoints before and after generateText():
for (let i = 0; i < 10; i++) {
const { text, toolCalls, steps } = await generateText({
model: anthropic("claude-opus-4-20250514"),
tools: { search, browse, analyze },
maxSteps: 5,
});
if (!toolCalls.length) {
return { summary: text, stepsUsed: steps.length };
}
for (const call of toolCalls) {
const result = await executeTool(call);
messages.push({ role: "tool", content: result });
}
}
If the task crashes during iteration 7, the runtime resumes at iteration 7 with the messages array restored from the checkpoint. The LLM call repeats, which may produce different tool calls due to non-determinism.
Observability: The dashboard shows each checkpoint, tool call, and retry. You can inspect the messages array at any point in the loop. This beats reading Temporal event logs for debugging agent behavior.
Deployment Models and Worker Architecture
Trigger.dev offers three deployment options:
| Model | Worker Location | State Storage | Use Case |
|---|---|---|---|
| Cloud | Managed containers | Trigger.dev Postgres | Fast onboarding, no ops |
| Hybrid | Your infrastructure | Trigger.dev Postgres | Data locality, custom compute |
| Self-hosted | Your infrastructure | Your Postgres | Full control, air-gapped environments |
Cloud deployment runs workers in ephemeral containers with automatic scaling. Hybrid keeps workers in your VPC but uses Trigger.dev for state and orchestration. Self-hosted gives you the full stack.
Worker lifecycle: Workers poll the Trigger.dev API for tasks. When a task arrives, the worker loads the checkpoint, executes the next step, writes a new checkpoint, and returns the result. This differs from Temporal’s long-lived worker pools.
Concurrency and Queue Control
Trigger.dev exposes concurrency limits at the task and queue level:
- Task-level: limit concurrent executions of a single task
- Queue-level: limit concurrent executions across multiple tasks in a queue
- Global: account-wide concurrency caps
Queues use priority ordering. High-priority tasks jump the queue. This matters for agent workflows where user-facing tasks need faster response than background research.
Failure mode: If workers crash, tasks stay in the queue. New workers pick them up and resume from the last checkpoint. If the database goes down, all execution stops. There is no local state fallback.
Comparison to LangGraph and Inngest
| Feature | Trigger.dev | LangGraph | Inngest |
|---|---|---|---|
| State model | Database snapshots | In-memory graph | Event log |
| Retry logic | Exponential backoff | Manual checkpoints | Automatic with steps |
| Observability | Dashboard + traces | Graph visualization | Event timeline |
| Deployment | Managed or self-hosted | Bring your own runtime | Managed cloud |
| TypeScript-native | Yes | Yes | Yes |
LangGraph gives you graph-based state machines with explicit edges between nodes. You control checkpointing and branching. Trigger.dev abstracts the graph into task primitives with automatic checkpointing.
Inngest uses event-driven steps. Each step is a separate function invocation. Trigger.dev keeps the entire task in one function with checkpoints at await boundaries.
When to use which: Use LangGraph for complex agent graphs with conditional branching. Use Inngest for event-driven microservices. Use Trigger.dev for long-running tasks with retries and timeouts in a single function.
Security Boundaries and Secrets Management
Trigger.dev stores secrets in environment variables encrypted at rest. Workers decrypt secrets at runtime. There is no secret rotation API yet.
Isolation: Each task runs in a separate container (cloud) or process (self-hosted). Tasks cannot access each other’s memory or file systems. Database-level isolation prevents tasks from reading other tasks’ checkpoints.
Audit trail: The dashboard logs every task execution, checkpoint, and retry. You can trace which secrets were accessed during each step. This helps with compliance audits.
Observability and Debugging
The dashboard shows:
- Real-time task execution status
- Checkpoint history with state snapshots
- Retry attempts and failure reasons
- Tool call inputs and outputs
- Execution timeline with step durations
You can replay a failed task from any checkpoint. This beats reading logs or event streams. The tracing view shows parent-child relationships for fan-out tasks.
Missing features: No distributed tracing integration (OpenTelemetry support is on the roadmap). No custom metrics API. You cannot export traces to Datadog or Honeycomb yet.
Likely Failure Modes
Database bottleneck: Every checkpoint writes to Postgres. High-frequency tasks (sub-second iterations) can saturate the database. The team recommends batching checkpoints or using longer sleep intervals.
Non-deterministic replays: If your task reads from an external API without caching, retries may produce different results. Add idempotency keys or cache responses between checkpoints.
Worker starvation: If all workers are busy with long-running tasks, new tasks queue up. Set concurrency limits and use priority queues to prevent starvation.
Checkpoint bloat: Large state objects (multi-megabyte messages arrays) slow down checkpoint writes and reads. Compress or paginate state between checkpoints.
Technical Verdict
Use Trigger.dev when:
- You need durable execution in TypeScript without running Temporal infrastructure
- Your agent workflows have 10-1000 steps with retries and timeouts
- You want a managed platform with automatic scaling and observability
- You can tolerate database-backed state instead of in-memory execution
Avoid Trigger.dev when:
- You need sub-second task latency (checkpoints add 50-200ms overhead)
- Your workflows require complex branching logic better suited to graph-based orchestration
- You already run Temporal and need cross-language workflow support
- You need air-gapped deployment without internet access to the control plane (unless you self-host)
The V2 pivot shows what happens when you listen to users building real agent workflows. The shift from webhooks to durable execution primitives exposes the plumbing that matters: state checkpointing, retry semantics, and deployment flexibility. For TypeScript teams building multi-step agent tasks, Trigger.dev fills a gap between simple job queues and heavyweight workflow engines.