Trigger.dev launched in February 2023 as a “developer-first Zapier alternative” and earned 745 points on Hacker News. Eight months later, the team shipped V2 with a completely different pitch: a Temporal alternative for TypeScript developers. That pivot exposes a real infrastructure gap. Developers building agent workflows don’t need another webhook router. They need durable execution primitives that survive crashes, handle retries, and manage state without forcing them to learn a distributed systems framework.
The V2 architecture reveals what happens when you strip durable execution down to the primitives TypeScript developers actually use: tasks, retries, queues, and observability. No workflow DSL. No separate temporal server cluster. Just functions that don’t die when your process does.
Why the Pivot Happened
V1 focused on event-driven integrations. Webhook comes in, trigger fires, action runs. The pattern works for simple automations but breaks down for long-running agent tasks. When an LLM call takes 45 seconds and your serverless function times out at 30, you need execution that survives process boundaries.
Early users kept asking for:
- Tasks that resume after crashes
- Retry logic that doesn’t require manual queue management
- State persistence without standing up Redis
- Observability into multi-step workflows
These are the same problems Temporal solves, but Temporal’s learning curve is steep. You write workflows in a special DSL, run a separate server cluster, and think in terms of activities and signals. For a team shipping an AI agent that calls three APIs and waits for user input, that’s too much infrastructure.
The V2 Execution Model
Trigger.dev V2 treats every task as a durable function. You write normal TypeScript. The runtime handles persistence, replay, and failure recovery.
export const processDocument = task({
id: "process-document",
run: async (payload: { documentId: string }) => {
// This runs durably - survives crashes and retries
const doc = await fetchDocument(payload.documentId);
// Long-running LLM call - no timeout panic
const analysis = await llm.analyze(doc.content);
// Wait for external event - task suspends, doesn't block
const approval = await waitForApproval(doc.id);
if (approval.approved) {
await publishResults(analysis);
}
return { status: "complete", analysisId: analysis.id };
}
});
The runtime checkpoints state after each await. If the process dies, the task resumes from the last checkpoint. If an API call fails, the retry policy kicks in without you writing queue consumer logic.
State Persistence Without the Ceremony
Temporal persists state by recording every workflow decision in an event log. Replay reconstructs state by re-executing the workflow code against that log. This works but requires deterministic execution. You can’t call Math.random() or Date.now() directly because replay must produce identical results.
Trigger.dev takes a simpler approach: checkpoint serialization. After each async boundary, the runtime serializes local variables and stores them. On resume, it deserializes and continues. This trades replay flexibility for developer ergonomics.
| Aspect | Temporal | Trigger.dev V2 |
|---|---|---|
| State model | Event sourcing + replay | Checkpoint serialization |
| Determinism | Required (no random, no Date.now) | Not required |
| Code constraints | Workflow DSL, activities separate | Normal TypeScript |
| Failure recovery | Replay from event log | Resume from checkpoint |
| Infrastructure | Separate server cluster | Managed runtime |
| Learning curve | ~2 weeks (workflows, activities, signals) | ~2 days (task primitives) |
The checkpoint model has limits. You can’t time-travel through execution history or replay with different code. But for agent workflows that need “run this until it finishes, even if stuff crashes,” it’s enough.
Retry and Timeout Plumbing
Durable execution without retry logic is just expensive logging. V2 exposes retry configuration at the task level:
export const flakeyApiCall = task({
id: "flakey-api",
retry: {
maxAttempts: 5,
factor: 2,
minTimeout: 1000,
maxTimeout: 30000,
randomize: true
},
run: async (payload) => {
// Automatic exponential backoff with jitter
return await unreliableApi.call(payload);
}
});
The runtime tracks attempt count, calculates backoff, and reschedules failed tasks. You don’t write queue consumers or dead-letter handlers. The failure mode is built into the execution primitive.
Timeouts work differently than serverless platforms. Instead of killing the process, V2 lets tasks declare expected duration and warns when they exceed it. Long-running agent loops can run for hours without hitting artificial limits.
Concurrency and Queue Semantics
Agent workflows often need controlled parallelism. You want to process 100 documents but only hit the LLM API 5 times concurrently. V2 exposes this as queue configuration:
export const analyzeDocument = task({
id: "analyze-document",
queue: {
name: "document-analysis",
concurrencyLimit: 5
},
run: async (payload) => {
// Only 5 of these run at once, rest queue
return await expensiveLLMCall(payload);
}
});
The queue is virtual. You don’t provision RabbitMQ or manage SQS consumers. The runtime schedules tasks and enforces limits. This matters for agent orchestration because you often need to throttle external API calls or limit concurrent database connections.
Observability Without Instrumentation Tax
Temporal gives you detailed execution history but requires you to understand its event model. V2 takes a different approach: automatic tracing with minimal setup.
Every task execution generates:
- Start and end timestamps
- Retry attempts and outcomes
- Checkpoint boundaries
- Child task relationships
- Error stack traces
The dashboard shows execution graphs without requiring OpenTelemetry configuration or custom spans. For debugging agent workflows, this is the right default. You can see which LLM call failed, how many times it retried, and what state it had when it crashed.
Deployment Shape
Trigger.dev V2 runs as a managed service. You write tasks in your codebase, deploy them to Trigger’s infrastructure, and trigger them via SDK or API. This is simpler than self-hosting Temporal but creates vendor lock-in.
The open-source repo (github.com/triggerdotdev/trigger.dev) includes the task runtime and SDK. You can run it locally for development. Production deployment currently requires their hosted platform, though self-hosting is on the roadmap.
For teams building agent workflows, this trade-off often makes sense. Standing up Temporal requires Kubernetes expertise and ongoing operational overhead. Trigger’s managed approach gets you durable execution without the infrastructure tax.
Failure Modes and Boundaries
Durable execution systems fail in specific ways. These constraints stem from the checkpoint model: it serializes local variables but not live connections or external state references.
Checkpoint bloat: If your task accumulates large objects in local variables, checkpoint size grows. V2 doesn’t currently expose checkpoint size limits, so runaway state can cause SerializationError at checkpoint time, aborting task execution.
Non-serializable state: WebSocket connections, file handles, and other stateful objects don’t survive checkpoints. If a checkpoint captures a WebSocket connection and the process restarts, resume will fail because WebSocket objects aren’t serializable. You must reconnect after resume, which means storing connection parameters (host, port, credentials) separately from the connection object itself.
Replay divergence: Unlike Temporal’s deterministic replay, checkpoint-based resume can diverge if external state changes between attempts. If you checkpoint after reading a database row, then the row gets deleted, resume will fail.
Concurrency limits: Queue-based concurrency is eventually consistent. If you change the limit from 5 to 10, in-flight tasks won’t immediately scale up.
Technical Verdict
Trigger.dev V2 trades Temporal’s replay debugging for zero-config durability. This exchange makes sense for TypeScript teams shipping agent workflows without distributed systems expertise. The checkpoint model sacrifices time-travel inspection for developer ergonomics.
Use it when:
- You’re building agent workflows in TypeScript and need tasks that survive crashes without manual retry logic
- Your execution patterns are linear (call API, wait, call another API) rather than complex state machines
- You’d rather pay for managed infrastructure than hire someone to operate Temporal
- You need observability out of the box without configuring tracing pipelines
- Your team doesn’t have distributed systems expertise and won’t acquire it
Avoid it when:
- You need full replay and time-travel debugging to understand production failures
- You’re already running Temporal and have the operational expertise (switching costs aren’t worth it)
- You require on-premise deployment today (self-hosting isn’t production-ready yet)
- Your workflows need strict deterministic execution or complex compensation logic
- Checkpoint serialization limits (non-serializable state, potential bloat) conflict with your architecture
The managed deployment model reduces operational overhead but creates vendor dependency. The open-source runtime provides an exit path, but self-hosting production workloads isn’t viable yet. If you’re building AI agents that orchestrate multiple API calls, wait for human input, or process long-running tasks, V2 gives you the primitives you need without forcing you into activities and signals. The infrastructure gap it fills is real: most TypeScript developers need durable execution, but few want to operate Temporal.
Source Links
- Trigger.dev V2 Announcement (172 points, 39 comments)
- Trigger.dev V1 Launch (745 points, 190 comments)
- Trigger.dev GitHub Repository
- Official Documentation