Trigger.dev launched in February 2023 as a “developer-first Zapier alternative” and earned 745 Hacker News points. Eight months later, the team shipped V2 as a “Temporal alternative for TypeScript” (172 points). That pivot exposes the architectural gap between event-driven automation and durable execution, and it matters for anyone building agent systems that need guaranteed completion across retries, crashes, and long-running operations.
The shift was driven by user feedback. Developers wanted workflows that survive failures, not just webhook routing. That requirement changes everything about state management, retry semantics, and execution guarantees.
What Changed Between V1 and V2
V1 architecture:
- Event triggers (webhooks, schedules, API calls)
- Stateless handlers
- No built-in retry or durability
- Zapier-style connector model
V2 architecture:
- Durable task execution
- Persistent state across retries
- Execution guarantees (at-least-once, idempotency boundaries)
- TypeScript-native workflow definitions
The V2 model positions Trigger.dev against Temporal, which uses event sourcing in Go with language-specific SDKs. Trigger.dev bets on TypeScript end-to-end, which simplifies the stack but constrains runtime flexibility.
Durable Execution Plumbing
Durable execution means a task can pause, crash, retry, and resume without losing progress. The system must:
- Persist state at checkpoints so retries start from the last known good position
- Guarantee idempotency so duplicate executions don’t corrupt state
- Handle partial failures where some steps succeed and others fail
- Provide observability into execution history and current state
Trigger.dev implements this with:
- Task definitions that declare retry policies, timeouts, and concurrency limits
- Checkpointing at tool call boundaries (for AI agents) or explicit
awaitpoints - Queue-based execution with visibility into pending, running, and failed tasks
- Real-time connections for frontend apps to subscribe to task progress
Here’s a minimal task with retry semantics:
import { task } from "@trigger.dev/sdk/v3";
export const processDocument = task({
id: "process-document",
retry: {
maxAttempts: 5,
factor: 2,
minTimeoutInMs: 1000,
maxTimeoutInMs: 60000,
},
run: async (payload: { url: string }) => {
// Step 1: Download (checkpointed)
const file = await downloadFile(payload.url);
// Step 2: Parse (checkpointed)
const parsed = await parseDocument(file);
// Step 3: Store (checkpointed)
return await storeResult(parsed);
},
});
If parseDocument fails on attempt 2, the system retries from that checkpoint without re-downloading. The execution history is stored, so you can inspect which step failed and why.
State Persistence vs. Event Sourcing
Temporal uses event sourcing: every state transition is an immutable event. Replay the event log, and you reconstruct the workflow state. This gives strong consistency and audit trails but requires a Go runtime and external storage (Cassandra, PostgreSQL, MySQL).
Trigger.dev uses checkpoint-based persistence: the system snapshots state at defined points and stores it in a managed database. This is simpler to operate but less granular. You lose inter-checkpoint visibility unless you add explicit logging.
| Aspect | Temporal | Trigger.dev |
|---|---|---|
| State recovery | Replay full event log | Restore from last checkpoint |
| Audit granularity | Every state transition | Explicit checkpoint boundaries |
| Storage model | External (Cassandra, Postgres) | Managed (platform-provided) |
| Language runtime | Go core + SDKs | TypeScript end-to-end |
| Operational complexity | High (cluster, storage, workers) | Low (managed platform) |
| Execution visibility | Complete history | Checkpoint + logs |
| Cost model | Self-hosted infrastructure + ops team | Managed SaaS (see pricing page for current rates) |
For agent systems, the trade-off is between operational overhead and execution transparency. If you need to debug why an LLM made a specific tool call, event sourcing gives you the full conversation history. Checkpoints give you “before” and “after” snapshots.
Execution Guarantees and Idempotency
Trigger.dev provides at-least-once execution: a task will run to completion or fail permanently, but it may run multiple times if retries occur. This requires idempotent operations.
Idempotency boundaries are where the system guarantees no duplicate side effects:
- Tool calls in AI agents: Each tool invocation gets a unique ID. If the agent crashes mid-execution, replaying the workflow skips already-completed tool calls.
- External API calls: Wrap calls in idempotency keys (e.g., Stripe’s
idempotency_keyheader). - Database writes: Use upserts or conditional inserts based on task run IDs.
Example with idempotent payment processing:
export const processPayment = task({
id: "process-payment",
run: async ({ orderId, amount }: { orderId: string; amount: number }) => {
// Use task run ID as idempotency key
const idempotencyKey = `${orderId}-${context.run.id}`;
// Stripe won't charge twice with same key
const charge = await stripe.charges.create({
amount,
currency: "usd",
source: "tok_visa",
idempotency_key: idempotencyKey,
});
// Database write uses orderId as unique constraint
await db.orders.upsert({
where: { id: orderId },
update: { chargeId: charge.id, status: "paid" },
create: { id: orderId, chargeId: charge.id, status: "paid" },
});
return { chargeId: charge.id };
},
});
If the task crashes after the Stripe charge but before the database write, the retry will skip the charge (idempotency key prevents duplicate) and complete the database update.
Queue vs. Workflow Orchestration
Trigger.dev routes tasks through queues with concurrency limits. This is different from Temporal’s workflow orchestration model, where workflows spawn child workflows and activities.
Trigger.dev’s queue model:
- Tasks are independent units
- Concurrency controlled per queue
- No parent-child workflow hierarchy
- Simpler mental model for isolated jobs
Temporal’s workflow model:
- Workflows compose into trees
- Parent workflows wait for children
- Activities are leaf nodes (actual work)
- More expressive for complex orchestration
For AI agents, the queue model works when each agent run is independent. If you need hierarchical orchestration (e.g., a coordinator agent spawning specialist agents), you either implement it in application code or use Temporal’s native workflow composition.
Observability and Failure Modes
Trigger.dev provides:
- Real-time task monitoring: See running, queued, and failed tasks
- Execution traces: Step-by-step logs with timestamps
- Retry history: Which attempts failed and why
- Real-time subscriptions: Frontend apps can listen to task progress via WebSockets
Common failure modes:
- Non-idempotent side effects: Retries cause duplicate charges, emails, or API calls. Fix: Add idempotency keys.
- Checkpoint bloat: Storing large objects at every checkpoint slows recovery. Fix: Store references, not full payloads.
- Timeout mismatches: Task timeout shorter than LLM response time. Fix: Set timeouts above worst-case latency.
- Concurrency limits: Queue backs up when tasks run slower than arrival rate. Fix: Scale workers or increase concurrency.
To mitigate checkpoint bloat, store references instead of full payloads:
export const processLargeFile = task({
id: "process-large-file",
run: async (payload: { fileUrl: string }) => {
// Download and upload to S3, store reference only
const file = await downloadFile(payload.fileUrl);
const s3Url = await uploadToS3(file);
// Checkpoint stores URL, not file buffer
const parsed = await parseDocument(s3Url);
return { resultUrl: s3Url, parsed };
},
});
Deployment Shape
Trigger.dev is a managed platform. You deploy tasks by pushing code to the platform, which handles:
- Worker provisioning
- Queue management
- State storage
- Retry scheduling
Self-hosting is possible but requires running the full stack (API server, workers, database, queue). The managed option removes operational overhead but locks you into the platform. For self-hosting, expect to provision at minimum: a PostgreSQL instance, Redis for queues, and worker nodes with auto-scaling. The operational cost resembles running a small Kubernetes cluster with stateful services.
Temporal requires running a cluster (server, workers, storage). You control the infrastructure but pay the operational cost. A production Temporal deployment typically needs a three-node server cluster, Cassandra or PostgreSQL with replication, and separate worker pools per task queue.
Technical Verdict
Use Trigger.dev if:
- You’re TypeScript-native and don’t need polyglot workflows
- Tasks are mostly independent without deep parent-child orchestration
- You want managed infrastructure and can accept platform lock-in
- Checkpoint-level visibility is sufficient for debugging and compliance. As shown in the state persistence comparison table, Trigger.dev gives you before/after snapshots at explicit checkpoint boundaries rather than every state transition. For most AI agent workflows (tool calling, multi-step reasoning), this granularity is adequate.
- You’re building AI agents with tool calling and need fast iteration
Use Temporal if:
- You need multi-language support (Go, Java, Python, .NET) in the same workflow
- Workflows require complex hierarchies with parent-child coordination
- Full event sourcing is critical for audit trails or deterministic replay. The event log model (every state transition recorded) provides complete execution history, unlike checkpoint-based recovery which only captures snapshots at defined boundaries.
- You have the team to operate distributed systems and want infrastructure control
- You need sub-checkpoint visibility into every state transition for forensic debugging or regulatory compliance
Avoid Trigger.dev when:
- Workflows span multiple services in different languages
- You require on-premise deployment with zero external dependencies
- Compliance mandates complete execution history at sub-checkpoint granularity
- You need long-running processes (weeks or months) that must survive code deployments, platform updates, and schema migrations without losing in-flight state or requiring manual intervention
Trigger.dev V2 proves that durable execution doesn’t require event sourcing or a polyglot runtime. The TypeScript-native approach lowers the barrier for teams building agent systems that need retry guarantees without Temporal’s operational complexity. The checkpoint model trades granular execution history for simplicity. For most AI agent workflows (tool calling, multi-step reasoning, human-in-the-loop approvals), that’s an acceptable trade.
The real insight is that durable execution is a spectrum. Trigger.dev sits between stateless webhooks (Zapier) and full workflow orchestration (Temporal), establishing a distinct position for TypeScript teams that need guaranteed completion without running a distributed systems cluster.