Trigger.dev launched in February 2023 as a “developer-first Zapier alternative” and got 745 points on Hacker News. Eight months later, the team shipped V2 as a “Temporal alternative for TypeScript” and repositioned entirely. That pivot exposes the infrastructure gap between event-driven automation (webhooks, triggers, chains) and durable execution (long-running jobs, retries, state persistence).
The shift matters for agent builders. Agents need resumable, fault-tolerant task execution, not just webhook chains. When your LLM tool call fails halfway through a 20-step research workflow, you need replay semantics and state snapshots, not a dead webhook.
What Changed Between V1 and V2
V1 focused on event triggers. You connected APIs, defined workflows in YAML or low-code builders, and chained actions together. The model assumed short-lived executions: webhook arrives, task runs, response returns.
V2 targets long-running background jobs. The primitive is a task function that can run for hours, survive process restarts, and retry individual steps without re-executing the entire workflow. The execution model shifted from stateless event handling to durable state machines.
Key differences:
- Execution boundary: V1 ran tasks in response to external events. V2 runs tasks as persistent background jobs with their own lifecycle.
- State management: V1 passed data between steps in memory. V2 persists state snapshots to disk and replays from checkpoints.
- Retry semantics: V1 retried the entire workflow. V2 retries individual steps and resumes from the last successful checkpoint.
- Deployment shape: V1 required a web server to receive webhooks. V2 requires worker processes that poll a task queue and execute jobs.
Durable Execution vs. Event-Driven Automation
Temporal popularized durable execution for distributed systems. The core idea: your code looks synchronous, but the runtime persists state after every step. If a worker crashes, another worker picks up the workflow from the last checkpoint.
Trigger.dev brings that model to TypeScript without Temporal’s operational weight. Instead of running a separate Temporal cluster (with Cassandra or PostgreSQL for event sourcing, gRPC for worker communication, and a custom query language for workflow history), you write TypeScript tasks and deploy them to Trigger.dev’s managed infrastructure.
The trade-off table:
| Dimension | Temporal | Trigger.dev V2 | Zapier/n8n |
|---|---|---|---|
| State persistence | Event sourcing (append-only log) | Checkpoint snapshots | In-memory or webhook retry |
| Replay model | Full deterministic replay | Step-level resume from checkpoint | Entire workflow retry |
| Language support | Go, Java, TypeScript (SDK) | TypeScript-native | Low-code + JavaScript |
| Deployment boundary | Self-hosted cluster or Temporal Cloud | Managed workers or self-hosted | Managed SaaS or Docker |
| Observability | Temporal UI, workflow history queries | Dashboard + OpenTelemetry hooks | Execution logs, webhook history |
| Retry primitives | Activity retry policies, exponential backoff | Task-level and step-level retries | Webhook retry with fixed delays |
Architecture: How Tasks Become Durable
A Trigger.dev task is a TypeScript function decorated with metadata. The runtime intercepts execution, persists state after each step, and handles retries.
import { task } from "@trigger.dev/sdk/v3";
export const processDocument = task({
id: "process-document",
retry: {
maxAttempts: 3,
factor: 2,
minTimeout: 1000,
maxTimeout: 10000,
},
run: async (payload: { documentId: string }) => {
// Step 1: Fetch document (persisted checkpoint)
const doc = await fetchDocument(payload.documentId);
// Step 2: Extract text (persisted checkpoint)
const text = await extractText(doc);
// Step 3: Analyze with LLM (persisted checkpoint)
const analysis = await analyzeLLM(text);
// Step 4: Store results (persisted checkpoint)
await storeResults(payload.documentId, analysis);
return { status: "complete", analysis };
},
});
Under the hood:
- Task registration: The
task()decorator registers the function with the Trigger.dev runtime. Theidfield creates a unique identifier for this task type. - Execution context: When you trigger a task (via API, schedule, or another task), the runtime creates an execution context with a unique run ID.
- Checkpoint persistence: After each
await, the runtime serializes the execution state (local variables, call stack, pending promises) and writes it to storage. - Failure recovery: If the worker crashes, the runtime loads the last checkpoint and resumes execution from the next step.
- Retry logic: If a step throws an error, the runtime applies the retry policy (exponential backoff, max attempts) before marking the task as failed.
State Persistence: Snapshots vs. Event Sourcing
Temporal uses event sourcing. Every workflow decision (start task, complete activity, schedule timer) appends an event to an immutable log. Replay reconstructs state by re-executing the workflow code against the event history.
Trigger.dev uses checkpoint snapshots. After each step, the runtime serializes the JavaScript heap (variables, closures, async state) and writes it to storage. Replay loads the snapshot and resumes execution.
Trade-offs:
- Snapshot pros: Faster replay (no re-execution), simpler debugging (inspect serialized state), smaller storage footprint for workflows with many steps.
- Snapshot cons: Non-deterministic code can cause replay divergence (random numbers, timestamps, external API calls that change between checkpoints).
- Event sourcing pros: Deterministic replay guarantees correctness, full audit trail of every decision.
- Event sourcing cons: Slower replay for long workflows, larger storage footprint, requires careful versioning of workflow code.
Trigger.dev mitigates non-determinism by requiring explicit checkpoints. You can’t accidentally introduce randomness between steps because the runtime only persists state at await boundaries.
Deployment Boundary: Workers vs. Web Servers
Zapier-style automation runs in the same process as the web server. A webhook arrives, the server executes the workflow, and returns a response. If the workflow takes too long, the HTTP connection times out.
Trigger.dev separates task execution from the web server. You deploy worker processes that poll a task queue, execute jobs, and report results. The web server only handles API requests (trigger task, query status, cancel job).
Deployment options:
- Managed workers: Trigger.dev hosts the workers. You push code, and the platform scales workers based on queue depth.
- Self-hosted workers: You run workers in your own infrastructure (Docker, Kubernetes, serverless functions). The platform provides the task queue and observability layer.
The worker model enables elastic scaling. If you have 1,000 tasks in the queue, the platform spins up more workers. If the queue is empty, workers scale to zero.
Retry Semantics: Task-Level vs. Step-Level
Zapier retries the entire workflow if any step fails. If step 5 of a 10-step workflow throws an error, the retry starts from step 1.
Trigger.dev retries at two levels:
- Task-level retries: If the entire task fails (unhandled exception, worker crash), the runtime retries the task from the beginning.
- Step-level retries: If a specific step fails (API timeout, rate limit), the runtime retries just that step without re-executing previous steps.
You configure retries per task:
export const fetchAndProcess = task({
id: "fetch-and-process",
retry: {
maxAttempts: 5,
factor: 2,
minTimeout: 1000,
maxTimeout: 30000,
},
run: async (payload) => {
// This entire task retries up to 5 times
const data = await fetchData(payload.url);
const result = await processData(data);
return result;
},
});
For finer control, you can wrap individual steps in try-catch blocks and implement custom retry logic:
export const resilientTask = task({
id: "resilient-task",
run: async (payload) => {
let attempt = 0;
let data;
while (attempt < 3) {
try {
data = await fetchData(payload.url);
break;
} catch (error) {
attempt++;
if (attempt >= 3) throw error;
await new Promise(r => setTimeout(r, 1000 * Math.pow(2, attempt)));
}
}
return await processData(data);
},
});
Observability: Tracking Long-Running Jobs
Trigger.dev exposes execution traces through a dashboard and OpenTelemetry hooks. Each task run generates:
- Execution timeline: Start time, end time, duration for each step.
- State snapshots: Serialized state at each checkpoint (local variables, function arguments).
- Error logs: Stack traces, retry attempts, failure reasons.
- Resource usage: CPU, memory, network I/O per step.
The dashboard shows:
- Active tasks (currently running)
- Queued tasks (waiting for workers)
- Failed tasks (exhausted retries)
- Completed tasks (success or manual cancellation)
You can integrate with existing APM tools (Datadog, New Relic, Honeycomb) by exporting OpenTelemetry spans:
import { trace } from "@opentelemetry/api";
export const tracedTask = task({
id: "traced-task",
run: async (payload) => {
const span = trace.getTracer("my-app").startSpan("process-document");
try {
const result = await processDocument(payload);
span.setStatus({ code: 0 });
return result;
} catch (error) {
span.setStatus({ code: 2, message: error.message });
throw error;
} finally {
span.end();
}
},
});
Queue and Scheduling Layer
Trigger.dev uses a task queue (similar to Temporal’s task queue architecture) to distribute work across workers. Each task type maps to a queue. Workers poll queues, claim tasks, execute them, and report results.
Scheduling primitives:
- Immediate execution: Trigger a task and return a run ID.
- Delayed execution: Schedule a task to run after a delay (e.g., 5 minutes from now).
- Cron schedules: Run a task on a recurring schedule (e.g., every hour, every day at 3 AM).
- Conditional triggers: Start a task when another task completes (chaining).
Example cron schedule:
import { schedules } from "@trigger.dev/sdk/v3";
export const dailyReport = schedules.task({
id: "daily-report",
cron: "0 3 * * *", // 3 AM every day
run: async (payload) => {
const report = await generateReport();
await sendEmail(report);
},
});
The queue layer handles:
- Concurrency limits: Prevent too many tasks from running simultaneously.
- Rate limiting: Throttle task execution to avoid overwhelming downstream APIs.
- Priority queues: Execute high-priority tasks before low-priority tasks.
When to Use Trigger.dev vs. Temporal
Use Trigger.dev if:
- Your team writes TypeScript and wants a native developer experience.
- You need durable execution without operating a distributed system.
- Your workflows involve LLM tool calls, API integrations, or long-running data processing.
- You want managed infrastructure with elastic scaling.
Use Temporal if:
- You need multi-language support (Go, Java, Python, PHP).
- You require strict deterministic replay guarantees for compliance or auditing.
- You already run a Temporal cluster and have operational expertise.
- Your workflows involve complex distributed transactions across multiple services.
Avoid both if:
- Your tasks complete in under 30 seconds and don’t need retries (use a simple job queue like BullMQ or Faktory).
- You need real-time streaming or sub-second latency (use a message broker like Kafka or NATS).
- Your workflows are purely event-driven with no long-running state (use a webhook router like Hookdeck or Svix).
Technical Verdict
Trigger.dev V2 solves the TypeScript durable execution problem without Temporal’s operational overhead. The checkpoint-based replay model trades strict determinism for faster recovery and simpler debugging. The managed worker model eliminates infrastructure toil for teams that don’t want to run their own task queue.
The pivot from event triggers to durable execution reflects a real gap in the market. Zapier-style automation works for simple workflows, but breaks down when you need fault tolerance, long-running jobs, and step-level retries. Temporal works for complex distributed systems, but requires significant operational investment.
For agent builders, Trigger.dev provides the plumbing you need: resumable LLM tool calls, retry semantics for flaky APIs, and observability for debugging multi-step workflows. The TypeScript-native API means you can write agents in the same language as your frontend and backend, without context-switching to Go or Java.
The main risk is vendor lock-in. If you build on Trigger.dev’s managed platform, migrating to self-hosted or another provider requires rewriting task definitions and replay logic. The open-source version mitigates this, but you still need to operate the task queue and worker infrastructure yourself.
Use it when you need durable execution for TypeScript workflows and don’t want to run Temporal. Avoid it if you need strict determinism, multi-language support, or already have Temporal expertise in-house.