Trigger.dev V2: Durable Execution Plumbing for TypeScript Workflows

Trigger.dev started as a TypeScript-native Zapier alternative (745 points, Feb 2023), then pivoted to durable execution after user feedback revealed what developers actually needed: workflows that survive crashes, retry intelligently, and orchestrate long-running tasks without manual checkpointing. The V2 announcement (172 points, Oct 2023) positioned it explicitly as a Temporal alternative for TypeScript teams.

The shift exposes a fundamental split in workflow orchestration. Event-driven systems like Zapier chain discrete steps with webhook glue. Durable execution systems like Temporal and Trigger.dev persist workflow state across process restarts, handle retries at the framework level, and let you write imperative code that looks synchronous but survives infrastructure failures.

State Persistence Without Checkpointing Code

Trigger.dev persists workflow state by intercepting async operations and logging them to a durable event log. When a task crashes mid-execution, the runtime replays the log on restart and reconstructs in-memory state up to the failure point.

How it works:

Every await boundary in your task becomes a checkpoint
The runtime logs each async call (HTTP request, database query, tool invocation) with its result
On replay, logged results are returned immediately without re-executing side effects
Only new operations after the last checkpoint actually run

This means you write normal TypeScript async/await code. The framework handles durability:

export const processOrder = task({
  id: "process-order",
  run: async ({ orderId }: { orderId: string }) => {
    // Each await is a checkpoint
    const order = await db.orders.findUnique({ where: { id: orderId } });
    
    // If crash happens here, replay starts from this point
    const payment = await stripe.charges.create({
      amount: order.total,
      currency: "usd",
    });
    
    // State persisted across this boundary too
    await db.orders.update({
      where: { id: orderId },
      data: { paymentId: payment.id, status: "paid" },
    });
    
    return { orderId, paymentId: payment.id };
  },
});

If the process dies after the Stripe charge but before the database update, replay will skip the Stripe call (using the logged result) and proceed directly to the update.

Retry Primitives vs. Temporal’s Activity Model

Temporal separates workflows (orchestration logic) from activities (side-effecting operations). Activities get explicit retry policies, heartbeats for long-running work, and timeout controls. Workflows are deterministic replay machines.

Trigger.dev collapses this distinction. Everything is a task. Retries happen at the task level with exponential backoff:

export const flakyScrape = task({
  id: "scrape-with-retries",
  retry: {
    maxAttempts: 5,
    factor: 2,
    minTimeoutInMs: 1000,
    maxTimeoutInMs: 60000,
  },
  run: async ({ url }: { url: string }) => {
    const page = await browser.newPage();
    const content = await page.goto(url); // Retries if this throws
    return content;
  },
});

Key differences:

Feature	Temporal	Trigger.dev
Retry scope	Per-activity, configurable	Per-task, exponential backoff
Heartbeats	Explicit activity heartbeats for progress tracking	No native heartbeat mechanism
Timeouts	Separate schedule-to-close, start-to-close, heartbeat	Single task timeout
Determinism	Strict: workflows must be deterministic	Relaxed: tasks can have side effects
Versioning	Workflow versioning with compatibility checks	Task versioning via `id` changes

Temporal’s activity heartbeats let long-running work signal progress and detect worker failures. Trigger.dev expects tasks to complete or fail within their timeout window. For multi-hour operations, you’d split into smaller tasks or poll external state.

The relaxed determinism in Trigger.dev is intentional for agent workflows where tool calls naturally have side effects. Agent orchestration often involves calling external APIs, invoking LLMs, or interacting with databases in ways that cannot be purely deterministic. Trigger.dev optimizes for developer velocity in these scenarios, accepting the trade-off that replay behavior may differ slightly from the original execution if non-deterministic operations are involved. This flexibility accelerates development but requires discipline in financial or compliance contexts where strict determinism prevents subtle replay bugs.

TypeScript Async/Await Mapping to Durable Semantics

When a Promise rejects mid-workflow, Trigger.dev logs the error and triggers retry logic. The replay mechanism ensures idempotency for operations that succeeded before the failure.

Promise rejection handling:

Task throws or rejects a Promise
Runtime logs the failure with stack trace
Retry policy determines if another attempt happens
On retry, replay skips successful checkpoints and re-executes from failure point

Concrete example of replay behavior:

import Stripe from "stripe";
const stripe = new Stripe(process.env.STRIPE_KEY);

export const multiStepTask = task({
  id: "multi-step-example",
  retry: { maxAttempts: 3 },
  run: async ({ userId }: { userId: string }) => {
    // Step 1: Fetch user (succeeds)
    const user = await db.users.findUnique({ where: { id: userId } });
    console.log("Fetched user:", user.email);
    
    // Step 2: Call external API (throws on first attempt)
    const charge = await stripe.charges.create({
      amount: 1000,
      currency: "usd",
      source: user.stripeToken,
    });
    console.log("Charge result:", charge.id);
    
    // Step 3: Update database
    await db.users.update({
      where: { id: userId },
      data: { processed: true },
    });
    
    return { success: true };
  },
});

What happens on failure and replay:

First execution: Step 1 succeeds, logs user fetch. Step 2 throws StripeCardError. Runtime logs the error and schedules retry.
Second execution (replay): Step 1 returns cached user from log without hitting database. Console log does NOT print again. Step 2 re-executes the Stripe call. If it succeeds, Step 3 runs fresh.
Key insight: Side effects before the failure point (console logs, database reads) are not re-executed. Only the failed operation and everything after it runs again.

Non-determinism risks:

Trigger.dev allows side effects in task code (database writes, API calls). This differs from Temporal’s strict determinism requirement. If you call Math.random() or Date.now() in a task, replay will produce different values.

Concrete failure scenario: If you generate a random order ID using Math.random() and insert it into a database, replay will generate a different ID. If the original execution inserted the first ID before crashing, replay inserts a second ID, creating duplicate records. The database now has two orders for the same user action.

Workaround: use the task’s context for deterministic values:

export const scheduledReport = task({
  id: "daily-report",
  run: async (payload, { ctx }) => {
    // ctx.run.startedAt is deterministic across replays
    const reportDate = ctx.run.startedAt;
    
    // Don't use Date.now() directly
    const data = await fetchData(reportDate);
    return generateReport(data);
  },
});

Deployment and Scaling Shape

Trigger.dev workers:

Run as long-lived Node.js processes
Poll the Trigger.dev platform for task assignments
Auto-scale based on queue depth (managed service) or manual scaling (self-hosted)
Each worker can handle multiple concurrent tasks up to memory limits

Temporal workers:

Run as Go or TypeScript processes (SDK-dependent)
Poll Temporal server for workflow and activity tasks
Require separate worker pools for workflows vs. activities
Horizontal scaling via worker count, vertical via task slots per worker

Infrastructure comparison:

Component	Trigger.dev	Temporal
Orchestration server	Managed SaaS or self-hosted API	Self-hosted Temporal server cluster (requires 3+ nodes for HA)
State storage	PostgreSQL (managed) or your DB	Cassandra, PostgreSQL, MySQL
Worker runtime	Node.js/Bun	Go, Java, TypeScript, Python
Observability	Built-in dashboard	Temporal Web UI + custom metrics
Deployment	Docker container or serverless	Kubernetes, VMs, or containers
Operational burden	Zero cluster ops (managed) or single-service deployment (self-hosted)	Requires cluster management, monitoring, and database tuning

Trigger.dev’s managed service handles the orchestration layer. You deploy workers as containers or serverless functions. Temporal requires running the server cluster yourself (or using Temporal Cloud), plus managing worker deployments. For teams with existing Kubernetes infrastructure, Temporal’s container-native design reduces operational overhead. Trigger.dev’s managed service eliminates cluster management but adds vendor lock-in risk.

Versioning Long-Running Workflows

When workflow code changes while instances are still executing, you need a versioning strategy.

Trigger.dev approach:

Task id acts as the version identifier
Changing the task code without changing id applies to new runs only
Running tasks continue with the code version they started with
To force migration, create a new task with a new id and trigger it from the old one

Temporal approach:

Workflow type name + version number
Supports patching: conditional logic based on whether a workflow started before or after a code change
Worker can run multiple versions simultaneously
More complex but handles gradual migration

Example of Trigger.dev task evolution:

// V1: original task
export const processUserV1 = task({
  id: "process-user-v1",
  run: async ({ userId }) => {
    const user = await db.users.findUnique({ where: { id: userId } });
    await sendEmail(user.email, "Welcome!");
  },
});

// V2: added validation step
export const processUserV2 = task({
  id: "process-user-v2",
  run: async ({ userId }) => {
    const user = await db.users.findUnique({ where: { id: userId } });
    
    // New validation logic
    if (!user.emailVerified) {
      throw new Error("Email not verified");
    }
    
    await sendEmail(user.email, "Welcome!");
  },
});

Running V1 tasks complete with old logic. New triggers use V2. No automatic migration path.

Observability and Failure Modes

Trigger.dev provides a dashboard showing:

Task execution timeline with checkpoint visibility
Retry attempts and backoff progression
Logs from each task run
Queue depth and worker utilization

Dashboard comparison:

Feature	Trigger.dev Dashboard	Temporal Web UI
Checkpoint visibility	Shows each await boundary with timestamp	Shows activity boundaries and decisions
Retry timeline	Visual timeline with backoff intervals	Activity retry history with attempt details
State inspection	JSON view of logged results per checkpoint	Workflow history events with payloads
Live logs	Streamed console output from tasks	Requires custom logging integration
Queue metrics	Built-in queue depth and worker utilization	Requires Prometheus + custom dashboards

Trigger.dev’s dashboard is opinionated and batteries-included. Temporal Web UI is powerful but requires more instrumentation for production observability. Use Trigger.dev’s dashboard for quick debugging and real-time monitoring. Use Temporal’s event history for forensic replay analysis and deep inspection of workflow decision paths.

Common failure modes:

Worker crashes mid-task: Replay from last checkpoint when worker restarts
Database unavailable during checkpoint: Task retries with backoff
Non-deterministic code on replay: Different execution path, potential state corruption
Memory exhaustion from large state: Task fails, requires splitting into smaller tasks
Version mismatch during replay: Running task continues with original code version

Monitoring strategy:

Track retry rates per task type
Alert on tasks exceeding max attempts
Monitor checkpoint frequency (too many = performance hit, too few = large replay windows)
Watch for non-determinism by comparing replay logs to original execution

Agent Orchestration Use Case

Durable execution shines for multi-step agent workflows where tool calls can fail, external APIs timeout, or human approval gates block progress.

Example: error recovery with fallback tools:

export const dataExtractionAgent = task({
  id: "extract-with-fallback",
  retry: {
    maxAttempts: 3,
    factor: 2,
  },
  run: async ({ documentUrl }: { documentUrl: string }) => {
    let extractedData = null;
    
    // Try primary extraction service
    try {
      extractedData = await primaryExtractor.extract(documentUrl);
    } catch (error) {
      // Checkpoint before fallback
      logger.warn("Primary extractor failed, trying fallback", { error });
      
      // Fallback to OCR + LLM pipeline
      const ocrText = await ocrService.process(documentUrl);
      extractedData = await llm.generateText({
        model: "gpt-4",
        prompt: `Extract structured data from: ${ocrText}`,
      });
    }
    
    // Human approval gate for low-confidence extractions
    if (extractedData.confidence < 0.8) {
      const approval = await waitForApproval({
        data: extractedData,
        timeout: "24h",
      });
      
      if (!approval.approved) {
        throw new Error("Human rejected extraction");
      }
      
      extractedData = approval.correctedData;
    }
    
    await db.documents.update({
      where: { url: documentUrl },
      data: { extractedData, status: "processed" },
    });
    
    return extractedData;
  },
});

If the OCR service times out, the task retries from that checkpoint. The primary extractor failure is logged but not re-attempted. The human approval gate can pause execution for hours without losing state.

Technical Verdict

Use Trigger.dev when:

Your team is TypeScript-first and wants to avoid Go or Java
Task execution time is under 4 hours with per-task state under 50MB
Retry requirements stay below 10 attempts per task run
You need durable execution without operating a Temporal cluster
Agent orchestration requires flexible tool calling with side effects
Managed service is acceptable (or you can self-host the simpler architecture)
You want built-in observability without custom Prometheus instrumentation
Workflow latency tolerance is above 100ms per decision point

Avoid Trigger.dev when:

You need strict determinism guarantees for financial transactions or compliance workflows
Workflows exceed 7-day duration or require complex versioning during execution
You require activity heartbeats for long-running operations (multi-hour video encoding, distributed map-reduce jobs)
Your team already operates Temporal and has Go expertise
You need saga patterns or advanced compensation logic for distributed transactions
Per