mech.app
Automation

Trigger.dev V2: Durable Execution Plumbing for TypeScript Workflows

How Trigger.dev's pivot from event automation to durable execution exposes retry semantics, state persistence, and TypeScript workflow ergonomics.

Source: trigger.dev
Trigger.dev V2: Durable Execution Plumbing for TypeScript Workflows

Trigger.dev started as a TypeScript-native Zapier alternative (745 points, Feb 2023), then pivoted to durable execution after user feedback revealed what developers actually needed: workflows that survive crashes, retry intelligently, and orchestrate long-running tasks without manual checkpointing. The V2 announcement (172 points, Oct 2023) positioned it explicitly as a Temporal alternative for TypeScript teams.

The shift exposes a fundamental split in workflow orchestration. Event-driven systems like Zapier chain discrete steps with webhook glue. Durable execution systems like Temporal and Trigger.dev persist workflow state across process restarts, handle retries at the framework level, and let you write imperative code that looks synchronous but survives infrastructure failures.

State Persistence Without Checkpointing Code

Trigger.dev persists workflow state by intercepting async operations and logging them to a durable event log. When a task crashes mid-execution, the runtime replays the log on restart and reconstructs in-memory state up to the failure point.

How it works:

  • Every await boundary in your task becomes a checkpoint
  • The runtime logs each async call (HTTP request, database query, tool invocation) with its result
  • On replay, logged results are returned immediately without re-executing side effects
  • Only new operations after the last checkpoint actually run

This means you write normal TypeScript async/await code. The framework handles durability:

export const processOrder = task({
  id: "process-order",
  run: async ({ orderId }: { orderId: string }) => {
    // Each await is a checkpoint
    const order = await db.orders.findUnique({ where: { id: orderId } });
    
    // If crash happens here, replay starts from this point
    const payment = await stripe.charges.create({
      amount: order.total,
      currency: "usd",
    });
    
    // State persisted across this boundary too
    await db.orders.update({
      where: { id: orderId },
      data: { paymentId: payment.id, status: "paid" },
    });
    
    return { orderId, paymentId: payment.id };
  },
});

If the process dies after the Stripe charge but before the database update, replay will skip the Stripe call (using the logged result) and proceed directly to the update.

Retry Primitives vs. Temporal’s Activity Model

Temporal separates workflows (orchestration logic) from activities (side-effecting operations). Activities get explicit retry policies, heartbeats for long-running work, and timeout controls. Workflows are deterministic replay machines.

Trigger.dev collapses this distinction. Everything is a task. Retries happen at the task level with exponential backoff:

export const flakyScrape = task({
  id: "scrape-with-retries",
  retry: {
    maxAttempts: 5,
    factor: 2,
    minTimeoutInMs: 1000,
    maxTimeoutInMs: 60000,
  },
  run: async ({ url }: { url: string }) => {
    const page = await browser.newPage();
    const content = await page.goto(url); // Retries if this throws
    return content;
  },
});

Key differences:

FeatureTemporalTrigger.dev
Retry scopePer-activity, configurablePer-task, exponential backoff
HeartbeatsExplicit activity heartbeats for progress trackingNo native heartbeat mechanism
TimeoutsSeparate schedule-to-close, start-to-close, heartbeatSingle task timeout
DeterminismStrict: workflows must be deterministicRelaxed: tasks can have side effects
VersioningWorkflow versioning with compatibility checksTask versioning via id changes

Temporal’s activity heartbeats let long-running work signal progress and detect worker failures. Trigger.dev expects tasks to complete or fail within their timeout window. For multi-hour operations, you’d split into smaller tasks or poll external state.

The relaxed determinism in Trigger.dev is intentional for agent workflows where tool calls naturally have side effects. Agent orchestration often involves calling external APIs, invoking LLMs, or interacting with databases in ways that cannot be purely deterministic. Trigger.dev optimizes for developer velocity in these scenarios, accepting the trade-off that replay behavior may differ slightly from the original execution if non-deterministic operations are involved. This flexibility accelerates development but requires discipline in financial or compliance contexts where strict determinism prevents subtle replay bugs.

TypeScript Async/Await Mapping to Durable Semantics

When a Promise rejects mid-workflow, Trigger.dev logs the error and triggers retry logic. The replay mechanism ensures idempotency for operations that succeeded before the failure.

Promise rejection handling:

  1. Task throws or rejects a Promise
  2. Runtime logs the failure with stack trace
  3. Retry policy determines if another attempt happens
  4. On retry, replay skips successful checkpoints and re-executes from failure point

Concrete example of replay behavior:

import Stripe from "stripe";
const stripe = new Stripe(process.env.STRIPE_KEY);

export const multiStepTask = task({
  id: "multi-step-example",
  retry: { maxAttempts: 3 },
  run: async ({ userId }: { userId: string }) => {
    // Step 1: Fetch user (succeeds)
    const user = await db.users.findUnique({ where: { id: userId } });
    console.log("Fetched user:", user.email);
    
    // Step 2: Call external API (throws on first attempt)
    const charge = await stripe.charges.create({
      amount: 1000,
      currency: "usd",
      source: user.stripeToken,
    });
    console.log("Charge result:", charge.id);
    
    // Step 3: Update database
    await db.users.update({
      where: { id: userId },
      data: { processed: true },
    });
    
    return { success: true };
  },
});

What happens on failure and replay:

  • First execution: Step 1 succeeds, logs user fetch. Step 2 throws StripeCardError. Runtime logs the error and schedules retry.
  • Second execution (replay): Step 1 returns cached user from log without hitting database. Console log does NOT print again. Step 2 re-executes the Stripe call. If it succeeds, Step 3 runs fresh.
  • Key insight: Side effects before the failure point (console logs, database reads) are not re-executed. Only the failed operation and everything after it runs again.

Non-determinism risks:

Trigger.dev allows side effects in task code (database writes, API calls). This differs from Temporal’s strict determinism requirement. If you call Math.random() or Date.now() in a task, replay will produce different values.

Concrete failure scenario: If you generate a random order ID using Math.random() and insert it into a database, replay will generate a different ID. If the original execution inserted the first ID before crashing, replay inserts a second ID, creating duplicate records. The database now has two orders for the same user action.

Workaround: use the task’s context for deterministic values:

export const scheduledReport = task({
  id: "daily-report",
  run: async (payload, { ctx }) => {
    // ctx.run.startedAt is deterministic across replays
    const reportDate = ctx.run.startedAt;
    
    // Don't use Date.now() directly
    const data = await fetchData(reportDate);
    return generateReport(data);
  },
});

Deployment and Scaling Shape

Trigger.dev workers:

  • Run as long-lived Node.js processes
  • Poll the Trigger.dev platform for task assignments
  • Auto-scale based on queue depth (managed service) or manual scaling (self-hosted)
  • Each worker can handle multiple concurrent tasks up to memory limits

Temporal workers:

  • Run as Go or TypeScript processes (SDK-dependent)
  • Poll Temporal server for workflow and activity tasks
  • Require separate worker pools for workflows vs. activities
  • Horizontal scaling via worker count, vertical via task slots per worker

Infrastructure comparison:

ComponentTrigger.devTemporal
Orchestration serverManaged SaaS or self-hosted APISelf-hosted Temporal server cluster (requires 3+ nodes for HA)
State storagePostgreSQL (managed) or your DBCassandra, PostgreSQL, MySQL
Worker runtimeNode.js/BunGo, Java, TypeScript, Python
ObservabilityBuilt-in dashboardTemporal Web UI + custom metrics
DeploymentDocker container or serverlessKubernetes, VMs, or containers
Operational burdenZero cluster ops (managed) or single-service deployment (self-hosted)Requires cluster management, monitoring, and database tuning

Trigger.dev’s managed service handles the orchestration layer. You deploy workers as containers or serverless functions. Temporal requires running the server cluster yourself (or using Temporal Cloud), plus managing worker deployments. For teams with existing Kubernetes infrastructure, Temporal’s container-native design reduces operational overhead. Trigger.dev’s managed service eliminates cluster management but adds vendor lock-in risk.

Versioning Long-Running Workflows

When workflow code changes while instances are still executing, you need a versioning strategy.

Trigger.dev approach:

  • Task id acts as the version identifier
  • Changing the task code without changing id applies to new runs only
  • Running tasks continue with the code version they started with
  • To force migration, create a new task with a new id and trigger it from the old one

Temporal approach:

  • Workflow type name + version number
  • Supports patching: conditional logic based on whether a workflow started before or after a code change
  • Worker can run multiple versions simultaneously
  • More complex but handles gradual migration

Example of Trigger.dev task evolution:

// V1: original task
export const processUserV1 = task({
  id: "process-user-v1",
  run: async ({ userId }) => {
    const user = await db.users.findUnique({ where: { id: userId } });
    await sendEmail(user.email, "Welcome!");
  },
});

// V2: added validation step
export const processUserV2 = task({
  id: "process-user-v2",
  run: async ({ userId }) => {
    const user = await db.users.findUnique({ where: { id: userId } });
    
    // New validation logic
    if (!user.emailVerified) {
      throw new Error("Email not verified");
    }
    
    await sendEmail(user.email, "Welcome!");
  },
});

Running V1 tasks complete with old logic. New triggers use V2. No automatic migration path.

Observability and Failure Modes

Trigger.dev provides a dashboard showing:

  • Task execution timeline with checkpoint visibility
  • Retry attempts and backoff progression
  • Logs from each task run
  • Queue depth and worker utilization

Dashboard comparison:

FeatureTrigger.dev DashboardTemporal Web UI
Checkpoint visibilityShows each await boundary with timestampShows activity boundaries and decisions
Retry timelineVisual timeline with backoff intervalsActivity retry history with attempt details
State inspectionJSON view of logged results per checkpointWorkflow history events with payloads
Live logsStreamed console output from tasksRequires custom logging integration
Queue metricsBuilt-in queue depth and worker utilizationRequires Prometheus + custom dashboards

Trigger.dev’s dashboard is opinionated and batteries-included. Temporal Web UI is powerful but requires more instrumentation for production observability. Use Trigger.dev’s dashboard for quick debugging and real-time monitoring. Use Temporal’s event history for forensic replay analysis and deep inspection of workflow decision paths.

Common failure modes:

  1. Worker crashes mid-task: Replay from last checkpoint when worker restarts
  2. Database unavailable during checkpoint: Task retries with backoff
  3. Non-deterministic code on replay: Different execution path, potential state corruption
  4. Memory exhaustion from large state: Task fails, requires splitting into smaller tasks
  5. Version mismatch during replay: Running task continues with original code version

Monitoring strategy:

  • Track retry rates per task type
  • Alert on tasks exceeding max attempts
  • Monitor checkpoint frequency (too many = performance hit, too few = large replay windows)
  • Watch for non-determinism by comparing replay logs to original execution

Agent Orchestration Use Case

Durable execution shines for multi-step agent workflows where tool calls can fail, external APIs timeout, or human approval gates block progress.

Example: error recovery with fallback tools:

export const dataExtractionAgent = task({
  id: "extract-with-fallback",
  retry: {
    maxAttempts: 3,
    factor: 2,
  },
  run: async ({ documentUrl }: { documentUrl: string }) => {
    let extractedData = null;
    
    // Try primary extraction service
    try {
      extractedData = await primaryExtractor.extract(documentUrl);
    } catch (error) {
      // Checkpoint before fallback
      logger.warn("Primary extractor failed, trying fallback", { error });
      
      // Fallback to OCR + LLM pipeline
      const ocrText = await ocrService.process(documentUrl);
      extractedData = await llm.generateText({
        model: "gpt-4",
        prompt: `Extract structured data from: ${ocrText}`,
      });
    }
    
    // Human approval gate for low-confidence extractions
    if (extractedData.confidence < 0.8) {
      const approval = await waitForApproval({
        data: extractedData,
        timeout: "24h",
      });
      
      if (!approval.approved) {
        throw new Error("Human rejected extraction");
      }
      
      extractedData = approval.correctedData;
    }
    
    await db.documents.update({
      where: { url: documentUrl },
      data: { extractedData, status: "processed" },
    });
    
    return extractedData;
  },
});

If the OCR service times out, the task retries from that checkpoint. The primary extractor failure is logged but not re-attempted. The human approval gate can pause execution for hours without losing state.

Technical Verdict

Use Trigger.dev when:

  • Your team is TypeScript-first and wants to avoid Go or Java
  • Task execution time is under 4 hours with per-task state under 50MB
  • Retry requirements stay below 10 attempts per task run
  • You need durable execution without operating a Temporal cluster
  • Agent orchestration requires flexible tool calling with side effects
  • Managed service is acceptable (or you can self-host the simpler architecture)
  • You want built-in observability without custom Prometheus instrumentation
  • Workflow latency tolerance is above 100ms per decision point

Avoid Trigger.dev when:

  • You need strict determinism guarantees for financial transactions or compliance workflows
  • Workflows exceed 7-day duration or require complex versioning during execution
  • You require activity heartbeats for long-running operations (multi-hour video encoding, distributed map-reduce jobs)
  • Your team already operates Temporal and has Go expertise
  • You need saga patterns or advanced compensation logic for distributed transactions
  • Per