mech.app
Automation

Trigger.dev V2: What a Temporal Alternative Reveals About Durable Execution for TypeScript Agents

How Trigger.dev's pivot from Zapier clone to Temporal alternative exposes the infrastructure gap between simple webhooks and long-running agent workflows.

Source: trigger.dev
Trigger.dev V2: What a Temporal Alternative Reveals About Durable Execution for TypeScript Agents

Trigger.dev launched as a Zapier alternative for developers and scored 745 HN points. The V2 pivot to a Temporal alternative for TypeScript scored 172 points with 39 comments. The shift reveals quantifiable developer demand: they need durable execution without adopting Go or Java ecosystems. They want tasks that survive restarts, retry intelligently, and maintain state across hours or days, but they don’t want to learn Temporal’s workflow determinism model or leave TypeScript.

The V2 architecture reveals what happens when you strip workflow orchestration down to the primitives TypeScript developers actually use for long-running tasks.

Why This Matters

V1 scored 745 points as a Zapier alternative. V2 scored 172 points as a Temporal alternative. The drop masks a shift in what developers actually wanted. User feedback drove the architectural change: developers wanted durable execution, not webhook orchestration. The V2 launch explicitly targets the Temporal use case but keeps TypeScript as the execution environment.

The pivot reveals where most agent workflows actually live: between simple automation (webhooks, cron jobs) and heavyweight workflow engines (Temporal, Cadence). Most agent workflows fall into that gap. They need state persistence and retry logic but don’t require Temporal’s event sourcing guarantees or multi-language support.

The Pivot: From Webhooks to Durable Execution

V1 focused on webhook-triggered automation. Users wanted something different: tasks that survive restarts, retry intelligently, and maintain state across hours or days. That’s the durable execution problem Temporal solves, but Temporal often requires learning a new mental model around workflow determinism and activity separation, plus adopting a new language runtime.

Trigger.dev V2 keeps TypeScript as the execution environment and exposes three core primitives:

  • Task definitions with automatic retry and timeout configuration
  • State checkpointing that survives process crashes
  • Execution observability without manual instrumentation

The platform handles deployment, worker orchestration, and state storage. You write tasks as async functions.

Architecture: How State Persists

Trigger.dev runs tasks in isolated worker processes. Each task execution gets:

  1. A unique run ID tied to persistent storage
  2. Automatic checkpointing at await boundaries
  3. Retry logic that resumes from the last successful checkpoint

When a task calls an external API or waits on a long operation, the platform serializes execution state to Postgres. If the worker dies, a new worker picks up the run and resumes from the checkpoint.

The serialization mechanism uses Postgres JSONB columns for schema flexibility. At each await boundary, the runtime captures the execution context: local variables, function arguments, and the current position in the call stack. This gets serialized as JSON and written to a task_runs table with the run ID as the key. When a worker resumes, it reads the JSONB blob, deserializes the context, and continues from the next line after the checkpoint.

This snapshot model breaks when tasks hold non-serializable state. If your task context includes circular references, open file handles, or non-serializable objects (class instances with methods, WeakMaps), serialization fails and the task crashes. The platform doesn’t attempt to serialize these automatically. You must structure your tasks to hold only plain data between checkpoints.

This differs from Temporal’s event sourcing model. Temporal replays the entire workflow history to reconstruct state. Trigger.dev snapshots state at specific points, trading replay flexibility for simpler mental models.

// Example based on Trigger.dev SDK patterns
export const processDocument = task({
  id: "process-document",
  retry: {
    maxAttempts: 3,
    factor: 2,
    minTimeout: 1000,
    maxTimeout: 10000,
  },
  run: async (payload: { documentId: string }) => {
    // Checkpoint after fetch: if worker crashes here, retry resumes with cached document
    const doc = await fetchDocument(payload.documentId);
    
    // Checkpoint after extraction: text extraction may timeout on large files
    const text = await extractText(doc);
    
    // Checkpoint after embedding: API rate limits trigger retries from this point
    const embedding = await generateEmbedding(text);
    
    // Checkpoint after storage: database writes can fail transiently
    await storeEmbedding(payload.documentId, embedding);
    
    return { success: true, documentId: payload.documentId };
  },
});

Each await creates a checkpoint. If the worker crashes after extractText but before generateEmbedding, the retry starts at the embedding step.

Retry Semantics and Failure Modes

The retry configuration exposes exponential backoff with jitter. You control:

  • Maximum attempts before permanent failure
  • Backoff multiplier between attempts
  • Min and max timeout boundaries

Failures fall into three categories:

Failure TypeBehaviorUse Case
Transient errorAutomatic retry with backoffOpenAI API rate limits, network blips
Permanent errorImmediate failure, no retryInvalid input, auth failure
TimeoutRetry from last checkpointLong-running external calls (video processing)

The platform doesn’t retry on thrown errors by default. You mark errors as retryable explicitly:

import { retry } from "@trigger.dev/sdk";

// Transient error: tell the platform to retry after 60 seconds
throw retry.error("Rate limited", { 
  retryAt: new Date(Date.now() + 60000) 
});

This gives you control over which failures justify a retry and when to schedule it. Permanent errors (authentication failures, invalid input) should throw standard Error objects without the retry wrapper.

Deployment Model and Worker Isolation

Trigger.dev offers two deployment paths:

Managed cloud: Workers run in isolated containers on Trigger.dev infrastructure. You push code via CLI, the platform handles scaling and state storage. Secrets live in environment variables scoped per project.

Self-hosted: You run the orchestrator and workers in your infrastructure. State goes to your Postgres instance. This mode requires managing worker scaling and observability yourself.

Both modes use the same task API. The difference is who operates the control plane.

Worker isolation happens at the container level. Each task execution runs in a separate process with its own memory space. Long-running tasks don’t block other work. Concurrency limits prevent resource exhaustion:

// Example based on Trigger.dev SDK patterns
export const heavyTask = task({
  id: "heavy-task",
  queue: {
    concurrencyLimit: 5,
  },
  run: async (payload) => {
    // Only 5 instances run simultaneously
  },
});

State Management for Multi-Day Workflows

Agent workflows often span days. A research task might wait for human approval, then continue processing. Trigger.dev handles this with scheduled resumes.

The platform exposes a waitFor primitive (or similar pattern depending on SDK version) that suspends execution and stores state. You configure a webhook endpoint or scheduled check that triggers resume. When the condition is met (approval received, external event fires), the platform loads the serialized state from Postgres and resumes the task from the next line.

// Example based on Trigger.dev SDK patterns
export const approvalWorkflow = task({
  id: "approval-workflow",
  run: async (payload: { reportId: string }) => {
    const report = await generateReport(payload.reportId);
    
    await sendForApproval(report);
    
    // Suspend execution: platform stores state and waits for external trigger
    const approval = await waitForApproval(payload.reportId, {
      timeout: "7d",
    });
    
    if (approval.approved) {
      await publishReport(report);
    }
    
    return { status: approval.approved ? "published" : "rejected" };
  },
});

The worker doesn’t hold resources during the wait. The task is suspended, state is written to Postgres, and the worker is freed. When the approval webhook fires or the scheduled check detects approval, the orchestrator spawns a new worker, loads the state, and resumes.

State storage uses Postgres JSONB columns. Each checkpoint serializes the execution context (variables, call stack position) as JSON. This limits what you can persist: no open file handles, no active network connections, no closures over non-serializable objects.

AI Agent Patterns: Tool Calling and Multi-Step Research

The platform supports AI agent workflows with tool calling. An agent task can invoke multiple tools (search, browse, analyze) across several steps, with each tool call creating a checkpoint.

// Example from Trigger.dev documentation
export const researchAgent = task({
  id: "research-agent",
  run: async ({ topic }: { topic: string }) => {
    const messages: CoreMessage[] = [
      { role: "user", content: `Research: ${topic}` },
    ];
    
    for (let i = 0; i < 10; i++) {
      const { text, toolCalls, steps } = await generateText({
        model: anthropic("claude-opus-4-20250514"),
        system: "You are a research assistant with web access.",
        messages,
        tools: {
          search: searchTool,
          browse: browseTool,
          analyze: analyzeTool
        },
        maxSteps: 5,
      });
      
      if (!toolCalls.length) {
        return { summary: text, stepsUsed: steps.length };
      }
      
      // Checkpoint after each tool execution
      for (const call of toolCalls) {
        const result = await executeTool(call);
        messages.push({ role: "tool", content: result });
      }
    }
  },
});

Each await executeTool(call) creates a checkpoint. If a tool call times out or the worker crashes, the retry resumes from the last successful tool result. The agent doesn’t re-execute completed tool calls.

This pattern works for multi-step research, data enrichment, and human-in-the-loop workflows where the agent waits for external input between steps.

Comparison: Trigger.dev vs. Temporal

DimensionTrigger.devTemporal
Language supportTypeScript/JavaScriptGo, Java, Python, TypeScript (SDK)
State modelCheckpoint snapshotsEvent sourcing with replay
DeploymentManaged cloud or self-hostedSelf-hosted or Temporal Cloud
Learning curveAsync functions with retry configWorkflow/activity split, determinism rules
Scaling modelAutomatic worker scaling (managed)Automatic (Temporal Cloud) or manual (self-hosted)
ObservabilityBuilt-in dashboardRequires separate UI setup (self-hosted)

Temporal gives you stronger guarantees about execution history and supports more complex workflow patterns (sagas, compensation logic). Trigger.dev trades those guarantees for simpler onboarding and tighter TypeScript integration.

Serialization Constraints

The checkpoint model has limits. If your task holds state that can’t serialize (database connections, file handles, WebSocket clients), you’ll hit runtime errors. The workaround is to reinitialize those resources after each checkpoint:

// Example based on Trigger.dev SDK patterns
export const streamingTask = task({
  id: "streaming-task",
  run: async (payload) => {
    // Don't persist the connection
    let connection = await openConnection();
    
    for (const batch of payload.batches) {
      // Checkpoint happens here: connection may be stale after resume
      await processBatch(batch);
      
      // Reconnect after checkpoint if needed
      if (!connection.isAlive()) {
        connection = await openConnection();
      }
    }
  },
});

This pattern works but adds boilerplate. Temporal’s deterministic replay model avoids this by separating side effects into activities.

Technical Verdict

Use Trigger.dev when:

  • You’re building agent workflows in TypeScript and want to avoid learning Temporal’s workflow model
  • Your tasks fit the checkpoint pattern (mostly async I/O, no complex state machines)
  • You prefer managed infrastructure over operating your own orchestration cluster
  • Observability out of the box matters more than custom tracing integrations

Avoid it when:

  • You need workflow patterns Temporal excels at (sagas, compensation, complex branching)
  • Your tasks hold non-serializable state (open connections, file handles)
  • You require multi-language support across the same workflow
  • You’re already invested in Temporal and need migration justification

The V2 pivot shows what developers building agent infrastructure actually need: durable execution without the operational weight of self-hosted Temporal. The checkpoint model is simpler but less flexible. That trade-off works for most TypeScript-native agent workflows.