mech.app
Automation

Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

How Trigger.dev pivoted from webhook glue to durable execution, exposing the infrastructure differences between workflow orchestration and long-running...

Source: trigger.dev
Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

Trigger.dev launched in February 2023 as a “developer-first open source Zapier alternative” and pulled 745 points on Hacker News. Eight months later, the team shipped V2 with a completely different pitch: a Temporal alternative for TypeScript developers. That pivot, from webhook glue to durable execution primitives, teaches more about what developers building agents actually need than any feature list.

The shift happened because early users kept asking for the same thing: long-running background jobs that survive restarts, retries that don’t lose state, and resumability without manual checkpoint management. They didn’t want another event router. They wanted workflow orchestration that felt native to TypeScript.

What Changed Between V1 and V2

V1 focused on triggering actions from external events. You connected webhooks, scheduled cron jobs, and chained API calls. The architecture assumed short-lived executions and stateless handlers.

V2 rebuilt the core around durable execution. Tasks can run for hours or days. If a worker crashes, the task resumes from the last checkpoint. Retries preserve context. The platform manages state persistence, so you write business logic instead of recovery code.

The technical difference is state management. V1 treated each invocation as independent. V2 tracks execution history, stores intermediate results, and reconstructs runtime context after failures. That’s the gap between event-driven automation and workflow orchestration.

Durable Execution Without Go

Temporal is the reference implementation for durable execution, but it’s written in Go and requires running separate worker processes. Trigger.dev targets TypeScript developers who want the same guarantees without leaving their language or managing infrastructure.

The core primitives:

  • Task definitions wrap async functions with retry policies, timeout configs, and concurrency limits
  • Automatic checkpointing serializes state at await boundaries
  • Resumability reconstructs execution context from persisted history
  • Idempotency keys prevent duplicate work when retries overlap

Here’s what a durable task looks like:

export const processOrder = task({
  id: "process-order",
  retry: {
    maxAttempts: 5,
    factor: 2,
    minTimeout: 1000,
    maxTimeout: 60000,
  },
  run: async (payload: { orderId: string }) => {
    // This checkpoint survives worker restarts
    const order = await fetchOrder(payload.orderId);
    
    // Each await creates a resumption point
    const payment = await chargeCard(order.paymentMethod);
    
    // If this fails, retry starts here, not from the beginning
    const shipment = await createShipment(order.items);
    
    return { orderId: order.id, trackingNumber: shipment.tracking };
  },
});

The platform intercepts each await, stores the result, and marks a checkpoint. If the worker dies after chargeCard completes but before createShipment runs, the next execution skips the payment step and resumes at shipment creation.

State Persistence and Replay

Trigger.dev uses a Postgres-backed event log. Each task execution generates a sequence of events: TaskStarted, CheckpointCreated, StepCompleted, TaskFailed, TaskSucceeded. The worker reads this log to rebuild state.

When a task resumes:

  1. Load the event log for the execution ID
  2. Replay completed steps without re-executing side effects
  3. Resume from the first incomplete step
  4. Continue appending new events

This is similar to Temporal’s event sourcing model, but implemented in TypeScript with a simpler storage layer. The tradeoff is less flexibility for complex branching workflows, but faster onboarding for developers who don’t need full workflow DSLs.

Retry Logic and Failure Boundaries

Retries in durable execution systems have different semantics than simple HTTP retries. The system needs to distinguish between transient failures (network timeout, rate limit) and permanent failures (invalid input, resource not found).

Trigger.dev’s retry configuration:

ParameterPurposeDefault
maxAttemptsTotal retry count including initial attempt3
factorExponential backoff multiplier2
minTimeoutInitial retry delay in milliseconds1000
maxTimeoutMaximum retry delay cap60000
randomizeAdd jitter to prevent thundering herdtrue

The platform tracks which steps succeeded, so retries only re-run failed operations. If chargeCard succeeds but createShipment fails, the retry skips payment processing entirely.

Developers can also define custom retry logic:

retry: {
  maxAttempts: 10,
  factor: 1.5,
  shouldRetry: (error, attempt) => {
    if (error.code === 'INVALID_INPUT') return false;
    if (error.code === 'RATE_LIMIT' && attempt < 5) return true;
    return attempt < 3;
  },
}

Execution Isolation and Multi-Tenancy

Running user-submitted code safely requires strong isolation. Trigger.dev uses containerized workers with resource limits and network policies.

Each task runs in a separate container with:

  • CPU and memory quotas
  • Restricted filesystem access
  • Network egress controls
  • Environment variable injection for secrets

Secrets management uses a key-value store encrypted at rest. Workers fetch secrets at runtime using scoped tokens that expire after task completion. This prevents one tenant’s code from accessing another tenant’s credentials.

The platform also enforces concurrency limits per task and per organization. If you configure maxConcurrency: 5, the scheduler queues additional invocations until a slot opens. This prevents runaway costs and resource exhaustion.

Observability and Debugging

Durable execution creates new debugging challenges. A task might fail on the 47th retry after running for six hours. Traditional logs don’t capture enough context.

Trigger.dev’s observability stack:

  • Execution timeline shows each step, checkpoint, and retry with timestamps
  • State snapshots display variable values at each checkpoint
  • Trace propagation links tasks that trigger other tasks
  • Metrics dashboard tracks success rates, retry counts, and duration percentiles

The timeline view is critical. You can see exactly which step failed, what the input was, how many times it retried, and what changed between attempts. This beats grepping logs for correlation IDs.

TypeScript vs. Go Worker Models

Temporal’s Go-based workers offer more control over execution semantics. You can implement custom activity heartbeats, signal handlers, and workflow versioning strategies. The cost is operational complexity: you run worker pools, manage deployments, and handle version skew.

Trigger.dev’s TypeScript workers trade flexibility for simplicity. The platform manages worker lifecycle, scaling, and deployment. You write tasks as async functions and let the runtime handle orchestration.

The comparison:

AspectTemporal (Go)Trigger.dev (TypeScript)
Worker deploymentSelf-managed containersFully managed
State persistenceCassandra or PostgresPostgres event log
Retry semanticsConfigurable per activityConfigurable per task
Language supportGo, Java, TypeScript, PythonTypeScript only
Workflow versioningManual version managementAutomatic based on code hash
Execution guaranteesExactly-once with determinismAt-least-once with idempotency

Temporal’s determinism requirement means workflows must be pure functions. Any non-deterministic operation (random number generation, external API calls) must happen in activities. Trigger.dev relaxes this constraint by using idempotency keys and replay detection instead of enforcing determinism.

When Durable Execution Matters for Agents

AI agents need durable execution because they make multiple LLM calls, wait for external tools, and handle unpredictable latencies. A research agent might:

  1. Call an LLM to generate search queries (30 seconds)
  2. Execute web searches in parallel (variable latency)
  3. Call the LLM again to synthesize results (45 seconds)
  4. Repeat for multiple iterations (minutes to hours)

If the worker crashes after step 2, you don’t want to re-run the expensive LLM calls. Durable execution preserves completed work and resumes from the last checkpoint.

The code example from Trigger.dev’s docs shows this pattern:

export const researchAgent = task({
  id: "research-agent",
  run: async ({ topic }: { topic: string }) => {
    const messages: CoreMessage[] = [
      { role: "user", content: `Research: ${topic}` },
    ];
    
    for (let i = 0; i < 10; i++) {
      const { text, toolCalls, steps } = await generateText({
        model: anthropic("claude-opus-4-20250514"),
        system: "You are a research assistant with web access.",
        messages,
        tools: { search, browse, analyze },
        maxSteps: 5,
      });
      
      if (!toolCalls.length) {
        return { summary: text, stepsUsed: steps.length };
      }
      
      for (const call of toolCalls) {
        const result = await executeTool(call);
        messages.push({ role: "tool", content: result });
      }
    }
  },
});

Each await generateText creates a checkpoint. If the task fails during tool execution, the retry resumes with the existing message history instead of starting over.

Deployment Shape and Scaling

Trigger.dev’s managed platform handles worker scaling automatically. When task volume increases, the system spawns additional containers. When load drops, it scales down.

For self-hosting, the architecture includes:

  • API server (Node.js) handles task submission and status queries
  • Scheduler (Node.js) manages task queues and concurrency limits
  • Workers (containerized Node.js) execute tasks
  • Postgres stores event logs and task metadata
  • Redis (optional) for distributed locking and caching

The self-hosted version requires more operational overhead but gives you control over data residency and resource allocation.

Likely Failure Modes

Durable execution systems fail in specific ways:

Poison messages: A task that always fails can block the queue. Trigger.dev mitigates this with dead-letter queues and configurable max attempts.

State bloat: Long-running tasks with many checkpoints accumulate large event logs. The platform prunes old events after task completion, but you can hit storage limits during execution.

Version skew: Deploying new task code while old executions are in-flight can cause deserialization errors. Trigger.dev uses code hashes to route executions to compatible worker versions.

Checkpoint overhead: Excessive checkpointing (awaiting inside tight loops) degrades performance. The runtime batches checkpoint writes, but you still pay serialization costs.

Idempotency violations: If your task has side effects that aren’t idempotent (incrementing counters, sending emails), retries can cause duplicates. You need to implement your own deduplication logic.

Technical Verdict

Use Trigger.dev when you need durable execution for TypeScript projects and don’t want to manage Temporal infrastructure. It’s a good fit for:

  • AI agent workflows with multiple LLM calls and tool invocations
  • Long-running data pipelines that process batches over hours
  • Background jobs that must survive deployments and restarts
  • Teams that want managed infrastructure with minimal ops overhead

Avoid it when:

  • You need sub-second latency (the checkpoint overhead adds milliseconds per step)
  • Your workflows require complex branching, parallel execution, or saga patterns (Temporal’s workflow DSL is more expressive)
  • You’re already running Temporal and have operational expertise
  • You need multi-language support (Trigger.dev is TypeScript only)

The V1-to-V2 pivot reveals what developers building agents actually need: not more webhook connectors, but primitives for managing state across long-running, failure-prone operations. Trigger.dev delivers that in a TypeScript-native package, trading Temporal’s flexibility for faster onboarding and lower operational complexity.