Trigger.dev V2: TypeScript Workflow Engine Architecture Without Temporal's Go Runtime

Trigger.dev launched in February 2023 as a Zapier alternative, collected 745 points on Hacker News, then pivoted eight months later to durable execution primitives. The V2 architecture (172 points, October 2023) positions itself as a Temporal competitor for TypeScript developers. It offers workflow guarantees without requiring Go or event sourcing knowledge.

The shift matters because most workflow engines force you into one of two camps: JavaScript task queues with weak durability (BullMQ, Agenda) or strongly consistent systems that require polyglot runtimes (Temporal, Cadence). Trigger.dev tries to split the difference by offering Temporal-style semantics in a TypeScript-native execution model.

What Changed Between V1 and V2

V1 focused on integration connectors and webhook triggers. You wrote event handlers that responded to external services. V2 rebuilt the core around long-running tasks with explicit retry boundaries, state checkpoints, and scheduling primitives.

Key architectural differences:

V1: Event-driven handlers, no built-in state persistence, relied on external queues
V2: Durable execution model, automatic retries, checkpoint-based state recovery
Execution isolation: Moved from shared Node processes to containerized task runners
Observability: Added structured tracing with step-level replay and debugging hooks

The pivot came from user feedback. Developers wanted to run multi-hour jobs, handle transient failures gracefully, and debug workflow state without parsing logs. V1’s event model could not guarantee those properties.

State Persistence Without Event Sourcing

Temporal persists every workflow decision as an immutable event log. Replay reconstructs state by re-executing the workflow code against that log. Trigger.dev skips event sourcing and uses explicit checkpoints instead.

How it works:

Checkpoint API: You call checkpoint(key, data) at explicit boundaries in your task code
Postgres storage: Checkpoints write to a relational table with task ID, step index, and serialized state
Retry recovery: On failure, the runtime reloads the last checkpoint and resumes from that step
No replay: The task does not re-execute prior steps, it jumps directly to the failure point

This trades Temporal’s deterministic replay for simpler mental models. You control when state gets persisted. The downside: if you forget to checkpoint before an expensive operation, a crash forces you to redo that work.

Checkpoint Example

export const processOrder = task({
  id: "process-order",
  run: async ({ orderId }: { orderId: string }) => {
    // Fetch order data
    const order = await db.order.findUnique({ where: { id: orderId } });
    
    // Checkpoint after fetch
    await checkpoint("order-fetched", { order });
    
    // Call payment API (might fail)
    const payment = await stripe.charges.create({
      amount: order.total,
      currency: "usd",
      source: order.paymentToken,
    });
    
    // Checkpoint after payment
    await checkpoint("payment-complete", { payment });
    
    // Update inventory (idempotent)
    await db.inventory.decrement({ productId: order.productId });
    
    return { orderId, paymentId: payment.id };
  },
});

If the Stripe call fails, the retry starts from the order-fetched checkpoint. The database fetch does not run again. If inventory decrement fails, you restart from payment-complete and skip the charge.

Concurrency and Scheduling Primitives

Trigger.dev exposes three execution modes:

Mode	Trigger	Concurrency Control	Use Case
Scheduled	Cron expression	Global limit per task	Nightly ETL, report generation
Event-driven	Webhook, SDK call	Queue-based with max workers	User signup flows, API webhooks
Realtime	Frontend SDK	Per-connection stream	Live progress updates, chat agents

Queue Backpressure

Event-driven tasks use a priority queue backed by Postgres. You configure maxConcurrency per task definition. When the queue depth exceeds the worker pool size, new tasks wait in PENDING state.

Backpressure handling:

Rate limiting: Exponential backoff for tasks that fail repeatedly
Dead letter queue: After N retries, tasks move to a manual review queue
Priority override: You can bump specific task instances to the front

The system avoids distributed locking to eliminate coordination overhead. The coordinator polls the queue table with SELECT FOR UPDATE SKIP LOCKED to claim tasks. This design accepts the risk of occasional duplicate task claims in exchange for simpler infrastructure. Workers run in separate containers and report heartbeats every 30 seconds.

Scheduled Task Guarantees

Cron schedules persist in the database with next-run timestamps. A background process scans for due tasks every 10 seconds and enqueues them. If the scheduler crashes, the next instance picks up missed runs on startup.

Missed execution behavior:

Default: Skip missed runs, schedule the next occurrence
Catchup mode: Enqueue all missed runs sequentially
Idempotency key: Prevent duplicate execution if scheduler restarts mid-enqueue

No distributed cron coordination. If you run multiple scheduler instances, you need external leader election (not included).

Execution Isolation Model

V2 runs each task in a dedicated Docker container. The runtime spins up a container from a pre-built image that includes your task code, executes the function, then tears down the container.

Isolation boundaries:

Filesystem: Ephemeral, wiped after task completion
Network: Outbound allowed, inbound blocked except for health checks
Memory: Configurable limit (default 512MB), OOM kills trigger retries
CPU: Shared, no hard limits unless you configure cgroups

The container runtime uses a sidecar proxy to intercept checkpoint calls and forward them to the coordinator API. Your task code never talks directly to Postgres.

Cold Start Mitigation

Container startup adds 2-5 seconds of latency. Trigger.dev keeps a warm pool of containers for frequently executed tasks. When a task completes, the container stays alive for 60 seconds. If another instance of the same task arrives, it reuses the warm container.

Warm pool sizing:

Per-task limit: Max 5 warm containers per task definition
Global limit: Max 50 warm containers across all tasks
Eviction policy: LRU, with priority boost for tasks with high execution frequency

You can disable warm pools for tasks that need strict isolation or have large memory footprints.

Observability and Debugging

Every task execution generates a trace with step-level spans. The UI shows:

Timeline view: Visual breakdown of checkpoint boundaries and retry attempts
State inspector: JSON view of checkpoint payloads at each step
Log aggregation: Structured logs with correlation IDs across retries
Replay mode: Re-run a failed task from any checkpoint with modified input

Trace IDs propagate through HTTP headers if your task calls external APIs. You can link Trigger.dev traces to OpenTelemetry spans in your own services.

Failure Mode Visibility

The dashboard flags common failure patterns:

Timeout loops: Tasks that hit max duration repeatedly
Retry storms: High failure rate across multiple task instances
Checkpoint gaps: Long execution spans without checkpoints (risk of wasted retries)
Memory leaks: Containers that grow memory usage across warm starts

No automatic remediation. The system surfaces the pattern and lets you decide whether to adjust retry limits, add checkpoints, or refactor task logic.

Comparison: Trigger.dev vs. Temporal

Dimension	Trigger.dev	Temporal
Language	TypeScript only	Polyglot (Go, Java, Python, TypeScript)
State model	Explicit checkpoints	Event sourcing with replay
Execution	Containerized tasks	Worker processes with sticky queues
Scheduling	Cron + event triggers	Timers, signals, child workflows
Observability	Built-in UI with replay	Requires external tracing setup
Self-hosting	Docker Compose, Kubernetes	Kubernetes with Cassandra/Postgres
Learning curve	Low (familiar async/await)	High (workflow determinism rules)

Temporal guarantees exactly-once execution through deterministic replay. Trigger.dev guarantees at-least-once with idempotency keys. If your task has side effects (API calls, database writes), you must handle deduplication yourself.

Deployment Shape

Trigger.dev runs as three services:

Coordinator: Handles task scheduling, queue management, checkpoint storage
Worker pool: Spins up containers and executes task code
API gateway: Exposes SDK endpoints for triggering tasks and querying state

Managed cloud deployment uses AWS ECS for workers and RDS Postgres for state. Self-hosted setup provides Docker Compose and Helm charts.

Resource requirements (self-hosted):

Coordinator: 1 CPU, 2GB RAM, scales horizontally
Worker pool: 2 CPU, 4GB RAM per worker node, autoscales based on queue depth
Postgres: 2 CPU, 8GB RAM, replication recommended for production

No external dependencies beyond Postgres and Docker. No Kafka, no Redis, no Elasticsearch.

Technical Verdict

Use Trigger.dev if:

Your entire stack is TypeScript and you want to avoid polyglot workflow engines
Tasks run for minutes to hours, not milliseconds
You need built-in observability and replay without configuring distributed tracing infrastructure
At-least-once execution is acceptable and you can add idempotency keys to side effects
You want simpler mental models than event sourcing and deterministic replay
Cold start latency of 2-5 seconds is tolerable for your use case

Avoid Trigger.dev if:

You need exactly-once guarantees for financial transactions, inventory updates, or other critical state changes
Your workflows require complex branching, parallel execution, or saga compensation patterns
You already run Temporal and have invested in event sourcing patterns across your organization
You need sub-second task latency (container cold starts add unavoidable overhead)
Your team works in multiple languages and needs polyglot workflow support
You require advanced workflow primitives like signals, queries, or child workflow orchestration

Trigger.dev fits AI agent orchestration, ETL pipelines, and async API workflows where tasks have clear boundaries and idempotency is manageable. It struggles with high-throughput event processing or workflows that need strong consistency across distributed transactions without manual coordination.

The V2 pivot from Zapier-style integrations to workflow primitives shows clear product-market fit. The 172-point Show HN and open-source traction suggest developers want TypeScript-native orchestration that does not force them into Go or Java ecosystems. The checkpoint-based state model trades Temporal’s replay guarantees for operational simplicity and a lower learning curve.