Libretto: How Coding Agents Generate Deterministic Browser Automations Instead of Runtime Prompts

Most agentic browser automation tools hand an LLM a prompt at runtime and hope it figures out the DOM. Libretto inverts this approach. It gives coding agents a skill and CLI to generate actual TypeScript automation scripts that you can inspect, version, and debug like any other code.

The shift matters because runtime prompt-driven automation is non-deterministic. The same prompt can produce different actions depending on model temperature, page state, or token budget. Libretto moves the LLM work to code generation time, then runs the resulting script deterministically. Saffron Health built Libretto to address this core pain point in production browser automation.

Architecture: Skill + CLI + Generated Code

Libretto splits into three layers:

Skill definition: A structured prompt template that teaches coding agents (Cursor, Windsurf, Cline) how to write browser automation scripts using Libretto’s API.
CLI tool: A command-line interface that runs the generated scripts, manages browser sessions, and provides debugging hooks.
Generated scripts: TypeScript files that use Playwright under the hood but expose a higher-level API for common automation patterns.

The skill is a Markdown file you drop into your agent’s context. It includes:

API reference for navigation, form filling, data extraction
Examples of common patterns (login flows, pagination, file downloads)
Error handling conventions
Debugging strategies

When you ask your coding agent to “automate extracting product prices from this e-commerce site,” it generates a .ts file that imports Libretto’s API and writes explicit steps. You review the code, commit it, and run it via the CLI.

Determinism Through Explicit Selectors and State

Libretto scripts are deterministic because they use explicit CSS selectors, XPath expressions, or accessibility labels instead of asking an LLM to “find the login button” at runtime. The agent generates these selectors during code generation by inspecting the page structure once.

When the page changes, the script breaks visibly. You get a clear error message pointing to the selector that failed, not a vague “the agent couldn’t complete the task.” You fix the selector in the script and re-run.

State management is explicit. Scripts declare what data they expect to extract and where they expect to navigate. No hidden context windows or token limits affecting behavior mid-run.

Code Generation Workflow

Here’s the typical flow:

Point your coding agent at a target site and describe the automation goal.
The agent uses the Libretto skill to generate a TypeScript script with explicit steps.
You review the generated code. It looks like normal Playwright automation but uses Libretto’s higher-level helpers.
Run the script via libretto run script.ts.
If it fails, you get a stack trace and can step through the script with standard debugging tools.
Commit the script to version control. It’s now a reusable asset.

The agent can also debug existing scripts. If a script breaks, you ask the agent to fix it. It reads the error message, inspects the current page structure, and updates the selectors or logic.

Comparison: Runtime Prompts vs. Generated Scripts

The following table contrasts the two approaches:

Aspect	Runtime Prompts	Generated Scripts (Libretto)
Determinism	Non-deterministic, varies by model state	Deterministic, same script produces same actions
Debuggability	Black box, hard to trace failures	Standard stack traces, breakpoints work
Versioning	Prompt history only	Git-tracked TypeScript files
Cost	LLM inference on every run	LLM cost during generation, then standard compute
Failure modes	Silent degradation, hallucinated actions	Explicit errors at selector level
Iteration speed	Re-prompt and hope	Edit code, re-run immediately

Implementation: Playwright Wrapper with Agent-Friendly Helpers

Under the hood, Libretto wraps Playwright and adds helpers that coding agents find easier to generate correctly. The following example illustrates Libretto’s agent-friendly API pattern:

Note: Pseudocode for illustration. Verify exact method signatures and API patterns against the Libretto repository for production use.

// Conceptual example of Libretto's agent-friendly API
import { Libretto } from '@libretto/core';

const browser = await Libretto.launch();
const page = await browser.newPage();

// Navigate and wait for network idle
await page.goto('https://example.com/products');

// Extract structured data with explicit selectors
const products = await page.extractList({
  container: '.product-card',
  fields: {
    name: '.product-title',
    price: '.product-price',
    url: 'a.product-link@href'
  }
});

// Handle pagination deterministically
while (await page.hasElement('.next-page')) {
  await page.click('.next-page');
  await page.waitForNetworkIdle();
  
  const nextBatch = await page.extractList({
    container: '.product-card',
    fields: {
      name: '.product-title',
      price: '.product-price',
      url: 'a.product-link@href'
    }
  });
  
  products.push(...nextBatch);
}

await browser.close();
console.log(JSON.stringify(products, null, 2));

The extractList helper is designed to be easy for agents to generate. It takes a container selector and a map of field names to selectors. Agents can infer this structure from a single page inspection.

Failure Modes and Recovery

Scripts fail when:

Selectors break: Page structure changes. The script throws an error at the exact line where the selector fails (e.g., Error: Selector '.product-title' not found at line 12). You update the selector in the TypeScript file and re-run. No need to re-prompt the agent unless you want it to suggest a new selector.
Navigation timing: Page loads slower than expected. Fix by adjusting wait conditions in the script.
Authentication state: Session expires. Add re-authentication logic directly in the script.
Rate limiting: Site blocks requests. Add delays or proxy rotation in the script.

Because failures are explicit, you can add retry logic, fallback selectors, or error notifications directly in the script. The agent can help generate these recovery patterns when you describe the failure mode.

Observability and Debugging

Libretto CLI provides:

Screenshot capture: Automatically saves screenshots before and after each action.
Network logs: Records all HTTP requests and responses.
Execution traces: Playwright’s trace viewer integrates via page.context().tracing for timeline inspection.
Step-through debugging: Use VS Code’s debugger to step through the script.

You can run scripts in headless mode for production or headed mode for debugging. The CLI supports watch mode for rapid iteration during development.

Deployment Shape

Generated scripts are just TypeScript files. You can:

Run them locally via the CLI.
Package them into Docker containers for scheduled execution.
Deploy them to serverless functions (AWS Lambda with Playwright, Google Cloud Functions).
Orchestrate them with Temporal, Prefect, or Airflow.

No special runtime required beyond Node.js and a browser binary. Scripts can be unit tested with mocked page objects or integration tested against staging environments.

Security Boundaries

Scripts run with the same permissions as the user executing them. Secrets (API keys, passwords) should be passed via environment variables, not hardcoded.

Libretto doesn’t include a secret management layer. You’re expected to use your existing secret store (AWS Secrets Manager, Vault, 1Password CLI).

Browser sessions are isolated per script execution. No shared state between runs unless you explicitly persist cookies or local storage.

When to Use Libretto

Use Libretto when:

You need browser automations that run the same way every time.
You want to version and review automation logic before deploying it.
Your team already uses coding agents for development.
You need to debug failures quickly without re-prompting an LLM.
You’re building automations that will run hundreds or thousands of times.

Libretto is less suitable when:

You need one-off automations that won’t be reused.
The target site changes structure daily (though you could use runtime prompts for initial exploration, then generate a Libretto script once the structure stabilizes).
You don’t have coding agents in your workflow and don’t want to adopt them.
Your automation logic is simple enough that a Playwright script without agent assistance is faster.

Technical Verdict

Libretto solves the determinism and debuggability problem in agentic browser automation by moving LLM work to code generation time. You get inspectable, versionable scripts instead of black-box runtime behavior. The trade-off is that you need to maintain generated code and update selectors when pages change, but you would have to do that anyway with any robust automation system.

The skill-based approach works well with modern coding agents. If you’re already using Cursor or Windsurf for development, adding Libretto to your agent’s context is straightforward. The generated scripts are readable TypeScript, so non-agent developers can maintain them.

The main risk is that coding agents sometimes generate incorrect selectors or miss edge cases. You still need to review and test the generated code. But that’s a better failure mode than silent degradation in production.

The Hacker News discussion (134 points, 56 comments) shows strong community interest in this approach, particularly from teams frustrated with non-deterministic runtime automation. If you’re building browser automations that need to run reliably at scale, Libretto’s approach of generating deterministic scripts is worth evaluating against runtime prompt-based tools.