mech.app
Dev Tools

HyperFrames: HTML-to-Video Rendering for Agent Workflows

How HeyGen's framework turns HTML, GSAP timelines, and browser-runtime Tailwind into video frames, with MCP tools and blocking-hook trade-offs.

Source: github.com
HyperFrames: HTML-to-Video Rendering for Agent Workflows

HeyGen open-sourced HyperFrames, a TypeScript framework that renders HTML compositions into video files. The project hit GitHub Trending #11 and ships with first-party skills for Claude Code, Cursor, Gemini CLI, and Codex. It bridges declarative markup and media synthesis by capturing browser frames from GSAP timelines, browser-runtime Tailwind v4, and animation adapters for Anime.js, Lottie, Three.js, and WAAPI.

This is not a video editor. It is a rendering pipeline designed for agents that generate code. The framework exposes MCP-compatible tools for TTS, transcription, and background removal so agents can chain preprocessing steps into video workflows.

Rendering Pipeline Architecture

HyperFrames runs a headless browser, evaluates your HTML composition, steps through GSAP timelines frame by frame, and captures each frame as a PNG or video segment. The core loop:

  1. Composition load: Parse HTML, inject browser-runtime Tailwind v4, load GSAP and adapter libraries.
  2. Timeline orchestration: GSAP controls time. The renderer seeks to each frame timestamp, waits for layout stability, captures the viewport.
  3. Frame capture: Puppeteer or Playwright snapshots the DOM. Each frame becomes a file or is piped to FFmpeg for encoding.
  4. Post-processing: Merge frames into video, apply audio tracks, export MP4 or WebM.

The pipeline is synchronous. The renderer blocks on each frame until the browser signals layout complete. This guarantees pixel-perfect output but limits parallelism.

GSAP Timeline Mapping

GSAP timelines define animation sequences with precise timing. HyperFrames reads the timeline duration, calculates frame count from the target frame rate (default 30 FPS), and seeks to each frame’s timestamp:

const duration = timeline.duration();
const frameCount = Math.ceil(duration * frameRate);

for (let i = 0; i < frameCount; i++) {
  const time = i / frameRate;
  timeline.seek(time, false);
  await page.waitForNetworkIdle();
  await page.screenshot({ path: `frame-${i}.png` });
}

The seek call moves the timeline without triggering animations. The renderer waits for network idle to ensure fonts, images, and CSS are loaded. This blocking wait is the primary performance bottleneck.

Browser-Runtime Tailwind v4

HyperFrames uses Tailwind v4’s browser runtime, which compiles utility classes on the fly inside the browser. No build step. Agents write <div class="bg-blue-500 text-white"> and the runtime generates CSS at page load.

This simplifies agent workflows. The agent does not need to run npx tailwindcss or manage a config file. The trade-off: runtime compilation adds 50-200ms to initial page load, and custom themes require inline configuration.

The runtime scans the DOM for utility classes, generates CSS rules, and injects a <style> tag. For video rendering, this happens once per composition. The latency is acceptable for single-shot rendering but compounds in multi-composition batch jobs.

Animation Adapter Pattern

HyperFrames provides adapters for five animation runtimes: Anime.js, CSS animations, Lottie, Three.js, and WAAPI. Each adapter translates its runtime’s API into a GSAP-compatible timeline.

RuntimeAdapter RoleUse Case
Anime.jsWraps Anime timelines in GSAP proxyLightweight property animations
CSS animationsSyncs @keyframes with GSAP timeBrowser-native transitions
LottiePlays JSON animations at GSAP timestampsAfter Effects exports
Three.jsSteps WebGL render loop per frame3D scenes and shaders
WAAPIMaps Web Animations API to GSAPStandards-based motion

The adapter pattern isolates tool boundaries. An agent can generate a Lottie animation, a Three.js scene, and CSS keyframes in the same composition. HyperFrames synchronizes them by forcing each runtime to respect GSAP’s master timeline.

Three.js Adapter Example

Three.js normally runs a continuous render loop. The adapter replaces the loop with frame-by-frame stepping:

function createThreeAdapter(scene, camera, renderer) {
  return {
    render(time) {
      // Update scene based on GSAP time
      scene.rotation.y = time * 0.5;
      renderer.render(scene, camera);
    }
  };
}

The adapter’s render method is called once per frame. GSAP passes the current timeline time, and the adapter updates the scene. This breaks Three.js’s assumption of a continuous loop but guarantees deterministic output.

First-Party Skills and Evidence Discipline

HyperFrames ships skills for Claude Code, Cursor, Gemini CLI, and Codex. These are not generic prompts. They are structured instructions that teach agents the framework’s API surface, common patterns, and error recovery.

The /hyperframes skill includes:

  • Composition structure (HTML boilerplate, GSAP timeline setup)
  • Tailwind v4 runtime configuration
  • Asset loading (fonts, images, audio)
  • Common animation patterns (fade in, slide, parallax)
  • Error messages and fixes (missing GSAP import, invalid timeline syntax)

The /hyperframes-media skill exposes preprocessing tools:

  • TTS: Convert text to audio via HeyGen’s API or local engines
  • Transcription: Extract text from audio for subtitle generation
  • Background removal: Isolate subjects in images or video frames

Agents chain these tools. Example workflow: transcribe audio, generate subtitles, render subtitles over video frames, export final composition.

Evidence discipline means the skill includes example outputs. The agent sees what a valid composition looks like before generating one. This reduces hallucination and syntax errors.

Blocking Hooks and Multi-Agent Performance

The documentation warns that Claude Code hooks are blocking. When multiple plugins register hooks (on file save, on command run), they execute serially. In a multi-agent team, this creates contention.

Scenario: Three agents collaborate on a video project. Agent A generates the composition, Agent B preprocesses audio, Agent C renders frames. If all three use HyperFrames skills, each hook blocks the others. The team’s effective parallelism drops to one.

The framework does not solve this. It documents the constraint. Workarounds:

  • Dedicated rendering agent: One agent handles all HyperFrames tasks. Others generate code and assets but do not invoke rendering.
  • Async task queue: Agents submit rendering jobs to a queue. A single worker processes them serially.
  • Disable hooks: Use the CLI directly instead of relying on agent hooks.

This is a tooling limitation, not a HyperFrames bug. The MCP protocol does not specify hook concurrency semantics. Until it does, blocking is the safe default.

Deployment Shape

HyperFrames runs in three modes:

  1. Local dev: npx hyperframes preview launches a live-reload server. Edit HTML, see updates in real time.
  2. CLI render: npx hyperframes render composition.html produces a video file. No server required.
  3. Programmatic API: Import HyperFrames as a library, call render() from Node.js scripts or serverless functions.

The programmatic API is the integration point for agent orchestrators. An agent generates a composition, writes it to disk, calls render(), and uploads the output to storage.

Serverless Considerations

Rendering video in AWS Lambda or Google Cloud Functions requires:

  • Headless browser layer: Bundle Chromium or use a prebuilt layer (Puppeteer Extra, Playwright Lambda).
  • Disk space: Each frame is ~500KB. A 10-second video at 30 FPS needs 150MB temp storage.
  • Timeout: Rendering 300 frames takes 30-60 seconds. Use functions with extended timeouts or offload to ECS/Cloud Run.

HyperFrames does not include a serverless adapter. You write the Lambda handler, manage the browser lifecycle, and handle cleanup.

Observability and Failure Modes

The renderer logs frame progress to stdout. Example:

Rendering frame 45/300 (15%)
Rendering frame 46/300 (15%)

No structured logs. No metrics export. You pipe stdout to a log aggregator or parse it yourself.

Common failure modes:

  • Font loading timeout: Remote fonts fail to load before the frame capture. The renderer uses fallback fonts. Solution: preload fonts or use local files.
  • Layout thrashing: Complex CSS triggers reflows during animation. Frames capture mid-reflow. Solution: simplify layouts or add explicit will-change hints.
  • Memory exhaustion: Long compositions (5+ minutes) accumulate frames in memory. The process OOMs. Solution: render in segments or stream frames to disk.
  • GSAP timeline desync: Nested timelines with relative offsets drift from absolute time. Frames skip or duplicate. Solution: flatten timelines or use absolute timestamps.

The framework does not retry failed frames. If frame 150 fails, the render aborts. You re-run the entire job.

Security Boundaries

HyperFrames executes arbitrary HTML and JavaScript in a headless browser. If an agent generates malicious code, it runs. Mitigations:

  • Sandboxed browser: Run Puppeteer with --no-sandbox disabled (default). The browser process is isolated from the host.
  • Content Security Policy: Inject a CSP header to block external scripts and inline event handlers.
  • Input validation: Parse agent-generated HTML before rendering. Reject <script> tags, eval(), or suspicious patterns.

The framework does not enforce these. You implement them in your orchestration layer.

Technical Verdict

Use HyperFrames when:

  • You need agents to generate video from code, not edit existing video.
  • Your compositions are short (under 2 minutes) and render time is acceptable.
  • You already use GSAP or want a timeline-based animation model.
  • You need MCP-compatible preprocessing tools (TTS, transcription) in the same workflow.

Avoid HyperFrames when:

  • You need real-time video generation (rendering is too slow).
  • Your agents must collaborate in parallel (blocking hooks degrade performance).
  • You require fine-grained observability (logs are minimal).
  • You want a managed service (this is a self-hosted framework).

HyperFrames is production-grade plumbing from a video AI company. It solves the hard problem of deterministic frame capture from animated HTML. The blocking-hook constraint is a tooling ecosystem issue, not a deal-breaker. If your agents generate code and you need video output, this is the most direct path.