mech.app
AI Agents

Rue: What Building a Programming Language with Claude Reveals About Agentic Coding Workflows

Steve Klabnik built Rue with Claude. Here's what compiler work exposes about the boundary between human architecture and agent code generation.

Source: share.transistor.fm
Rue: What Building a Programming Language with Claude Reveals About Agentic Coding Workflows

In a March 2026 Practical AI podcast episode, Steve Klabnik (known for his work on the Rust core team) discussed building Rue, a new programming language developed largely with help from Claude. Hosts Chris Benson and Daniel Whitenack explored what happens when someone previously skeptical of AI tools starts using agentic coding for compiler infrastructure.

This is not a story about AI hype. It’s a case study in where agentic coding works and where it breaks down when you’re building compiler infrastructure instead of web apps. The interesting part is not that an agent can write code. It’s that Klabnik found a workflow boundary that makes sense for language implementation work, and that boundary tells you something about how to integrate agents into your own technical stack.

What Klabnik’s Workflow Reveals

The episode frames Klabnik’s journey as moving from criticism to experimentation. The key question is what “largely built with the help of AI tools like Claude” actually means in practice for compiler work.

Based on typical compiler development patterns and the challenges of language implementation, the workflow likely involves:

Human decisions that cannot be delegated:

  • Grammar design and syntax choices
  • Type system semantics and inference rules
  • Memory model and ownership constraints
  • Standard library API surface
  • Error message strategy

Implementation work where agents accelerate delivery:

  • Parser combinator boilerplate
  • AST node definitions and traversal code
  • Test case generation for edge cases
  • Documentation scaffolding
  • Refactoring passes for consistency

This split is not arbitrary. Architectural decisions require domain knowledge and the ability to predict how choices cascade through the system. Implementation work requires precision and the ability to maintain consistency across thousands of lines of code. Agents excel at the second category but struggle with the first because they lack the mental model of how a language will be used in practice.

What “Built with Claude” Actually Means

When someone says a project was “largely built with AI tools,” you need to know the commit topology:

  • Did the agent write first drafts that a human heavily edited?
  • Did the human write specs that the agent implemented verbatim?
  • Did the agent suggest refactorings that the human accepted or rejected?
  • How many agent-generated lines survived into production unchanged?

For compiler work, the likely pattern is:

  1. Human writes grammar rules and type system constraints in natural language or pseudocode
  2. Agent generates parser code, AST definitions, and test scaffolding
  3. Human reviews for semantic correctness (not just syntactic validity)
  4. Agent refactors based on human feedback about invariants
  5. Human writes integration tests that validate end-to-end behavior

This is not pair programming. It’s more like having a junior engineer who is very fast at typing but needs constant direction about what the code should actually do.

The Parser Challenge in Agentic Workflows

Based on compiler architecture patterns, parsers expose a key challenge for agentic coding. They have clear specifications but subtle failure modes.

Consider the challenge of handling optional trailing commas in function arguments. A human might specify the grammar constraint in natural language, and the agent generates the implementation. But the agent might not handle:

  • Ambiguity between trailing commas and tuple syntax
  • Error recovery when the comma is missing
  • Span tracking for error messages
  • Interaction with macro expansion

A human reviewer needs to check that the generated parser respects the language’s semantic invariants, not just that it compiles.

Version Control and Review Strategy

When agents generate significant code, your review process changes:

Review FocusHuman-Written CodeAgent-Generated Code
Syntactic correctnessAssumedMust verify
Semantic correctnessPrimary focusPrimary focus
Consistency with codebaseAssumedMust verify explicitly
Test coverageCheck for gapsGenerate missing tests
DocumentationCheck for clarityRegenerate if unclear
Edge case handlingSpot checkExhaustive check

The key difference is that you cannot assume consistency. An agent might generate code that works in isolation but violates invariants elsewhere in the codebase. You need to check:

  • Does this parser handle the same edge cases as other parsers?
  • Does this error message follow the project’s style guide?
  • Does this optimization respect the memory model?

Potential Failure Modes in Language Implementation

Based on compiler architecture patterns, watch for these failure modes when agents generate compiler code. These are not specific findings from Klabnik’s Rue development but general principles that apply to agentic compiler work:

Silent semantic violations: The agent generates code that type-checks and passes unit tests but violates a semantic invariant. For example, a type inference function that produces correct types for common cases but fails on recursive types.

Inconsistent error handling: The agent generates error messages that are technically correct but inconsistent with the rest of the compiler. Users see different error formats depending on which part of the compiler caught the issue.

Missed optimization opportunities: The agent generates straightforward code that works but misses domain-specific optimizations a human would apply. For example, a naive AST traversal instead of a visitor pattern with memoization.

Test coverage gaps: The agent generates tests for happy paths and obvious edge cases but misses subtle interactions between language features. For example, testing closures and testing generics but not testing generic closures.

Observability for Agentic Workflows

If you’re using agents to generate compiler code, you need observability into the generation process. These are general principles for tracking agentic contributions:

  • Track which functions were agent-generated versus human-written
  • Log agent prompts and responses for reproducibility
  • Measure test coverage separately for agent-generated code
  • Flag semantic review checkpoints in commit messages
  • Maintain a decision log for architectural choices

This is not just for debugging. It’s for understanding which parts of your codebase are stable and which parts might need rework as the agent’s capabilities improve.

Technical Verdict

Klabnik’s experience with Rue demonstrates that agentic coding can accelerate language implementation when you have clear architectural constraints you can articulate precisely. The workflow requires strong semantic review processes because agents excel at generating syntactically correct code that may still violate domain-specific invariants.

Use it if:

  • Your language grammar is stable and you can write formal specs for parser behavior
  • You have a strong type system with documented invariants that can be checked mechanically
  • Your team has deep compiler expertise to review generated code for semantic correctness
  • You need to accelerate boilerplate implementation (AST nodes, visitor patterns, test scaffolding)
  • You can maintain clear separation between architectural decisions and implementation details

Avoid it if:

  • Your type system is still evolving or relies on undocumented semantic invariants
  • Your language design is exploratory and you’re still discovering what the right abstractions are
  • You lack the expertise to distinguish between code that compiles and code that respects language semantics
  • Your project depends on subtle optimizations or domain-specific patterns that are hard to specify
  • You need tight coupling between design exploration and implementation feedback

The fundamental boundary remains: agents accelerate implementation once you know what you’re building, but they cannot replace the architectural judgment that defines what a language should be.