mech.app
Dev Tools

Building a Native Terminal for AI Coding Agents: Rust, GPUI, and the /proc/net/tcp Detection Layer

How cmux's agent-first terminal detects dev-servers via /proc/net/tcp, isolates agent execution contexts, and uses JSON-RPC IPC to prevent tool escape.

Source: dev.to
Building a Native Terminal for AI Coding Agents: Rust, GPUI, and the /proc/net/tcp Detection Layer

Most agent frameworks treat terminals as dumb pipes. You spin up Claude Code or Codex, they spawn subprocesses, and you hope nothing leaks. Paneflow (a Rust rewrite of the macOS-only cmux project) flips that model: the terminal becomes the orchestration layer. It parses /proc/net/tcp to detect dev servers without credentials, maintains an N-ary layout tree to isolate agent workspaces, and exposes a JSON-RPC control plane so external tools can drive panes programmatically.

The interesting part is not the UI. The interesting part is how a native terminal enforces security boundaries that web-based agent interfaces cannot, and where those boundaries break down.

Why Native Terminals Matter for Agent Workspaces

When you run multiple AI coding agents in parallel (Claude Code on one branch, Codex on another, OpenCode on a third), tmux and screen treat them like any other shell. No branch context. No session restore that survives a reboot. No way for an external orchestrator to inject commands or read agent state without brittle tmux send-keys hacks.

A native terminal can:

  • Parse process state directly. Read /proc/net/tcp to detect which ports agents have opened, map those to dev servers, and display live status without polling HTTP endpoints.
  • Enforce process isolation. Each pane runs in its own PTY with separate environment variables, working directories, and file descriptor tables.
  • Expose a control plane. JSON-RPC over a Unix socket lets external tools (CI runners, orchestrators, monitoring agents) query pane state, inject commands, or snapshot sessions.
  • Render untrusted output safely. A native terminal emulator can sandbox ANSI escape sequences and reject malicious sequences that web terminals must parse in JavaScript.

The trade-off: you lose the browser’s sandboxing model. A compromised agent can write to /dev/tty directly, escape the PTY, or exploit terminal emulator bugs. Web terminals run in a separate process. Native terminals share the same address space as the UI.

Architecture: GPUI, Alacritty, and the PTY Boundary

Paneflow is built on three layers:

  1. GPUI (Zed’s UI framework): handles windowing, layout, input events, and GPU-accelerated rendering.
  2. alacritty_terminal: a vendored crate that implements VT100/xterm emulation, glyph rasterization, and ANSI parsing.
  3. Portable PTY layer: abstracts openpty(3) on Unix and CreatePseudoConsole on Windows.

The critical decision was choosing GPUI over Tauri, Iced, or egui. The constraint: the UI framework must own glyph rasterization. Terminal emulators render thousands of glyphs per frame. If the UI framework forces you to rebuild the glyph atlas on every frame (egui) or serialize glyph data across a webview boundary (Tauri), you lose 60fps.

GPUI gives you direct access to the GPU command buffer. You can upload a glyph atlas once, update dirty rectangles, and issue draw calls without round-tripping through a layout engine.

The Alacritty Integration

Alacritty is a standalone terminal emulator. The alacritty_terminal crate extracts the VT100 state machine and grid renderer into a library. Paneflow vendors this crate and wires it into GPUI’s event loop:

// Simplified: actual code handles resize, scrollback, selection
struct PaneTerminal {
    term: alacritty_terminal::Term<EventProxy>,
    pty: Box<dyn Pty>,
    grid_renderer: GlyphAtlas,
}

impl PaneTerminal {
    fn handle_pty_output(&mut self, bytes: &[u8]) {
        self.term.advance_bytes(bytes);
        self.grid_renderer.mark_dirty(self.term.dirty_cells());
    }

    fn render(&self, cx: &mut RenderContext) {
        for cell in self.term.renderable_cells() {
            let glyph = self.grid_renderer.lookup(cell.c);
            cx.draw_glyph(glyph, cell.point, cell.fg, cell.bg);
        }
    }
}

The EventProxy is a GPUI handle that lets the terminal state machine wake the UI thread when the PTY has data. This avoids polling.

The security boundary here is thin. The alacritty_terminal crate parses untrusted ANSI sequences. If an agent writes a malicious escape code (e.g., a crafted SGR sequence that overflows a buffer), it runs in the same process as the UI. Alacritty has a good track record, but the attack surface is larger than a web terminal where the renderer runs in a separate process.

Dev-Server Detection via /proc/net/tcp

Most agent frameworks require you to configure dev server URLs manually. Paneflow detects them automatically by parsing /proc/net/tcp.

When an agent spawns a dev server (e.g., npm run dev on port 3000), the kernel writes a line to /proc/net/tcp:

  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode
   0: 00000000:0BB8 00000000:0000 0A 00000000:00000000 00:00000000 00000000  1000        0 12345

The local_address field is 0BB8 in hex (3000 in decimal). The st field is 0A (TCP_LISTEN).

Paneflow polls /proc/net/tcp every 500ms, filters for LISTEN sockets owned by child processes of the current pane, and displays them in the UI:

fn detect_dev_servers(pane_pid: u32) -> Vec<DevServer> {
    let tcp_table = std::fs::read_to_string("/proc/net/tcp").unwrap();
    let child_pids = get_child_pids(pane_pid);

    tcp_table
        .lines()
        .skip(1)
        .filter_map(|line| parse_tcp_line(line))
        .filter(|entry| entry.state == TcpState::Listen)
        .filter(|entry| child_pids.contains(&entry.uid))
        .map(|entry| DevServer {
            port: entry.local_port,
            pid: entry.uid,
            url: format!("http://localhost:{}", entry.local_port),
        })
        .collect()
}

This works on Linux. On macOS, you parse lsof -iTCP -sTCP:LISTEN -n -P output. On Windows, you call GetExtendedTcpTable with TCP_TABLE_OWNER_PID_LISTENER.

The advantage: no configuration, no credentials, no HTTP polling. The agent spawns a server, the terminal detects it, and you get a clickable link in the UI.

The failure mode: if the agent binds to 127.0.0.1 but the detection logic only checks 0.0.0.0, you miss it. If the agent uses Unix sockets instead of TCP, you need a separate /proc/net/unix parser. If the agent runs in a container with a separate network namespace, /proc/net/tcp shows the host’s view, not the container’s.

N-ary Layout Tree and State Synchronization

tmux and screen use binary splits. You split horizontally or vertically, and the layout is a binary tree. This breaks down when you have three agents and want to arrange them in a grid.

Paneflow uses an N-ary tree where each node can have an arbitrary number of children and a layout direction (horizontal, vertical, or tabbed):

enum LayoutNode {
    Pane(PaneId),
    Container {
        direction: Direction,
        children: Vec<LayoutNode>,
        weights: Vec<f32>, // proportional sizes
    },
}

enum Direction {
    Horizontal,
    Vertical,
    Tabbed,
}

When you split a pane, the tree is rewritten:

fn split_pane(tree: &mut LayoutNode, pane_id: PaneId, direction: Direction) {
    let node = find_node_mut(tree, pane_id);
    let new_pane = create_pane();
    *node = LayoutNode::Container {
        direction,
        children: vec![
            LayoutNode::Pane(pane_id),
            LayoutNode::Pane(new_pane),
        ],
        weights: vec![0.5, 0.5],
    };
}

The layout tree is serialized to JSON and written to ~/.config/paneflow/session.json on every change. On startup, the tree is deserialized and each pane’s PTY is respawned with the saved working directory and environment.

The synchronization problem: if an agent changes its working directory (cd /tmp), the session file is stale. Paneflow solves this by injecting a shell hook that writes the current directory to a Unix socket on every prompt:

# Added to .bashrc or .zshrc
precmd() {
    echo "$PWD" | nc -U /tmp/paneflow-$PANE_ID.sock
}

The terminal reads from the socket and updates the session file. This is fragile (breaks if the user changes shells, or if the agent disables the hook), but it works for the 80% case.

JSON-RPC IPC and the Control Plane

Paneflow listens on a Unix socket (/tmp/paneflow.sock) and accepts JSON-RPC 2.0 requests. External tools can:

  • List panes: {"method": "list_panes"}
  • Send input: {"method": "send_input", "params": {"pane_id": "abc", "text": "npm test\n"}}
  • Read output: {"method": "get_output", "params": {"pane_id": "abc", "lines": 100}}
  • Snapshot session: {"method": "save_session"}

This is the orchestration boundary. A CI runner can spin up Paneflow, create three panes, run agents in each, and collect results without SSH or tmux attach.

The security boundary is the Unix socket permission bits. By default, the socket is 0600 (owner-only). If you want to allow other users or containers to control panes, you set 0666 and accept the risk that any local process can inject commands.

The attack surface: if an agent can write to /tmp/paneflow.sock, it can control other panes. If an agent can read the socket, it can snoop on other agents’ output. The mitigation is to run each agent in a separate user namespace (Linux) or sandbox (macOS), but Paneflow does not enforce this by default.

Security Trade-offs: Native vs. Web Terminals

DimensionNative TerminalWeb Terminal
Process isolationPanes share the same process. A crash kills all panes.Each tab runs in a separate renderer process.
ANSI parsingRuns in the UI process. A malicious escape sequence can corrupt memory.Runs in a sandboxed JavaScript VM. Exploits are contained.
File system accessAgents can read/write any file the terminal user can access.Agents must go through a file picker or server-side API.
Network accessAgents can bind to any port, connect to any host.Agents are subject to CORS, CSP, and browser network policies.
Control planeUnix socket with file permissions. Any local process can connect.WebSocket with origin checks and authentication tokens.
ObservabilityDirect access to /proc, lsof, strace.Must instrument the agent or proxy through a server.

The native model gives you more power and more risk. If you trust your agents (or run them in VMs), the direct access to process state and file system is a feature. If you run untrusted agents, the lack of sandboxing is a liability.

Failure Modes and Lessons

Four things that broke during the rewrite:

  1. PTY resize races. If you resize the window while the agent is writing output, the PTY can deliver a SIGWINCH signal mid-write. The agent sees a partial line, panics, and exits. The fix: buffer output until the resize completes, then replay.

  2. Glyph atlas fragmentation. If you allocate a new glyph atlas on every font size change, you leak GPU memory. The fix: reuse the atlas and mark all glyphs dirty.

  3. Session restore and zombie processes. If you save a session while an agent is running, then restore it, the old agent process is still alive. The new pane spawns a second agent. The fix: write the PID to the session file and kill -9 it on restore.

  4. Cross-platform PTY differences. On Linux, openpty gives you a master/slave pair. On macOS, you need posix_openpt and grantpt. On Windows, CreatePseudoConsole requires a separate I/O thread. The abstraction leaked everywhere. The fix: vendor the portable-pty crate and accept that some features (job control, signal forwarding) behave differently on each platform.

Technical Verdict

Use Paneflow (or build something like it) when:

  • You run multiple AI agents in parallel and need branch-aware workspaces.
  • You want automatic dev-server detection without configuring URLs.
  • You need a programmatic control plane for CI or orchestration tools.
  • You trust your agents (or run them in isolated VMs) and want direct access to process state.

Avoid it when:

  • You run untrusted agents and need browser-level sandboxing.
  • You need cross-platform consistency and cannot tolerate PTY behavior differences.
  • You want a mature, battle-tested terminal (use Alacritty, Kitty, or WezTerm instead).
  • You need session sharing across machines (native terminals are local-only).

The real contribution is not the terminal emulator. It is the pattern: treat the terminal as the orchestration layer, parse process state directly, and expose a control plane so external tools can drive agents programmatically. That pattern works whether you build it in Rust, Go, or Swift.


Tags

agentic-ai orchestration infrastructure

Primary Source

dev.to