mech.app
Financial

Runtime Steering Vectors: Controlling LLM Reasoning Without Fine-Tuning

How Mentat's intervention API injects steering vectors into the forward pass to enforce deterministic behavior in financial agents without retraining.

Source: news.ycombinator.com
Runtime Steering Vectors: Controlling LLM Reasoning Without Fine-Tuning

Fine-tuning is expensive. Prompt engineering is brittle. Mentat (YC F24) provides a third option: runtime intervention that injects steering vectors into the forward pass to control reasoning paths without touching model weights. The API targets financial services, compliance scanning, and other high-stakes domains where probabilistic guarantees are insufficient for regulatory compliance.

The core claim is deterministic control over LLM behavior at inference time. You send a chat completion request with a steering parameter that specifies which features to amplify or suppress. The model generates text with modified activations, enforcing policies like “skeptical reasoning” or “spatial inversion” that prompt engineering cannot reliably trigger.

Unlike constrained decoding (which limits token selection) or tool use (which delegates to external functions), steering vectors operate inside the transformer’s residual stream. This positions them between prompt engineering (which operates at the input boundary) and fine-tuning (which modifies weights permanently).

How Runtime Intervention Works

Standard guardrails (RAG, system prompts) operate at the token level. You give the model context or instructions and hope it integrates them correctly. The API operates at the feature level inside the transformer’s forward pass.

Activation Engineering Pipeline

  1. Feature Extraction: Identify internal activations that correlate with specific behaviors (e.g., “skeptical reasoning,” “spatial reasoning”).
  2. Steering Vector Injection: During inference, add or subtract scaled vectors from the residual stream at targeted layers.
  3. Graph-Based Verification: Run the output through a verification graph that checks whether the steering objective was satisfied.

The API endpoint accepts POST requests to /chat-completions with an OpenAI-compatible structure plus a steering object. Here is an example request body:

{
  "model": "mentat-gpt-4",
  "messages": [
    {"role": "user", "content": "What is 228 miles NW of Lerwick?"}
  ],
  "steering": {
    "spatial_inversion": 0.8,
    "skeptical_reasoning": 0.6
  }
}

The float values control intervention strength. Higher values push the model harder toward the desired behavior. The system applies these vectors at specific transformer layers where those features are most active. Available steering features are documented in the API reference, though the full catalog and feature discovery mechanism are not detailed in public materials.

The RAG Integration Problem

The demo highlights a failure mode that RAG cannot fix: knowledge availability versus knowledge integration. If you inject context like “Lerwick is 228 miles SE of Tórshavn,” a base model with RAG will retrieve that fact but fail to answer “What is 228 miles NW of Lerwick?” because it cannot perform the spatial inversion.

This is not a retrieval problem. The model has the data. It lacks the reasoning path to transform it. Steering vectors that amplify spatial reasoning features let the model execute the inversion without retraining on geography datasets.

Why This Matters for Financial Agents

Financial agents need to enforce compliance rules, not approximate them. Compliance scanning requires the model to flag communications that violate specific regulations, not just “sound risky.” Models trained on internet data inherit common misconceptions (e.g., “stocks always go up in the long run”). Steering vectors can suppress those priors without retraining. Regulatory audits require reproducible outputs. Probabilistic guardrails fail when the same input produces different compliance verdicts.

Graph-Based Verification Mechanism

The verification layer is proprietary. Public documentation describes it as “graph-based verification to fix hallucinations and enforce policies” but does not specify graph structure, node types, or validation logic. For financial applications, this creates a trust boundary problem: you cannot audit what you cannot inspect.

What we know from the Launch HN discussion:

  • Verification runs after generation to check if steering objectives were satisfied.
  • The system can detect when outputs violate specified policies.
  • Failure modes and retry logic are not documented.

What remains unspecified:

  • Does verification block output synchronously or log failures asynchronously?
  • Can verification graphs be customized per request or are they fixed per steering feature?
  • What happens when verification detects a policy violation (retry, error, fallback)?

For production financial deployments, these gaps require vendor clarification. The verification mechanism is the trust boundary that separates “steered output” from “compliant output.”

Latency and Versioning Trade-Offs

Runtime intervention adds compute overhead. Every forward pass now includes vector addition at multiple layers and a verification step. The Launch HN discussion and public documentation do not include latency benchmarks, so actual overhead is not publicly benchmarked. The architecture (steering vector injection plus graph-based verification) suggests measurable overhead, but production deployments should benchmark actual latency under load before committing to SLAs.

Versioning becomes critical because steering vectors are decoupled from model weights. If you update the base model, your steering vectors may target activations that no longer exist or behave differently. The API must version both the model and the steering profile together.

Approach Comparison

DimensionRuntime SteeringFine-TuningPrompt Engineering
LatencyNot publicly benchmarkedStandard inferenceStandard inference
CostPer-request API pricingHigh upfront compute + storageMinimal (prompt tokens)
DeterminismHigh (feature-level control)High (baked into weights)Low (probabilistic)
AuditabilityProprietary verification layerFull control over weightsPrompt history only
DeploymentHosted API (no self-hosted option mentioned)Self-hosted or cloudAny LLM provider

Security Boundaries

The graph-based verification mitigates some adversarial inputs, but the documentation does not detail threat modeling or penetration testing results. Like RAG, steering vectors improve reasoning but do not guarantee compliance verdict consistency. Adversarial prompts that strongly activate unwanted features early in the residual stream may overwhelm later steering interventions or bypass the verification layer entirely.

Key considerations:

  • Prompt Injection: Can crafted inputs bypass steering by activating conflicting features?
  • Verification Timing: Does verification run synchronously (blocking output) or asynchronously (allowing unverified responses)?
  • Audit Trails: Can you export steering decisions for regulatory review?

These questions matter for financial applications where compliance failures carry legal consequences. The Launch HN discussion does not address these security boundaries, so teams must clarify them directly before production deployment.

Deployment Model and Operational Considerations

The API is hosted, so you do not manage the intervention infrastructure. This simplifies deployment but creates a dependency on uptime and rate limits. For financial applications, you need:

  • SLA Guarantees: What happens if the API is down during market hours?
  • Data Residency: Does the API log prompts? Where are activations stored?
  • Audit Trails: Can you export steering decisions for regulatory review?

Self-hosted deployment is not mentioned in the documentation or Launch HN thread. If on-premise licensing becomes available, you gain control but must manage feature extraction, vector storage, and verification graphs yourself.

When to Use Runtime Steering

Runtime steering vectors work best in high-stakes domains (finance, healthcare, legal) where prompt engineering fails audits. Applications that need deterministic behavior without fine-tuning budgets benefit most. Scenarios where reasoning paths matter more than knowledge retrieval are ideal candidates.

The approach is a poor fit for latency-sensitive applications where overhead is unacceptable. Use cases where prompt engineering already works reliably do not need the added complexity. Environments that cannot tolerate third-party API dependencies should avoid hosted-only solutions.

Technical Verdict

Runtime steering vectors solve a real problem: deterministic control over reasoning without retraining. The approach works when prompt engineering is too brittle and fine-tuning is too expensive. Financial agents that scan compliance communications or enforce regulatory policies benefit most.

The missing pieces are latency benchmarks, verification mechanism transparency, security boundary documentation, and self-hosted deployment options. Teams evaluating the platform for production financial applications should treat verification behavior, conflict resolution logic, and latency impact as pre-sales blockers that require vendor clarification before committing.

Use it when you need reproducible reasoning paths and can tolerate unmeasured latency overhead. Avoid it if you need guaranteed sub-100ms responses, require self-hosted deployment for compliance, or cannot accept a proprietary verification layer without audit access.