mech.app
AI Agents

Stave's Agent-Centric Cloud Security: Why Machine-Verifiable Contracts Beat Traditional Boundary Enforcement

How Stave built machine-verifiable contracts at every pipeline boundary instead of traditional agent-based security enforcement.

Source: dev.to
Stave's Agent-Centric Cloud Security: Why Machine-Verifiable Contracts Beat Traditional Boundary Enforcement

Stave calls itself the first agent-centric cloud security platform, but it deliberately avoided deploying agents. Instead, every boundary in the pipeline has a machine-verifiable contract. The architecture was built for solo developer productivity. It turned out to be exactly what agents need to build security pipelines without human handholding.

The distinction matters. Most security vendors are adding AI copilots that summarize findings or chatbots that answer questions. Stave lets agents construct the entire pipeline: data export, reasoning engine selection, verdict generation, and remediation. The contracts make this possible without brittle prompt engineering or probabilistic output parsing.

What Machine-Verifiable Contracts Look Like

A machine-verifiable contract at an agent boundary is not an API schema. It’s a formal specification of valid state transitions with deterministic evaluation.

Stave’s contracts use standard JSON Schema for data shape, but add three layers:

  • Exit codes instead of prose output. Every tool returns 0 (success), 1 (violation detected), or 2 (evaluation error). No parsing natural language.
  • Deterministic evaluation rules. Given the same input facts, the reasoning engine must produce the same verdict. No probabilistic scoring.
  • Composable tool boundaries. Each tool reads JSON, performs one operation, writes JSON. No monolithic state machines.

The key difference from traditional API contracts: these contracts specify logical correctness, not just data shape. An API contract says “this endpoint returns a list of security findings.” A boundary contract says “given these infrastructure facts, this reasoning engine will produce a verdict that satisfies these logical constraints.”

The Five-Engine Proof

After fourteen months of development, the author ran five blind trials. Each agent received:

  • A reasoning specification (formal logic rules for security policy)
  • Stave’s data export (infrastructure facts in JSON)
  • No implementation code
  • No documentation beyond the spec
  • No guidance on reasoning engine selection

The agents produced correct security verdicts using five different reasoning engines:

EngineTypeWhat It Proves
Z3SMT solverMathematical proof via constraint solving
SouffléDatalogBlast radius via transitive closure
ClingoASP solverViolation detection through answer sets
PrologLogic programmingSecurity property derivation proof trees
PRISMProbabilistic checkerRisk probability under uncertainty

Two trials were fully blind: fresh agents with zero prior context. Both passed.

This validates the contract model because agents could swap reasoning engines without breaking the pipeline. The contracts don’t care about implementation. They specify what a valid verdict looks like. The agent picks the engine, writes the glue code, and verifies the output against the contract. No engine-specific integration logic in the platform itself.

Why Traditional Agent-Based Security Fails Here

Traditional agent-based security deploys monitoring agents into the environment. The agent watches API calls, network traffic, or system calls. It reports anomalies to a central platform.

This model breaks when the agent itself is the untrusted actor. If an AI agent is building your security pipeline, you can’t trust it to monitor itself. You need boundary enforcement at every step.

Stave’s contracts solve this by making verification external to the agent. The contract checker doesn’t trust the agent’s reasoning engine choice or implementation. It only verifies that the output satisfies the logical constraints.

The performance trade-off:

  • Traditional agent monitoring: Low overhead per event, high complexity in anomaly detection, probabilistic alerts.
  • Contract verification: Higher overhead per boundary crossing, zero complexity in verification logic, deterministic pass/fail.

For agentic systems, deterministic pass/fail is worth the overhead. You can’t debug probabilistic agent behavior in production.

Architecture: How Boundaries Compose

Stave’s pipeline has four boundaries, each with a contract:

1. Cloud Provider API → Fact Extractor
   Contract: JSON Schema + Rate Limits

2. Fact Extractor → Reasoning Engine
   Contract: SIR Schema + Completeness Check

3. Reasoning Engine → Verdict Output
   Contract: Verdict Schema + Logic Constraints

4. Verdict Output → Remediation Actions
   Contract: Idempotency + Rollback

An agent building this pipeline must satisfy all four contracts. It can swap reasoning engines, change remediation strategies, or optimize data extraction. The contracts don’t care. They only verify correctness at each boundary.

The SIR (Security Infrastructure Representation) is the critical middle layer. It’s a normalized fact schema that decouples cloud provider APIs from reasoning engines. An agent can target any cloud provider by writing a fact extractor that satisfies the SIR contract.

What This Means for Trust Boundaries

Traditional security platforms trust their own code. They deploy proprietary agents, run closed-source analysis, and present findings in a dashboard. The trust boundary is the platform perimeter.

Stave inverts this. The platform doesn’t trust anything, including agents it orchestrates. Every boundary has a contract. Every contract is verifiable by external tools.

This matters for three failure modes:

  1. Agent hallucination: If an agent generates invalid reasoning, the contract rejects it. No hallucinated verdicts reach production.
  2. Supply chain compromise: If a reasoning engine is backdoored, the contract still verifies logical correctness. A compromised engine can’t produce invalid verdicts that pass verification.
  3. Configuration drift: If infrastructure changes break assumptions, the fact extractor contract fails. No stale reasoning reaches the verdict stage.

The cost: you can’t shortcut verification. Every boundary crossing pays the contract overhead. For human-driven pipelines, this is expensive. For agent-driven pipelines, it’s the only way to maintain correctness.

Implementation: What the Contracts Actually Check

A boundary contract at the reasoning engine stage looks like this:

{
  "verdict_schema": {
    "type": "object",
    "required": ["policy_id", "satisfied", "evidence"],
    "properties": {
      "policy_id": {"type": "string"},
      "satisfied": {"type": "boolean"},
      "evidence": {
        "type": "array",
        "items": {"$ref": "#/definitions/fact"}
      }
    }
  },
  "logical_constraints": [
    {
      "rule": "if satisfied=false, evidence must contain violating facts",
      "checker": "constraint_validator"
    },
    {
      "rule": "evidence facts must exist in SIR export",
      "checker": "set_membership"
    },
    {
      "rule": "verdict must be reproducible from evidence alone",
      "checker": "determinism_validator"
    }
  ],
  "exit_codes": {
    "0": "policy satisfied",
    "1": "policy violated",
    "2": "evaluation error (missing facts, logic error, timeout)"
  }
}

The contract checker runs three verifications:

  1. Schema validation: Does the verdict match the JSON schema?
  2. Logical constraint checking: Do the evidence facts support the verdict under the policy rules?
  3. Reproducibility check: Can the verdict be derived from the evidence facts alone, without hidden state?

If any verification fails, the contract rejects the verdict. The agent must fix the reasoning engine or change the evidence.

The reproducibility check is the hardest to satisfy. It requires the reasoning engine to expose its inference steps, not just the final verdict. This is why Stave supports multiple reasoning engines: different engines expose different proof artifacts (SMT models, Datalog derivations, ASP answer sets, Prolog proof trees).

When to Use This Pattern

Machine-verifiable contracts at every boundary make sense when:

  • Solo developers need to audit agent-built pipelines. One person can’t manually verify every agent decision. Contracts automate correctness checks.
  • Agents must build the pipeline, not just query it. If humans write the integration code, traditional API contracts are simpler.
  • Security policy can be formalized into logical constraints. If policy is ambiguous or political, machine verification is impossible.

Avoid this pattern when:

  • Latency is critical. Contract verification adds overhead at every boundary. For real-time systems, this may be unacceptable.
  • Rapid prototyping matters more than correctness. The contract overhead slows initial development. If you need to ship fast and iterate on policy definitions, simpler tooling will be faster in the short term.
  • Policy evaluation has one correct implementation. If there’s only one valid reasoning engine, a hardcoded engine is more efficient than contract-based swapping.

Technical Verdict

Stave demonstrates that agent-centric doesn’t mean deploying agents everywhere. It means designing boundaries that agents can reason about formally. Machine-verifiable contracts turn security policy into a specification problem, not a monitoring problem.

The architecture works because it decouples three concerns: fact extraction (cloud provider APIs), reasoning (policy evaluation), and remediation (infrastructure changes). Each boundary has a contract. Agents can swap implementations at any boundary without breaking the pipeline.

Use this pattern when agents are building security pipelines and correctness is non-negotiable. Avoid it when latency matters more than formal verification or when rapid iteration on policy definitions is the priority.

Tags

agentic-ai orchestration infrastructure

Primary Source

dev.to