Embedding access control rules in agent system prompts is a maintenance nightmare. You end up with sentences like “only allow database writes if the user is an admin and the table is not production” scattered across prompt templates. When you need to change a permission, you grep through YAML files, update three prompts, and hope the LLM still respects the constraint.
CAST separates authorization from reasoning. It intercepts tool calls before execution, evaluates declarative policies, and returns allow or deny decisions without touching your agent’s prompt. The policy engine sits between your orchestrator and the tool runtime, so you can version, test, and audit access rules independently.
The Prompt Engineering Tax
Most agent frameworks let you register tools as Python functions or API endpoints. The agent decides which tool to call based on its prompt and the user’s request. If you want to restrict who can call what, you have two bad options:
- Prompt-based guards: Add instructions like “never delete files for non-admin users” to the system prompt and hope the model obeys.
- Inline checks: Write
if user.role != "admin": raise PermissionErrorinside every tool function.
Prompt-based guards fail silently. The agent might ignore the rule, misinterpret context, or hallucinate exceptions. Inline checks scatter authorization logic across dozens of tool implementations. Neither approach gives you a central audit log or a way to test policies in isolation.
How CAST Intercepts Tool Calls
CAST wraps your tool registry with a policy evaluation layer. When an agent requests a tool call, the framework:
- Captures the tool name, arguments, and execution context (user ID, session metadata, timestamp).
- Loads the relevant policy from a declarative rule set.
- Evaluates the policy against the context.
- Returns an allow decision and executes the tool, or returns a deny decision and logs the attempt.
The agent never sees the policy logic. It just gets a success or failure response. You can swap policies without retraining or re-prompting.
from cast import PolicyEngine, Tool, Context
# Define a tool
@Tool(name="delete_file")
def delete_file(path: str):
os.remove(path)
return f"Deleted {path}"
# Define a policy
policy = """
allow delete_file if:
user.role == "admin"
path not in ["/etc/passwd", "/var/log/system.log"]
"""
# Wrap the tool with the policy engine
engine = PolicyEngine(policies=[policy])
context = Context(user={"role": "user"}, session_id="abc123")
# Agent requests tool call
result = engine.execute("delete_file", {"path": "/tmp/data.csv"}, context)
# result.allowed = False, result.reason = "user.role != admin"
The policy engine evaluates the rule before delete_file runs. If the user lacks admin privileges, the tool never executes. The agent receives a structured denial with a reason.
Policy Language and Context Evaluation
CAST policies use a simple conditional syntax. Each rule specifies:
- Tool name: Which function or API endpoint the rule applies to.
- Conditions: Boolean expressions over context attributes.
- Actions: Allow or deny, with optional logging or alerting.
Context attributes come from your orchestrator. You pass in user identity, resource ownership, time windows, rate limits, or any other metadata your tools need to make authorization decisions.
policies:
- name: restrict_database_writes
tool: execute_sql
allow_if:
- user.role in ["admin", "data_engineer"]
- sql.operation != "DROP"
- time.hour >= 9 and time.hour <= 17
- name: limit_api_calls
tool: call_external_api
allow_if:
- user.api_calls_today < 100
on_deny:
log: "Rate limit exceeded for user {user.id}"
The engine evaluates conditions in order. If all conditions pass, the tool executes. If any condition fails, the engine logs the denial and returns a structured error. You can chain policies with AND or OR logic, and you can define default-deny or default-allow behavior per tool.
Testing and Versioning Policies
Policies live in separate files from your agent code. You can version them in Git, run unit tests against them, and deploy them independently.
# test_policies.py
from cast import PolicyEngine, Context
def test_admin_can_delete():
engine = PolicyEngine.from_file("policies.yaml")
context = Context(user={"role": "admin"})
result = engine.evaluate("delete_file", {"path": "/tmp/test"}, context)
assert result.allowed
def test_user_cannot_delete():
engine = PolicyEngine.from_file("policies.yaml")
context = Context(user={"role": "user"})
result = engine.evaluate("delete_file", {"path": "/tmp/test"}, context)
assert not result.allowed
You can run these tests in CI before deploying new policies. If a policy change breaks an expected permission, the test fails before it reaches production. You can also simulate different contexts (time of day, user attributes, resource states) without invoking the actual tools.
Audit Trails and Observability
Every policy evaluation generates a structured log entry. The engine records:
- Tool name and arguments
- User identity and session metadata
- Policy decision (allow or deny)
- Reason for denial (which condition failed)
- Timestamp and request ID
You can ship these logs to your observability stack (Datadog, Honeycomb, CloudWatch) and build dashboards around access patterns. You can alert on repeated denials, track which users hit rate limits, or audit who called sensitive tools.
{
"timestamp": "2026-06-03T16:15:42Z",
"tool": "delete_file",
"user_id": "user_456",
"session_id": "abc123",
"decision": "deny",
"reason": "user.role != admin",
"policy": "restrict_file_operations",
"request_id": "req_789"
}
This log format integrates with standard security information and event management (SIEM) tools. You can correlate agent activity with application logs, infrastructure metrics, and security alerts.
Architecture Trade-offs
| Aspect | Prompt-Based Guards | Inline Checks | CAST Policies |
|---|---|---|---|
| Centralization | Scattered across prompts | Scattered across tools | Single policy file |
| Auditability | No structured logs | Requires custom logging | Built-in audit trail |
| Testability | Hard to unit test | Requires mocking tools | Policies test in isolation |
| Failure Mode | Silent non-compliance | Runtime exceptions | Structured denials |
| Versioning | Tied to prompt versions | Tied to code deploys | Independent policy deploys |
| Context Awareness | Limited to prompt text | Full code access | Declarative context rules |
CAST adds latency (policy evaluation happens before every tool call) and operational complexity (you need to manage policy files and the engine runtime). But it eliminates the fragility of prompt-based guards and the sprawl of inline checks.
Deployment Shape
CAST runs as a sidecar or a shared service. In a sidecar model, each agent instance runs its own policy engine. Policies load from a config file or a remote policy store. The agent calls the local engine before executing tools.
In a shared service model, all agents call a central policy API. The API loads policies from a database or a Git repository, evaluates them, and returns decisions. This centralizes policy management but adds network latency and a new failure point.
┌─────────────┐ ┌──────────────┐ ┌──────────┐
│ Agent │──────>│ Policy Engine│──────>│ Tool │
│ Orchestrator│ │ (CAST) │ │ Registry │
└─────────────┘ └──────────────┘ └──────────┘
│ │
│ v
│ ┌──────────────┐
└─────────────>│ Audit Log │
└──────────────┘
The policy engine sits between the orchestrator and the tool registry. The orchestrator sends tool call requests to the engine, which evaluates policies and forwards allowed calls to the registry. Denied calls return immediately with a reason.
Likely Failure Modes
Policy syntax errors: A typo in a policy file can break all tool calls. You need schema validation and linting in CI to catch these before deploy.
Context mismatch: If the orchestrator doesn’t pass the right context attributes, policies fail open or closed depending on your default. You need integration tests that verify context shape.
Performance bottlenecks: Evaluating complex policies on every tool call adds latency. You can cache policy decisions for idempotent tools or use a faster policy language (Rego, Cedar).
Policy drift: If policies and tools evolve separately, you can end up with orphaned rules or missing coverage. You need tooling to detect unused policies and tools without policies.
Audit log volume: High-frequency agents generate massive log volumes. You need sampling, aggregation, or a separate audit pipeline to avoid overwhelming your observability stack.
Technical Verdict
Use CAST when you have multiple agents, multiple users, or compliance requirements that demand auditable access control. It makes sense for production systems where authorization logic changes frequently and needs to be tested independently of agent prompts.
Avoid it for single-user prototypes or agents that only call read-only tools. The overhead of policy management and evaluation is not worth it if you have no access control requirements. Stick with inline checks or prompt-based guards until you feel the pain of scattered authorization logic.
CAST shines in multi-tenant environments where different users need different permissions for the same tools. It also works well in regulated industries (healthcare, finance) where you need to prove who accessed what and why.