mech.app
AI Agents

Continue? Y/N: What a 60-Second Game Reveals About Agent Permission Fatigue

A viral Show HN game exposes the real infrastructure problem: how agent systems handle human approval loops, consent boundaries, and permission fatigue...

Source: llmgame.scalex.dev
Continue? Y/N: What a 60-Second Game Reveals About Agent Permission Fatigue

A 60-second game just hit Show HN with 308 points and 129 comments. The premise is simple: an AI agent asks for permission to execute commands, and you click Y or N. The catch? The requests come fast, the language is deliberately confusing, and by second 45 you are clicking yes to things you should not approve.

This is not a game about reading comprehension. It is a game about the production bottleneck every agent system hits when humans must approve tool calls, API access, and state changes. The fatigue is real, the security implications are worse, and the infrastructure patterns to handle this problem are still immature.

The Permission Fatigue Problem

Agent frameworks treat human approval as a binary gate: pause execution, show a prompt, wait for yes or no, resume. This works for demos. It breaks in production when:

  • An agent makes 15 tool calls to complete one task
  • Multiple agents run in parallel and queue approval requests
  • The user is in a different timezone or offline for hours
  • The approval prompt shows raw JSON instead of human-readable context
  • There is no way to approve a batch of similar requests at once

The game simulates this by flooding you with requests. By the time you hit “approve deletion of production database” you are on autopilot. The same thing happens in real systems when engineers approve Terraform plans or CI/CD pipelines after the tenth similar request in a row.

State Management for Interrupted Execution

When an agent pauses for approval, it needs to preserve:

  • Current execution context (which step in the plan, what data was fetched)
  • Tool call parameters and expected return types
  • Conversation history and reasoning chain
  • Timeout policies (how long to wait before failing)
  • Rollback state if the user denies permission

Most frameworks serialize the agent state to a database or message queue. The tricky part is resuming execution without re-running expensive LLM calls or losing the reasoning thread.

Common patterns:

PatternHow It WorksTrade-off
Checkpoint snapshotsSerialize full agent state to blob storageSimple but memory-heavy; hard to inspect or modify paused state
Event sourcingLog every tool call and decision as an eventReplayable and auditable but requires careful event schema design
Workflow enginesUse Temporal or Prefect to manage durable executionBest for long-running agents but adds orchestration complexity
Stateless with context injectionStore only conversation history; re-prompt LLM on resumeCheapest but non-deterministic; agent may change its mind

The game does not pause. Real systems must, and the state management choice determines whether your agent can survive a 12-hour approval delay or crashes with “context expired.”

Permission Scope Design

The game asks for permission to “delete a file” without specifying which file. Real agent systems face the same problem: how granular should permission scopes be?

Too granular:

  • User must approve every API call individually
  • 50 approval prompts to send one email with attachments
  • Engineers start rubber-stamping requests to get work done

Too coarse:

  • Agent gets blanket permission to “use the filesystem”
  • No visibility into what it actually does
  • One compromised prompt leads to full system access

Better approach: capability-based permissions with preview

class ToolPermission:
    def __init__(self, tool_name, scope, preview_fn):
        self.tool_name = tool_name
        self.scope = scope  # e.g., "read-only", "write-safe-dir", "delete-with-backup"
        self.preview_fn = preview_fn  # generates human-readable summary
    
    def request_approval(self, params):
        preview = self.preview_fn(params)
        # Show: "Agent wants to delete 3 files in /tmp/cache (total 45MB)"
        # Instead of: "Agent wants to call os.remove with args=['file1.tmp']"
        return user_approval_service.prompt(preview, timeout=300)

# Usage
file_tool = ToolPermission(
    tool_name="delete_file",
    scope="safe-directories-only",
    preview_fn=lambda p: f"Delete {len(p['files'])} files in {p['dir']} ({p['total_size']})"
)

The preview function is critical. It translates raw parameters into a decision the user can actually evaluate in three seconds.

Batching and Approval Policies

Production systems need policies that reduce interruption frequency without sacrificing control:

Time-boxed batching:

  • Collect all permission requests in a 30-second window
  • Show one approval prompt with a grouped summary
  • User approves or denies the batch

Risk-based routing:

  • Low-risk actions (read-only API calls) auto-approve
  • Medium-risk actions (write to staging) require one-click approval
  • High-risk actions (production changes, external API calls with cost) require typed confirmation

Delegation and escalation:

  • Junior agents can only request low-risk permissions
  • Senior agents inherit a trust budget (10 auto-approved actions per hour)
  • Escalate to human only when budget is exhausted or risk threshold exceeded

The game does not implement any of these. It just spams you with requests. Real systems that skip batching and risk tiers end up with the same user experience: click fatigue leading to security incidents.

Observability for Approval Loops

You need telemetry on:

  • Approval latency: How long does the user take to respond? Are requests timing out?
  • Approval rate: What percentage of requests get denied? High denial rate means the agent is asking for the wrong things.
  • Fatigue signals: Are users approving requests faster over time (indicating they stopped reading)?
  • Failure recovery: How often do agents crash or lose state while waiting for approval?

Example metrics to track:

approval_latency_seconds = Histogram("agent_approval_latency", ["agent_id", "tool_name"])
approval_decision = Counter("agent_approval_decision", ["agent_id", "tool_name", "decision"])
approval_timeout = Counter("agent_approval_timeout", ["agent_id", "tool_name"])

# In your approval handler
start = time.time()
decision = await wait_for_approval(request, timeout=300)
approval_latency_seconds.labels(agent_id, tool_name).observe(time.time() - start)
approval_decision.labels(agent_id, tool_name, decision).inc()

If your median approval latency is 2 seconds but your 95th percentile is 180 seconds, you have a timezone problem or a notification delivery problem. If your approval rate drops below 60%, your agent is asking for things it should not need.

Failure Modes

Approval timeout with no fallback:

  • Agent waits forever for a response that never comes
  • Execution hangs, resources leak, downstream tasks pile up
  • Fix: Implement timeout with default-deny or escalation to a backup approver

Lost context on resume:

  • Agent re-prompts the LLM after approval but loses the reasoning chain
  • Makes different decisions than it would have if execution had not paused
  • Fix: Store full conversation history and reasoning trace, not just the last message

Approval prompt shows raw JSON:

  • User sees {"action": "delete", "path": "/var/log/app.log", "recursive": false}
  • User has no idea if this is safe or dangerous
  • Fix: Write preview functions that translate parameters into plain English

No audit trail:

  • User approves a request, agent executes it, something breaks
  • No record of what was approved, when, or by whom
  • Fix: Log every approval decision with timestamp, user ID, and full request context

Architecture: Approval Service as a Sidecar

Instead of embedding approval logic in every agent, extract it into a shared service:

┌─────────────┐
│   Agent A   │──┐
└─────────────┘  │
                 ├──► ┌──────────────────┐      ┌──────────────┐
┌─────────────┐  │    │ Approval Service │◄────►│ Notification │
│   Agent B   │──┤    │  - Queue         │      │   Service    │
└─────────────┘  │    │  - Policy Engine │      └──────────────┘
                 │    │  - State Store   │
┌─────────────┐  │    └──────────────────┘
│   Agent C   │──┘            │
└─────────────┘               │

                      ┌──────────────┐
                      │ Audit Log DB │
                      └──────────────┘

Approval Service responsibilities:

  • Queue incoming requests from all agents
  • Apply batching and risk-based routing policies
  • Send notifications (Slack, email, mobile push)
  • Store pending requests with TTL
  • Emit metrics and audit logs
  • Resume agent execution after decision

This centralizes policy enforcement and makes it easier to tune batching windows, risk thresholds, and notification channels without modifying agent code.

When Permission Fatigue Becomes a Security Incident

The game proves that humans are bad at reading repetitive approval prompts. In production, this leads to:

  • Approving a tool call that deletes production data
  • Granting an agent access to a sensitive API it should not touch
  • Clicking yes on a prompt that was actually a phishing attempt (adversarial prompt injection)

Mitigations:

  • Rate limiting: Cap the number of approval requests per agent per hour
  • Anomaly detection: Flag requests that deviate from the agent’s normal behavior
  • Mandatory delays: Force a 5-second wait before allowing approval on high-risk actions
  • Randomized confirmation: Require the user to type a random word instead of just clicking yes

The game does not implement any of these. It just lets you click yes as fast as you want. Real systems that skip these mitigations end up in incident postmortems.

Technical Verdict

Use human-in-the-loop approval when:

  • Agents interact with production systems or external APIs
  • Compliance requires human oversight for certain actions
  • The cost of a mistake is higher than the cost of interruption

Avoid it when:

  • The agent only reads data or performs idempotent operations
  • Approval latency breaks the user experience (real-time chat, live dashboards)
  • You have not built batching, risk tiers, and observability (you will create permission fatigue and security risk)

The game is a useful reminder that approval prompts are not a security feature if users stop reading them. If your agent system requires more than 3-5 approvals per task, you need better permission scopes, batching policies, or a different trust model.