Continue? Y/N: What a 60-Second Game Reveals About Agent Permission Fatigue

A 60-second game just hit Show HN with 308 points and 129 comments. The premise is simple: an AI agent asks for permission to execute commands, and you click Y or N. The catch? The requests come fast, the language is deliberately confusing, and by second 45 you are clicking yes to things you should not approve.

This is not a game about reading comprehension. It is a game about the production bottleneck every agent system hits when humans must approve tool calls, API access, and state changes. The fatigue is real, the security implications are worse, and the infrastructure patterns to handle this problem are still immature.

The Permission Fatigue Problem

Agent frameworks treat human approval as a binary gate: pause execution, show a prompt, wait for yes or no, resume. This works for demos. It breaks in production when:

An agent makes 15 tool calls to complete one task
Multiple agents run in parallel and queue approval requests
The user is in a different timezone or offline for hours
The approval prompt shows raw JSON instead of human-readable context
There is no way to approve a batch of similar requests at once

The game simulates this by flooding you with requests. By the time you hit “approve deletion of production database” you are on autopilot. The same thing happens in real systems when engineers approve Terraform plans or CI/CD pipelines after the tenth similar request in a row.

State Management for Interrupted Execution

When an agent pauses for approval, it needs to preserve:

Current execution context (which step in the plan, what data was fetched)
Tool call parameters and expected return types
Conversation history and reasoning chain
Timeout policies (how long to wait before failing)
Rollback state if the user denies permission

Most frameworks serialize the agent state to a database or message queue. The tricky part is resuming execution without re-running expensive LLM calls or losing the reasoning thread.

Common patterns:

Pattern	How It Works	Trade-off
Checkpoint snapshots	Serialize full agent state to blob storage	Simple but memory-heavy; hard to inspect or modify paused state
Event sourcing	Log every tool call and decision as an event	Replayable and auditable but requires careful event schema design
Workflow engines	Use Temporal or Prefect to manage durable execution	Best for long-running agents but adds orchestration complexity
Stateless with context injection	Store only conversation history; re-prompt LLM on resume	Cheapest but non-deterministic; agent may change its mind

The game does not pause. Real systems must, and the state management choice determines whether your agent can survive a 12-hour approval delay or crashes with “context expired.”

Permission Scope Design

The game asks for permission to “delete a file” without specifying which file. Real agent systems face the same problem: how granular should permission scopes be?

Too granular:

User must approve every API call individually
50 approval prompts to send one email with attachments
Engineers start rubber-stamping requests to get work done

Too coarse:

Agent gets blanket permission to “use the filesystem”
No visibility into what it actually does
One compromised prompt leads to full system access

Better approach: capability-based permissions with preview

class ToolPermission:
    def __init__(self, tool_name, scope, preview_fn):
        self.tool_name = tool_name
        self.scope = scope  # e.g., "read-only", "write-safe-dir", "delete-with-backup"
        self.preview_fn = preview_fn  # generates human-readable summary
    
    def request_approval(self, params):
        preview = self.preview_fn(params)
        # Show: "Agent wants to delete 3 files in /tmp/cache (total 45MB)"
        # Instead of: "Agent wants to call os.remove with args=['file1.tmp']"
        return user_approval_service.prompt(preview, timeout=300)

# Usage
file_tool = ToolPermission(
    tool_name="delete_file",
    scope="safe-directories-only",
    preview_fn=lambda p: f"Delete {len(p['files'])} files in {p['dir']} ({p['total_size']})"
)

The preview function is critical. It translates raw parameters into a decision the user can actually evaluate in three seconds.

Batching and Approval Policies

Production systems need policies that reduce interruption frequency without sacrificing control:

Time-boxed batching:

Collect all permission requests in a 30-second window
Show one approval prompt with a grouped summary
User approves or denies the batch

Risk-based routing:

Low-risk actions (read-only API calls) auto-approve
Medium-risk actions (write to staging) require one-click approval
High-risk actions (production changes, external API calls with cost) require typed confirmation

Delegation and escalation:

Junior agents can only request low-risk permissions
Senior agents inherit a trust budget (10 auto-approved actions per hour)
Escalate to human only when budget is exhausted or risk threshold exceeded

The game does not implement any of these. It just spams you with requests. Real systems that skip batching and risk tiers end up with the same user experience: click fatigue leading to security incidents.

Observability for Approval Loops

You need telemetry on:

Approval latency: How long does the user take to respond? Are requests timing out?
Approval rate: What percentage of requests get denied? High denial rate means the agent is asking for the wrong things.
Fatigue signals: Are users approving requests faster over time (indicating they stopped reading)?
Failure recovery: How often do agents crash or lose state while waiting for approval?

Example metrics to track:

approval_latency_seconds = Histogram("agent_approval_latency", ["agent_id", "tool_name"])
approval_decision = Counter("agent_approval_decision", ["agent_id", "tool_name", "decision"])
approval_timeout = Counter("agent_approval_timeout", ["agent_id", "tool_name"])

# In your approval handler
start = time.time()
decision = await wait_for_approval(request, timeout=300)
approval_latency_seconds.labels(agent_id, tool_name).observe(time.time() - start)
approval_decision.labels(agent_id, tool_name, decision).inc()

If your median approval latency is 2 seconds but your 95th percentile is 180 seconds, you have a timezone problem or a notification delivery problem. If your approval rate drops below 60%, your agent is asking for things it should not need.

Failure Modes

Approval timeout with no fallback:

Agent waits forever for a response that never comes
Execution hangs, resources leak, downstream tasks pile up
Fix: Implement timeout with default-deny or escalation to a backup approver

Lost context on resume:

Agent re-prompts the LLM after approval but loses the reasoning chain
Makes different decisions than it would have if execution had not paused
Fix: Store full conversation history and reasoning trace, not just the last message

Approval prompt shows raw JSON:

User sees {"action": "delete", "path": "/var/log/app.log", "recursive": false}
User has no idea if this is safe or dangerous
Fix: Write preview functions that translate parameters into plain English

No audit trail:

User approves a request, agent executes it, something breaks
No record of what was approved, when, or by whom
Fix: Log every approval decision with timestamp, user ID, and full request context

Architecture: Approval Service as a Sidecar

Instead of embedding approval logic in every agent, extract it into a shared service:

┌─────────────┐
│   Agent A   │──┐
└─────────────┘  │
                 ├──► ┌──────────────────┐      ┌──────────────┐
┌─────────────┐  │    │ Approval Service │◄────►│ Notification │
│   Agent B   │──┤    │  - Queue         │      │   Service    │
└─────────────┘  │    │  - Policy Engine │      └──────────────┘
                 │    │  - State Store   │
┌─────────────┐  │    └──────────────────┘
│   Agent C   │──┘            │
└─────────────┘               │
                              ▼
                      ┌──────────────┐
                      │ Audit Log DB │
                      └──────────────┘

Approval Service responsibilities:

Queue incoming requests from all agents
Apply batching and risk-based routing policies
Send notifications (Slack, email, mobile push)
Store pending requests with TTL
Emit metrics and audit logs
Resume agent execution after decision

This centralizes policy enforcement and makes it easier to tune batching windows, risk thresholds, and notification channels without modifying agent code.

When Permission Fatigue Becomes a Security Incident

The game proves that humans are bad at reading repetitive approval prompts. In production, this leads to:

Approving a tool call that deletes production data
Granting an agent access to a sensitive API it should not touch
Clicking yes on a prompt that was actually a phishing attempt (adversarial prompt injection)

Mitigations:

Rate limiting: Cap the number of approval requests per agent per hour
Anomaly detection: Flag requests that deviate from the agent’s normal behavior
Mandatory delays: Force a 5-second wait before allowing approval on high-risk actions
Randomized confirmation: Require the user to type a random word instead of just clicking yes

The game does not implement any of these. It just lets you click yes as fast as you want. Real systems that skip these mitigations end up in incident postmortems.

Technical Verdict

Use human-in-the-loop approval when:

Agents interact with production systems or external APIs
Compliance requires human oversight for certain actions
The cost of a mistake is higher than the cost of interruption

Avoid it when:

The agent only reads data or performs idempotent operations
Approval latency breaks the user experience (real-time chat, live dashboards)
You have not built batching, risk tiers, and observability (you will create permission fatigue and security risk)

The game is a useful reminder that approval prompts are not a security feature if users stop reading them. If your agent system requires more than 3-5 approvals per task, you need better permission scopes, batching policies, or a different trust model.