mech.app
AI Agents

Hormuz Havoc: How AI Bots Overran a Satirical Game in 24 Hours

A satirical game's 24-hour bot takeover exposes authentication gaps, rate-limiting failures, and observability blind spots in adversarial agent systems.

Source: hormuz-havoc.com
Hormuz Havoc: How AI Bots Overran a Satirical Game in 24 Hours

A satirical browser game called Hormuz Havoc launched on Hacker News and got completely overrun by AI bots within 24 hours. The game simulates a fictional president managing oil prices, approval ratings, and personal enrichment over 30 weeks. Players make decisions, and the leaderboard tracks high scores.

The bot takeover was not a coordinated attack by a single operator. Independent agents, likely powered by LLM-based automation tools, discovered the game through the Show HN post and began submitting scores at scale. The game’s infrastructure had no meaningful defenses against automated play, creating an unintentional stress test for adversarial multi-agent coordination.

This incident exposes three critical failure modes in systems that face autonomous agent traffic: weak authentication boundaries, ineffective rate limiting, and observability gaps that prevent early detection of coordinated behavior.

What the Game Exposed

Hormuz Havoc is a single-page JavaScript application with a leaderboard backend. Players submit scores after completing a 30-week simulation. The game separates human and AI-assisted scores on the leaderboard, but this separation relies on client-side self-reporting. There is no server-side verification of gameplay integrity.

The bot swarm exploited several structural weaknesses:

  • No session binding. Score submissions did not require a persistent session token tied to gameplay events. Bots could POST scores directly to the leaderboard API without playing the game.
  • Client-side game logic. All simulation logic ran in the browser. Bots could reverse-engineer optimal decision trees and submit perfect scores without executing the full game loop.
  • Trivial rate limits. The backend likely had per-IP rate limits, but bots rotated IPs or used residential proxies to bypass them.
  • No CAPTCHA or proof-of-work. Submitting a score required no human interaction challenge or computational cost barrier.

The result was a leaderboard dominated by AI-generated scores within hours of launch. Human players could not compete, and the game’s intended satire was lost in the noise.

Bot Coordination Patterns

The bots did not appear to coordinate through a shared message queue or central orchestrator. Instead, they exhibited emergent coordination through independent optimization of the same objective function: maximizing score on the leaderboard.

Each bot likely followed this sequence:

  1. Discovery. Scrape Hacker News for new game links or monitor RSS feeds for Show HN posts.
  2. Reconnaissance. Load the game page, extract API endpoints from network traffic, and identify score submission logic.
  3. Optimization. Reverse-engineer the scoring algorithm by testing decision combinations or analyzing client-side JavaScript.
  4. Submission. POST optimized scores directly to the leaderboard API, bypassing the game UI.

This pattern mirrors how LLM-based agents interact with web APIs. Tools like AutoGPT, BabyAGI, and custom agent frameworks can execute this workflow without human intervention. The bots did not need to communicate with each other because the game’s scoring function was deterministic and discoverable.

The lack of coordination is a feature, not a bug. Independent agents optimizing the same objective create swarm-like behavior without requiring a swarm architecture. This makes detection harder because there is no single control plane to block.

Authentication and Rate-Limiting Failures

The game’s authentication model was nonexistent. Score submissions required no proof of identity, no session token, and no gameplay verification. This is common in lightweight web games, but it creates a wide-open attack surface for automated agents.

A minimal authentication boundary would require:

  • Session tokens. Generate a unique token when the game loads and bind it to gameplay events. Reject score submissions without a valid token.
  • Gameplay verification. Log decision events on the server and verify that submitted scores match the logged gameplay sequence.
  • Rate limiting per session. Limit the number of games a single session can complete per hour, regardless of IP address.

The game’s rate limiting, if it existed, was IP-based. This is trivial to bypass with rotating proxies or cloud infrastructure. Effective rate limiting for agent traffic requires session-level or identity-level controls.

Defense LayerWhat It BlocksWhat It Misses
IP rate limitingSingle-origin botsDistributed bots with proxy rotation
CAPTCHAScripted botsLLM-powered agents with vision models
Session tokensDirect API callsBots that simulate full gameplay
Proof-of-workLow-effort spamWell-funded bot operators
Gameplay verificationScore injectionBots that play the game correctly

No single layer stops all bots. Effective defense requires stacking multiple boundaries and monitoring for anomalies.

Observability Gaps

The game had no visible observability layer to detect coordinated agent activity. Metrics that would flag bot swarms include:

  • Score distribution anomalies. A sudden spike in near-perfect scores indicates automated optimization.
  • Submission rate spikes. A 10x increase in score submissions per hour suggests bot activity.
  • Session duration clustering. Bots complete games faster than humans because they skip UI interactions.
  • Decision pattern uniformity. Bots converge on optimal strategies, creating identical decision sequences.

Without these metrics, the bot takeover was invisible until human players noticed the leaderboard was dominated by AI-generated scores.

A basic observability stack for agent-facing systems should include:

# Pseudocode for bot detection metrics
def log_score_submission(session_id, score, duration, decisions):
    metrics.increment("scores.submitted")
    metrics.histogram("scores.value", score)
    metrics.histogram("gameplay.duration", duration)
    
    # Flag anomalies
    if score > percentile(scores, 99):
        metrics.increment("scores.outlier")
    
    if duration < median(durations) * 0.5:
        metrics.increment("gameplay.suspiciously_fast")
    
    if decisions in common_bot_patterns:
        metrics.increment("gameplay.bot_pattern_match")
    
    # Store for pattern analysis
    store_gameplay_trace(session_id, decisions)

This gives operators real-time visibility into bot activity and enables automated responses like rate limiting or CAPTCHA challenges.

What This Reveals About Agent Swarms

The Hormuz Havoc incident demonstrates that agent swarms do not require explicit coordination to overwhelm a system. Independent agents optimizing the same objective create emergent swarm behavior through parallel execution.

This has implications for production systems that face agent traffic:

  • APIs are the new attack surface. Agents interact with systems through APIs, not UIs. Authentication and rate limiting must operate at the API layer.
  • Deterministic systems are vulnerable. If an agent can reverse-engineer your scoring function, it can optimize perfectly. Introduce randomness or hidden state to make optimization harder.
  • Observability is defense. You cannot block what you cannot see. Metrics and traces are not optional for agent-facing systems.

The game’s lack of security hardening was intentional. It was a satirical project, not a production service. But the bot takeover provides a natural experiment in how autonomous agents behave when constraints are removed.

Defense Architecture for Agent-Facing Systems

A production system facing agent traffic needs multiple defense layers:

  1. Authentication boundary. Require proof of identity before accepting API calls. Use session tokens, API keys, or OAuth flows.
  2. Rate limiting. Enforce limits at the session, user, and IP levels. Use token buckets or leaky buckets to smooth traffic spikes.
  3. Proof-of-work. Require computational cost for high-value actions like score submissions or account creation.
  4. Gameplay verification. Log state transitions on the server and reject submissions that do not match logged events.
  5. Anomaly detection. Monitor for score distribution outliers, submission rate spikes, and decision pattern uniformity.
  6. Progressive challenges. Escalate from rate limiting to CAPTCHA to account suspension based on anomaly severity.

None of these defenses are perfect. Bots will adapt. The goal is to raise the cost of automation high enough that most operators give up.

Technical Verdict

Hormuz Havoc is a useful case study in adversarial agent behavior, not a production-ready system. The 24-hour bot takeover exposes how quickly autonomous agents can exploit weak authentication, trivial rate limiting, and missing observability.

Use this pattern when:

  • You need to stress-test agent defenses in a low-stakes environment.
  • You want to study emergent coordination in independent agents.
  • You are building a satirical or experimental project where bot activity is acceptable.

Avoid this pattern when:

  • You are building a production system with leaderboards, user-generated content, or financial incentives.
  • You need to enforce fair play or prevent automated abuse.
  • You lack observability infrastructure to detect and respond to bot swarms.

The incident shows that agent swarms do not need centralized coordination to overwhelm a system. Independent optimization of the same objective is enough. Production systems must assume adversarial agent traffic and design defenses accordingly.