mech.app
Financial

Agent-Driven Trading Infrastructure: What Happens When Retail Investors Deploy Autonomous Crypto Bots

Exchange API limits, position sizing guardrails, state management across outages, and the gap between backtesting and live execution plumbing.

Source: news.ycombinator.com
Agent-Driven Trading Infrastructure: What Happens When Retail Investors Deploy Autonomous Crypto Bots

A recent Hacker News thread about career transitions surfaced a pattern: engineers planning moves into automated crypto trading as AI disrupts traditional dev roles. The conversation exposed a gap between “I’ll build a trading bot” aspirations and the infrastructure needed to prevent catastrophic losses.

Building a trading agent is not hard. Building one that survives exchange outages, rate limits, and your own bugs without liquidating your account is a different problem. Here’s what the plumbing actually looks like.

Exchange Connectivity: Rate Limits and WebSocket Reconnection

Every exchange enforces API rate limits. Binance allows 1200 requests per minute. Coinbase caps at 10 requests per second. Your agent needs to track its own request budget or the exchange will ban your IP for hours.

REST vs. WebSocket trade-offs:

  • REST: Stateless, easier to retry, but burns rate limit budget on polling
  • WebSocket: Real-time updates, but requires reconnection logic when the socket drops

WebSocket reconnection is where most bots fail. You need:

  • Exponential backoff with jitter to avoid thundering herd
  • Snapshot reconciliation (did you miss any trades during the disconnect?)
  • Duplicate message detection (exchanges often replay the last N messages on reconnect)
class ExchangeWebSocket:
    def __init__(self, url, snapshot_endpoint):
        self.url = url
        self.snapshot_endpoint = snapshot_endpoint
        self.last_sequence = None
        self.reconnect_delay = 1.0
        
    async def reconnect(self):
        await asyncio.sleep(self.reconnect_delay)
        self.reconnect_delay = min(self.reconnect_delay * 2, 60)
        
        # Fetch snapshot to reconcile missed messages
        snapshot = await self.fetch_snapshot()
        self.last_sequence = snapshot['sequence']
        
        # Reconnect WebSocket
        self.ws = await websockets.connect(self.url)
        self.reconnect_delay = 1.0  # Reset on success
        
    async def fetch_snapshot(self):
        # Use REST API to get current order book state
        async with aiohttp.ClientSession() as session:
            async with session.get(self.snapshot_endpoint) as resp:
                return await resp.json()

Position Sizing Guardrails: Preventing Runaway Orders

An autonomous agent without position limits will eventually place an order that exceeds your account balance. This is not theoretical. Public incident reports document $440M in retail bot losses during 2021 flash crashes.

Guardrail layers:

  1. Pre-execution checks: Validate order size against account balance and risk parameters before submitting
  2. Exchange-side limits: Set max order size and daily loss limits in exchange API settings (not all exchanges support this)
  3. Circuit breakers: Halt trading if drawdown exceeds threshold or if order rejection rate spikes
Guardrail TypeWhere It LivesFailure Mode
Pre-execution validationAgent codeBug in validation logic
Exchange API limitsExchange settingsNot all exchanges support; requires manual configuration
Circuit breakerMonitoring layerReacts after damage is done
Kill switchExternal serviceRequires separate infrastructure; adds latency

The problem: guardrails in your agent code can be bypassed by bugs in your agent code. You need an external kill switch that monitors account state and can halt trading independent of the agent’s decision loop.

State Management Across Exchange Outages

Your agent needs to track:

  • Open positions (long/short, size, entry price)
  • Pending orders (limit orders waiting to fill)
  • Recent fills (to reconcile against expected state)
  • Market data (order book, recent trades)

When the exchange goes down, your agent loses visibility into all of this. When it comes back up, you need to reconcile:

  • Did pending orders fill during the outage?
  • Did stop-loss orders trigger?
  • Is your position size what you think it is?

State reconciliation pattern:

async def reconcile_state_after_outage(self):
    # Fetch current positions from exchange
    live_positions = await self.exchange.get_positions()
    
    # Fetch open orders
    live_orders = await self.exchange.get_open_orders()
    
    # Compare against local state
    for symbol, local_pos in self.local_positions.items():
        live_pos = live_positions.get(symbol, 0)
        
        if live_pos != local_pos:
            # Log discrepancy and update local state
            logger.warning(f"Position mismatch for {symbol}: "
                          f"local={local_pos}, live={live_pos}")
            self.local_positions[symbol] = live_pos
            
            # Check if stop-loss triggered
            if abs(live_pos) < abs(local_pos):
                self.handle_stop_loss_fill(symbol, local_pos - live_pos)

You need a persistent state store (Redis, PostgreSQL) separate from the agent’s in-memory state. On restart, the agent loads from the store and reconciles against the exchange.

Backtesting vs. Paper Trading: Different Infrastructure

Backtesting replays historical data. Paper trading uses live market data with simulated execution. These require different infrastructure.

Backtesting plumbing:

  • Load historical OHLCV or tick data from CSV/Parquet
  • Simulate order book state (or use actual historical order book snapshots if available)
  • Assume fills at limit price (optimistic) or worse (realistic slippage model)
  • No network latency, no rate limits, no exchange outages

Paper trading plumbing:

  • Connect to live WebSocket feeds
  • Simulate order execution based on current order book
  • Track simulated positions separately from real positions
  • Handle all the same failure modes as live trading (reconnection, rate limits, outages)

The gap: backtesting infrastructure gives you false confidence. Your strategy might work on historical data but fail in paper trading because:

  • Slippage is worse than your model assumed
  • Your reconnection logic has a bug
  • You hit rate limits during high-volatility periods
  • Your position sizing logic doesn’t account for partial fills

Multi-Exchange Agents: Reconciling Inconsistent APIs

If your agent trades across multiple exchanges, you need a unified abstraction layer. Each exchange has:

  • Different timestamp formats (Unix milliseconds, ISO 8601, exchange-specific)
  • Different order book depth (some provide 20 levels, some provide 100)
  • Different fee structures (maker/taker, tiered by volume, flat)
  • Different order types (some support stop-limit, some don’t)

Abstraction layer pattern:

class UnifiedExchange:
    def __init__(self, exchange_adapter):
        self.adapter = exchange_adapter
        
    async def get_order_book(self, symbol):
        raw_book = await self.adapter.fetch_order_book(symbol)
        return self.normalize_order_book(raw_book)
        
    def normalize_order_book(self, raw_book):
        return {
            'bids': [(float(price), float(size)) 
                     for price, size in raw_book['bids'][:20]],
            'asks': [(float(price), float(size)) 
                     for price, size in raw_book['asks'][:20]],
            'timestamp': self.normalize_timestamp(raw_book['timestamp'])
        }
        
    def normalize_timestamp(self, ts):
        # Convert to Unix milliseconds
        if isinstance(ts, str):
            return int(datetime.fromisoformat(ts).timestamp() * 1000)
        return int(ts)

The CCXT library provides this abstraction for 100+ exchanges, but you still need to handle exchange-specific quirks (rate limits, order size precision, minimum order values).

Observability: What to Log When Things Go Wrong

When your agent loses money, you need to reconstruct what happened. Logs must capture:

  • Every order placement (symbol, side, size, price, timestamp)
  • Every fill (actual execution price, fees paid)
  • Every decision (why did the agent place this order?)
  • Every error (API failures, validation failures, reconnection events)

Structured logging pattern:

logger.info("order_placed", extra={
    "symbol": "BTC-USD",
    "side": "buy",
    "size": 0.1,
    "price": 45000,
    "order_id": "abc123",
    "reason": "mean_reversion_signal",
    "account_balance": 10000,
    "position_before": 0.0
})

Store logs in a queryable format (Elasticsearch, ClickHouse). You need to answer questions like:

  • What was the agent’s position size when it placed this order?
  • How many times did the WebSocket reconnect in the last hour?
  • What percentage of orders were rejected due to insufficient balance?

Technical Verdict

Use autonomous trading agents when:

  • You have a tested strategy with positive expectancy in paper trading (not just backtesting)
  • You have external guardrails (kill switch, position limits) independent of agent code
  • You can afford to lose the entire account balance (seriously)
  • You have observability infrastructure to debug losses post-mortem

Avoid when:

  • You’re moving from backtesting directly to live trading without paper trading
  • Your agent has no external kill switch or position limits
  • You don’t have a plan for handling exchange outages and WebSocket reconnections
  • You’re using this as a primary income source (the failure modes are too unpredictable)

The infrastructure gap between “I built a trading bot” and “I built a trading bot that won’t liquidate my account” is wider than most engineers expect. The plumbing is not glamorous, but it’s the difference between a learning experience and a financial disaster.