mech.app
Financial

Enterprise Agent Pricing: How Anthropic and OpenAI Switched from Seat Licenses to API Metering

April 2026 marked a pricing pivot where both labs moved enterprise customers from flat seat pricing to per-token API costs, exposing the infrastructure...

Source: simonwillison.net
Enterprise Agent Pricing: How Anthropic and OpenAI Switched from Seat Licenses to API Metering

Uber’s budget overrun in Q1 2026 reveals the core problem: seat-based forecasting cannot predict agent token burn, which can exceed annual budgets in months. April 2026 marked a quiet but significant shift in how enterprise customers pay for coding agents. Both Anthropic and OpenAI moved from flat seat-based pricing to per-token API metering, and the results are showing up in corporate budgets. Microsoft started canceling Claude Code licenses before the end of its fiscal year. The pricing model exposes a fundamental mismatch: enterprise procurement systems expect predictable monthly costs, but agents generate variable token consumption that can swing 50x based on activity.

This article examines the metering infrastructure, observability requirements, and billing boundaries that emerge when you move from $20/seat to $1,000+/month per heavy user.

The Pricing Pivot

Anthropic switched its Enterprise plan from “enough usage for a typical workday” to $20/seat/month plus API pricing sometime between August 2025 and April 2026. An Anthropic spokesperson claimed the change occurred in November 2025, but customers only discovered it when renewing contracts in April 2026.

OpenAI made a similar move on April 2, 2026, updating Codex pricing to align with API token usage instead of per-message pricing. By April 23, 2026, all existing ChatGPT Enterprise plans (including Edu, Health, Gov, and ChatGPT for Teachers) were on the new model.

Both companies released new frontier models in April with higher API prices, locking enterprise customers into these rates through year-long contracts.

ModelRelease DatePrice vs. PreviousActual Token CostEnterprise Lock-In
GPT-5.5April 23, 20262x GPT-5.4~$0.20/1K input tokensYear-long contracts at new API rates
Opus 4.7April 16, 20261.4x Opus 4.6 (after tokenizer adjustment)~$0.15/1K input tokensYear-long contracts at new API rates

Enterprise customers who signed year-long deals are now locked into these API prices, not the previous extreme discounts.

The Token Consumption Gap

Simon Willison’s personal usage illustrates the gap between subscription pricing and API-equivalent costs. Over 30 days, he consumed:

  • $1,199.79 in Anthropic Claude Code tokens
  • $980.37 in OpenAI Codex tokens
  • Total: $2,180.16 for $200 in subscriptions

That’s an 11x difference, and Willison describes himself as a “moderately heavy user” who is not running agents every hour of the day and night. Heavy users in enterprise environments can easily exceed $1,000/month per seat in API costs.

The assumption that enterprise customers were getting similar discounts turned out to be wrong. As of April 2026, the “Enterprise” cost for both OpenAI Codex and Anthropic Claude Code/Cowork matches the listed API price.

Metering and Billing Infrastructure

Moving from seat licenses to token metering requires new infrastructure at multiple layers.

Session Boundary Problems

Agents run across multiple sessions, often autonomously overnight. Traditional SaaS billing assumes a user logs in, performs actions, and logs out. Agents do not follow this pattern. Willison’s overnight agent runs demonstrate the problem: token consumption happens when no human is watching. You need:

  • Session persistence that tracks token usage across disconnects and reconnects
  • Attribution logic that ties agent activity back to the initiating user or team
  • Timeout policies that prevent runaway agents from burning unlimited tokens

Real-Time Cost Visibility

When a $20/month seat can generate $1,000+ in API costs, observability becomes a procurement issue. Anthropic’s hybrid model charges $20/seat/month plus API costs, but Willison’s data shows heavy users hit $1,000+/month in API charges alone. Required components:

  • Per-user token counters updated in near real-time
  • Budget threshold alerts before costs spiral
  • Historical usage dashboards that show trends and anomalies
  • Cost attribution by project, team, or department

Billing System Integration

Enterprise procurement systems expect predictable monthly costs. Variable token consumption breaks this model. Anthropic’s $20/seat + API pricing structure creates a forecasting gap: the seat fee is predictable, but the API charges can swing 50x based on agent activity.

Rate Limiting and Quotas

To prevent budget overruns, you need rate limiting at multiple scopes:

# Server-side metering service that runs before agent task dispatch
# Note: In production, use atomic operations or database transactions
# to prevent race conditions on current_usage updates
class AgentTokenBudget:
    def __init__(self, user_id, monthly_limit_usd):
        self.user_id = user_id
        self.monthly_limit = monthly_limit_usd
        self.current_usage = self.get_current_month_usage()
    
    def check_budget(self, estimated_tokens, cost_per_token):
        estimated_cost = estimated_tokens * cost_per_token
        projected_total = self.current_usage + estimated_cost
        
        if projected_total > self.monthly_limit:
            raise BudgetExceededError(
                f"User {self.user_id} would exceed monthly limit: "
                f"${projected_total:.2f} > ${self.monthly_limit:.2f}"
            )
        
        return True
    
    def record_usage(self, actual_tokens, cost_per_token):
        actual_cost = actual_tokens * cost_per_token
        self.current_usage += actual_cost
        self.emit_usage_metric(actual_cost)

This runs on the server-side metering service before dispatching agent tasks. It requires:

  • Pre-flight budget checks before starting long-running agent tasks
  • Incremental usage recording as tokens are consumed
  • Graceful degradation when budgets are exhausted (queue tasks, notify user, or switch to cheaper models)

Observability Requirements

When token costs become material, you need observability that answers:

  1. Which agents are burning tokens? Specific agent workflows or tasks, not just which users.
  2. Why did usage spike? Was it a runaway loop, a legitimate batch job, or a model upgrade that changed token efficiency?
  3. Can we forecast next month? Historical patterns, seasonal trends, and growth rates.

Logging infrastructure must capture:

  • Token counts per API call (input and output separately)
  • Model version and pricing tier
  • Agent task type and duration
  • User attribution and project tags
  • Retry attempts and error-induced token waste

To identify which agents are burning tokens, query your usage logs:

-- Schema: usage_logs (user_id, agent_type, tokens, timestamp, model_version, cost_usd)
SELECT agent_type, SUM(tokens) as total_tokens, COUNT(*) as runs
FROM usage_logs
WHERE user_id = 'user_12345'
  AND timestamp >= NOW() - INTERVAL '30 days'
GROUP BY agent_type
ORDER BY total_tokens DESC
LIMIT 10;

This shows the top 10 agent types by token consumption for a specific user over the past 30 days.

The Inference Spend Floor

Anthropic’s SpaceX S-1 filing revealed they are paying $1.25 billion per month for compute capacity on Colossus and Colossus II through May 2029. This is for inference, not training, and represents just one of their vendors.

If Anthropic is spending $1.25 billion/month on inference from a single provider, the total inference budget across all vendors is likely several times higher. This suggests inference costs are a material portion of their revenue target.

Their rumored $10.9 billion Q2 2026 revenue suggests they are finding enough heavy users to cover these costs.

Enterprise Sales Headcount

Both companies are hiring aggressively for enterprise sales. OpenAI has 229 of 703 jobs (32.6%) in enterprise sales and support. Anthropic has 105 of 390 jobs (26.9%) in enterprise sales and support.

This is a labor-intensive business model. Enterprise contracts do not close themselves, and each customer requires account executives, forward-deployed engineers, and ongoing support.

Procurement Misalignment: Uber and Microsoft Case Studies

The Uber case is instructive. CTO Praveen Neppalli Naga indicated that Uber “maxed out its full year AI budget just a few months into 2026,” mostly due to Claude Code usage. Given that Claude Code only became genuinely useful in November 2025, a budget set in 2025 could not have predicted 2026 demand.

Microsoft’s seat cancellations before the June 30 fiscal year-end suggest similar budget pressure. The Verge reported that sources described the decision as “also a financial one,” not just a dogfooding strategy.

These are not AI failures. They are pricing model mismatches. The best advice on pricing is that your customer should suck air through their teeth and then say yes. Uber’s budget overrun and Microsoft’s seat cancellations look like that effect playing out.

Technical Verdict

Use token metering only if your agents run predictable workloads where you can forecast token consumption within 20% accuracy month-over-month and your finance team can absorb monthly cost swings of 10x to 50x without triggering procurement freezes. You also need real-time usage dashboards (Anthropic’s billing API or OpenAI’s usage dashboard), engineering capacity to implement pre-flight budget checks and graceful degradation, and a billing platform (Stripe Billing or similar) that handles variable monthly costs without manual intervention.

Avoid token metering if your agents run autonomously overnight without clear session boundaries or timeout policies, your procurement system requires fixed monthly costs for annual budget approval cycles (as Uber and Microsoft discovered), you lack observability infrastructure to track per-user token usage in real-time, or your workloads include unpredictable batch jobs or exploratory agent runs that can spike token usage by 100x in a single day. In these cases, negotiate hybrid caps with a fixed monthly ceiling or stick with seat licenses that bundle generous token allowances.

The April 2026 pricing pivot aligns revenue with the actual cost structure of inference at scale. Enterprise customers are now paying the real cost of running agents that burn tokens 24/7. The companies that can build metering, observability, and budget control infrastructure will succeed. The ones that cannot will hit budget limits and cancel seats.