Uber’s budget overrun in Q1 2026 reveals the core problem: seat-based forecasting cannot predict agent token burn, which can exceed annual budgets in months. April 2026 marked a quiet but significant shift in how enterprise customers pay for coding agents. Both Anthropic and OpenAI moved from flat seat-based pricing to per-token API metering, and the results are showing up in corporate budgets. Microsoft started canceling Claude Code licenses before the end of its fiscal year. The pricing model exposes a fundamental mismatch: enterprise procurement systems expect predictable monthly costs, but agents generate variable token consumption that can swing 50x based on activity.
This article examines the metering infrastructure, observability requirements, and billing boundaries that emerge when you move from $20/seat to $1,000+/month per heavy user.
The Pricing Pivot
Anthropic switched its Enterprise plan from “enough usage for a typical workday” to $20/seat/month plus API pricing sometime between August 2025 and April 2026. An Anthropic spokesperson claimed the change occurred in November 2025, but customers only discovered it when renewing contracts in April 2026.
OpenAI made a similar move on April 2, 2026, updating Codex pricing to align with API token usage instead of per-message pricing. By April 23, 2026, all existing ChatGPT Enterprise plans (including Edu, Health, Gov, and ChatGPT for Teachers) were on the new model.
Both companies released new frontier models in April with higher API prices, locking enterprise customers into these rates through year-long contracts.
| Model | Release Date | Price vs. Previous | Actual Token Cost | Enterprise Lock-In |
|---|---|---|---|---|
| GPT-5.5 | April 23, 2026 | 2x GPT-5.4 | ~$0.20/1K input tokens | Year-long contracts at new API rates |
| Opus 4.7 | April 16, 2026 | 1.4x Opus 4.6 (after tokenizer adjustment) | ~$0.15/1K input tokens | Year-long contracts at new API rates |
Enterprise customers who signed year-long deals are now locked into these API prices, not the previous extreme discounts.
The Token Consumption Gap
Simon Willison’s personal usage illustrates the gap between subscription pricing and API-equivalent costs. Over 30 days, he consumed:
- $1,199.79 in Anthropic Claude Code tokens
- $980.37 in OpenAI Codex tokens
- Total: $2,180.16 for $200 in subscriptions
That’s an 11x difference, and Willison describes himself as a “moderately heavy user” who is not running agents every hour of the day and night. Heavy users in enterprise environments can easily exceed $1,000/month per seat in API costs.
The assumption that enterprise customers were getting similar discounts turned out to be wrong. As of April 2026, the “Enterprise” cost for both OpenAI Codex and Anthropic Claude Code/Cowork matches the listed API price.
Metering and Billing Infrastructure
Moving from seat licenses to token metering requires new infrastructure at multiple layers.
Session Boundary Problems
Agents run across multiple sessions, often autonomously overnight. Traditional SaaS billing assumes a user logs in, performs actions, and logs out. Agents do not follow this pattern. Willison’s overnight agent runs demonstrate the problem: token consumption happens when no human is watching. You need:
- Session persistence that tracks token usage across disconnects and reconnects
- Attribution logic that ties agent activity back to the initiating user or team
- Timeout policies that prevent runaway agents from burning unlimited tokens
Real-Time Cost Visibility
When a $20/month seat can generate $1,000+ in API costs, observability becomes a procurement issue. Anthropic’s hybrid model charges $20/seat/month plus API costs, but Willison’s data shows heavy users hit $1,000+/month in API charges alone. Required components:
- Per-user token counters updated in near real-time
- Budget threshold alerts before costs spiral
- Historical usage dashboards that show trends and anomalies
- Cost attribution by project, team, or department
Billing System Integration
Enterprise procurement systems expect predictable monthly costs. Variable token consumption breaks this model. Anthropic’s $20/seat + API pricing structure creates a forecasting gap: the seat fee is predictable, but the API charges can swing 50x based on agent activity.
Rate Limiting and Quotas
To prevent budget overruns, you need rate limiting at multiple scopes:
# Server-side metering service that runs before agent task dispatch
# Note: In production, use atomic operations or database transactions
# to prevent race conditions on current_usage updates
class AgentTokenBudget:
def __init__(self, user_id, monthly_limit_usd):
self.user_id = user_id
self.monthly_limit = monthly_limit_usd
self.current_usage = self.get_current_month_usage()
def check_budget(self, estimated_tokens, cost_per_token):
estimated_cost = estimated_tokens * cost_per_token
projected_total = self.current_usage + estimated_cost
if projected_total > self.monthly_limit:
raise BudgetExceededError(
f"User {self.user_id} would exceed monthly limit: "
f"${projected_total:.2f} > ${self.monthly_limit:.2f}"
)
return True
def record_usage(self, actual_tokens, cost_per_token):
actual_cost = actual_tokens * cost_per_token
self.current_usage += actual_cost
self.emit_usage_metric(actual_cost)
This runs on the server-side metering service before dispatching agent tasks. It requires:
- Pre-flight budget checks before starting long-running agent tasks
- Incremental usage recording as tokens are consumed
- Graceful degradation when budgets are exhausted (queue tasks, notify user, or switch to cheaper models)
Observability Requirements
When token costs become material, you need observability that answers:
- Which agents are burning tokens? Specific agent workflows or tasks, not just which users.
- Why did usage spike? Was it a runaway loop, a legitimate batch job, or a model upgrade that changed token efficiency?
- Can we forecast next month? Historical patterns, seasonal trends, and growth rates.
Logging infrastructure must capture:
- Token counts per API call (input and output separately)
- Model version and pricing tier
- Agent task type and duration
- User attribution and project tags
- Retry attempts and error-induced token waste
To identify which agents are burning tokens, query your usage logs:
-- Schema: usage_logs (user_id, agent_type, tokens, timestamp, model_version, cost_usd)
SELECT agent_type, SUM(tokens) as total_tokens, COUNT(*) as runs
FROM usage_logs
WHERE user_id = 'user_12345'
AND timestamp >= NOW() - INTERVAL '30 days'
GROUP BY agent_type
ORDER BY total_tokens DESC
LIMIT 10;
This shows the top 10 agent types by token consumption for a specific user over the past 30 days.
The Inference Spend Floor
Anthropic’s SpaceX S-1 filing revealed they are paying $1.25 billion per month for compute capacity on Colossus and Colossus II through May 2029. This is for inference, not training, and represents just one of their vendors.
If Anthropic is spending $1.25 billion/month on inference from a single provider, the total inference budget across all vendors is likely several times higher. This suggests inference costs are a material portion of their revenue target.
Their rumored $10.9 billion Q2 2026 revenue suggests they are finding enough heavy users to cover these costs.
Enterprise Sales Headcount
Both companies are hiring aggressively for enterprise sales. OpenAI has 229 of 703 jobs (32.6%) in enterprise sales and support. Anthropic has 105 of 390 jobs (26.9%) in enterprise sales and support.
This is a labor-intensive business model. Enterprise contracts do not close themselves, and each customer requires account executives, forward-deployed engineers, and ongoing support.
Procurement Misalignment: Uber and Microsoft Case Studies
The Uber case is instructive. CTO Praveen Neppalli Naga indicated that Uber “maxed out its full year AI budget just a few months into 2026,” mostly due to Claude Code usage. Given that Claude Code only became genuinely useful in November 2025, a budget set in 2025 could not have predicted 2026 demand.
Microsoft’s seat cancellations before the June 30 fiscal year-end suggest similar budget pressure. The Verge reported that sources described the decision as “also a financial one,” not just a dogfooding strategy.
These are not AI failures. They are pricing model mismatches. The best advice on pricing is that your customer should suck air through their teeth and then say yes. Uber’s budget overrun and Microsoft’s seat cancellations look like that effect playing out.
Technical Verdict
Use token metering only if your agents run predictable workloads where you can forecast token consumption within 20% accuracy month-over-month and your finance team can absorb monthly cost swings of 10x to 50x without triggering procurement freezes. You also need real-time usage dashboards (Anthropic’s billing API or OpenAI’s usage dashboard), engineering capacity to implement pre-flight budget checks and graceful degradation, and a billing platform (Stripe Billing or similar) that handles variable monthly costs without manual intervention.
Avoid token metering if your agents run autonomously overnight without clear session boundaries or timeout policies, your procurement system requires fixed monthly costs for annual budget approval cycles (as Uber and Microsoft discovered), you lack observability infrastructure to track per-user token usage in real-time, or your workloads include unpredictable batch jobs or exploratory agent runs that can spike token usage by 100x in a single day. In these cases, negotiate hybrid caps with a fixed monthly ceiling or stick with seat licenses that bundle generous token allowances.
The April 2026 pricing pivot aligns revenue with the actual cost structure of inference at scale. Enterprise customers are now paying the real cost of running agents that burn tokens 24/7. The companies that can build metering, observability, and budget control infrastructure will succeed. The ones that cannot will hit budget limits and cancel seats.
Source Links
- I think Anthropic and OpenAI have found product-market fit - Simon Willison