Uber capped Claude Code usage after blowing four months of AI budget — here's what that means for enterprise rollout
Uber burned through its 2026 AI budget in four months and capped Claude Code access. The story isn't about Uber's failure — it's about what happens when you budget for 2025 usage patterns and ship 2026 agents.
Uber burned through four months of its 2026 AI budget and capped access to Claude Code and similar tools. Bloomberg reported it June 2nd. The narrative framing was predictable: wasteful engineers, runaway costs, AI hype colliding with CFO reality.
That framing misses the actual story.
Uber set that budget in late 2025, before Claude Mythos shipped, before OpenAI o5 went multi-turn, before any finance team could model what coding agents would actually cost at scale. They budgeted for 2025 usage patterns — occasional Copilot autocomplete, some manual Claude conversations, maybe a few Cursor sessions per engineer per week. Then 2026 agents shipped and engineers started running multi-hour autonomous coding sessions that burn 500K tokens on a single PR review.
The budget didn't fail. The prediction surface changed underneath it.
What the cap actually looks like
Uber didn't kill the tools. They capped daily usage per engineer and added a request process for extended access. If you need more tokens, you file a ticket explaining the use case. That's not irrational — it's the same pattern we use when a client's voice agent starts processing 10× expected call volume and we need to raise the OpenAI rate limit. You don't shut it down; you add a control layer and re-forecast.
The interesting part is what happens next. Uber now has four months of real usage data across thousands of engineers. They know which teams are getting ROI from agentic coding and which teams are burning tokens on low-value tasks. They know the unit economics of Claude Code for refactoring versus greenfield work. That data didn't exist in December 2025 when they set the budget.
Every enterprise is going to hit this. We saw it with Bergsify's Jordan agent in March — their call center operators started using the voice transcription for quality assurance, then for training new hires, then for compliance audits. Token usage tripled in six weeks. We didn't cap it; we moved them to a tiered pricing structure where the first 200 calls/day are flat-rate and overage is metered. Same control pattern Uber's applying now.
The real lesson for SMBs
If you're rolling out coding agents or voice agents or any agentic tooling in 2026, budget with a usage collar. Set a baseline forecast, add a 3× buffer for the first 90 days, and bake in a monthly re-forecast checkpoint. Don't try to predict February usage in November — you can't.
We do this on every voice agent deployment now. Golden Plate Events wanted Jordan handling event RSVPs. We scoped it at 150 calls/month based on their historical volume. By month two they were at 380 calls/month because customers started calling to modify orders mid-event, a use case that didn't exist before the agent made it frictionless. We didn't blow the budget because we'd structured it with overage tiers from day one.
Uber's doing the same thing now, just at 100× scale. The cap isn't a failure signal — it's a re-forecast checkpoint. They'll run usage analysis for 60 days, model the ROI per team, then either raise the budget or tighten the scope. That's what production rollout looks like when you're managing tokens as a line item.
The alternative is what happened to a fintech client we talked to in April. They gave every engineer unrestricted Claude Mythos access, no usage tracking, no cost allocation per team. Three months later they got a $340K bill and killed the whole program. Uber's cap is the sane version of that response.
What we're doing differently
For VioX OS deployments, we now include token budget modeling in the fixed-price scope. If you're deploying a voice agent or a coding assistant or a research agent, we estimate token burn for the first 90 days based on the task profile, then structure pricing with a baseline + metered overage. That way neither side gets surprised when usage scales faster than the initial model predicted.
It's the same reason we moved Goldie's ElevenLabs deployment to a tiered pricing structure after the first month. DreamersJoy's customers started asking the agent detailed care questions about orchids and succulents, which turned 15-second order confirmations into 3-minute horticultural consultations. Token costs tripled. We didn't cap it — we re-scoped the pricing and built a knowledge base so the agent could answer plant care questions without burning tokens on every call.
Uber's doing this at enterprise scale with thousands of engineers. We're doing it at SMB scale with dozens of agents. The pattern is identical: real usage always differs from the forecast, so build the pricing structure to handle variance without killing the program.
The budget collision was inevitable. The cap is the right response. Every company deploying agentic tools in 2026 will hit the same checkpoint. The ones that survive it are the ones that planned for re-forecasting from day one.