Uber Shares What Happens When 1.500 AI Agents Hit Production

Hearing how Uber scaled to 1.500 AI agents made me realize just how quickly things can spiral when those agents start acting faster than humans can keep up.

At the MCP Dev Summit North America earlier this month, I was listening to Meghana Somasundara, (Agentic AI Lead, Uber), and Rush Tehrani (Senior Engineering Manager leading the Agentic AI Platform, Uber) talk about what they’re building.

By their account, more than 90% of Uber’s 5.000+ engineers already use AI monthly for agentic workflows. They also have over 1.500 monthly active agents internally, running more than 60.000 executions per week.

What stood out to me was Meghana’s framing of the real risk: not deliberate misuse, but an agent causing serious damage by accident, faster than any human could react:

It takes us humans a lot more effort to break things. But with agents, it’s a lot faster, a lot quicker, and the blast radius is a lot higher.

What problems did Uber face when scaling AI?

Meghana and Rush’s talk focused on three problems that nearly made those numbers impossible to reach. The first was the lack of a shared way of building.

When agent adoption spreads organically across a large engineering organization, teams tend to build independently. At Uber Technologies, with over 10.000 internal services, that meant dozens of teams were building MCP servers and custom integrations on their own, without shared standards, central oversight, and any real way to reuse what others had already built.

The result was predictable: duplicated work, and a growing stack of systems that only the original team really understood, as Meghana Somasundara explains:

The simple truth was, if you can’t manage the development lifecycle, you just can’t trust it in production.

When agents start making decisions across systems, inconsistent implementations stop being a minor issue but they become harder to track, debug and even harder to trust.

Photo: Agentic AI Foundation (Flickr) – Meghana Somasundara (Agentic AI Lead, Uber) and Rush Tehrani (Senior Engineering Manager, Uber)

The second problem included security. Agents operating across a complex service landscape could unknowingly call endpoints they shouldn’t, expose sensitive data, or trigger operations nobody intended. Add third-party MCP servers into the mix (Uber uses many external systems) and the governance problem scales quickly.

They needed full visibility into call patterns: who was accessing what data, under what conditions, and what happened when things went wrong. Without that, running agents in production at scale becomes a trust problem.

Finding the right tool quickly became the third problem, Rush asked himself:

How does an agent or the engineer building it actually find the right one?

Not just any MCP server, but one that’s reliable, performs well, and doesn’t quietly degrade everything built on top of it.

When discovery is left unmanaged, agents default to whatever is most visible rather than what actually works best. At smaller scale, that’s an annoyance, but across thousands of services, it becomes a systemic quality problem.

How Uber addressed these challenges

Uber’s answer to all three problems was a centralized MCP gateway and registry.

Meghana describes it as a central control plane that turns Uber’s endpoints into MCP tools, with service owners deciding what gets exposed and how it’s defined.

Every change flows through pull requests, passes security scans before deployment, and is continuously monitored in production, while a central registry (acting as the single source of truth) removes duplication and enforces tighter scrutiny on third-party MCPs.

In their no-code Agent Builder, as Rush explained, engineers can pre-select specific tools from an MCP server so the model doesn’t have to decide which one to use, and they can also lock down parameters so the agent doesn’t have to infer them at runtime, ultimately reducing the number of decisions and things that can go wrong.

Getting the infrastructure right shows up in adoption: their coding agent Minions generates about 1.800 code changes weekly and is used by 95% of Uber engineers, but that’s the output, not the real lesson.

On the roadmap are evaluation metrics in the registry to help teams spot reliable servers before committing, and “skills”, reusable MCP patterns with built-in A/B testing that bake evaluation into how knowledge is shared.

Does any of this apply if you’re not Uber?

Uber operates at a scale most engineering teams never see (10.000+ services in play) but while the complexity is extreme, the underlying failure patterns Meghana and Rush describe aren’t unique to them.

Teams often end up building the same integrations in parallel, with governance only becoming a priority after something breaks, and discovery treated as an afterthought. These problems appear well before reaching 1.500 agents – once multiple teams start using the same MCP infrastructure without a shared layer.

The Uber model won’t translate directly to smaller organisations. But if you’re already running MCP servers across more than two teams and nobody owns discoverability or access control yet, that gap could surface soon.