MCP Co-Creator Explains Why MCP Needs More Than the Protocol to Scale

If you haven’t heard of MCP (Model Context Protocol), it’s a standardized way for AI models like GPTs and agents to connect to real-world tools – such as APIs, databases, and internal systems – without having to build custom integrations from scratch.
At the MCP Dev Summit North America, I heard from its co-creator David Soria Parra, Member of Technical Staff at Anthropic, that the question is no longer how to use MCP, but rather what breaks under load?
There are roughly 110 million SDK downloads per month across tools like OpenAI Agents, LangChain, and other frameworks, and they are all speaking the same protocol. For comparison, it took React years to reach that scale. MCP did all that in under a year and a half.
That demand explains the speed of adoption. Teams weren’t looking for another framework, they needed a practical way to connect models to real systems without rebuilding the same integrations over and over.
MCP gave them a shared interface for that. But standardizing the interface doesn’t remove the underlying complexity.
Don’t blame MCP – blame the implementation
Once you stop focusing on how to connect AI to tools, you’re dealing with everything that comes after: context management, tool selection, authentication, latency, and how it all behaves beyond a demo, when it’s actually running in production.
The first thing that usually breaks is context. In most MCP setups, the simplest approach wins: expose a set of tools, pass them to the model, and let it decide. That works in demos, but in production it quickly becomes inefficient. David called this out:
People continuously complain about context bloat in MCP and end up blaming MCP for it. But the interesting part is that we already know the mechanisms to work around context bloat. This is called progressive discovery.
So, according to David, the issue isn’t the protocol itself, but the way it’s implemented:
Tools come with metadata: descriptions, parameters, schemas. Across dozens of integrations, a significant portion of the context window is consumed before the model does any actual reasoning. In some cases, just listing available tools can take more than 20% of the context window.

What makes it worse is how models behave under pressure. With too many tools, selection gets unreliable – overlapping capabilities and weak signals lead to irrelevant or suboptimal tool calls.
More context doesn’t help, it just makes things less reliable. David explained how to “fix” this:
The idea behind progressive discovery is not to take all the 20, 50, 100 tools from an MCP server and naively dump them into the context window, but to use a more modern mechanism like tool search to load tools only when they’re needed.
Progressive discovery changes the model from a static consumer of tools into something closer to a query-driven system. Instead of being aware of everything upfront, it retrieves what it needs at the moment it needs it, similar to how search or retrieval systems work.
MCP supports that pattern, but it doesn’t enforce it. And that’s the gap most teams run into: the protocol scales, but naive implementations don’t.
What about infrastructure?
Infrastructure is what most teams underestimate. MCP solves the connection, not how it behaves in a real system. Protocols define interfaces, they don’t handle reliability, scaling, or operations. And that’s exactly what breaks as usage grows.
This shows up first in transport. Streaming HTTP works in controlled setups, but under load it gets hard to manage – connection state, coordination, throughput all start to matter. That’s why things are moving toward stateless communication. Until then, teams have to build around it.
The deeper issue is that MCP doesn’t define how systems should behave once a connection is established. It doesn’t define:
- How calls are retried.
- How failures are handled.
- How systems recover when something breaks mid-execution.
And in production, those are the problems that matter. David points to this indirectly when talking about where MCP is heading:
In my mind, 2025 was about figuring out whether something like MCP is needed in the ecosystemm, and the answer is a resounding yes. But 2026 will be about making sure it’s ready to help people productionize agentic systems.

The focus has shifted from whether MCP works to what’s needed around it in real-world conditions. That means things MCP doesn’t include: retries, observability, backpressure, coordination between agents hitting the same services.
Without that, MCP behaves fine on its own, but the system around it doesn’t. And that’s what you see in most early setups.
How Duolingo and Uber use MCP
If there’s one takeaway from the MCP Dev Summit in New York, it’s that I didn’t see anyone using MCP on its own.
Across talks from teams like Duolingo and Uber, the pattern was pretty consistent: MCP solves the integration layer, but everything around it still has to be engineered, constrained, and operated.

At Duolingo, the first issue wasn’t scale or performance, it was adoption. Setting up MCP servers meant manual config, credentials, and environment-specific setup, and the barrier to entry was high enough that few engineers bothered.
Instead of rethinking MCP, they focused on reducing friction. They built a central interface to discover and configure MCP servers – an internal “app store.” Behind the scenes, they standardized hosting, added shared auth, and built tooling to turn services into MCP-compatible endpoints without starting from scratch.
With thousands of engineers, services, and agents interacting through MCP, the lack of structure quickly became a risk, as Meghana Somasundara (Agentic AI Lead) and Rush Tehrani (Senior Engineering Manager) at Uber shared:
Without a central framework or guidance, everybody was trying to solve the same problems in silos.
Their solution was a control layer on top of MCP – a central gateway for all interactions, backed by a registry defining what tools exist, how they’re described, and who can access them. Tool definitions are auto-generated from existing services, but still reviewed, scanned, and governed before exposure.

In the end, teams reduce how much decision-making is left to the model. Instead of letting it freely choose between tools, they scope what’s available and pre-set key parameters to reduce ambiguity and improve reliability, especially where mistakes matter.
As Meghana and Rush pointed out, “these things can hallucinate and maybe not pick the right tool.”
So in production, it’s not about trusting the model to make the right choices – it’s about reducing the number of choices it has to make. MCP defines the interface, but reliability comes from everything built around it.


