Event Archives - ShiftMag

When systems go down, devs still juggle 10 tabs. PagerDuty says MCP fixes that

Ivan Pelivanovic — Fri, 22 May 2026 14:13:25 +0000

Production incidents are a context problem. By the time an engineers understand what’s happening, they’ve already bounced across several different tools – and the incident is still ongoing. PagerDuty thinks MCP is the fix.

When incidents hit production systems, engineers rarely stay inside one tool for long, jumping from logs to dashboards to runbooks, trying to reconstruct what is actually happening.

Talking to other builders, it seemed like almost everybody faces this context-switching problem.

Rocío Bayon (Product Manager) and Sebastian Villanelo (Sr. Forward Deployed Engineer) from PagerDuty think MCP is how you fix it.

PagerDuty built their MCP to cut context switching

Rocío explained that their MCP is solving the issue of context switching:

When an incident hits, the engineer has to go between 5 to 10 different tools to understand what’s happening.

That’s the real problem they’re trying to solve.

PagerDuty’s framing of MCP was interesting: neither Rocío nor Sebastian described MCP as just another integration layer. They framed it as connective tissue that gathers logs, alerts, runbooks, and incident context into a single workflow.

What the MCP does, it brings all that context into one platform where engineers are usually already working.

Most engineering organizations already have enormous amounts of observability data. The real problem is that it is scattered across systems, and engineers end up reconstructing operational context manually during incidents.

Retrieve what you need, nothing more

Sebastian framed the problem as signal retrieval. Rather than feeding the model more information, the goal is pulling the relevant operational state around a specific incident.

If you have the right parameters or the queries and all this stuff, you will retrieve the exact information that you need.

That means narrowing context around the actual incident window. When an incident hits, it retrieves information around that time only, Sebastian explained.

That also changes how they think about efficiency, reducing context switching directly affects operational speed, token usage, and cost.

You will see that information only with one call. And that saves a lot of tokens and time. That’s money and time.

Photo: Lea Lobor

AI helps but engineers still decide

Still, both of them were careful not to frame AI as autonomous incident management.

Rocío repeatedly emphasized that MCP and AI systems are primarily helping with context gathering and operational visibility, while engineers remain responsible for the high-risk decisions:

The AI is helping you, but the engineer is the one who is assessing and making decisions where there’s a high risk.

That human layer is intentional. PagerDuty’s broader vision seems less about replacing on-call engineers and more about reducing the operational overhead surrounding incidents. Their MCP systems help gather information, surface relationships between systems, and accelerate investigation workflows, but humans still decide what actually happens next.

Rocío also mentioned that their SRE agent is designed to support larger incident workflows beyond information retrieval:

It can also help you trigger those incident workflows. So it can help you resolve the incident. And it learns as it goes.

“MCP – the connective tissue between tools”

I asked Rocío and Sebastian, how does MCP fit into the tools they already use without becoming just another silo.

And both of them clearly framed MCP as anti-silo infrastructure since it brings everything to one place. Rocío called MCP “the connective tissue between all these different tools.”

That framing probably captures the broader architectural challenge better than anything else in the interview.

Modern incident response already spans dozens of systems: observability platforms, deployment pipelines, CI/CD tooling, ticketing systems, infrastructure management, and communication layers.

AI systems inherit that fragmentation unless something explicitly connects operational state.

Engineers trust systems that behave predictably

Sebastian mentioned that teams often react very differently to MCP systems. Some embrace them immediately while others remain skeptical, especially around security and predictability. For him, trust improves once systems consistently produce expected outcomes:

When a person or a teammate says “ah, I’m retrieving what I’m expecting to retrieve”, that will help them to trust it.

A lot of AI tooling discussions still focus on model capability, reasoning quality, or benchmark performance. But operational systems are usually adopted much more pragmatically. Engineers trust systems that behave predictably, retrieve the right operational context, and fit into workflows they already rely on.

The post When systems go down, devs still juggle 10 tabs. PagerDuty says MCP fixes that appeared first on ShiftMag.

Teaching AI Agents to Test 1,000 Java Libraries – and Letting Them Run While You Sleep

Marin Pavelić — Tue, 19 May 2026 18:39:50 +0000

When humans maintained the GraalVM native image reflection metadata repository, coverage sat at just 14%. Tests were often stubs that technically compiled but covered nothing meaningful, nobody wanted to write them for someone else’s code, and the results showed.

At Devoxx UK, Vojin Jovanovic (Principal Researcher, Oracle Labs) and Mihailo Markovic (Software Engineer, Oracle), presented how they replaced that process with an autonomous AI agent pipeline.

The result is 90% dynamic access coverage across more than 1,000 JVM libraries, roughly 2 billion tokens spent, and a GitHub repository generating thousands of commits per week – while Vojin was at a hotel the night before the conference.

The problem with GraalVM reflection

GraalVM Native Image takes a Java application, performs static analysis, and AOT compiles it into a single binary. The benefits are significant: startup roughly 10x faster than a standard JVM, dramatically lower memory footprint.

But static analysis has a fundamental limitation: when a method calls Class.forName(“Foo”) with a dynamic argument, the analyser cannot determine at compile time what class will be needed. Reflection calls break the closed-world assumption.

The solution is reachability metadata – a JSON file that tells the native image compiler which classes, methods, and fields need to be accessible at runtime. Writing this metadata requires running tests that exercise all the relevant code paths.

For a library like Hibernate Core, that means covering 264 individual reflection call sites. For Tomcat, 205. Across the JVM ecosystem, the number is enormous, and until recently, it was almost entirely a manual process that humans were not doing well.

Start simple, then add feedback

The first approach was straightforward: give an LLM the library source code, tell it to generate comprehensive Java tests, collect the metadata via a JVMTI agent.

The results were not impressive – 5.7% coverage for logback, 2.9% for H2. Vojin noted how this doesn’t feel like AGI.

The shift came from adding GraalVM’s static analysis directly to the agent’s context. Instead of asking the LLM to guess which code paths matter, the pipeline runs a static analysis pass that identifies every dynamic access call site (the exact class, method, and line number) and feeds that report directly to the agent. With this addition, logback coverage jumped to 97%, H2 to 84.3%, in five iterations.

The next layer was JaCoCo integration. After each generation round, the pipeline correlates coverage data with the remaining uncovered call sites and feeds only the uncovered ones back into the next iteration. The agent knows exactly what it hit and what it missed. Vojin noted:

We always create a checkpoint in those systems so we can go back to it if something goes wrong. And in these LLM-driven workflows, something is always going wrong.

With this feedback loop: logback reached 100%, H2 reached 96.1%.

Coverage sometimes still isn’t enough

For larger, more complex libraries (Guava, Tomcat, MongoDB) even the feedback loop left gaps. The team added a third technique: PGO (Profile-Guided Optimization) profiling from GraalVM’s Graal compiler. The profiler samples execution and produces a call trace, which can be correlated with static analysis to identify exactly where a test nearly reached a reflection call but diverged.

The profiling feedback tells the agent not just what’s uncovered, but where in the call stack a test went in the wrong direction and what it would need to do differently. Results: Guava went from 50% to 72%, Tomcat from 45% to 83%, MongoDB reached 100%.

The feedback also tells the agent (and the engineers) why certain calls cannot be covered: a security service only available on Java 6, a cleaner class incompatible with the current JVM. “If you cannot reach it, tell us why,” the prompt instructs, and the agent does.

Photo: DevoxxUK / Flickr

Cost, agents and model selection

Codex was the first agent framework the team tried. For logback (a library with 33 dynamic access calls) Codex spent $35:

If we’re spending $35 per library for a thousand libraries, we’re not replacing humans.

The alternative was P, a minimal agent that starts with a 200-token context describing basic file operations and bash execution. Same results, roughly 10x cheaper and the lesson is straightforward:

Simple task, use a simple agent. You already give it a lot of rules, a lot of context, and you’ve grounded it enough so it can perform on the level of these big agents.

On model selection, the team compared GPT 5.5 against several open-source alternatives – GLM, Kimi K2, DeepSeek, Gemma. GPT 5.5 consistently outperformed them on coverage. The counterintuitive finding was this: a more expensive model that makes the right decision in one shot can cost less overall than a cheaper model that wastes tokens going in the wrong direction.

The architecture that lets it run without you

The pipeline now operates as a third-generation system. When a user opens an issue requesting a library, the agent fetches the issue, runs the generation workflow, verifies the output, creates a pull request, reviews it, and merges or escalates to human review – automatically. The “human intervention” label on GitHub still exists, but its queue has shrunk dramatically.

Documentation, not smarter prompting, was what made the difference.

Vojin outlined what he calls the key context layers:

raison d’être (why does this project exist, in two sentences),
state of direction (where the architecture stands today),
functional specification (how the system behaves),
architectural specification (how it is built),
decision records (what major choices were made and why), and
comprehensive logs that serve as checkpoints for recovery.

When you do all of these things, it takes almost a few days for a very big project. You will reduce your work by 50%, 60%, 70%.

The payoff is that agents with this context can diagnose failures, trace them through logs, and fix the underlying system, not just the immediate problem.

The RAID system (an automated issue-resolution agent) was built in four prompts on a Sunday morning. It sweeps human intervention tickets, classifies them, performs deep analysis using the project logs, and either opens a GitHub issue for humans or attempts a fix in a forked branch with review. Jovanovic added:

Never work on the problem, always work on the system. You never go and fix a ticket. You always go fix the rules.

Where things stand

The repository currently supports 1,021 libraries. Without five large Hibernate libraries that predate the automated pipeline, dynamic access coverage across the ecosystem is 90%.

The GitHub repository has accumulated roughly 2,977 branches. In the week before Devoxx, it logged approximately 8,000-9,000 commits, with agents committing every few minutes around the clock.

Total cost for the project: approximately $1,700 in API tokens, plus personal compute on Jovanovic’s home desktop, running around the clock because the Oracle compliance process for cloud infrastructure takes time. The key point is simple:

Start with neural, simplest thing, get results, and then slowly chop off things and put them into algorithms, because they are much cheaper and faster.

Photo: DevoxxUK / Flickr

We caught Vojin Jovanovic for a few more questions!

After the talk, we sat down with Vojin for a few minutes to ask him a couple more questions.

You tested over 1,000 libraries. What broke first when you tried to scale?

Vojin: Basically everything broke. We had mostly infrastructure issues, all kinds of GitHub failures. When you build a system at this scale, you need to assume that everything will fail and needs to recover. We broke GitHub rate limits. My machine was broken because it was running so many things. The key takeaway is that you need to build a system in a way that you can always continue. When things fail, you always checkpoint and continue from a checkpoint. We do work in sizable chunks, and when something fails, you just restart the chunk.

Is just asking the LLM enough?

Vojin: If you had asked me four weeks ago, I would say no. Now I would say you need to know how to ask it, and it will be enough. I was like, “GitHub is failing with a 504, abstract away all GitHub calls and retry.” It did it in two minutes. With today’s models, it’s a matter of minutes, not hours.

What did you learn about the trade-off between cost, speed, and coverage?

Vojin: I haven’t seen a situation when doing something with an LLM is more expensive than doing that by a human typing on the keyboard. Build a system that uses the most efficient LLM for the job — you’re going to get far and not cost much money at all.

When does using multiple agents make sense?

Vojin: Where I use it is for decisions and research. I use Claude Opus 4.7, Gemini 3.1, and GPT 5.5. I ask them all, let them discuss, and I discuss together with them. Each brings something to the table. Before, it was always Claude who was the smartest. Now GPT 5.5 is second and close to the first. Things are changing. The most important bit is getting the system designed right. Once you do that, coding, I don’t care who does it.

The post Teaching AI Agents to Test 1,000 Java Libraries – and Letting Them Run While You Sleep appeared first on ShiftMag.

Uber Shares What Happens When 1.500 AI Agents Hit Production

Ivan Pelivanovic — Mon, 04 May 2026 14:19:55 +0000

At the MCP Dev Summit North America earlier this month, I was listening to Meghana Somasundara, (Agentic AI Lead, Uber), and Rush Tehrani (Senior Engineering Manager leading the Agentic AI Platform, Uber) talk about what they’re building.

By their account, more than 90% of Uber’s 5.000+ engineers already use AI monthly for agentic workflows. They also have over 1.500 monthly active agents internally, running more than 60.000 executions per week.

What stood out to me was Meghana’s framing of the real risk: not deliberate misuse, but an agent causing serious damage by accident, faster than any human could react:

It takes us humans a lot more effort to break things. But with agents, it’s a lot faster, a lot quicker, and the blast radius is a lot higher.

What problems did Uber face when scaling AI?

Meghana and Rush’s talk focused on three problems that nearly made those numbers impossible to reach. The first was the lack of a shared way of building.

When agent adoption spreads organically across a large engineering organization, teams tend to build independently. At Uber Technologies, with over 10.000 internal services, that meant dozens of teams were building MCP servers and custom integrations on their own, without shared standards, central oversight, and any real way to reuse what others had already built.

The result was predictable: duplicated work, and a growing stack of systems that only the original team really understood, as Meghana Somasundara explains:

The simple truth was, if you can’t manage the development lifecycle, you just can’t trust it in production.

When agents start making decisions across systems, inconsistent implementations stop being a minor issue but they become harder to track, debug and even harder to trust.

Photo: Agentic AI Foundation (Flickr) – Meghana Somasundara (Agentic AI Lead, Uber) and Rush Tehrani (Senior Engineering Manager, Uber)

The second problem included security. Agents operating across a complex service landscape could unknowingly call endpoints they shouldn’t, expose sensitive data, or trigger operations nobody intended. Add third-party MCP servers into the mix (Uber uses many external systems) and the governance problem scales quickly.

They needed full visibility into call patterns: who was accessing what data, under what conditions, and what happened when things went wrong. Without that, running agents in production at scale becomes a trust problem.

Finding the right tool quickly became the third problem, Rush asked himself:

How does an agent or the engineer building it actually find the right one?

Not just any MCP server, but one that’s reliable, performs well, and doesn’t quietly degrade everything built on top of it.

When discovery is left unmanaged, agents default to whatever is most visible rather than what actually works best. At smaller scale, that’s an annoyance, but across thousands of services, it becomes a systemic quality problem.

How Uber addressed these challenges

Uber’s answer to all three problems was a centralized MCP gateway and registry.

Meghana describes it as a central control plane that turns Uber’s endpoints into MCP tools, with service owners deciding what gets exposed and how it’s defined.

Every change flows through pull requests, passes security scans before deployment, and is continuously monitored in production, while a central registry (acting as the single source of truth) removes duplication and enforces tighter scrutiny on third-party MCPs.

Photo: Agentic AI Foundation (Flickr)

In their no-code Agent Builder, as Rush explained, engineers can pre-select specific tools from an MCP server so the model doesn’t have to decide which one to use, and they can also lock down parameters so the agent doesn’t have to infer them at runtime, ultimately reducing the number of decisions and things that can go wrong.

Getting the infrastructure right shows up in adoption: their coding agent Minions generates about 1.800 code changes weekly and is used by 95% of Uber engineers, but that’s the output, not the real lesson.

On the roadmap are evaluation metrics in the registry to help teams spot reliable servers before committing, and “skills”, reusable MCP patterns with built-in A/B testing that bake evaluation into how knowledge is shared.

Does any of this apply if you’re not Uber?

Uber operates at a scale most engineering teams never see (10.000+ services in play) but while the complexity is extreme, the underlying failure patterns Meghana and Rush describe aren’t unique to them.

Teams often end up building the same integrations in parallel, with governance only becoming a priority after something breaks, and discovery treated as an afterthought. These problems appear well before reaching 1.500 agents – once multiple teams start using the same MCP infrastructure without a shared layer.

The Uber model won’t translate directly to smaller organisations. But if you’re already running MCP servers across more than two teams and nobody owns discoverability or access control yet, that gap could surface soon.

The post Uber Shares What Happens When 1.500 AI Agents Hit Production appeared first on ShiftMag.

Tech Conferences Aren’t Dead. But Why We Go Is Changing.

Ivan Pelivanovic — Tue, 31 Mar 2026 14:01:02 +0000

Why would you, as a developer, fly halfway around the world to hear something you could Google in minutes?

“Because there’s more to it than just getting plain information,” says Mark Hazell, organiser of Devoxx UK and co-founder of Voxxed.

Some things just can’t be replicated online

Conferences feel like one of the few places where simply showing up still counts. In a way, they’re a throwback, a reminder that not all value happens behind a screen.

And that’s precisely what makes them stand out: remote work offers undeniable flexibility, but it often fragments our attention. It’s hard to find real focus, especially if you’re trying to keep a healthy work-life balance. At a conference, that changes, as Mark points out.

Simply not being distracted by incoming mail or slack messages is worth its weight in gold in terms of the knowledge you take away.

Photo: DevoxxUK / Flickr

The person next to you might be facing the same problem, or they might have already solved it. That kind of closeness makes learning immediate, practical, and way faster than online.

Many people tell me they watch a session on-demand from Devoxx UK and wish they could be in the room so they can chat with others who are facing similar challenges or are even further along in finding solutions.

But conferences are expensive…

Let’s face it: conferences aren’t cheap. Between tickets, flights, and hotels, the costs add up fast. And with companies tightening budgets and cutting back on travel, that expense really matters. If you don’t get real value in return, it can quickly feel like a waste of both time and money.

Mark doesn’t deny it. Instead, he reframes the question: if you take your team to the right conference, you’ll see a strong return.

The keyword here is well-chosen:

I do think it’s key to research up front and find the conference that accelerates learning and problem solving in ways truly relevant to those attending. That way, instead of weeks of trial and error, your team can spend a day or two at the conference and return with practical techniques, ideas, and tooling suggestions that boost productivity and quality.

Picking the right conference is all about fit. How long will your team be out? Is the ticket worth it? Will they meet people facing similar challenges? That’s where the real value is, says Mark. Plan ahead, and early bird tickets, flights, and hotels cost a lot less than last-minute bookings.

Photo: DevoxxUK / Flickr

Big stages or small communities?

It might seem that large flagship conferences have the upper hand with bigger budgets, bigger names, and more production. And in some cases, that’s true, Mark admits: “If a conference is run by a large company with deep pockets, it can be more financially resilient.”

But that’s not the model Devoxx relies on, its strength comes from the community: they rely on a big team who volunteer their time and help them pull together all of the content, shape how the event looks and feels, and execute it on the ground.

In fact, many of today’s most respected conferences began as small, grassroots initiatives, including Devoxx itself, which grew from the London Java Community.

And for Mark, the real distinction isn’t size – it’s about quality and intent:

Whatever the size of the event, the content has to stay balanced and neutral. Without that, scale doesn’t mean much.

When people feel welcome, real connections follow

Modern conferences sit at the intersection of learning, hiring, and business. Sponsorships and recruitment are part of the reality, especially in expensive cities like London. But Mark doesn’t see it as a trade-off between developers and companies:

I prefer the notion of weaving strands together to create a fabric that everyone is part of.

That means creating an environment where attendees benefit from sponsors being present and sponsors benefit from genuine interaction with the community.

Photo: DevoxxUK / Flickr

That same philosophy extends to how Devoxx grows by creating real opportunities for first-time speakers, helping them gain experience and build confidence. Many return to mentor the next group, creating a self-sustaining cycle that supports the broader developer community.

When there’s no barrier, people talk more freely, ask more questions, and connect naturally, Mark says.

Our philosophy is to create an environment where everyone is equal (sorry speakers, that means no private room out back to go hang out in), everyone is welcome and everyone is respected. This is noticeable and means the event has this really special, open vibe to it.

As Mark puts it, when people feel welcome and respected, they talk, share, and enjoy themselves, and meaningful connections naturally follow. “Sure, we do stuff like hosting evening socials, a party, a pub quiz,” he says, “but it’s really the collective buy-in from everyone to welcome and respect each other that makes all the difference.”

ShiftMag is recognized as a friend of the Devoxx UK conference.

The post Tech Conferences Aren’t Dead. But Why We Go Is Changing. appeared first on ShiftMag.

CTOs Face Pressure to Deliver AI Gains, but Productivity Isn’t There Yet

Nikolina Oršulić — Wed, 18 Mar 2026 15:45:53 +0000

How are CTOs feeling about AI?

According to Andy Skipper, founder of CTO Craft, they’re experiencing fear, uncertainty, and doubt.

And if the technical leaders of companies are feeling that way, what can the rest of us expect? Certainly, we dream of productivity boosts and an AI El Dorado – but that’s not the reality.

That’s why we sat down with Skipper to talk about how CTOs should manage expectations for AI, and how to navigate the hype versus reality.

Stakeholders and investors are watching CTOs closely, and the pressure is rising

Many CTOs, Skipper notes, are navigating intense pressure from non-technical stakeholders and investors alike, especially with the massive resources being invested in AI and LLM technologies.

He’s a bit careful about this:

AI is not going to reduce costs or increase productivity in the way some non-technical people think just yet. It’s getting there, but it’s not there yet.

At the same time, Skipper points out a surprising upside: AI is giving engineering leaders a chance to reconnect with the code and architecture without writing all the code themselves:

One of the things you have to accept as an engineering leader is that you are going to get further away from the code the more senior you become. AI gives people an opportunity to get back to architecture and development work, even if they aren’t coding themselves.

CTO role can be isolating

When Skipper became a CTO for the first time, he quickly realized just how isolating the role could be. There was nowhere for tech leaders to share challenges, get support, or navigate the non-technical side of the job.

That gap inspired him to start CTO Craft, now a community helping senior engineering leaders navigate team dynamics, strategy, and AI.

When I was a CTO for the first time, I didn’t have somebody who I could talk to about the issues I was seeing or compare notes with people who had similar challenges. That’s what CTO Craft is all about – helping people understand where the challenges come from and understand they’re not alone in having those challenges.

As a coach and mentor, Andy works closely with CTOs around the world, helping them deal with issues like burnout, communication with nontechnical stakeholders, and, lately, how to adapt in the AI era.

The most common CTO mistake? Always chasing the newest technologies

Many first-time CTOs struggle with burnout, overextending themselves to shield teams from stress, and balancing hands-on coding with high-level responsibilities. He explains:

A lot of the people that I work with directly are suffering from burnout. First time CTOs commonly miss out self-preservation. And usually that’s a combination of too much expectation of their own energy levels, their own abilities, backlogs…

And after overextending themselves, first-time CTOs often make another common mistake: chasing the newest technologies. While adopting the latest tools and frameworks can seem exciting, Skipper warns that it’s not always the best choice for fast-moving teams trying to scale.

“Using bleeding-edge tech can slow you down, make systems harder to maintain, and even complicate hiring because the talent pool for newer technologies might be limited,” he explains.

As a coach, Skipper says these are just some of the recurring challenges he sees among engineering leaders, alongside a range of other operational and people-related issues.

Engineering skills alone won’t make you a CTO

For aspiring engineering leaders, Skipper highlights that growing into a successful CTO requires more than technical excellence: commercial understanding, communication, coaching, and vision-setting are just as crucial:

The difference between a good engineering manager and a great CTO is understanding how technology drives business success, while still inspiring and guiding your teams.

But technical and business skills are only part of the picture. Motivation and team management are equally critical. Skipper stresses that not everyone is motivated by the same things, and leaders need to understand individual drivers:

Having a vision in the first place is very important. But when it comes to actually bringing individuals along on the journey, they all need to be worked with differently. You can’t just set it and expect everyone to be motivated.

He also warns against a common mistake among CTOs: trying to shield their teams from the challenges of a pivot or rapid change. While the instinct is understandable, it often backfires and drains the leader’s emotional energy. Instead, transparency and realistic communication are key:

Being transparent, being realistic, measuring your words, not being super negative about everything, but still being realistic, I think all these things are really important.

The need for a support network, not another tech stack

Skipper believes resilience and peer support are crucial for engineering leaders navigating the complexity of the CTO role. Sharing experiences and learning from others can help leaders realize they’re not alone when facing difficult decisions.

Looking ahead, however, he admits that the pace of technological change makes it hard to predict what the role will look like in the future.

Five years from now, I honestly have no idea what the role of a CTO will look like. The way we build software is already changing rapidly, especially with AI. But the fundamentals like setting a vision, communicating it clearly, and connecting technology with business outcomes, will always remain essential.

For Skipper, that uncertainty makes peer support crucial: it helps leaders adapt, learn, and navigate a fast-changing profession.

Ultimately, he believes the most important skill for CTOs is the ability to keep learning and tackle challenges without going it alone.

*Infobip, the global communications API leader that launched ShiftMag, was an Event Partner at CTO Craft 2026.

The post CTOs Face Pressure to Deliver AI Gains, but Productivity Isn’t There Yet appeared first on ShiftMag.

OpenAI Shares How They’re Turning Engineers into AI Team Leads

Ivan Brezak Brkan — Mon, 02 Mar 2026 15:56:59 +0000

Six months ago, if someone had told me that engineers would start naming their AI agents and treating them like teammates, I probably would’ve rolled my eyes.

Honestly, even today, it still sounds a little… absurd.

That is, until I heard directly at the Pragmatic Summit in San Francisco that’s happening right now inside OpenAI.

Vijaye Raji and Thibaut Sottiaux from OpenAI say AI is shifting development from manual coding to guiding AI teams (setting goals and guardrails) while speeding up work and keeping core roles essential.

Close the laptop. Join the meeting. Come back to finished code.

Raji’s (CTO, Applications, OpenAI) been at OpenAI for only six months, and already he’s seen Codex go from just a tool, to an extension, to an agent… and now it actually feels like a teammate.

Inside OpenAI, they recently launched something called a Codex Box.

Basically, engineers can grab a dev box on the server, fire off prompts, and let the system run things in parallel while they just work from their laptop. Sounds amazing, right?

Photo by Ivan Brezak Brkan

Some engineers are using hundreds of billions of tokens per week across multiple agents – not for fun, but because that’s just how they build now. Raji said:

Software development inside OpenAI isn’t a single-threaded human loop anymore. It’s parallel. And that is going to become the new normal.

Designers and PMs are writing code. What’s going on?

Sottiaux (Engineering lead for Codex, OpenAI) described how the Codex team works today.

“It changes constantly. Almost week to week,” he said. “We look for bottlenecks, solve them, and then a new one pops up.”

At first, the slowest part was code generation, then it became code review, and now the friction often comes from understanding user needs faster – parsing feedback from Twitter, Reddit, and SDK experiments and turning that into product direction.

Speed up coding, and suddenly reviews become the bottleneck. Fix reviews, and CI/CD slows things down. That rhythm has become normal. Instead of debating every trade-off in design docs and discarding alternatives, teams try multiple implementations in parallel and focus on what actually works.

“Trying things is cheaper,” Sottiaux added. “So we try more things.”

And the rules? They’re blurring. Designers are shipping more code, PMs are writing and testing ideas, and it’s not that roles disappear – everyone’s capabilities are expanding.

Usually the problem is the prompt, not the system

What about long-running, autonomous tasks?

AI coding tools might seem like advanced autocomplete – type a few words, get a few lines back. Helpful, yes, but still reactive. Sottiaux challenged that:

Give the model a meaningful, well-defined objective, and it doesn’t just respond – it runs, for hours.

Inside OpenAI, the model runs on its own for hours, sometimes producing full reports. Engineers review the results, pick what works, and feed it back – this isn’t just suggestions anymore, it’s delegated execution.

There was also an unusually honest anecdote shared during the discussion: a researcher admitted that whenever he thought he was smarter than Codex, it turned out the problem was the prompt, not the system.

The bottleneck isn’t typing speed – it’s defining the goal clearly.

Photo by Ivan Brezak Brkan

AI tools accelerate work and ahape AI-native engineers

During weekly analytics reviews, teams don’t assign follow-ups, they just trigger Codex threads. “Twenty minutes later, the answers are ready before the meeting even ends,” one leader said.

In high-severity incidents, Codex gets effectively paged into calls to help figure out what went wrong and suggest the fastest recovery. “It’s like having small consultants working quietly in parallel,” they added.

So what does this mean for junior engineers?

OpenAI is hiring new grads and running a strong internship program, believing the next generation will be AI-native and comfortable with these tools from day one.

At the same time, strong foundations, guardrails, and code reviews remain essential. As they put it, “Foundations will never go out of fashion.”

Engineers will guide AI teams, speeding up code without touching every line

Vijaye has spent more than two decades in the industry. He has lived through the rise of developer tools, the shift to higher-level abstractions, the mobile wave, and the social platform era. In his view, none of those transitions felt quite like this one.

What makes the current moment different isn’t just what the technology can do, it’s how quickly it is evolving. The speed of change, he suggested, is on another level entirely.

And Sottiaux expects that pace to accelerate even further.

In the near term, I anticipate another order-of-magnitude jump in development speed, enabled by networks of agents collaborating toward large, shared goals. Instead of a single assistant responding to prompts, entire clusters could work together on complex builds.

As systems get more complex, engineers stop checking every line of code and start setting constraints, guardrails, and validating outputs. It’s less about manual control and more about guiding the system, and working through a single assistant that coordinates all the agents behind the scenes.

Whether this ends up being the smartest leap in the industry or a step we rushed into too quickly, only time will tell.

The post OpenAI Shares How They’re Turning Engineers into AI Team Leads appeared first on ShiftMag.

Chip Huyen: To Build or Not to Build – When AI Can Do It All?

Ivan Brezak Brkan — Fri, 27 Feb 2026 14:00:19 +0000

“If AI can replicate almost anything quickly and cheaply, what’s the point of building anything at all?” Chip Huyen asked at the start of her talk at the Pragmatic Summit.”

And that question carries weight because she isn’t a casual AI observer: she’s an ex-Netflix researcher, former NVIDIA core developer, and an author who explores AI engineering.

She told us a personal story: after building a product, someone recreated it with AI almost immediately.

That moment forced her to confront hard questions – if anything can be copied, where’s the moat, the incentive, or the point of the effort?

“I built a product – and someone copied it with AI”

After she built a product, someone emailed her a clone generated with AI. The message read: “I love what you’ve built. So I used AI to recreate exactly that. And here’s the link.”

She described her reaction bluntly: “I’m flattered. But also, why the f**k?“

That moment crystallized a new reality: if replication requires minimal effort, traditional defensibility weakens. Technical execution no longer guarantees leverage. She framed the shift clearly:

If you can describe a software, then AI can build it for you. The constraint moves upstream. The critical question no longer asks how to build, but what to build.

The real advantage comes from context

But Chip pushed back against the idea that AI erases all opportunities.

Common problems are quickly handled by AI, but challenges with nuance and context remain – and those are where real value lies.

She illustrated this with chatbots: U.S. users expect instant replies, while in parts of Asia, waiting signals respect. These nuances matter. As AI handles common solutions, advantage goes to those who master context (cultural, behavioral, or domain-specific) where generic automation fails.

Chip spoke at this year’s Pragmatic Summit in San Francisco.

Engineering culture is changing

Workflows built around humans writing code (pull requests, line-by-line reviews, mentorship) don’t work the same when AI generates large chunks.

Junior developers may disengage, and even seniors wonder “How do I give feedback to my AI?”

The focus moves from polishing code to designing instructions and systems. Mentorship now teaches structured thinking in a human–AI–human loop.

And Chip didn’t have an answers about job displacement or copyright, she acknowledged uncertainty.

I do think it’s a bit scary and I don’t really know what the futures look like but builders still shape tools that affect labor markets, creative industries, and institutions.

When AI acts, who’s accountable?

As AI systems move beyond code editors, the risks grow. Chip drew a hard line: if AI acts in the real world (like a car hitting a pedestrian), mistakes can’t be undone.

The question isn’t if AI can act, but whether it should without strict limits.

Engineers now must build guardrails, monitoring, and escalation paths from the start – autonomy demands containment.

Enjoy building, but choose wisely what to build

Chip closed on a personal note:

Fundamentally, I enjoy building. It just brings me joy.

In an environment where execution becomes cheap, intrinsic motivation gains weight. She compared building to music that creates tension and resolution, and to assembling Lego sets for friends. Not every project requires a moat. Not every product needs revenue logic.

Her final reframing carried strategic weight. If replication becomes trivial, the advantage may belong to those who decide what deserves to exist. Vision, context, and responsibility define the new frontier. Execution follows.

The post Chip Huyen: To Build or Not to Build – When AI Can Do It All? appeared first on ShiftMag.

This CTO Says 93% of Developers Use AI, but Productivity Is Still 10%

Ivan Brezak Brkan — Wed, 18 Feb 2026 15:13:58 +0000

I had the chance to attend this year’s Pragmatic Summit and catch Laura Tacho – CTO at DX, executive advisor, and Austrian Innovator of the Year – in her keynote.

She presented her latest research, Measuring Developer Productivity & AI Impact, based on three months of data collected through February 1.

The research surveyed 121.000 developers across 450+ companies. A striking 92.6% of them use an AI coding assistant at least once a month, and roughly 75% use one weekly. Clearly, AI isn’t just a side experiment anymore, it’s part of the workflow.

Here are the top takeaways I found most compelling from Laura’s research.

The 10% productivity plateau

The first thing most people think of with AI assistance is saving time. According to the research, developers say they’re saving about 4 hours a week – pretty much the same as Q2 2025, with Q4 2025 numbers sitting around 3.6-3.7 hours.

It looks like the time-saving boost has leveled off. Productivity shows the same pattern: it jumped around 10% when AI first took off, and since then, it’s stayed steady at that level.

What’s really shifting is the amount of “AI-authored code” – that is, code that gets merged into the main repository or production environment with little to no human intervention. Laura breaks this down using the latest data:

Looking at about 4.2 million developers between November 2025 and February 2026, AI-authored code now makes up 26.9% of all production code – up from 22% last quarter. Daily AI users are also hitting a milestone: nearly a third of the code they merge, which passes review and goes into production, is written by AI.

One example Laura loves to highlight is how AI is speeding up the onboarding process:

Looking at the data quarter by quarter, from Q1 2024 through Q4 2025, onboarding time has been cut in half. Specifically, we’re measuring it by the “time to the 10th Pull Request (PR).”

This metric (widely seen as a key sign of successful onboarding) has now been cut in half. Because of that, Laura sees AI as a powerful tool for getting people up to speed, whether it’s new hires, engineers switching projects, or even non-engineers stepping into technical workflows.

The faster someone gets up to speed, the longer the productivity boost lasts, usually for at least two years. This points to a bigger trend: AI is helping developers get up to speed faster, reducing mental load, and making it easier to onboard into complex codebases.

In struggling organizations, AI exposes flaws instead of fixing them

Laura also pointed out a part of the research that looks at how AI impacts company performance. This segment analyzed data from 67.000 developers between November 2025 and February 2026, and the findings are strikingly divided.

Some companies are dealing with twice as many customer-facing incidents, while others see a 50% drop.

The difference comes down to how AI is used: in well-structured organizations, AI acts as a “force multiplier,” helping teams move faster, scale with higher quality, and boost reliability. In struggling organizations, AI tends to highlight existing flaws rather than fix them. Based on this, Laura concludes:

Transformation is uncomfortable. Organizations that were ready to quit their cloud or agile transformations are now giving up on AI transformation, too. It’s difficult to look at an entire organization and realize that something fundamental must change to see a real impact on the bottom line.

According to her, adoption alone doesn’t guarantee results, just using the tools doesn’t automatically improve an organization:

This is really a management problem. The hype made it sound like just trying AI would automatically pay off. But so far, most tools have been used for individual coding tasks. To see real impact, we need to use AI at the organizational level, not just for single tasks.

Laura also touched on the most popular AI tools among developers, specifically highlighting Codex:

The Codex desktop app launched on February 2 and has already topped one million downloads, with a 60% growth rate just last week. They recently rolled out GPT-5.3 Codex. Inside OpenAI, 95% of developers use Codex, and those users submit roughly 60% more Pull Requests each week.

As a real-world example, Laura highlights Cisco, where 18.000 engineers use Codex daily for complex migrations and code reviews. This has cut their code review time in half. But Laura cautions that AI won’t fix deeper organizational issues unless you tackle those problems head-on, and that starts with acknowledging they exist.

Since organizations remain constrained by human and systemic friction, Laura notes:

I am skeptical of any technology’s promise to improve performance without addressing those underlying constraints. If we don’t solve our systemic issues, we’ll just “carry them into space with us.” The real question isn’t how to colonize Mars, but how to achieve actual organizational impact.

DevEx is more important than ever

To wrap things up, Laura revealed the secret to success for those who are “winning” with AI:

1. They set clear goals and measure results.

2. They recognize that Developer Experience (DevEx) matters more than ever.

3. AI succeeds when factors like fast Continuous Integration (CI), clear documentation, and well-defined services are in place.

At the end of the day, getting real organizational results means treating AI as a company-wide challenge. The research shows the barriers aren’t technical, they come down to change management and leadership support. Laura sums it up:

Successful organizations experiment by tackling real customer problems. Exploring Mars sounds exciting, but it’s not sustainable – it’s expensive and distracts from the core business. Focus your experiments on the customer to drive meaningful results. After all, somewhere, something incredible is waiting to be discovered.

Really enjoyed your talk, and I really appreciated our chat afterward!

The productivity gap isn’t unique. Read why 84% of developers use AI, but most don’t trust it. The trust issue is part of the equation.

The post This CTO Says 93% of Developers Use AI, but Productivity Is Still 10% appeared first on ShiftMag.

Forget the Model, It’s Workflows That Make LLM Products Run

Marko Crnjanski — Thu, 05 Feb 2026 14:18:39 +0000

From his experience leading AI product teams, Andrew Mende (Senior Product Manager, Machine Learning at Booking.com) explained what it truly takes to ship LLM-based products in production.

Making AI products reliable requires new workflows

For Mende, the buzz around AI is a rare shift, like the rise of smartphones. But what does it mean for product teams?

This moment unlocks new ways of solving customer problems that were previously impossible due to technical constraints.

He was clear: traditional product management approaches often fail with AI-driven products.

LLM-based systems behave differently, demand new workflows, and bring new types of risk.

Unlike deterministic software, LLMs are probabilistic (identical inputs can produce different outputs), making experimentation easy but production readiness challenging, and forcing teams to rethink how they test, evaluate, and monitor features.

One of the biggest traps, Mende explained, is confusing a successful prototype with a scalable solution:

It’s easy to paste a prompt into ChatGPT and see results; much harder to make it reliable across thousands of real customer inputs.

Teams need structured datasets, big tables of real customer examples, to track accuracy, spot regressions, and see if changes actually work. Without them, it’s all guesswork.

Focus on accuracy, cost, and speed

Mende’s practical approach to model selection focuses on accuracy, cost, and latency: start with the most capable model to see if the problem can be solved, then move to smaller or faster models to optimize performance.

This requires testing multiple configurations (context size, prompts, and parameters) since even small changes affect results. Beyond the model, context selection, prompt instructions, and external tools are critical:

For example, when a customer asks about a specific order, the system should fetch real-time data instead of relying on static knowledge. This combination of LLMs and tools turns simple prompts into full systems, but also increases complexity and maintenance costs.

LLMs can transform how users interact – if teams build the right infrastructure

Mende concluded his How to Web lecture by saying LLMs shine by transforming user interaction: for the first time, digital products can understand plain language, turning customer requests directly into actions.

This shift brings digital experiences closer to human conversations and enables new product patterns that were out of reach just a few years ago.

The challenge now, Mende explained, is not whether LLMs work, but whether teams are willing to build the evaluation, monitoring, and infrastructure required to make them truly useful.

The post Forget the Model, It’s Workflows That Make LLM Products Run appeared first on ShiftMag.

Want to speak at Infobip Shift 2026? Here’s your chance!

ShiftMag — Wed, 14 Jan 2026 14:55:04 +0000

As one of the largest and most popular developer conferences in Europe, this event brings together thousands of developers from around the world on the Croatian coast. It offers hands-on sessions, live coding, and valuable networking opportunities in a vibrant, global setting.

A call for speakers is now open, welcoming creative thinkers and experienced developers to share their insights. Here’s what our participants are most excited about:

Practical solutions and actual use-case scenarios.
Fresh and beneficial insights (coding techniques, methodologies, perspectives).
We’re all about live coding!
Engage, entertain, and surprise us.
Present your thoughts, challenge our beliefs, or motivate us.
English is the medium of communication for the event.

The call for speakers is open till 22 May 2026 and will be reviewed by a committee from the Developer Ecosystem of Shift. Speakers will have the chance to present practical solutions, fresh coding techniques, and challenge the audience with new perspectives.

For more information and to submit a proposal, visit the official Infobip Shift Conference website.

The post Want to speak at Infobip Shift 2026? Here’s your chance! appeared first on ShiftMag.