<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Artificial Intelligence Archives - ShiftMag</title>
	<atom:link href="https://shiftmag.dev/category/artificial-intelligence/feed/" rel="self" type="application/rss+xml" />
	<link>https://shiftmag.dev/category/artificial-intelligence/</link>
	<description>Insightful engineering content &#38; community</description>
	<lastBuildDate>Wed, 27 May 2026 13:47:52 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>

<image>
	<url>https://shiftmag.dev/wp-content/uploads/2024/08/cropped-ShiftMag-favicon-32x32.png</url>
	<title>Artificial Intelligence Archives - ShiftMag</title>
	<link>https://shiftmag.dev/category/artificial-intelligence/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Trisha Gee: AI Won&#8217;t Fix Your Broken Pipeline &#8211; It Will Break It Faster</title>
		<link>https://shiftmag.dev/trisha-gee-ai-wont-fix-your-broken-pipeline-it-will-break-it-faster-9785/</link>
		
		<dc:creator><![CDATA[Ivan Pelivanovic]]></dc:creator>
		<pubDate>Wed, 27 May 2026 13:36:39 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Event]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Developer Productivity]]></category>
		<category><![CDATA[development]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=9785</guid>

					<description><![CDATA[<p>At Devoxx UK, I spoke with Trisha Gee - author and one of the most recognized voices in the Java space - about what really happens when teams lean heavily on AI. Her take was far darker than the conference hype.</p>
<p>The post <a href="https://shiftmag.dev/trisha-gee-ai-wont-fix-your-broken-pipeline-it-will-break-it-faster-9785/">Trisha Gee: AI Won&#8217;t Fix Your Broken Pipeline &#8211; It Will Break It Faster</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img fetchpriority="high" decoding="async" width="1200" height="720" src="https://shiftmag.dev/wp-content/uploads/2026/05/tirsha-devoxx.jpg?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/tirsha-devoxx.jpg 1200w, https://shiftmag.dev/wp-content/uploads/2026/05/tirsha-devoxx-300x180.jpg 300w, https://shiftmag.dev/wp-content/uploads/2026/05/tirsha-devoxx-1024x614.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/tirsha-devoxx-768x461.jpg 768w" sizes="(max-width: 1200px) 100vw, 1200px" /></figure>

<div class="wp-block-post-excerpt wp-block-hidden-desktop wp-block-hidden-mobile wp-block-hidden-tablet"><p class="wp-block-post-excerpt__excerpt">At Devoxx UK, I spoke with Trisha Gee &#8211; author and one of the most recognized voices in the Java space &#8211; about what really happens when teams lean heavily on AI. Her take was far darker than the conference hype. </p></div>


<p class="wp-block-paragraph"><strong>Trisha Gee</strong> has spent over two decades in software development, from startups to global enterprises &#8211; equally at home discussing DORA metrics and SPACE frameworks as business outcomes and organizational design.</p>



<p class="wp-block-paragraph">At <a href="https://www.devoxx.co.uk/" target="_blank" rel="noreferrer noopener">Devoxx UK</a>, she gave a talk about how <strong>software engineering principles stay the same regardless of what tooling era you are in</strong>. </p>



<p class="wp-block-paragraph">I wanted to understand what that means right now when AI is writing a significant portion of the code.</p>



<h2 class="wp-block-heading"><span id="ai-exposes-the-weakest-link-not-just-the-fastest-path">AI exposes the weakest link, not just the fastest path</span></h2>



<p class="wp-block-paragraph">Trisha frames AI as an amplifier, not a solution. When I asked what that looks like beyond demos, she put it simply: <strong>it exposes the problems that were already there</strong>, the ones you didn&#8217;t know you had.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">The most common thing I saw (I was working at Gradle, so we dealt with a lot of build tooling) was more code, more tests, and tests taking longer. The continuous delivery pipeline took a lot of pressure.</p>
</blockquote>



<p class="wp-block-paragraph">The broader pattern she describes is straightforward but easy to miss when you are excited about shipping faster. &#8220;Whichever part of your system is the weakest, it&#8217;s going to expose that part,&#8221; she said.</p>



<p class="wp-block-paragraph">Reframing it this way, while most conversations about AI adoption focus on what gets faster, Trisha highlights what deteriorates first.</p>



<h2 class="wp-block-heading"><span id="when-code-gets-cheap-everything-else-gets-expensive">When code gets cheap, everything else gets expensive</span></h2>



<p class="wp-block-paragraph">When I asked Trisha where teams should focus once code generation becomes cheap, her answer was <em>everywhere</em>.</p>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="614" src="https://shiftmag.dev/wp-content/uploads/2026/05/crowd-devoxx-1024x614.jpg?x94846" alt="" class="wp-image-9877" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/crowd-devoxx-1024x614.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/crowd-devoxx-300x180.jpg 300w, https://shiftmag.dev/wp-content/uploads/2026/05/crowd-devoxx-768x461.jpg 768w, https://shiftmag.dev/wp-content/uploads/2026/05/crowd-devoxx.jpg 1200w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">&nbsp;Photo: DevoxxUK / Flickr</figcaption></figure>



<p class="wp-block-paragraph">What she means is that optimizing the <strong>writing of code without understanding the surrounding system does not move the needle</strong>.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">It&#8217;s not about one thing which is going to fix one problem, it&#8217;s about really understanding the whole system, it&#8217;s about understanding even the whole organization, the whole enterprise. Where does IT and technology and software fit into that? What are you really trying to deliver? What is the business benefit?</p>
</blockquote>



<p class="wp-block-paragraph">She described this as <strong>working across two ends of the process</strong>. On the input side, teams need to get better at questioning requirements before writing anything. On the output side, they need to look at build pipelines, test parallelism, flaky tests, and DORA metrics.</p>



<p class="wp-block-paragraph">&#8220;If you can measure those things (your DORA metrics, build times, whether delivered requirements actually give users value) you can start to see which parts of the process are working and which need attention,&#8221; Trisha explained.</p>



<h2 class="wp-block-heading"><span id="measuring-the-wrong-things-optimizes-the-wrong-things">Measuring the wrong things optimizes the wrong things</span></h2>



<p class="wp-block-paragraph">She also makes a sharp point about measurement and optimization.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">If you measure lines of code for productivity, you&#8217;ll get more lines of code. But really productivity is not just about what we call these activity metrics. It&#8217;s not just lines of code. It&#8217;s not just pull requests, merges, features delivered.</p>
</blockquote>



<p class="wp-block-paragraph">The thing teams consistently miss is the full arc of delivery.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Developer experience and productivity is the whole piece. Did it get out to the user? Did it meet the user&#8217;s needs? Is the user paying for more of our stuff? Is the business getting what they need from what the developers are doing? What you&#8217;re measuring there impacts what you&#8217;re going to optimize.</p>
</blockquote>



<p class="wp-block-paragraph">That last line is worth sitting with. If your productivity metrics stop at pull requests merged, you are optimizing for pull requests merged.</p>



<h2 class="wp-block-heading"><span id="the-space-framework-and-why-three-metrics-beat-one"><br>The SPACE framework and why three metrics beat one</span></h2>



<p class="wp-block-paragraph">When I asked Trisha what teams should measure, she pointed to the SPACE framework. SPACE stands for <strong>satisfaction</strong>, <strong>performance</strong>, <strong>activity</strong>, <strong>communication and collaboration</strong>, and <strong>efficiency and flow</strong>.</p>



<p class="wp-block-paragraph">DORA metrics, which most teams are more familiar with, are a subset of it. Her recommendation is to <strong>pick metrics from three different dimensions</strong> rather than relying on a single category. The reasoning is that single-category metrics tend to be easy to game without improving anything real.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">So yes, you can write more code, but no, you didn&#8217;t do what the business wanted.</p>
</blockquote>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="614" src="https://shiftmag.dev/wp-content/uploads/2026/05/Trisha-Gee-1-1024x614.jpg?x94846" alt="" class="wp-image-9879" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/Trisha-Gee-1-1024x614.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/Trisha-Gee-1-300x180.jpg 300w, https://shiftmag.dev/wp-content/uploads/2026/05/Trisha-Gee-1-768x461.jpg 768w, https://shiftmag.dev/wp-content/uploads/2026/05/Trisha-Gee-1.jpg 1200w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Photo: Marin Pavelić</figcaption></figure>



<p class="wp-block-paragraph">She also brought up Fred Brooks and <strong>communication overhead</strong> as something the industry consistently underweights. The harder metrics to capture, like satisfaction and flow, are often more revealing than the activity metrics that dashboards make easy to track.</p>



<p class="wp-block-paragraph">The business outcomes she keeps returning to are specific: &#8220;You need to measure, did it do what you wanted it to do? Did it get out to the user in time? Did they start spending more money with us? Did it fix your retention problem?&#8221;</p>



<p class="wp-block-paragraph">Those are the things which matter much more to the business.</p>



<h2 class="wp-block-heading"><span id="what-to-fix-before-adopting-ai">What to fix before adopting AI</span></h2>



<p class="wp-block-paragraph">I wondered what teams need to get right before AI tooling can actually help them. Trisha&#8217;s first answer was essentially: stop adopting AI the way you have adopted everything else. </p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">We generally get requirements, write the code, chuck it out there, and then you&#8217;re kind of done. That&#8217;s not how it should work.</p>
</blockquote>



<p class="wp-block-paragraph">What she advocates for instead is <strong>applying the scientific method to engineering decisions</strong>, which sounds obvious but rarely happens in real life.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Have a hypothesis, do your investigation, measure the results, have a conclusion. Generally speaking, we have not been great at that in our industry.</p>
</blockquote>



<p class="wp-block-paragraph">Applied to AI adoption specifically, that means being precise about what you are actually trying to achieve. What are we trying to achieve with AI? Do we want to deliver more features more quickly to the customer or do we want to perhaps deliver higher quality features? Because those two things are not necessarily the same thing Trisha concluded.</p>



<p class="wp-block-paragraph">Therefore the practical instruction she gives is to <strong>run short experiments, measure one change at a time, and iterate</strong>. But have a hypothesis, figure out how to measure it, measure it, get feedback, and iterate over that.</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="AI Won&amp;apos;t Fix What You Can&amp;apos;t Measure" width="500" height="281" src="https://www.youtube.com/embed/iH4UnTskOSM?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div></figure>



<p class="wp-block-paragraph"><br></p>
<p>The post <a href="https://shiftmag.dev/trisha-gee-ai-wont-fix-your-broken-pipeline-it-will-break-it-faster-9785/">Trisha Gee: AI Won&#8217;t Fix Your Broken Pipeline &#8211; It Will Break It Faster</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Killing PRs was the easy part. Now Technical Death Keeps the CTO Up.</title>
		<link>https://shiftmag.dev/killing-prs-was-the-easy-part-now-technical-death-keeps-the-cto-up-9910/</link>
		
		<dc:creator><![CDATA[Marin Pavelić]]></dc:creator>
		<pubDate>Tue, 26 May 2026 14:39:07 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[iBOOD]]></category>
		<category><![CDATA[Sander Hoogendoorn]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=9910</guid>

					<description><![CDATA[<p>Everything you think is non-negotiable in software development, Sander Hoogendoorn's team quietly dropped - and nothing broke.</p>
<p>The post <a href="https://shiftmag.dev/killing-prs-was-the-easy-part-now-technical-death-keeps-the-cto-up-9910/">Killing PRs was the easy part. Now Technical Death Keeps the CTO Up.</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="1200" height="630" src="https://shiftmag.dev/wp-content/uploads/2026/05/iBOOD-sander-.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/iBOOD-sander-.png 1200w, https://shiftmag.dev/wp-content/uploads/2026/05/iBOOD-sander--300x158.png 300w, https://shiftmag.dev/wp-content/uploads/2026/05/iBOOD-sander--1024x538.png 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/iBOOD-sander--768x403.png 768w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /></figure>


<p class="wp-block-paragraph"><strong>Sander Hoogendoorn </strong>has been writing code for over 40 years and is currently CTO at iBOOD, a Dutch e-commerce company. </p>



<p class="wp-block-paragraph">His talk at <a href="https://www.devoxx.co.uk/" target="_blank" rel="noreferrer noopener">Devoxx</a>, <em>The Last Pull Request</em>, was a live report from a team that quietly dismantled most of what the industry considers non-negotiable, and then kept shipping.</p>



<p class="wp-block-paragraph">Now there&#8217;s a new concern.</p>



<h2 class="wp-block-heading">AI didn&#8217;t change everything. Change didn&#8217;t wait for AI.</h2>



<p class="wp-block-paragraph">Sander opened with a timeline: source control, IDEs, the web, mobile, the cloud, microservices. Each wave reshaped what developers could build and how. AI is just the latest.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">AI is going to change everything? No.&nbsp;Everything already changed everything. And this is not the last step.</p>
</blockquote>



<p class="wp-block-paragraph">The point wasn&#8217;t to diminish AI but to put it in context. Every major shift expanded the tooling, and the problem space alongside it. For most teams, that problem space now sits in what Sander calls <strong>complex territory</strong>: no best practices, only things that might emerge from experimentation. Dave Snowden&#8217;s Cynefin framework is blunt about this: in a complex context, there is no right answer to find. You have to invent one.</p>



<p class="wp-block-paragraph">That&#8217;s the actual job, Sander says. Not typing code. <strong>Solving problems that have never been solved before</strong>.</p>



<h2 class="wp-block-heading"><span id="selfware">Selfware</span></h2>



<p class="wp-block-paragraph">Sander introduced a concept: selfware. Software built by non-developers (marketers, finance teams, executives) using AI to <strong>solve their own problems without involving engineering</strong>. </p>



<p class="wp-block-paragraph">At iBOOD, the content director is already doing it. So is the CMO:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">We as tech are not fast enough. And I’ve seen this before. In the 80s and 90s, everyone started writing Excel spreadsheets.</p>
</blockquote>



<p class="wp-block-paragraph">The difference now is that the output isn’t a pivot table, it’s software. Unmanaged, untested, running on personal accounts with passwords nobody reviews, exporting customer data in ways that would make your compliance team cry. This is happening right now, and <strong>most engineering teams haven&#8217;t figured out what to do about it</strong>.</p>



<h2 class="wp-block-heading">No scrum, sprints, pull requests&#8230;</h2>



<p class="wp-block-paragraph">The list of things they stopped doing is long: no scrum, no sprints, no retros. Fewer standups. No scrum master, no product owner. Minimal estimates. <strong>No pull requests</strong> &#8211; because every branch is a merge waiting to happen, every review costs time, and reviewers rarely know what the code was supposed to do in the first place.</p>



<p class="wp-block-paragraph">What replaced it? <a href="https://shiftmag.dev/pair-programming-benefits-challenges-563/" target="_blank" rel="noreferrer noopener">Pair programming</a>. <a href="https://shiftmag.dev/mob-programming-why-do-it-882/" target="_blank" rel="noreferrer noopener">Mob programming</a>. Smaller changes, checked in faster, continuously. Everyone on the team is an architect. Everyone is accountable for everything, Sander says:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Perfection is achieved not when there’s nothing more to add but when there’s nothing left to take away.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="pair-me-with-claude">Pair me with Claude</span></h2>



<p class="wp-block-paragraph">Today, Sander&#8217;s 13-person team <strong>pairs with AI through most of their working day</strong>. It became the natural way to work. Currently that means Claude, though that could change next week.</p>



<p class="wp-block-paragraph">AI breaks things. Two weeks before the talk, Sander pushed AI-generated changes that silently removed all dependency injections from their web page constructors. None of the pages were serving data. He didn&#8217;t catch it until later:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">I’m not saying not to use AI. I do it every day. But I do think we should check what it’s doing.</p>
</blockquote>



<p class="wp-block-paragraph">What worries him most is what he calls <strong>technical death</strong> &#8211; a state where a team spends all its time keeping existing software alive, with nothing left for anything new. Technical debt compounding under AI-generated code nobody fully reviews. Complexity accumulating faster than it gets cleaned up. That&#8217;s the real risk.</p>



<h2 class="wp-block-heading"><span id="we-asked-sander-a-few-more-things">We asked Sander a few more things</span></h2>



<h3 class="wp-block-heading"><span id="your-team-dropped-pull-requests-was-that-an-ai-decision">Your team dropped pull requests. Was that an AI decision?</span></h3>



<p class="wp-block-paragraph"><strong>Sander</strong>: No, we did that a long time ago, it has nothing to do with AI. The problem with pull requests is that <strong>they slow you down</strong>. The longer you wait with merging back into main, the harder it gets, because other people make changes too. And what you see very often is that people reviewing other people’s code tend not to know or even understand what the code was supposed to do. So they check formatting, linting, naming conventions. Which is pretty stupid, because that you can automate.</p>



<p class="wp-block-paragraph"><strong>Pull requests make sense in open source</strong>, where you have no idea who’s submitting changes or what the quality of their work is. But on your own team? I don’t see any problems with committing code from anybody automatically. We work together every day, we write code together. You just don’t need it.</p>



<h3 class="wp-block-heading"><span id="ai-is-part-of-your-team-now-what-happens-when-something-breaks">AI is part of your team now. What happens when something breaks?</span></h3>



<p class="wp-block-paragraph"><strong>Sander</strong>: We don’t track who broke it. <strong>Everybody on my team is accountable for everything, including me</strong>. If I push something and the pipeline fails and I’m not around, somebody else picks it up. I have no doubt about that. So accountability is… I don’t care too much about it, because it’s distributed. We don’t blame people. We just fix it.</p>



<h3 class="wp-block-heading"><span id="you%e2%80%99ve-been-critical-of-agile-is-ai-exposing-that-teams-never-really-understood-it">You’ve been critical of Agile. Is AI exposing that teams never really understood it?</span></h3>



<p class="wp-block-paragraph"><strong>Sander</strong>: I’m not critical about Agile. I think a lot of people misunderstand what Agile actually means. Agile does not mean Scrum. Actually, to be quite honest, Scrum is not really Agile. The Scrum Guide says Scrum is immutable, which basically means it’s not Agile, because Agile means you can improve on anything.</p>



<p class="wp-block-paragraph">There is nothing in Agile that says you need to do sprints. The key statement in the Agile Manifesto is the one at the top: <strong>we are uncovering better ways of developing software</strong>. Everything else doesn’t really matter. As long as you have that mindset, there’s always something to improve. No default way of working is going to solve the problem for you.</p>



<h3 class="wp-block-heading"><span id="where-does-this-go-in-two-or-three-years">Where does this go in two or three years?</span></h3>



<p class="wp-block-paragraph"><strong>Sander</strong>: I think we will soon realize that the <strong>English language is too ambiguous and not concise enough to specify to an AI what to do</strong>. So what will happen is that we’ll develop better ways of having conversations with AI &#8211; more precise, less ambiguous. And what those languages are called? Programming languages. We will develop programming languages that allow us to talk to an AI in a way that the AI is able to create lower-level code from it.</p>



<p class="wp-block-paragraph">Programming will be programming, except with different tools. As they always have been.</p>
<p>The post <a href="https://shiftmag.dev/killing-prs-was-the-easy-part-now-technical-death-keeps-the-cto-up-9910/">Killing PRs was the easy part. Now Technical Death Keeps the CTO Up.</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>When systems go down, devs still juggle 10 tabs. PagerDuty says MCP fixes that</title>
		<link>https://shiftmag.dev/when-systems-go-down-devs-still-juggle-10-tabs-pagerduty-says-mcp-fixes-that-9657/</link>
		
		<dc:creator><![CDATA[Ivan Pelivanovic]]></dc:creator>
		<pubDate>Fri, 22 May 2026 14:13:25 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Event]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Developer Productivity]]></category>
		<category><![CDATA[MCP]]></category>
		<category><![CDATA[PagerDuty]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=9657</guid>

					<description><![CDATA[<p>Production incidents are a context problem. By the time an engineers understand what's happening, they've already bounced across several different tools - and the incident is still ongoing. PagerDuty thinks MCP is the fix.</p>
<p>The post <a href="https://shiftmag.dev/when-systems-go-down-devs-still-juggle-10-tabs-pagerduty-says-mcp-fixes-that-9657/">When systems go down, devs still juggle 10 tabs. PagerDuty says MCP fixes that</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="1200" height="630" src="https://shiftmag.dev/wp-content/uploads/2026/05/rocio-and-sebastian-2.jpg?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/rocio-and-sebastian-2.jpg 1200w, https://shiftmag.dev/wp-content/uploads/2026/05/rocio-and-sebastian-2-300x158.jpg 300w, https://shiftmag.dev/wp-content/uploads/2026/05/rocio-and-sebastian-2-1024x538.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/rocio-and-sebastian-2-768x403.jpg 768w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /></figure>

<div class="wp-block-post-excerpt wp-block-hidden-desktop wp-block-hidden-mobile wp-block-hidden-tablet"><p class="wp-block-post-excerpt__excerpt">Production incidents are a context problem. By the time an engineers understand what&#8217;s happening, they&#8217;ve already bounced across several different tools &#8211; and the incident is still ongoing. PagerDuty thinks MCP is the fix. </p></div>


<p class="wp-block-paragraph">When incidents hit production systems, engineers rarely stay inside one tool for long,  jumping from logs to dashboards to runbooks, trying to <strong>reconstruct what is actually happening</strong>.</p>



<p class="wp-block-paragraph">Talking to other builders, it seemed like almost everybody faces this context-switching problem. </p>



<p class="wp-block-paragraph"><strong>Rocío Bayon</strong> (Product Manager) and <strong>Sebastian Villanelo</strong> (Sr. Forward Deployed Engineer) from PagerDuty think MCP is how you fix it.</p>



<h2 class="wp-block-heading"><span id="pagerduty-built-their-mcp-to-cut-context-switching">PagerDuty built their MCP to cut context switching</span></h2>



<p class="wp-block-paragraph">Rocío explained that their MCP is solving the issue of context switching:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">When an incident hits, the engineer has to go between 5 to 10 different tools to understand what&#8217;s happening.</p>
</blockquote>



<p class="wp-block-paragraph">That&#8217;s the real problem they&#8217;re trying to solve. </p>



<p class="wp-block-paragraph">PagerDuty&#8217;s framing of MCP was interesting: neither Rocío nor Sebastian described MCP as just another integration layer. They framed it as <strong>connective tissue</strong> that gathers logs, alerts, runbooks, and incident context into a single workflow.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">What the MCP does, it brings all that context into one platform where engineers are usually already working.</p>
</blockquote>



<p class="wp-block-paragraph">Most engineering organizations already have enormous amounts of observability data. The real problem is that <strong>it is scattered across systems</strong>, and engineers end up reconstructing operational context manually during incidents.</p>



<h2 class="wp-block-heading"><span id="retrieve-what-you-need-nothing-more">Retrieve what you need, nothing more</span></h2>



<p class="wp-block-paragraph">Sebastian framed the problem as signal retrieval. Rather than feeding the model more information, the goal is <strong>pulling the relevant operational state around a specific incident</strong>.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">If you have the right parameters or the queries and all this stuff, you will retrieve the exact information that you need.</p>
</blockquote>



<p class="wp-block-paragraph">That means <strong>narrowing context around the actual incident window</strong>. When an incident hits, it retrieves information around that time only, Sebastian explained.</p>



<p class="wp-block-paragraph">That also changes how they think about efficiency, reducing context switching directly affects operational speed, token usage, and cost.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">You will see that information only with one call. And that saves a lot of tokens and time. That&#8217;s money and time.</p>
</blockquote>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="614" src="https://shiftmag.dev/wp-content/uploads/2026/05/Editorial-IP-71-1024x614.jpg?x94846" alt="" class="wp-image-9873" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/Editorial-IP-71-1024x614.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/Editorial-IP-71-300x180.jpg 300w, https://shiftmag.dev/wp-content/uploads/2026/05/Editorial-IP-71-768x461.jpg 768w, https://shiftmag.dev/wp-content/uploads/2026/05/Editorial-IP-71.jpg 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Photo: Lea Lobor</figcaption></figure>



<h2 class="wp-block-heading"><span id="ai-helps-but-engineers-still-decide">AI helps but engineers still decide</span></h2>



<p class="wp-block-paragraph">Still, both of them were careful not to frame AI as autonomous incident management. </p>



<p class="wp-block-paragraph">Rocío repeatedly emphasized that <strong>MCP and AI systems are primarily helping with context gathering and operational visibility</strong>, while engineers remain responsible for the high-risk decisions:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">The AI is helping you, but the engineer is the one who is assessing and making decisions where there&#8217;s a high risk.</p>
</blockquote>



<p class="wp-block-paragraph">That human layer is intentional. PagerDuty&#8217;s broader vision seems less about replacing on-call engineers and more about <strong>reducing the operational overhead surrounding incidents</strong>. Their MCP systems help gather information, surface relationships between systems, and accelerate investigation workflows, but humans still decide what actually happens next.</p>



<p class="wp-block-paragraph">Rocío also mentioned that their <strong>SRE agent</strong> is designed to support larger incident workflows beyond information retrieval:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">It can also help you trigger those incident workflows. So it can help you resolve the incident. And it learns as it goes.</p>
</blockquote>



<h2 class="wp-block-heading">&#8220;MCP &#8211; the connective tissue between tools&#8221;</h2>



<p class="wp-block-paragraph">I asked Rocío and Sebastian, how does MCP fit into the tools they already use without becoming just another silo.</p>



<p class="wp-block-paragraph">And both of them clearly framed <strong>MCP as anti-silo infrastructure</strong> since it brings everything to one place. Rocío called MCP &#8220;the connective tissue between all these different tools.&#8221;</p>



<p class="wp-block-paragraph">That framing probably captures the broader architectural challenge better than anything else in the interview. </p>



<p class="wp-block-paragraph">Modern incident response already spans dozens of systems: observability platforms, deployment pipelines, CI/CD tooling, ticketing systems, infrastructure management, and communication layers. </p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">AI systems inherit that fragmentation unless something explicitly connects operational state.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="engineers-trust-systems-that-behave-predictably">Engineers trust systems that behave predictably</span></h2>



<p class="wp-block-paragraph">Sebastian mentioned that <strong>teams often react very differently to MCP systems</strong>. Some embrace them immediately while others remain skeptical, especially around security and predictability. For him, trust improves once systems consistently produce expected outcomes:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">When a person or a teammate says &#8220;ah, I&#8217;m retrieving what I&#8217;m expecting to retrieve&#8221;, that will help them to trust it.</p>
</blockquote>



<p class="wp-block-paragraph">A lot of AI tooling discussions still focus on model capability, reasoning quality, or benchmark performance. But <strong>operational systems are usually adopted much more pragmatically</strong>. Engineers trust systems that behave predictably, retrieve the right operational context, and fit into workflows they already rely on.</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="10 Tools, One IDE: PagerDuty&amp;apos;s Incident MCP" width="500" height="281" src="https://www.youtube.com/embed/YRIiKkG7JY0?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div></figure>
<p>The post <a href="https://shiftmag.dev/when-systems-go-down-devs-still-juggle-10-tabs-pagerduty-says-mcp-fixes-that-9657/">When systems go down, devs still juggle 10 tabs. PagerDuty says MCP fixes that</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Teaching AI Agents to Test 1,000 Java Libraries – and Letting Them Run While You Sleep</title>
		<link>https://shiftmag.dev/teaching-ai-agents-to-test-1000-java-libraries-and-letting-them-run-while-you-sleep-9802/</link>
		
		<dc:creator><![CDATA[Marin Pavelić]]></dc:creator>
		<pubDate>Tue, 19 May 2026 18:39:50 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Event]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Vojin Jovanović]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=9802</guid>

					<description><![CDATA[<p>A 1,000 libraries, 90% coverage, 1,700 in API tokens. Nobody typed a single test by hand.</p>
<p>The post <a href="https://shiftmag.dev/teaching-ai-agents-to-test-1000-java-libraries-and-letting-them-run-while-you-sleep-9802/">Teaching AI Agents to Test 1,000 Java Libraries – and Letting Them Run While You Sleep</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="1200" height="630" src="https://shiftmag.dev/wp-content/uploads/2026/05/ai-agents-java-libraries-1200x630-1.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/ai-agents-java-libraries-1200x630-1.png 1200w, https://shiftmag.dev/wp-content/uploads/2026/05/ai-agents-java-libraries-1200x630-1-300x158.png 300w, https://shiftmag.dev/wp-content/uploads/2026/05/ai-agents-java-libraries-1200x630-1-1024x538.png 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/ai-agents-java-libraries-1200x630-1-768x403.png 768w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /></figure>


<p class="wp-block-paragraph">When humans maintained the GraalVM native image reflection metadata repository, <strong>coverage sat at just 14%</strong>. Tests were often stubs that technically compiled but covered nothing meaningful, nobody wanted to write them for someone else&#8217;s code, and the results showed.</p>



<p class="wp-block-paragraph">At <a href="https://www.devoxx.co.uk/" target="_blank" rel="noreferrer noopener">Devoxx UK</a>, <strong>Vojin Jovanovic</strong> (Principal Researcher, Oracle Labs) and <strong>Mihailo Markovic</strong> (Software Engineer, Oracle), presented how they replaced that process with an autonomous AI agent pipeline. </p>



<p class="wp-block-paragraph">The result is <strong>90% dynamic access coverage across more than 1,000 JVM libraries</strong>, roughly 2 billion tokens spent, and a GitHub repository generating thousands of commits per week &#8211; while Vojin was at a hotel the night before the conference.</p>



<h2 class="wp-block-heading"><span id="the-problem-with-graalvm-reflection">The problem with GraalVM reflection</span></h2>



<p class="wp-block-paragraph">GraalVM Native Image takes a Java application, performs static analysis, and AOT compiles it into a single binary. The benefits are significant: <strong>startup roughly 10x faster than a standard JVM</strong>,<strong> </strong>dramatically lower memory footprint. </p>



<p class="wp-block-paragraph">But static analysis has a fundamental limitation: when a method calls Class.forName(&#8220;Foo&#8221;) with a dynamic argument, the analyser <strong>cannot determine at compile time what class will be needed</strong>. Reflection calls break the closed-world assumption.</p>



<p class="wp-block-paragraph">The solution is <strong>reachability metadata</strong> &#8211; a JSON file that tells the native image compiler which classes, methods, and fields need to be accessible at runtime. Writing this metadata requires running tests that exercise all the relevant code paths. </p>



<p class="wp-block-paragraph">For a library like Hibernate Core, that means covering 264 individual reflection call sites. For Tomcat, 205. Across the JVM ecosystem, the number is enormous, and until recently, it was almost entirely a manual process that humans were not doing well.</p>



<h2 class="wp-block-heading"><span id="start-simple-then-add-feedback">Start simple, then add feedback</span></h2>



<p class="wp-block-paragraph">The first approach was straightforward: give an LLM the library source code, tell it to generate comprehensive Java tests, collect the metadata via a JVMTI agent. </p>



<p class="wp-block-paragraph">The results were not impressive &#8211; 5.7% coverage for logback, 2.9% for H2. Vojin noted how this doesn’t feel like AGI.</p>



<p class="wp-block-paragraph">The shift came from <strong>adding GraalVM’s static analysis directly to the agent’s context</strong>. Instead of asking the LLM to guess which code paths matter, the pipeline runs a static analysis pass that identifies every dynamic access call site (the exact class, method, and line number) and feeds that report directly to the agent. With this addition, logback coverage jumped to 97%, H2 to 84.3%, in five iterations.</p>



<p class="wp-block-paragraph"><strong>The next layer was JaCoCo integration</strong>. After each generation round, the pipeline correlates coverage data with the remaining uncovered call sites and feeds only the uncovered ones back into the next iteration. The agent knows exactly what it hit and what it missed. Vojin noted:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">We always create a checkpoint in those systems so we can go back to it if something goes wrong. And in these LLM-driven workflows, something is always going wrong.</p>
</blockquote>



<p class="wp-block-paragraph">With this feedback loop: logback reached 100%, H2 reached 96.1%.</p>



<h2 class="wp-block-heading"><span id="coverage-sometimes-still-isn%e2%80%99t-enough">Coverage sometimes still isn’t enough</span></h2>



<p class="wp-block-paragraph">For larger, more complex libraries (Guava, Tomcat, MongoDB) even the feedback loop left gaps. The team added a third technique:<strong> PGO</strong> (Profile-Guided Optimization) <strong>profiling from GraalVM’s Graal compiler</strong>. The profiler samples execution and produces a call trace, which can be correlated with static analysis to identify exactly where a test nearly reached a reflection call but diverged.</p>



<p class="wp-block-paragraph">The profiling feedback tells the agent not just what’s uncovered, but <strong>where in the call stack a test went in the wrong direction</strong> and what it would need to do differently. Results: Guava went from 50% to 72%, Tomcat from 45% to 83%, MongoDB reached 100%. </p>



<p class="wp-block-paragraph">The feedback also tells the agent (and the engineers) why certain calls cannot be covered: a security service only available on Java 6, a cleaner class incompatible with the current JVM. &#8220;If you cannot reach it, tell us why,&#8221; the prompt instructs, and the agent does.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="622" src="https://shiftmag.dev/wp-content/uploads/2026/05/55258392222_e98c47ff10_k-1024x622.jpg?x94846" alt="" class="wp-image-9853" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/55258392222_e98c47ff10_k-1024x622.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/55258392222_e98c47ff10_k-300x182.jpg 300w, https://shiftmag.dev/wp-content/uploads/2026/05/55258392222_e98c47ff10_k-768x466.jpg 768w, https://shiftmag.dev/wp-content/uploads/2026/05/55258392222_e98c47ff10_k.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Photo: DevoxxUK / Flickr</figcaption></figure>



<h2 class="wp-block-heading"><span id="cost-agents-and-model-selection">Cost, agents and model selection</span></h2>



<p class="wp-block-paragraph"><strong>Codex</strong> was the first agent framework the team tried. For logback  (a library with 33 dynamic access calls) Codex spent $35:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">If we’re spending $35 per library for a thousand libraries, we’re not replacing humans.</p>
</blockquote>



<p class="wp-block-paragraph"><strong>The alternative was P</strong>, a minimal agent that starts with a 200-token context describing basic file operations and bash execution. Same results, roughly 10x cheaper and the lesson is straightforward:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Simple task, use a simple agent. You already give it a lot of rules, a lot of context, and you’ve grounded it enough so it can perform on the level of these big agents.</p>
</blockquote>



<p class="wp-block-paragraph">On model selection, the team compared GPT 5.5 against several open-source alternatives &#8211; GLM, Kimi K2, DeepSeek, Gemma. <strong>GPT 5.5 consistently outperformed them on coverage</strong>. The counterintuitive finding was this: a more expensive model that makes the right decision in one shot can cost less overall than a cheaper model that wastes tokens going in the wrong direction.</p>



<h2 class="wp-block-heading"><span id="the-architecture-that-lets-it-run-without-you">The architecture that lets it run without you</span></h2>



<p class="wp-block-paragraph">The pipeline now operates as a <strong>third-generation system</strong>. When a user opens an issue requesting a library, the agent fetches the issue, runs the generation workflow, verifies the output, creates a pull request, reviews it, and merges or escalates to human review &#8211; automatically. The &#8220;human intervention&#8221; label on GitHub still exists, but its queue has shrunk dramatically.</p>



<p class="wp-block-paragraph"><strong>Documentation, not smarter prompting, was what made the difference</strong>. </p>



<p class="wp-block-paragraph">Vojin outlined what he calls <em><strong>the</strong> <strong>key context layers</strong></em>: </p>



<ul class="wp-block-list">
<li>raison d’être (why does this project exist, in two sentences), </li>



<li>state of direction (where the architecture stands today), </li>



<li>functional specification (how the system behaves), </li>



<li>architectural specification (how it is built), </li>



<li>decision records (what major choices were made and why), and </li>



<li>comprehensive logs that serve as checkpoints for recovery.</li>
</ul>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">When you do all of these things, it takes almost a few days for a very big project. You will reduce your work by 50%, 60%, 70%.</p>
</blockquote>



<p class="wp-block-paragraph">The payoff is that agents with this context can diagnose failures, trace them through logs, and fix the underlying system, not just the immediate problem.</p>



<p class="wp-block-paragraph">The RAID system (an automated issue-resolution agent) was built in four prompts on a Sunday morning. It sweeps human intervention tickets, classifies them, performs deep analysis using the project logs, and either opens a GitHub issue for humans or attempts a fix in a forked branch with review. Jovanovic added:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Never work on the problem, always work on the system. You never go and fix a ticket. You always go fix the rules.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="where-things-stand">Where things stand</span></h2>



<p class="wp-block-paragraph">The repository currently supports 1,021 libraries. Without five large Hibernate libraries that predate the automated pipeline, <strong>dynamic access coverage across the ecosystem is 90%</strong>. </p>



<p class="wp-block-paragraph">The GitHub repository has accumulated roughly 2,977 branches. In the week before Devoxx, it logged approximately 8,000-9,000 commits, with agents committing every few minutes around the clock. </p>



<p class="wp-block-paragraph">Total cost for the project: approximately $1,700 in API tokens, plus personal compute on Jovanovic’s home desktop, running around the clock because the Oracle compliance process for cloud infrastructure takes time. The key point is simple:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Start with neural, simplest thing, get results, and then slowly chop off things and put them into algorithms, because they are much cheaper and faster.</p>
</blockquote>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="614" src="https://shiftmag.dev/wp-content/uploads/2026/05/55259209226_a368299446_k-1024x614.jpg?x94846" alt="" class="wp-image-9850" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/55259209226_a368299446_k-1024x614.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/55259209226_a368299446_k-300x180.jpg 300w, https://shiftmag.dev/wp-content/uploads/2026/05/55259209226_a368299446_k-768x460.jpg 768w, https://shiftmag.dev/wp-content/uploads/2026/05/55259209226_a368299446_k.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Photo: DevoxxUK / Flickr</figcaption></figure>



<h2 class="wp-block-heading"><span id="we-caught-vojin-jovanovic-for-a-few-more-questions">We caught Vojin Jovanovic for a few more questions!</span></h2>



<p class="wp-block-paragraph">After the talk, we sat down with Vojin for a few minutes to ask him a couple more questions. </p>



<h3 class="wp-block-heading"><span id="you-tested-over-1000-libraries-what-broke-first-when-you-tried-to-scale">You tested over 1,000 libraries. What broke first when you tried to scale?</span></h3>



<p class="wp-block-paragraph"><strong>Vojin</strong>: Basically everything broke. We had mostly infrastructure issues, all kinds of GitHub failures. When you build a system at this scale, you need to assume that everything will fail and needs to recover. We broke GitHub rate limits. My machine was broken because it was running so many things. The key takeaway is that <strong>you need to build a system in a way that you can always continue</strong>. When things fail, you always checkpoint and continue from a checkpoint. We do work in sizable chunks, and when something fails, you just restart the chunk.</p>



<h3 class="wp-block-heading"><span id="is-just-asking-the-llm-enough">Is just asking the LLM enough?</span></h3>



<p class="wp-block-paragraph"><strong>Vojin</strong>: If you had asked me four weeks ago, I would say no. Now I would say <strong>you need to know how to ask it</strong>, and it will be enough. I was like, &#8220;GitHub is failing with a 504, abstract away all GitHub calls and retry.&#8221; It did it in two minutes. With today’s models, it’s a matter of minutes, not hours.</p>



<h3 class="wp-block-heading"><span id="what-did-you-learn-about-the-trade-off-between-cost-speed-and-coverage">What did you learn about the trade-off between cost, speed, and coverage?</span></h3>



<p class="wp-block-paragraph"><strong>Vojin</strong>: I haven’t seen a situation when doing something with an LLM is more expensive than doing that by a human typing on the keyboard. Build a system that uses the most efficient LLM for the job — you’re going to get far and not cost much money at all.</p>



<h3 class="wp-block-heading"><span id="when-does-using-multiple-agents-make-sense">When does using multiple agents make sense?</span></h3>



<p class="wp-block-paragraph"><strong>Vojin</strong>: Where I use it is for <strong>decisions and research</strong>. I use Claude Opus 4.7, Gemini 3.1, and GPT 5.5. I ask them all, let them discuss, and I discuss together with them. Each brings something to the table. Before, it was always Claude who was the smartest. Now GPT 5.5 is second and close to the first. Things are changing. The most important bit is getting the system designed right. Once you do that, coding, I don’t care who does it.</p>
<p>The post <a href="https://shiftmag.dev/teaching-ai-agents-to-test-1000-java-libraries-and-letting-them-run-while-you-sleep-9802/">Teaching AI Agents to Test 1,000 Java Libraries – and Letting Them Run While You Sleep</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How Developers Should Build AI Tools &#8211; So The EU Doesn’t Lose IT</title>
		<link>https://shiftmag.dev/how-developers-should-build-ai-tools-so-the-eu-doesnt-lose-it-9482/</link>
		
		<dc:creator><![CDATA[Marin Pavelić]]></dc:creator>
		<pubDate>Fri, 15 May 2026 13:20:37 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Ervin Jagatić]]></category>
		<category><![CDATA[infobip]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=9482</guid>

					<description><![CDATA[<p>What happens when regulators ask an AI company to explain exactly how its system made a decision? </p>
<p>The post <a href="https://shiftmag.dev/how-developers-should-build-ai-tools-so-the-eu-doesnt-lose-it-9482/">How Developers Should Build AI Tools &#8211; So The EU Doesn’t Lose IT</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="1200" height="630" src="https://shiftmag.dev/wp-content/uploads/2026/05/eu-ai-act-compliance-1200x630-1.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/eu-ai-act-compliance-1200x630-1.png 1200w, https://shiftmag.dev/wp-content/uploads/2026/05/eu-ai-act-compliance-1200x630-1-300x158.png 300w, https://shiftmag.dev/wp-content/uploads/2026/05/eu-ai-act-compliance-1200x630-1-1024x538.png 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/eu-ai-act-compliance-1200x630-1-768x403.png 768w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /></figure>


<p class="wp-block-paragraph">The August 2026 deadline for the <a href="https://artificialintelligenceact.eu/" target="_blank" rel="noreferrer noopener">EU AI Act</a> is getting close, and companies and developerds building AI products are starting to feel it. </p>



<p class="wp-block-paragraph">High-risk AI systems need to be compliant by then, and the ones doing it well aren&#8217;t treating it as a last-minute legal scramble. They&#8217;re <strong>building compliance in from the start</strong>. </p>



<p class="wp-block-paragraph">We sat down with <strong>Ervin Jagatic</strong> (AI Business Unit Director, Infobip) to talk about what that actually looks like at Infobip, and why compliance-by-design is turning into something engineers think about, not just lawyers.</p>



<h2 class="wp-block-heading"><span id="compliance-starts-in-the-design-phase">Compliance starts in the design phase</span></h2>



<p class="wp-block-paragraph">AI Act compliance doesn&#8217;t start at deployment. Ervin is clear on this: <strong>it has to enter during system architecture, before a single line of agent code is written</strong>:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Compliance enters during the design phase &#8211; system architecture, data flow planning. Every layer of our AI Agents product, from planning to memory to tool execution, needs to be designed with traceability and human oversight in mind. We can&#8217;t bolt that on after the orchestrator is already coordinating multiple sub-agents autonomously.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="the-ai-act-is-changing-product-development-in-3-ways"><strong>The AI Act is changing product development in 3 ways</strong></span></h2>



<p class="wp-block-paragraph">That shift has already changed how Infobip&#8217;s teams design and ship AI-powered features. Ervin points to three major changes that came directly from the AI Act.</p>



<h3 class="wp-block-heading"><span id="1-transparency-and-auditability">1. Transparency and auditability</span></h3>



<p class="wp-block-paragraph">Transparency is the first. Infobip&#8217;s <strong>AI Agents documentation is explicit</strong>: &#8220;you cannot script exact responses&#8221; &#8211; agents &#8220;generate responses dynamically.&#8221; </p>



<p class="wp-block-paragraph">That unpredictability is exactly why the company expanded its logging and analytics infrastructure, Ervin explains:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">The AI Act&#8217;s transparency obligations pushed us to build comprehensive logging into our Insights and Analytics layer. Every agent execution now produces detailed logs &#8211; requests, responses, processing steps. That&#8217;s not just good engineering, it&#8217;s a direct response to auditability requirements.</p>
</blockquote>



<h3 class="wp-block-heading"><span id="2-explicit-guardrails-instead-of-assumptions">2. Explicit guardrails instead of assumptions</span></h3>



<p class="wp-block-paragraph">The second shift relates to behavioral boundaries and guardrails. Infobip now <strong>requires customers to define capability boundaries, mandatory restrictions, and compliance rules directly inside every agent’s system prompt</strong>, Ervin points out:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Our own documentation warns that if you do not explicitly define these constraints, the agent makes assumptions. That design philosophy, forcing explicit guardrails rather than relying on implicit model behavior, comes directly from the Act’s emphasis on risk mitigation by design.</p>
</blockquote>



<h3 class="wp-block-heading"><span id="3-human-oversight-is-a-part-of-the-architecture">3. Human oversight is a part of the architecture</span></h3>



<p class="wp-block-paragraph">The third shift is human oversight &#8211; not as an external policy layer, but <strong>built directly into the product architecture</strong>. Ervin explains:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph"><a href="https://www.infobip.com/agentos" target="_blank" rel="noreferrer noopener">AgentOS</a> uses a human-in-the-loop model where complex issues are escalated from AI agents to human agents. We are talking about a core architectural decision that applies human oversight requirements while also improving the product.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="why-compliance-by-design-is-becoming-the-standard">Why compliance-by-design is becoming the standard</span></h2>



<p class="wp-block-paragraph">Ervin believes compliance-by-design is quickly becoming <strong>the</strong> <strong>new industry standard</strong>, particularly for teams building enterprise-grade AI systems:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">For developers and ML engineers at Infobip, compliance-by-design means several practical things. It means every AI agent we build has a defined architecture where an orchestrator coordinates sub-agents, each with explicit scope, tools, and behavioral rules.</p>
</blockquote>



<p class="wp-block-paragraph">It also <strong>changes how engineering teams think about data</strong>. &#8220;It means our engineers think about data lineage and provenance from the moment they design a training pipeline, not because someone from legal asked them to, but because the architecture demands it,&#8221; Ervin points out.</p>



<p class="wp-block-paragraph">To support that approach, Infobip <strong>invested heavily in tooling and analytics infrastructure</strong> that now serves both operational and regulatory purposes, Ervin said:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Our Insights and Analytics platform is our compliance infrastructure. When a regulator asks ‘show me how this AI system made this decision,’ we need to answer that question with structured evidence, not anecdotes.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="risk-assessment-depends-on-the-use-case">Risk assessment depends on the use case</span></h2>



<p class="wp-block-paragraph">Internally, the company approaches risk assessment through a framework closely aligned with the <strong>AI Act’s four-tier classification model</strong>: unacceptable, high, limited, and minimal risk. However, Ervin notes that Infobip applies this framework at the feature level rather than only at the system level:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">This is important because a platform like Infobip’s serves vastly different use cases. An AI gamification tool for lead generation on WhatsApp is a fundamentally different risk profile than an AI agent that handles authentication.</p>
</blockquote>



<p class="wp-block-paragraph">The company <strong>evaluates risk based on several factors</strong>, including the sensitivity of the data involved, the autonomy of the AI component, and the intended use case, Ervin explains:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Our internal process follows a lifecycle approach. During identification, we map known and foreseeable risks, including risks from reasonably foreseeable misuse. During estimation, we assess probability and severity. During mitigation, we implement design controls, testing procedures, and human oversight.</p>
</blockquote>



<p class="wp-block-paragraph"><strong>Monitoring continues after deployment</strong> through analytics infrastructure designed for drift detection, incident investigation, and performance tracking. For enterprise customers, risk assessment also becomes a collaborative process between Infobip and client compliance teams.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">A bank using our AI agents to automate customer support has different risk considerations than a retail brand using the same technology for product recommendations. The platform is the same; the risk profile is not.</p>
</blockquote>



<h2 class="wp-block-heading">August 2026 is approaching&#8230;</h2>



<p class="wp-block-paragraph">As August 2026 closes in, Ervin says the conversation has shifted:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">The question is no longer whether to integrate compliance into product development. The question is whether you&#8217;ve built the infrastructure to do it at speed.</p>
</blockquote>
<p>The post <a href="https://shiftmag.dev/how-developers-should-build-ai-tools-so-the-eu-doesnt-lose-it-9482/">How Developers Should Build AI Tools &#8211; So The EU Doesn’t Lose IT</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How Agentic AI Foundation and MCP Are Redefining the Infrastructure for AI Agents</title>
		<link>https://shiftmag.dev/how-agentic-ai-foundation-and-mcp-are-redefining-the-infrastructure-for-ai-agents-9663/</link>
		
		<dc:creator><![CDATA[Anastasija Uspenski]]></dc:creator>
		<pubDate>Wed, 13 May 2026 14:01:47 +0000</pubDate>
				<category><![CDATA[API]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[AAIF]]></category>
		<category><![CDATA[agentic AI]]></category>
		<category><![CDATA[Agentic AI Foundation]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[infobip]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[MCP]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=9663</guid>

					<description><![CDATA[<p>I spoke with developers from a golden member company who explained why AAIF membership is crucial for navigating the shift toward agentic AI.</p>
<p>The post <a href="https://shiftmag.dev/how-agentic-ai-foundation-and-mcp-are-redefining-the-infrastructure-for-ai-agents-9663/">How Agentic AI Foundation and MCP Are Redefining the Infrastructure for AI Agents</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">As an ICT journalist, I see AI as a force that keeps surpassing its own limits. Engineers refine it every day, while millions of users around the world feed it with real experiences, making it increasingly capable. Only a few years after the global expansion of chatbots, we now witness another major transformation: AI no longer just provides answers, it performs tasks.</p>



<p class="wp-block-paragraph">This shift feels both fascinating and unsettling. Because of that, the need for <strong>clear rules and a neutral authority</strong> has never been greater. Such a framework must ensure balance so that AI develops in a way that remains fair, transparent, and aligned with human needs. </p>



<p class="wp-block-paragraph">That need led to the creation of the <a href="https://aaif.io/" type="link" id="https://aaif.io/" target="_blank" rel="noreferrer noopener">Agentic AI Foundation (AAIF) within the Linux Foundation in December last year</a>.</p>



<h2 class="wp-block-heading">AAIF builds open, neutral foundations for agentic AI through collaboration &#8211; not control</h2>



<p class="wp-block-paragraph">AAIF<strong> </strong>mission focuses on <strong>neutral governance, open standards, and a collaborative ecosystem</strong>. The goal is to prevent a small number of proprietary companies and platforms from dominating AI. </p>



<p class="wp-block-paragraph">In this context, the Linux Foundation provides reliable infrastructure, much like Linux does for operating systems or Kubernetes does for cloud environments. It ensures that these technologies remain open, secure, and interoperable.</p>



<p class="wp-block-paragraph">Agentic AI Foundation hosts the <a href="https://shiftmag.dev/tag/mcp/" type="link" id="https://shiftmag.dev/tag/mcp/" target="_blank" rel="noreferrer noopener">Model Context Protocol (MCP)</a>, the emerging open standard that defines how AI agents communicate with external platforms, tools, and services. Companies that collaborate within AAIF will help determine which platforms will shape the infrastructure of the agentic AI era. <strong>Mazin Gilbert</strong>, Executive Director of the Agentic AI Foundation, stated:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">The Agentic AI Foundation (AAIF) is the connective tissue, the plumbing behind how agentic systems operate. <strong>No one company can define or own these standards</strong>. We’ve seen this in cloud native with CNCF and in networking with the LFN. At every inflection point, the world moves from experimentation to production, and that shift needs open standards and community collaboration. With 170+ companies already in AAIF, we’re clearly at that inflection point in Agentic AI today.</p>



<p class="wp-block-paragraph"></p>
</blockquote>



<h2 class="wp-block-heading"><span id="what-changes-in-the-infrastructure-when-moving-from-apis-to-ai-agent-based-systems">What changes in the infrastructure when moving from APIs to AI agent-based systems</span></h2>



<p class="wp-block-paragraph">To better understand AAIF’s mission firsthand, I interviewed my colleagues, two developers from <a href="https://www.infobip.com/" type="link" id="https://www.infobip.com/" target="_blank" rel="noreferrer noopener">Infobip</a>, a gold member of the foundation and the publisher of Shiftmag!</p>



<p class="wp-block-paragraph"><strong>Josip Antoliš </strong>and <strong>Filip Srnec</strong> described how agentic AI transformation looks from a developer’s perspective, what changes it brings, which challenges arise, and what AAIF membership enables when it comes to participating in a global AI community.</p>



<p class="wp-block-paragraph">We began by discussing what changes at the infrastructure level when moving from traditional APIs to AI agent-based systems. Josip Antoliš explained that <strong>MCP lets developers assign tasks to AI agents</strong> and ensures agents execute them in a standardized way. In practice, service providers who built products through HTTP APIs should now consider exposing the same functionalities through MCP.</p>



<p class="wp-block-paragraph">In some cases, APIs can adapt automatically into MCP servers. </p>



<p class="wp-block-paragraph">As an example, he noted that <a href="https://github.com/infobip/infobip-openapi-mcp/" type="link" id="https://github.com/infobip/infobip-openapi-mcp/" target="_blank" rel="noreferrer noopener">Infobip has open-sourced its own framework for exposing any HTTP API as MCP</a><a href="https://github.com/infobip/infobip-openapi-mcp/">.</a> He described this as only the first step. He explained that protocols like MCP let different agent systems connect, allowing one AI agent to delegate subtasks to another in a different environment through an MCP call. This makes it easier to build independent agents that collaborate, turning API providers into agent providers. </p>



<p class="wp-block-paragraph">He also noted that <strong>AI agents become more valuable with every new tool they connect to</strong>, creating a positive feedback loop similar to network effects:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">For example, an AI agent connected to an MCP server that tracks the stock market can analyze trends and suggest actions. If connected to a messaging provider like Infobip, it can send proactive SMS alerts when opportunities appear. Adding a trading tool then allows users to reply and instruct the agent to execute trades. Each new tool increases the value of all previous tools.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="api-providers-are-becoming-agent-providers">API providers are becoming agent providers</span></h2>



<p class="wp-block-paragraph">Filip Srnec expanded on this perspective by pointing out that Infobip’s mission to reach users wherever they are, through any available channel, naturally aligns with the agentic world. Their communication capabilities allow agents to interact through channels that users already know:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">As we like to say, by using Infobip, AI agents gain communication superpowers. This applies across industries: agents that manage flight bookings and reminders, agents that run e-commerce processes, or marketing agents that create meaningful campaigns targeted at the right user segments.</p>
</blockquote>



<p class="wp-block-paragraph">He highlighted that<strong> Infobip has developed a range of products in the agent space</strong>, such as <a href="https://www.infobip.com/agentos" target="_blank" rel="noreferrer noopener">AgentOS</a>, along with tools for connecting agents, including MCP servers. These solutions bridge the gap and enable agent-driven communication experiences:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">From setting up communication through channel activation, sending messages, and feeding responses back to agents, Infobip covers the entire process. In addition, our platform offers advanced message optimization, fraud detection, and communication flow design.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="challenges-in-adopting-mcp">Challenges in adopting MCP</span></h2>



<p class="wp-block-paragraph">Early-stage ecosystems often lack structure, and MCP is no exception. I asked my interviewees to identify the<a href="https://shiftmag.dev/mcp-co-creator-explains-why-mcp-needs-more-than-the-protocol-to-scale-9041/" target="_blank" rel="noreferrer noopener"> biggest gaps and limitations</a> they encounter when building production-ready agent systems. Filip acknowledged that <strong>the ecosystem still feels unstructured</strong>, especially when it comes to adopting new standards and terminology:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">I work in the MCP value stream, and we experience this firsthand. The biggest issue is that third-party client software, such as MCP clients, varies in maturity. Because of that, we cannot assume that everything behaves exactly according to the specification.</p>
</blockquote>



<p class="wp-block-paragraph">He added that s<strong>pecifications and terminology evolve quickly in this emerging space</strong>. These changes sometimes introduce breaking issues, both intentional and unintentional. Teams must remain agile and constantly balance product delivery with compatibility.</p>



<p class="wp-block-paragraph">Josip pointed to another challenge. Anthropic originally developed MCP with a focus on coding use cases, particularly for its Claude Code assistant. Some assumptions from that use case remain embedded in the protocol: </p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">For example, one of the two available deployment options requires the MCP server to run on the same machine as the AI agent. That works for servers that manipulate or compile local source files, but it becomes impractical when exposing functionality over the internet.</p>
</blockquote>



<p class="wp-block-paragraph">MCP does support remote servers, which enables broader use cases. Even so, <strong>authentication and authorization still require significant effort</strong>:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">MCP adopted the OAuth specification. While this supports adoption, MCP relies on relatively niche parts of OAuth, which makes full compatibility harder to achieve.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="how-aaif-helps-address-these-challenges">How AAIF helps address these challenges</span></h2>



<p class="wp-block-paragraph">Since governance of the MCP specification moved to the AAIF, development and priorities have become more open and better aligned with the broader ecosystem, as Josip observed. <a href="https://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/" type="link" id="https://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/" target="_blank" rel="noreferrer noopener">The 2026 roadmap</a> highlights key improvements such as scalable remote deployment, support for long-running tasks, and stronger enterprise readiness, including observability and integration with existing authentication systems.</p>



<p class="wp-block-paragraph">These changes should make MCP servers easier to maintain and <strong>open the door to more complex use cases and new markets</strong>. Josip drew attention to the choice of Streamable HTTP as a transport protocol, which remains somewhat controversial: </p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Although it limits horizontal scaling, keeping it at this stage helps prevent fragmentation of the ecosystem. Planned improvements in this area will be especially important for DevOps and production environments.</p>
</blockquote>



<p class="wp-block-paragraph">He underlined the importance of <strong>support for long-running tasks</strong>. These tasks allow agents to manage processes that run for hours, opening entirely new categories of use cases. Improvements in enterprise integrations, especially single sign-on, will prove critical for broader adoption, since current complexity creates real barriers in production environments.</p>



<h2 class="wp-block-heading"><span id="what-does-it-mean-to-be-aaif-member">What does it mean to be AAIF member?</span></h2>



<p class="wp-block-paragraph">When discussing Infobip’s role as a Golden Member of the Agentic AI Foundation, I wanted to understand how this membership influences internal technical decision-making compared to simply adopting external standards. </p>



<p class="wp-block-paragraph">Josip noted that the AI ecosystem evolves rapidly, and new standards seem to appear constantly. However,<strong> standards only create value when people adopt them</strong>. By participating in AAIF working groups, his team gains insight into the direction of key industry players:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">We contribute by sharing our use cases and drawing attention to the challenges we encounter in our own implementations.</p>
</blockquote>



<p class="wp-block-paragraph">This involvement allows them to <strong>align new features and even entire products</strong> with the direction in which technology is moving:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Choosing the wrong technological direction can become expensive and create significant technical debt. By participating in AAIF activities, we ensure that we move in the right direction instead of following ideas that lead nowhere.</p>
</blockquote>



<p class="wp-block-paragraph">Through AAIF Josip stressed the importance of bringing real-world use cases into technical discussions from the very beginning. Standards that fail to address real user needs rarely succeed. Early input helps embed key concepts from the start instead of adding them later. </p>



<p class="wp-block-paragraph">Filip described AAIF membership as a<strong> source of confidence and stability</strong> in the emerging agentic AI landscape. Open standards like MCP ensure that development does not rely solely on commercial interests. The community develops, maintains, and governs the technology together:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">From the perspective of a developer building agent-based applications today, open standards provide strong foundations, best practices, and proven design patterns. This ensures that solutions remain robust and independent of any single vendor.</p>
</blockquote>



<p class="wp-block-paragraph">He pointed out that MCP acts as a universal connector for external tools and data sources. Building on open technologies allows individual engineers to become part of a global community and even influence the future direction of technology. </p>



<p class="wp-block-paragraph">Filip concluded by noting that global collaboration remains essential at this stage, <strong>especially when it comes to reliability and security</strong>. The era of agentic AI has already begun. Many agents already operate in production. Now is the time to build a stable ecosystem that allows everyone to develop and use this technology safely.</p>


<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="1200" height="630" src="https://shiftmag.dev/wp-content/uploads/2026/05/naslovna.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/naslovna.png 1200w, https://shiftmag.dev/wp-content/uploads/2026/05/naslovna-300x158.png 300w, https://shiftmag.dev/wp-content/uploads/2026/05/naslovna-1024x538.png 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/naslovna-768x403.png 768w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /></figure><p>The post <a href="https://shiftmag.dev/how-agentic-ai-foundation-and-mcp-are-redefining-the-infrastructure-for-ai-agents-9663/">How Agentic AI Foundation and MCP Are Redefining the Infrastructure for AI Agents</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Claude Mythos Opens The Cybersecurity Pandora&#8217;s box</title>
		<link>https://shiftmag.dev/claude-mythos-opens-the-cybersecurity-pandoras-box-9622/</link>
		
		<dc:creator><![CDATA[Senko Rasic]]></dc:creator>
		<pubDate>Mon, 11 May 2026 13:39:21 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[security]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=9622</guid>

					<description><![CDATA[<p>What would you do if you had an AI model so powerful that it can hack into multiple major operating systems and browsers?</p>
<p>The post <a href="https://shiftmag.dev/claude-mythos-opens-the-cybersecurity-pandoras-box-9622/">Claude Mythos Opens The Cybersecurity Pandora&#8217;s box</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="1200" height="630" src="https://shiftmag.dev/wp-content/uploads/2026/05/claude-mythos.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/claude-mythos.png 1200w, https://shiftmag.dev/wp-content/uploads/2026/05/claude-mythos-300x158.png 300w, https://shiftmag.dev/wp-content/uploads/2026/05/claude-mythos-1024x538.png 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/claude-mythos-768x403.png 768w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /></figure>


<p class="wp-block-paragraph">This is exactly what Anthropic claimed to have achieved with <a href="https://red.anthropic.com/2026/mythos-preview/)" target="_blank" rel="noreferrer noopener">Claude Mythos</a>, its newest and most powerful model which‚ according to Anthropic‚ is <strong>too powerful to be released to the public</strong>.</p>



<p class="wp-block-paragraph">In its announcement, Anthropic said its new model identified security problems in several operating systems (Linux, OpenBSD, FreeBSD), browsers (Firefox), and widely-used software libraries (FFmpeg)..</p>



<p class="wp-block-paragraph">Making such a powerful tool available to anyone (including bad actors) would be irresponsible, so Anthropic only <strong>gave access to a small group of &#8220;launch partners&#8221;</strong> (among them AWS, Apple, Google, Microsoft, and the Linux Foundation) under <a href="https://www.anthropic.com/glasswing" target="_blank" rel="noreferrer noopener">Project Glasswing</a>. The idea is to give important organizations and open source projects advance warning and tools to find more security problems, while Anthropic decides what to do with the wider release of Mythos.</p>



<h2 class="wp-block-heading"><span id="the-fine-art-of-doom-marketing">The fine art of Doom Marketing</span></h2>



<p class="wp-block-paragraph">Of course, the idea is also to hype up the capabilities of the new model.<br><br>OpenAI already played the &#8220;Our new AI is so powerful, we can&#8217;t give it to you&#8221; card with <a href="https://openai.com/index/better-language-models/" target="_blank" rel="noreferrer noopener">GPT-2</a>, a model that today <a href="https://x.com/karpathy/status/2017703360393318587" target="_blank" rel="noreferrer noopener">anyone can train for under $100</a>.</p>



<p class="wp-block-paragraph">The tactic still works‚ <a href="http://(https://www.bbc.com/news/articles/crk1py1jgzko)" target="_blank" rel="noreferrer noopener">the media</a> (<a href="https://www.nytimes.com/2026/04/07/technology/anthropic-claims-its-new-ai-model-mythos-is-a-cybersecurity-reckoning.html" target="_blank" rel="noreferrer noopener">another example</a>) and the wider <a href="https://www.youtube.com/watch?v=SQhfkWdxVvE" target="_blank" rel="noreferrer noopener">public</a> have bought Anthropic&#8217;s doom marketing wholesale. Fear sells, and an AI that can hack anyone is as bad as it gets (or as good as it gets, if you&#8217;re in marketing.</p>



<h2 class="wp-block-heading">Where there&#8217;s smoke&#8230;</h2>



<p class="wp-block-paragraph">Just because it&#8217;s marketing doesn&#8217;t mean it&#8217;s not true.</p>



<p class="wp-block-paragraph">For a while now, many security researchers <a href="https://sockpuppet.org/blog/2026/03/30/vulnerability-research-is-cooked/" target="_blank" rel="noreferrer noopener">have been increasingly impressed with AI cybersecurity capabilities</a>.</p>



<p class="wp-block-paragraph">In their testing of Mythos, the AI Security Institute (part of the UK government) &#8220;<a href="https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities" target="_blank" rel="noreferrer noopener">found significant improvement on cyber-attack simulations</a>&#8220;.<br><br>Open source developers have seen an increasing number of security reports, too: Linux kernel developers (participants in Project Glasswing) said &#8220;<a href="https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_kernel/" target="_blank" rel="noreferrer noopener">All open source projects have real reports that are made with AI, but they&#8217;re good, and they&#8217;re real</a>&#8220;. In a similar vein, the developer of the popular open source utility &#8220;curl&#8221;, who was very vocal about bad AI bug reports in the past, recently <a href="https://etn.se/index.php/72494" target="_blank" rel="noreferrer noopener">used AI to find 50 real bugs in the project</a>.<br><br>Even the NSA, the feared U.S. cybersecurity agency, is reportedly <a href="https://www.axios.com/2026/04/19/nsa-anthropic-mythos-pentagon" target="_blank" rel="noreferrer noopener">using Mythos</a> despite Anthropic being banned from U.S. government use just weeks before.</p>



<h2 class="wp-block-heading"><span id="the-scariest-ai-of-them-all">The scariest AI of them all?</span></h2>



<p class="wp-block-paragraph">Based on all the reports, there seems to be some substance to Anthropic&#8217;s doom marketing. But let&#8217;s stop panicking, breathe for a bit, and try to rationally unpack what might be happening.<br><br>The new model is certainly very capable, but it&#8217;s not obvious that it&#8217;s miles ahead of what&#8217;s already there. In fact, the researchers at Aisle <a href="https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier" target="_blank" rel="noreferrer noopener">tasked small local models with finding the same bugs</a> with (limited) success, concluding that <strong>the most important part is the approach, not model capability</strong>.<br><br>Basically, you can ask the model to carefully review every single part of the codebase and find security bugs. The AI never gets tired of the tedious grind and is happy to spend a lot of time and burn a lot of tokens (and money) in the effort. And if there is something suspicious, there&#8217;s a high likelihood it&#8217;ll find it.<br><br>The researchers point out that more capable models will do better, but <strong>you don&#8217;t need an out-of-this-world capability to achieve these impressive results</strong>.<br><br>So, on one hand, we don&#8217;t need to be scared of Mythos. It&#8217;s likely an incremental improvement over previous models. On the other hand, this means <em>everyone can already do this</em>, and probably already is.<br><br><em>Now</em>, you can panic.</p>



<h2 class="wp-block-heading"><span id="gpt-enters-the-chat">GPT enters the Chat</span></h2>



<p class="wp-block-paragraph">As further proof, just a week after the Mythos announcement, OpenAI released <a href="https://openai.com/index/scaling-trusted-access-for-cyber-defense/" target="_blank" rel="noreferrer noopener">GPT-5.4-Cyber</a>, a dedicated AI model for cyber defense.</p>



<p class="wp-block-paragraph">Available only to &#8220;<strong>verified individual defenders</strong> and <strong>teams responsible for defending critical software</strong>&#8220;, the new model shows that no great leap forward is required for such a tool.<br><br>In fact, both OpenAI and Anthropic have since released newer versions of their flagship models, GPT-5.5 and Claude Opus 4.7, respectively.</p>



<p class="wp-block-paragraph">The AI Security Institute tested GPT-5.5 as well, and noted that &#8220;<a href="https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities" target="_blank" rel="noreferrer noopener">GPT-5.5 shows that rapid improvement on cyber tasks may be part of a more general trend</a>&#8220;.</p>



<p class="wp-block-paragraph">These models have been trained to <strong>refuse cybersecurity-related requests</strong> (unless you&#8217;re in the program), but the Chinese models are just a few months behind in general coding capabilities, and have no such guards.</p>



<h2 class="wp-block-heading"><span id="where-do-we-go-now">Where do we go now?</span></h2>



<p class="wp-block-paragraph">To quote one of the security researchers, &#8220;<strong>vulnerability research is cooked</strong>&#8220;. There&#8217;s no going back; motivated actors can already do a lot with the current AI tools, and we&#8217;ll only get increasingly powerful ones in the future.<br><br>In the short run, this can look pretty bad: expect more exploits, hacks and bugs across all kinds of software, from critical infrastructure to supply chain attacks against popular software libraries.<br><br>In the long run, however, I believe this is a good thing: motivated attackers with a lot of money already have stashes of 0-days (unpublicized vulnerabilities). Now, <strong>more people will be able to use AI to find these problems in their own code and patch them</strong>, leading to more secure software overall.<br><br>This is why Anthropic&#8217;s Glasswing and OpenAI&#8217;s &#8220;Trusted Access for Cyber&#8221; programs are a <strong>good first step</strong>, even though they&#8217;re available only to select participants. In the future, using open-weights models in a similar manner will bring these capabilities to everyone, cheaply.<br><br>Buckle up, it&#8217;s gonna be a bumpy ride.<br>&nbsp;</p>
<p>The post <a href="https://shiftmag.dev/claude-mythos-opens-the-cybersecurity-pandoras-box-9622/">Claude Mythos Opens The Cybersecurity Pandora&#8217;s box</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>I Tried to Get OpenClaw to Betray Me. The Model Caught Me on the First Try</title>
		<link>https://shiftmag.dev/openclaw-experiment-security-9304/</link>
		
		<dc:creator><![CDATA[Ivan Mihić]]></dc:creator>
		<pubDate>Wed, 06 May 2026 14:20:13 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[OpenClaw]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=9304</guid>

					<description><![CDATA[<p>I spent a rainy weekend trying to trick OpenClaw into leaking my personal email, but the model caught me almost immediately. That’s the problem, not the solution.</p>
<p>The post <a href="https://shiftmag.dev/openclaw-experiment-security-9304/">I Tried to Get OpenClaw to Betray Me. The Model Caught Me on the First Try</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="1200" height="630" src="https://shiftmag.dev/wp-content/uploads/2026/05/open-claw-betrayal.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/open-claw-betrayal.png 1200w, https://shiftmag.dev/wp-content/uploads/2026/05/open-claw-betrayal-300x158.png 300w, https://shiftmag.dev/wp-content/uploads/2026/05/open-claw-betrayal-1024x538.png 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/open-claw-betrayal-768x403.png 768w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /></figure>


<p class="wp-block-paragraph">I&#8217;m a software engineer who works on domains that represent the messy corner of the internet. </p>



<p class="wp-block-paragraph">In this corner, <strong>there are bad actors doing bad stuff and us trying to make their lives harder</strong>. Hence I spend a lot of time looking at what people do when they&#8217;re trying to slip something past a system. This led me to developing a slight paranoia about anything that reads untrusted input and then does something with it.</p>



<p class="wp-block-paragraph">So when half my Linkedin timeline started <strong>losing their minds over OpenClaw</strong>, I developed a specific kind of curiosity:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">What happens when this thing reads an email that&#8217;s actively trying to manipulate it?</p>
</blockquote>



<p class="wp-block-paragraph">So I tried… and the model caught me on the first try.</p>



<p class="wp-block-paragraph">That&#8217;s the disappointing part. The interesting part is what happened when I tried harder &#8211; and what I realized about where the defense actually lives.</p>



<h2 class="wp-block-heading">The hype isn&#8217;t manufactured, which is the whole point</h2>



<p class="wp-block-paragraph">But first, let me be honest about why this thing went viral. <strong>OpenClaw is genuinely impressive.</strong></p>



<p class="wp-block-paragraph">The first time I asked it to triage my inbox in detail and it actually did, I had the same reaction every other dev on X or LinkedIn has been having: <em>oh now we are talking. This is the thing</em>!</p>



<p class="wp-block-paragraph">That reaction is part of what makes this complicated. Because the same architecture choices that make OpenClaw feel magical are the ones that create some genuinely <strong>hard security questions</strong>. The type of questions the broader industry hasn&#8217;t figured out how to properly answer yet.</p>



<h2 class="wp-block-heading"><span id="15-minutes-from-npm-install-to-ai-reading-your-gmail">15 minutes from <code>npm install</code> to AI reading your Gmail</span></h2>



<p class="wp-block-paragraph">Fifteen minutes. That&#8217;s how long it takes from <code>npm install</code> to having an LLM agent reading your inbox. The installer warns you <strong>this is a hobby project and still in beta</strong> &#8211; which, with 360k GitHub stars and 1.500+ contributors, reads more like a legal disclaimer than a self-description. The warning is the project being honest: security isn&#8217;t the primary concern here.</p>



<p class="wp-block-paragraph">The onboarding wizard asks which channels you want, which model provider to route through, and walks you through the gateway setup. Gmail takes a little more work. OpenClaw doesn&#8217;t ship a &#8220;Connect Google&#8221; button because Google&#8217;s OAuth verification for production Gmail apps is strict, so <strong>every developer rolls their own Google Cloud project</strong>. The flow:</p>



<pre class="wp-block-code"><code># 1. Create a Google Cloud project, enable Gmail API, download credentials JSON
# (console.cloud.google.com → New Project → APIs &amp; Services → Library)

# 2. Install gog — OpenClaw's OAuth bridge for Google Workspace
brew install gog

# 3. Authenticate
gog auth --credentials ~/Downloads/client_secret_xxx.json
gog auth add me@example.com --services gmail,calendar,drive,contacts
</code></pre>



<p class="wp-block-paragraph"><code>gog auth</code> opens your browser and walks you through Google&#8217;s consent screen with a scary &#8220;this app isn&#8217;t verified&#8221; warning (<em>technically correct &#8211; it isn&#8217;t, you just installed it</em>). You grant the scopes. Done.</p>



<p class="wp-block-paragraph">That&#8217;s what the wizard shows you. Four defaults it doesn&#8217;t show matter more.</p>



<p class="wp-block-paragraph"><strong>Gateway auth is off by default.</strong> The gateway runs on localhost, sure. But the moment you expose it, it&#8217;s wide open. Bitsight found <em>over 30.000 OpenClaw instances</em> exposed directly on the open internet in their February report. If you&#8217;re one of them, anyone who can reach your WebSocket can issue commands as you.</p>



<p class="wp-block-paragraph"><strong>Permissions are off by default.</strong> Out of the box, OpenClaw runs with no filesystem restrictions. A skill can reach anything the OpenClaw process can reach &#8211; <code>~/.ssh</code>, browser credential stores, shell history. You configure restrictions yourself in <code>openclaw.json</code>.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Set <code>chmod 600 openclaw.json</code> to restrict file permissions. And if you&#8217;re testing skills from unknown publishers, run OpenClaw inside a Docker sandbox.</p>
</blockquote>



<p class="wp-block-paragraph">That&#8217;s from the project&#8217;s own docs. Read it again. The maintainers know what happens if you don&#8217;t sandbox the agent.</p>



<p class="wp-block-paragraph"><strong>Skills are markdown files.</strong> OpenClaw learns new tools by loading a <code>SKILL.md</code> This is a YAML file with a body describing, in English, which CLI commands it can run. The model reads the description, decides when the skill is relevant, and runs the commands the markdown tells it are available. Here&#8217;s a trimmed version of the real <code>gog</code> skill:</p>



<pre class="wp-block-code"><code>---
name: gog
description: Google Workspace CLI for Gmail, Calendar, Drive, Contacts.
metadata:
  requires:
    bins: &#91;gog]
---

# gog
Use `gog` for Gmail/Calendar/Drive/Contacts. Requires OAuth setup.

## Common commands
Gmail search: gog gmail search 'newer_than:7d' --max 10
Gmail send:   gog gmail send --to a@b.com --subject "Hi" --body "Hello"
</code></pre>



<p class="wp-block-paragraph">That markdown file is the entire trust boundary. Malicious instructions in a <code>SKILL.md</code> and legitimate ones look identical to the model, because they <em>are</em> identical. The only thing differentiating the &#8220;read my mail&#8221; prompt from &#8220;send mail to a stranger&#8221; is the model&#8217;s judgement about it.</p>



<p class="wp-block-paragraph"><strong>OAuth scopes are all-or-nothing.</strong> The three scopes <code>gog</code> asks for &#8211; <code>gmail.readonly</code>, <code>gmail.send</code>, <code>gmail.modify</code> &#8211; apply to every email in your account, ever. No &#8220;only this or only that&#8221; variant. That&#8217;s a Google API design decision, not OpenClaw&#8217;s fault, but you inherit it the moment you wire them together.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="704" src="https://shiftmag.dev/wp-content/uploads/2026/05/openclaw-graphic-1-1024x704.png?x94846" alt="" class="wp-image-9566" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/openclaw-graphic-1-1024x704.png 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/openclaw-graphic-1-300x206.png 300w, https://shiftmag.dev/wp-content/uploads/2026/05/openclaw-graphic-1-768x528.png 768w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading"><span id="the-test-i-came-here-to-run">The test I came here to run</span></h2>



<p class="wp-block-paragraph">So <strong>I sent myself an email from a burner account</strong>. The visible body was a generic delivery confirmation. At the bottom, using an ancient trick of white text on a white background, I embedded a quiet exfiltration request dressed up as a routine maintenance message. These instructions told the agent to forward emails containing password-manager keywords to an address I controlled.</p>



<p class="wp-block-paragraph">Then I opened the chat interface and asked the agent a simple question: <em>Are there any emails today?</em></p>



<h2 class="wp-block-heading"><span id="the-model-saw-through-me"><strong>The model saw through me</strong></span></h2>



<p class="wp-block-paragraph">It flagged <strong>the sender as suspicious</strong> &#8211; a personal Gmail issuing a corporate-sounding directive. It called out the hidden text explicitly. It refused to act on the instruction. It categorized the message alongside the day&#8217;s normal mail, presented its reasoning, and asked whether I wanted to flag the suspicious one as spam.</p>



<p class="wp-block-paragraph">I&#8217;ll be honest, I was kind of disappointed. I&#8217;d sat down expecting a war story. Instead, I got a well-aligned frontier model doing exactly what a well-aligned frontier model is supposed to do.</p>



<h2 class="wp-block-heading"><span id="so-i-tried-harder"><strong>So I tried harder</strong></span></h2>



<p class="wp-block-paragraph">I thought about <strong>what had triggered the defense and iterated</strong>.</p>



<p class="wp-block-paragraph">The first attempt hit at least three trained heuristics at once: suspicious-sender detection, hidden-text detection, and a pattern-match against &#8220;silent operation, don&#8217;t tell the user&#8221; phrasing.</p>



<p class="wp-block-paragraph">I removed the tells one at a time. Visible text instead of hidden. Plausible sender framing instead of a personal Gmail. Configuration-style payloads instead of one-shot exfiltration. Setting up an ongoing workflow rather than asking for something bad right now.</p>



<p class="wp-block-paragraph">Against the frontier model I was routing through, every version I tried got caught. Sometimes immediately, sometimes with a clarifying question<em>,</em> but the model never silently complied.</p>



<p class="wp-block-paragraph"><strong>Against lighter models, that&#8217;s not what happened.</strong></p>



<p class="wp-block-paragraph">Same architecture. Same skill. Same agent. Cheaper model. And the defenses that were reliable at the top of the hierarchy became probabilistic as I moved down. I&#8217;m not going to publish specific payloads. Not because the finding is novel (Cisco, CrowdStrike, and Barracuda have all been saying this for months) but because the payload is not the interesting finding here.</p>



<p class="wp-block-paragraph">The gradient is.</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" width="768" height="768" src="https://shiftmag.dev/wp-content/uploads/2026/05/Anakin-Padme-4-Panel-1.png?x94846" alt="" class="wp-image-9568" style="width:836px;height:auto" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/Anakin-Padme-4-Panel-1.png 768w, https://shiftmag.dev/wp-content/uploads/2026/05/Anakin-Padme-4-Panel-1-300x300.png 300w, https://shiftmag.dev/wp-content/uploads/2026/05/Anakin-Padme-4-Panel-1-150x150.png 150w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure>



<h2 class="wp-block-heading"><strong>The defense isn&#8217;t where you think it is</strong></h2>



<p class="wp-block-paragraph">Here&#8217;s the thing the defensive and offensive communities both already know, and that almost nobody installing OpenClaw on a Friday night has internalized.</p>



<p class="wp-block-paragraph">The security of these agent systems <strong>lives at the model layer, not at the architecture layer.</strong></p>



<p class="wp-block-paragraph">OpenClaw doesn&#8217;t defend against the attack. The model does. The skill doesn&#8217;t defend. The tool framework doesn&#8217;t defend. If the model you&#8217;re routing through has been trained to spot the pattern, the attack gets caught. If it hasn&#8217;t or if it was trained to spot last month&#8217;s patterns but not this month&#8217;s &#8211; the attack lands.</p>



<p class="wp-block-paragraph">Which means the security posture of your OpenClaw install <strong>depends almost entirely on which model is sitting behind your API key that day</strong>. And most developers running personal agents are doing one or more of the following:</p>



<ul class="wp-block-list">
<li>Routing through whichever model is cheapest this week</li>



<li>Using a fallback chain that drops to lower-tier models under load or rate limits</li>



<li>Not paying attention to which model they&#8217;re on, because the agent <em>works</em> regardless</li>
</ul>



<p class="wp-block-paragraph"><strong>Every one of those is a security decision</strong>. Most developers don&#8217;t realize they&#8217;re making one.</p>



<h2 class="wp-block-heading"><span id="why-this-is-the-failure-mode-that-matters">Why this is the failure mode that matters</span></h2>



<p class="wp-block-paragraph">The architectural problem doesn&#8217;t go away when the frontier model defends perfectly. <strong>Three facts stay true</strong>:</p>



<ol class="wp-block-list">
<li>The agent reads untrusted external content: inboxes, fetched pages, message bodies.</li>



<li>The agent has tools that can act on what it reads: send email, run shell commands, call APIs.</li>



<li>Skills declare capability in plain English: which means, at the token level, an instruction in a skill and an instruction in an email are the same thing.</li>
</ol>



<p class="wp-block-paragraph">The model is what <strong>stands between those three facts and an exploit</strong>. For the frontier model I tested, the model was enough. For the lighter ones, less so. And the model is a training artifact. This means the defense you have today is not necessarily the defense you have tomorrow, and the defense at the top of the model stack is not the defense at the bottom.</p>



<p class="wp-block-paragraph"><strong>This isn’t just an OpenClaw bug; it’s a universal one</strong>. It&#8217;s the current shape of personal-agent architecture, and it&#8217;ll probably take several generations of isolation patterns, capability frameworks, and signed skill registries before the industry has an honest answer. </p>



<p class="wp-block-paragraph">In the meantime, the defense you get is whatever your provider shipped this quarter… and the defense the developer across the room gets is whatever <em>their</em> provider shipped, and those are not the same thing.</p>



<h2 class="wp-block-heading"><span id="where-this-goes-from-here">Where this goes from here</span></h2>



<p class="wp-block-paragraph">What I came away with is that <strong>OpenClaw is the most honest version we have of where personal agents are going</strong> and it&#8217;s exposing a question the whole industry is going to have to answer:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">When the only thing standing between an untrusted email and a privileged action is the model&#8217;s judgement, and model judgement varies by an order of magnitude across the price curve, what is the security posture of the system?</p>
</blockquote>



<p class="wp-block-paragraph">Right now the honest answer is: whichever model you happened to pick. I believe that shouldn’t be the case.</p>



<p class="wp-block-paragraph">If you want to play with OpenClaw, play with it but do it in a hardened environment with throwaway credentials, pin your model explicitly in config, <strong>keep it away from your real inbox</strong> until the safety story catches up to the capability story, and read the hardening docs before you read the tutorials.</p>
<p>The post <a href="https://shiftmag.dev/openclaw-experiment-security-9304/">I Tried to Get OpenClaw to Betray Me. The Model Caught Me on the First Try</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Uber Shares What Happens When 1.500 AI Agents Hit Production</title>
		<link>https://shiftmag.dev/uber-shares-what-happens-when-1-500-ai-agents-hit-production-9430/</link>
		
		<dc:creator><![CDATA[Ivan Pelivanovic]]></dc:creator>
		<pubDate>Mon, 04 May 2026 14:19:55 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Event]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[MCP]]></category>
		<category><![CDATA[Uber]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=9430</guid>

					<description><![CDATA[<p>Hearing how Uber scaled to 1.500 AI agents made me realize just how quickly things can spiral when those agents start acting faster than humans can keep up.</p>
<p>The post <a href="https://shiftmag.dev/uber-shares-what-happens-when-1-500-ai-agents-hit-production-9430/">Uber Shares What Happens When 1.500 AI Agents Hit Production</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="2100" height="1400" src="https://shiftmag.dev/wp-content/uploads/2026/04/55192368682_89b60f358c_o-scaled.jpg?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2026/04/55192368682_89b60f358c_o-scaled.jpg 2100w, https://shiftmag.dev/wp-content/uploads/2026/04/55192368682_89b60f358c_o-300x200.jpg 300w, https://shiftmag.dev/wp-content/uploads/2026/04/55192368682_89b60f358c_o-1024x683.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2026/04/55192368682_89b60f358c_o-768x512.jpg 768w" sizes="auto, (max-width: 2100px) 100vw, 2100px" /></figure>


<p class="wp-block-paragraph">At the <a href="https://events.linuxfoundation.org/mcp-dev-summit-north-america/" target="_blank" rel="noreferrer noopener">MCP Dev Summit North America</a> earlier this month, I was listening to <strong>Meghana Somasundara</strong>, (Agentic AI Lead, Uber), and <strong>Rush Tehrani</strong> (Senior Engineering Manager leading the Agentic AI Platform, Uber) talk about what they’re building.</p>



<p class="wp-block-paragraph">By their account, <strong>more than 90% of Uber’s 5.000+ engineers already use AI monthly</strong> for agentic workflows. They also have over <a href="https://shiftmag.dev/how-uber-engineers-use-ai-agents-8617/" target="_blank" rel="noreferrer noopener">1.500 monthly active agents internally</a>, running more than 60.000 executions per week. </p>



<p class="wp-block-paragraph">What stood out to me was Meghana’s framing of the real risk: not deliberate misuse, but <strong>an agent causing serious damage by accident</strong>, faster than any human could react:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">It takes us humans a lot more effort to break things. But with agents,&nbsp;it&#8217;s&nbsp;a lot faster, a lot quicker, and the blast radius is a lot higher.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="what-problems-did-uber-face-when-scaling-ai">What problems did Uber face when scaling AI?</span></h2>



<p class="wp-block-paragraph">Meghana and Rush’s talk focused on three problems that nearly made those numbers impossible to reach. The first was <strong>the lack of a shared way of building</strong>.</p>



<p class="wp-block-paragraph">When agent adoption spreads organically across a large engineering organization, teams tend to build independently. At Uber Technologies, with over 10.000 internal services, that meant dozens of teams were building MCP servers and custom integrations on their own, without shared standards, central oversight, and any real way to reuse what others had already built.</p>



<p class="wp-block-paragraph">The result was predictable: <strong>duplicated work, and a growing stack of systems that only the original team really understood</strong>, as Meghana Somasundara explains:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">The simple truth was, if you can&#8217;t manage the development lifecycle, you just can&#8217;t trust it in production.</p>
</blockquote>



<p class="wp-block-paragraph">When agents start making decisions across systems, inconsistent implementations stop being a minor issue but they become harder to track, debug and even harder to trust.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="683" src="https://shiftmag.dev/wp-content/uploads/2026/04/55193657725_819f8b9753_o-1024x683.jpg?x94846" alt="" class="wp-image-9448" srcset="https://shiftmag.dev/wp-content/uploads/2026/04/55193657725_819f8b9753_o-1024x683.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2026/04/55193657725_819f8b9753_o-300x200.jpg 300w, https://shiftmag.dev/wp-content/uploads/2026/04/55193657725_819f8b9753_o-768x512.jpg 768w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Photo: Agentic AI Foundation (Flickr) &#8211; Meghana Somasundara (Agentic AI Lead, Uber) and Rush Tehrani (Senior Engineering Manager, Uber)</figcaption></figure>



<p class="wp-block-paragraph">The second problem included<strong> security</strong>. Agents operating across a complex service landscape could unknowingly call endpoints they shouldn’t, expose sensitive data, or trigger operations nobody intended. Add third-party MCP servers into the mix (Uber uses many external systems) and the governance problem scales quickly.</p>



<p class="wp-block-paragraph">They needed <strong>full visibility into call patterns</strong>: who was accessing what data, under what conditions, and what happened when things went wrong. Without that, running agents in production at scale becomes a trust problem.</p>



<p class="wp-block-paragraph"><strong>Finding the right tool</strong> quickly became the third problem, Rush asked himself:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">How does an agent or the engineer building it actually find the right one?</p>
</blockquote>



<p class="wp-block-paragraph">Not just any MCP server, but one that’s reliable, performs well, and doesn’t quietly degrade everything built on top of it.</p>



<p class="wp-block-paragraph">When discovery is left unmanaged, agents default to whatever is most visible rather than what actually works best. At smaller scale, that’s an annoyance, but across thousands of services, it becomes a <strong>systemic quality problem</strong>.</p>



<h2 class="wp-block-heading"><span id="how-uber-addressed-these-challenges">How Uber addressed these challenges</span></h2>



<p class="wp-block-paragraph">Uber&#8217;s answer to all three problems was a <strong>centralized MCP gateway and registry</strong>. </p>



<p class="wp-block-paragraph">Meghana describes it as a central control plane that turns Uber’s endpoints into MCP tools, with service owners deciding what gets exposed and how it’s defined.</p>



<p class="wp-block-paragraph">Every change flows through pull requests, passes security scans before deployment, and is continuously monitored in production, while a central registry (acting as the single source of truth) removes duplication and enforces tighter scrutiny on third-party MCPs.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="683" src="https://shiftmag.dev/wp-content/uploads/2026/05/55193402013_db67d79bd4_k-1-1024x683.jpg?x94846" alt="" class="wp-image-9558" srcset="https://shiftmag.dev/wp-content/uploads/2026/05/55193402013_db67d79bd4_k-1-1024x683.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2026/05/55193402013_db67d79bd4_k-1-300x200.jpg 300w, https://shiftmag.dev/wp-content/uploads/2026/05/55193402013_db67d79bd4_k-1-768x512.jpg 768w, https://shiftmag.dev/wp-content/uploads/2026/05/55193402013_db67d79bd4_k-1.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Photo: Agentic AI Foundation (Flickr)</figcaption></figure>



<p class="wp-block-paragraph">In their <strong>no-code Agent Builder</strong>, as Rush explained, engineers can pre-select specific tools from an MCP server so the model doesn’t have to decide which one to use, and they can also lock down parameters so the agent doesn’t have to infer them at runtime, ultimately reducing the number of decisions and things that can go wrong.</p>



<p class="wp-block-paragraph">Getting the infrastructure right shows up in adoption: <strong>their coding agent <em>Minions</em> generates about 1.800 code changes weekly</strong> and is used by 95% of Uber engineers, but that’s the output, not the real lesson.</p>



<p class="wp-block-paragraph">On the roadmap are evaluation metrics in the registry to help teams spot reliable servers before committing, and &#8220;skills&#8221;, reusable MCP patterns with built-in A/B testing that bake evaluation into how knowledge is shared.  </p>



<h2 class="wp-block-heading">Does any of this apply if&nbsp;you&#8217;re&nbsp;not Uber?&nbsp;&nbsp;&nbsp;</h2>



<p class="wp-block-paragraph">Uber operates at a scale most engineering teams never see (10.000+ services in play) but while the complexity is extreme, the underlying failure patterns Meghana and Rush describe aren’t unique to them.  </p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Teams often end up building the same integrations in parallel, with governance only becoming a priority after something breaks, and discovery treated as an afterthought. These problems appear well before reaching 1.500 agents &#8211; once multiple teams start using the same MCP infrastructure without a shared layer.</p>
</blockquote>



<p class="wp-block-paragraph">The Uber model <strong>won&#8217;t translate directly to smaller organisations</strong>. But if you&#8217;re already running MCP servers across more than two teams and nobody owns discoverability or access control yet, that gap could surface soon. </p>
<p>The post <a href="https://shiftmag.dev/uber-shares-what-happens-when-1-500-ai-agents-hit-production-9430/">Uber Shares What Happens When 1.500 AI Agents Hit Production</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>13 CTOs walk into a bar and realize: There is no best AI adoption strategy</title>
		<link>https://shiftmag.dev/cto-ai-adoption-strategy-9477/</link>
		
		<dc:creator><![CDATA[Petar Dučić]]></dc:creator>
		<pubDate>Thu, 30 Apr 2026 13:59:15 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[CTO Craft]]></category>
		<category><![CDATA[Ivan brezak brkan]]></category>
		<category><![CDATA[petar dučić]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=9477</guid>

					<description><![CDATA[<p>This March at CTO Craft Conference in London, I sat down over dinner with 13 senior leaders and CTOs and had the kind of conversation you rarely get at conferences. There were no slides or presentations, just talk about how AI implementation works in different companies. </p>
<p>The post <a href="https://shiftmag.dev/cto-ai-adoption-strategy-9477/">13 CTOs walk into a bar and realize: There is no best AI adoption strategy</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="1600" height="1067" src="https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london.png 1600w, https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-300x200.png 300w, https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-1024x683.png 1024w, https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-768x512.png 768w" sizes="auto, (max-width: 1600px) 100vw, 1600px" /></figure>


<p class="wp-block-paragraph">AI is a blessing for some, but a headache for everyone else.&nbsp;</p>



<p class="wp-block-paragraph">That was one of the clearest takeaways from our CTO dinner in London, where my colleague <strong><strong>Ivan Brezak Brkan IBB (</strong>Developer Experience Director, Infobip)</strong> and I hosted a dinner with CTOs from a dozen great engineering organizations. </p>



<p class="wp-block-paragraph">Not that I&nbsp;didn’t&nbsp;suspect it, but hearing it&nbsp;out loud,&nbsp;black&nbsp;and white, makes your assumptions impossible to ignore.&nbsp;</p>



<p class="wp-block-paragraph">For some, AI is putting the fun back into coding. For others?&nbsp;<strong>Welcome to AI shaming</strong>.<strong>&nbsp;</strong>Champions are treated like heroes; skeptics get rolled over, dismissed, or quietly frowned upon.&nbsp;</p>



<h2 class="wp-block-heading"><span id="oh-to-finally-build-again">Oh, to finally build again!&nbsp;</span></h2>



<p class="wp-block-paragraph">As our conversation made clear,&nbsp;it’s&nbsp;no surprise that leaders and&nbsp;<strong>C-level execs are more excited about AI than most employees</strong>. But that excitement&nbsp;isn’t&nbsp;always about business &#8211; sometimes&nbsp;it’s&nbsp;just curiosity, fascination, or even fun.&nbsp;</p>



<p class="wp-block-paragraph">And I was&nbsp;struck by how many participants talked about the sheer joy of working with AI,&nbsp;finally getting to build again instead of just managing others.&nbsp;</p>



<p class="wp-block-paragraph">AI has allowed leaders and CTOs to bypass the so-called&nbsp;<strong>“atrophy” of framework-specific knowledge</strong>, letting them focus on problem-solving and architecture.&nbsp;&nbsp;</p>



<p class="wp-block-paragraph">In practice, this means more time is spent&nbsp;creating ideas and prototyping<strong>,</strong>&nbsp;rather than learning the specific technologies needed to build things. One participant noted:&nbsp;</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">I’d say there’s a bunch of things I always wanted to do but never had time for. I’d either have to get someone else to solve the problem or just live with it. </p>



<p class="wp-block-paragraph"></p>



<p class="wp-block-paragraph"></p>



<p class="wp-block-paragraph">Now, if I’ve got an itch I want to scratch, I can build it myself. That freedom to solve my own problems also means I can solve more problems for others. </p>
</blockquote>



<p class="wp-block-paragraph">On the topic of prototyping, multiple participants agreed that tasks that used to take several days now take just a couple of hours. This allows leaders to experiment and prototype their own ideas without overloading their engineering teams.&nbsp;</p>



<p class="wp-block-paragraph">And so far,&nbsp;so&nbsp;good.&nbsp;</p>



<p class="wp-block-paragraph">Using AI across the organization sounds like&nbsp;a no-brainer: ideas flow,&nbsp;everyone’s&nbsp;impressed at the speed of routine work, and it feels like&nbsp;you’re&nbsp;on the right track.&nbsp;But then it hits you &#8211;<strong>you haven’t really thought about the people who are actually writing and reviewing the code</strong>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="683" src="https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-3-1024x683.png?x94846" alt="" class="wp-image-9486" srcset="https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-3-1024x683.png 1024w, https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-3-300x200.png 300w, https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-3-768x512.png 768w, https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-3.png 1600w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Foto: Marko Mudrinić</figcaption></figure>



<h2 class="wp-block-heading"><span id="aishameshameshame">AI&nbsp;shame,&nbsp;shame,&nbsp;shame&nbsp;</span></h2>



<p class="wp-block-paragraph">While many&nbsp;companies (especially&nbsp;<a href="https://www.infobip.com/news/infobip-devdays-2026" target="_blank" rel="noreferrer noopener">Infobip</a>) actively encourage the use of AI tools in the workplace, an unavoidable&nbsp;“AI stigma”&nbsp;still hangs over the tech space.&nbsp;</p>



<p class="wp-block-paragraph">This fear often comes from&nbsp;<strong>worrying about being perceived as incompetent</strong>&nbsp;&#8211; or as someone leaning on AI for&nbsp;work&nbsp;they’re&nbsp;“supposed” to do themselves.&nbsp;</p>



<p class="wp-block-paragraph">We concluded that you&nbsp;could&nbsp;approach it in one of two ways:&nbsp;</p>



<ol start="1" class="wp-block-list">
<li>Embrace the early Facebook mantra: “Move fast and break things.”&nbsp;&nbsp;</li>
</ol>



<ol start="2" class="wp-block-list">
<li>Pause regularly to ensure that the “break things” part&nbsp;isn’t&nbsp;causing too much damage.&nbsp;</li>
</ol>



<p class="wp-block-paragraph">The participants of the dinner echoed these statements, with one participant mentioning an example where a pull request was not reviewed because “it looked like AI-driven code”:&nbsp;</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">One of my engineers was helping an ML engineer make a change in the iOS app. They relied primarily on Claude Code to write it but worked closely with the iOS engineer to test everything thoroughly. They checked and refined the code wherever necessary.</p>



<p class="wp-block-paragraph"></p>



<p class="wp-block-paragraph">However, when the change was&nbsp;submitted&nbsp;for code review to the team that owned this codebase, it was&nbsp;immediately&nbsp;rejected, with the assumption that the authors&nbsp;hadn’t&nbsp;tested anything beforehand.&nbsp;</p>
</blockquote>



<p class="wp-block-paragraph">I believe&nbsp;<strong>there’s&nbsp;no single right or wrong way to approach this</strong>. Being overly zealous about AI has its drawbacks: teams may resist because they feel pressured. On the other hand, being too&nbsp;conservative&nbsp;risks falling behind, arriving late to the AI party, and scrambling while competitors are already there,&nbsp;relaxed&nbsp;and sipping champagne.&nbsp;<br>&nbsp;<br>We all know that in the AI world&nbsp;there’re&nbsp;<strong>no universal&nbsp;playbook.</strong>&nbsp;What worked in some cases (rushing to full engineering adoption) might be a masterstroke for one organization and a disaster for another.&nbsp;</p>



<p class="wp-block-paragraph">At the dinner, participants pushed back on the very definition of “AI usage.” Is it opening a tool once a week? Using it daily?&nbsp;Or only when it actually changes how work gets done?&nbsp;Turning employees into internal AI Ambassadors, where colleagues help each other was one of the more promising ideas around the table.&nbsp;</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="683" src="https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-2-1024x683.png?x94846" alt="" class="wp-image-9487" srcset="https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-2-1024x683.png 1024w, https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-2-300x200.png 300w, https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-2-768x512.png 768w, https://shiftmag.dev/wp-content/uploads/2026/04/CTO-dinner-london-2.png 1600w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Foto: Marko Mudrinić</figcaption></figure>



<h2 class="wp-block-heading"><span id="the-foundation-is-laid-now-what">The foundation is laid. Now what?&nbsp;&nbsp;</span></h2>



<p class="wp-block-paragraph">For us at Infobip, this conversation was something&nbsp;we’ve&nbsp;been living for a while now.&nbsp;We held hackathons,&nbsp;organized&nbsp;education programs, and made moves in infrastructure&nbsp;and&nbsp;security. This&nbsp;made adopting tools like Claude as easy as possible.&nbsp;We&#8217;re&nbsp;now at a point where&nbsp;<strong>over 80% of the company uses AI tools daily.</strong>&nbsp;</p>



<p class="wp-block-paragraph">But&nbsp;here&#8217;s&nbsp;what the dinner made me think about: the subjective experience and the data&nbsp;don&#8217;t&nbsp;always agree.&nbsp;When we&nbsp;talk with our engineers,&nbsp;they report feeling more productive. But when we look at DORA metrics or business outcomes, the improvement&nbsp;isn’t&nbsp;easy to&nbsp;correlate.&nbsp;</p>



<p class="wp-block-paragraph">The&nbsp;funny thing is that everyone feels more productive and energized, but&nbsp;it&#8217;s&nbsp;hard to put a finger on the exact metric.&nbsp;&nbsp;</p>



<p class="wp-block-paragraph">And if the dinner told us anything,&nbsp;it’s&nbsp;that&nbsp;we’re&nbsp;not the only ones thinking about that gap.&nbsp;&nbsp;</p>



<p class="wp-block-paragraph">Which brings me to the real takeaway:&nbsp;we&#8217;re&nbsp;entering a new phase, where&nbsp;<strong>it&#8217;s&nbsp;important to make AI usage count</strong>. That means top-down initiatives that change how teams work, including bringing non-technical teams in more.&nbsp;</p>



<p class="wp-block-paragraph">There might not be a &#8220;best&#8221; AI adoption strategy. But for those of us&nbsp;who&#8217;ve&nbsp;got adoption off the ground, the question is no longer&nbsp;quantity&nbsp;&#8211;&nbsp;it&#8217;s&nbsp;quality.&nbsp;</p>
<p>The post <a href="https://shiftmag.dev/cto-ai-adoption-strategy-9477/">13 CTOs walk into a bar and realize: There is no best AI adoption strategy</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/?utm_source=w3tc&utm_medium=footer_comment&utm_campaign=free_plugin

Page Caching using Disk: Enhanced 

Served from: shiftmag.dev @ 2026-06-06 11:03:21 by W3 Total Cache
-->