<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Copilot Archives - ShiftMag</title>
	<atom:link href="https://shiftmag.dev/tag/copilot/feed/" rel="self" type="application/rss+xml" />
	<link>https://shiftmag.dev/tag/copilot/</link>
	<description>Insightful engineering content &#38; community</description>
	<lastBuildDate>Thu, 02 Apr 2026 11:39:52 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>

<image>
	<url>https://shiftmag.dev/wp-content/uploads/2024/08/cropped-ShiftMag-favicon-32x32.png</url>
	<title>Copilot Archives - ShiftMag</title>
	<link>https://shiftmag.dev/tag/copilot/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>AI Hasn&#8217;t Made Developers Faster, It&#8217;s Made Their Review Queues Longer!</title>
		<link>https://shiftmag.dev/ai-hasnt-made-developers-faster-its-made-their-review-queues-longer-8935/</link>
		
		<dc:creator><![CDATA[ShiftMag]]></dc:creator>
		<pubDate>Thu, 02 Apr 2026 09:53:30 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI coding tools]]></category>
		<category><![CDATA[Copilot]]></category>
		<category><![CDATA[Developer Experience]]></category>
		<category><![CDATA[Developer Productivity]]></category>
		<category><![CDATA[engineering metrics]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=8935</guid>

					<description><![CDATA[<p>92% of developers use AI coding tools, but productivity has barely moved - stuck at 10%. Here’s why using AI doesn’t automatically mean getting more done.</p>
<p>The post <a href="https://shiftmag.dev/ai-hasnt-made-developers-faster-its-made-their-review-queues-longer-8935/">AI Hasn&#8217;t Made Developers Faster, It&#8217;s Made Their Review Queues Longer!</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img fetchpriority="high" decoding="async" width="1200" height="630" src="https://shiftmag.dev/wp-content/uploads/2026/03/Ai-productivity.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2026/03/Ai-productivity.png 1200w, https://shiftmag.dev/wp-content/uploads/2026/03/Ai-productivity-300x158.png 300w, https://shiftmag.dev/wp-content/uploads/2026/03/Ai-productivity-1024x538.png 1024w, https://shiftmag.dev/wp-content/uploads/2026/03/Ai-productivity-768x403.png 768w" sizes="(max-width: 1200px) 100vw, 1200px" /></figure>


<p class="wp-block-paragraph">A developer uses Copilot to write 30 lines of code in 10 minutes, but then spends 45 minutes reviewing it &#8211; checking for bugs, edge cases, and code that doesn’t match team standards. </p>



<p class="wp-block-paragraph">The time saved during writing <strong>gets completely eaten up during validation</strong>. And this is exactly what happens repeatedly across teams trying to adopt AI at scale.</p>



<p class="wp-block-paragraph">At the Pragmatic Summit, <strong>Laura Tacho</strong> (CTO at DX) <a href="https://shiftmag.dev/this-cto-says-93-of-developers-use-ai-but-productivity-is-still-10-8013/" target="_blank" rel="noreferrer noopener">shared some interesting research on AI in coding</a>:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Almost 93% of developers use AI assistants every month, and about 27% of production code now comes from AI. Yet, despite all this, overall productivity has barely budged &#8211; staying around a 10% boost since AI tools arrived.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="ai-adoption-is-everywhere%e2%80%a6">AI adoption is everywhere…</span></h2>



<p class="wp-block-paragraph">The numbers are clear:</p>



<ul class="wp-block-list">
<li>92.6% of developers use AI coding assistants monthly</li>



<li>75% use them weekly</li>



<li>26.9% of production code contains AI-authored segments</li>
</ul>



<p class="wp-block-paragraph"><a href="https://shiftmag.dev/stack-overflow-survey-2025-ai-5653/" target="_blank" rel="noreferrer noopener">84% of developers use AI tools, according to Stack Overflow&#8217;s 2025 survey.</a> Adoption is now standard &#8211; the numbers are probably even bigger now.</p>



<h2 class="wp-block-heading"><span id="%e2%80%a6yet-work-isn%e2%80%99t-moving-any-quicker">…Yet work isn’t moving any quicker</span></h2>



<p class="wp-block-paragraph">The <strong>gap between adoption and productivity appears first as a trust problem</strong>. </p>



<p class="wp-block-paragraph"><a href="https://shiftmag.dev/stack-overflow-survey-2025-ai-5653/" target="_blank" rel="noreferrer noopener">46% of developers don&#8217;t fully trust the output</a>, and that skepticism has a reason: reviewing AI-generated code frequently requires more effort than reviewing human-written one.</p>



<p class="wp-block-paragraph">The DX AI Measurement Framework (published by vendor DX but structured as an industry standard) identifies this directly: </p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Code generated by AI may be less intuitive for human developers to understand, potentially creating bottlenecks when issues arise or modifications are needed.</p>
</blockquote>



<p class="wp-block-paragraph">This is why productivity hasn’t jumped. <strong>Developers might write code faster with AI, but they end up spending the same time checking, fixing, and making sense of what AI produces</strong>. In the end, the overall development cycle doesn’t get any shorter.</p>



<p class="wp-block-paragraph"><a href="https://shiftmag.dev/state-of-code-2025-7978/" target="_blank" rel="noreferrer noopener">Sonar&#8217;s research confirms the pattern at scale: 42% of committed code now includes AI assistance</a>, yet <a href="https://shiftmag.dev/state-of-code-2025-7978/" target="_blank" rel="noreferrer noopener">96% of developers say they don&#8217;t fully trust AI-generated code.</a> And this is exactly what we see: output is everywhere, but the confidence in it is not.</p>



<h2 class="wp-block-heading"><span id="why-productivity-has-stalled">Why productivity has stalled?</span></h2>



<p class="wp-block-paragraph">That 10% productivity bump comes down to a workflow mismatch. </p>



<p class="wp-block-paragraph">Teams started using AI to write code faster, but<strong> didn’t adjust how they review, test, or integrate it</strong>. In other words, writing got quicker, but everything that comes after stayed just as slow.</p>



<p class="wp-block-paragraph">The DX research notes a broader context relevant here: most organizations see their biggest bottlenecks not in code generation, but: </p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">In the outer loop, or in human factors like collaboration, alignment, and the ability to do deep, focused work.</p>
</blockquote>



<p class="wp-block-paragraph">AI addresses one specific problem, and that&#8217;s code-writing speed. But, as we can see, the overall development cycle has other constraints.</p>



<p class="wp-block-paragraph">Teams that actually see productivity gains from AI usually do two things: <strong>they figure out exactly where AI adds value</strong>, and <strong>they tweak their workflows to make the most of it</strong>. Teams that just deploy AI without changing how they work? They get adoption, but no real boost in productivity.</p>



<p class="wp-block-paragraph">The 10% productivity ceiling sticks because the time spent validating AI-written code cancels out the speed gains. Most teams focus on writing faster, but few have optimized for faster validation.</p>



<p class="wp-block-paragraph">It’s an obvious obstacle, but maybe also an opportunity.</p>
<p>The post <a href="https://shiftmag.dev/ai-hasnt-made-developers-faster-its-made-their-review-queues-longer-8935/">AI Hasn&#8217;t Made Developers Faster, It&#8217;s Made Their Review Queues Longer!</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Want to build a more accurate Copilot with fewer hallucinations? Move from prompting to fine-tuning.</title>
		<link>https://shiftmag.dev/build-a-more-accurate-copilot-with-fewer-hallucinations-3256/</link>
		
		<dc:creator><![CDATA[Tena Šojer Keser]]></dc:creator>
		<pubDate>Tue, 07 May 2024 13:12:32 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Event]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Copilot]]></category>
		<category><![CDATA[Emanuel Lacic]]></category>
		<category><![CDATA[large language models]]></category>
		<category><![CDATA[Shift Conference]]></category>
		<category><![CDATA[Shift Miami 2024]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=3256</guid>

					<description><![CDATA[<p>Is prompting enough? Emanuel shares exploration of his team and what they learned regarding prompting strategies, fine-tuning, and model size.</p>
<p>The post <a href="https://shiftmag.dev/build-a-more-accurate-copilot-with-fewer-hallucinations-3256/">Want to build a more accurate Copilot with fewer hallucinations? Move from prompting to fine-tuning.</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img decoding="async" width="2100" height="1402" src="https://shiftmag.dev/wp-content/uploads/2024/05/53701014927_a82c426152_o-scaled.jpg?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2024/05/53701014927_a82c426152_o-scaled.jpg 2100w, https://shiftmag.dev/wp-content/uploads/2024/05/53701014927_a82c426152_o-300x200.jpg 300w, https://shiftmag.dev/wp-content/uploads/2024/05/53701014927_a82c426152_o-1024x684.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2024/05/53701014927_a82c426152_o-768x513.jpg 768w" sizes="(max-width: 2100px) 100vw, 2100px" /></figure>


<p class="wp-block-paragraph">Is prompting enough? Emanuel Lacić asked this question on the stage of the <a href="https://shift.infobip.com/us/" target="_blank" rel="noreferrer noopener">Shift Conference in Miami </a>as he explored the process of creating a Copilot for a UI-based chatbot builder.  </p>



<p class="wp-block-paragraph">The chatbot builder in question, <a href="https://www.infobip.com/docs/answers/generative-ai/answers-copilot#:~:text=Answers%20copilot%20is%20a%20Generative,an%20outline%20of%20the%20design." target="_blank" rel="noreferrer noopener">Answers Copilot</a>, is a GenAI feature tha<strong>t enables end users to design a chatbot based on their natural language input</strong>. GenAI creates an outline of the design of how the chatbot should behave, automating the chatbot building process to a degree, and the end user then customizes it to meet their requirement. &nbsp;</p>



<h2 class="wp-block-heading"><span id="starting-with-prompting">Starting with prompting</span></h2>



<p class="wp-block-paragraph">The initial process relied on prompting: Emanuel and his team described what the underlying code looked like, had Open<s> </s>AI generate the code blocks representing visual elements, and then plugged it in to have it rendered in the UI. Preferably with as few hallucinations (i.e., generated code that leads to an error when rendering), and as predictable output as possible. &nbsp;</p>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="575" src="https://shiftmag.dev/wp-content/uploads/2024/05/image-2-1024x575.png?x94846" alt="" class="wp-image-3259" srcset="https://shiftmag.dev/wp-content/uploads/2024/05/image-2-1024x575.png 1024w, https://shiftmag.dev/wp-content/uploads/2024/05/image-2-300x169.png 300w, https://shiftmag.dev/wp-content/uploads/2024/05/image-2-768x432.png 768w, https://shiftmag.dev/wp-content/uploads/2024/05/image-2.png 1381w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph">They tested different prompt engineering strategies with<strong> Microsoft’s API for GPT-3.5 Turbo</strong>. By testing different techniques ranging from zero-shot to few-shot prompting with domain-specific instructions, they managed to <strong>lower the percentage of hallucinations to 12.63%</strong> on average. &nbsp;Accuracy was measured using HitRate – the number of times where the generated code blacked matched to a 100% of what was expected &#8211; which peaked at 2.13%.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="277" src="https://shiftmag.dev/wp-content/uploads/2024/05/image-1024x277.png?x94846" alt="" class="wp-image-3257" srcset="https://shiftmag.dev/wp-content/uploads/2024/05/image-1024x277.png 1024w, https://shiftmag.dev/wp-content/uploads/2024/05/image-300x81.png 300w, https://shiftmag.dev/wp-content/uploads/2024/05/image-768x208.png 768w, https://shiftmag.dev/wp-content/uploads/2024/05/image.png 1377w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph">Having created the Copilot using different prompting strategies, it was time to answer Emanuel’s titular question: Is prompting enough? The team decided to test the hypothesis that <strong>LLMs with context-specific data might yield a lower percentage of hallucinations and higher accuracy</strong> (i.e., by measuring the HitRate and turning to fine-tuning.&nbsp;</p>



<h2 class="wp-block-heading"><span id="bigger-is-not-always-better">Bigger is not always better&nbsp;</span></h2>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="684" src="https://shiftmag.dev/wp-content/uploads/2024/05/53701912011_be5443f9fc_o-1024x684.jpg?x94846" alt="" class="wp-image-3263" srcset="https://shiftmag.dev/wp-content/uploads/2024/05/53701912011_be5443f9fc_o-1024x684.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2024/05/53701912011_be5443f9fc_o-300x200.jpg 300w, https://shiftmag.dev/wp-content/uploads/2024/05/53701912011_be5443f9fc_o-768x513.jpg 768w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph">As end users can task the Answers Copilot with creating a chatbot for a variety of use cases, the task of fine-tuning it required the team to know what input users might provide, as well as what is the desired output. Since real-world data was not available, <strong>GenAI was put to the task of synthetically creating some. &nbsp;</strong></p>



<p class="wp-block-paragraph">The data was then used to fine-tune LLMs of various sizes: <a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/tutorials/fine-tune">OpenAI GPT-3.5 Turbo</a> (large), <a href="https://arxiv.org/pdf/2310.06825.pdf">Mistral 7B Instruct </a>(mid), <a href="https://arxiv.org/pdf/2302.13971.pdf" target="_blank" rel="noreferrer noopener">LLaMa 3B</a> (small), and <a href="https://arxiv.org/pdf/2310.06694.pdf" target="_blank" rel="noreferrer noopener">Sheared LLaMa 1.3B</a> (tiny). In addition to training the models with relevant data, the team used <a href="https://arxiv.org/abs/2106.09685" target="_blank" rel="noreferrer noopener">LoRA</a> to fine-tune visual element generation.  </p>



<p class="wp-block-paragraph">The fine-tuning process did yield the desired results: LLMs trained on relevant data had a significantly lower number of hallucinations, with <strong>0.04% as the lowest achieved hallucination rate</strong>. The accuracy, on the other hand, also improved significantly, where the <strong>HitRate climbed up to 26.72%.&nbsp;</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="254" src="https://shiftmag.dev/wp-content/uploads/2024/05/image-1-1024x254.png?x94846" alt="" class="wp-image-3258" srcset="https://shiftmag.dev/wp-content/uploads/2024/05/image-1-1024x254.png 1024w, https://shiftmag.dev/wp-content/uploads/2024/05/image-1-300x75.png 300w, https://shiftmag.dev/wp-content/uploads/2024/05/image-1-768x191.png 768w, https://shiftmag.dev/wp-content/uploads/2024/05/image-1.png 1377w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph">Interestingly, Emanuel notes the best performing models were Sheared LlaMA (in terms of hallucinations) and Mistral 7b Instruct (when it came to HitRate):&nbsp;</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph"><em>Sometimes you don’t need the largest, best performing LLM. But the only way to know which one performs best is to experiment – you can’t know beforehand.</em>&nbsp;</p>
</blockquote>



<h2 class="wp-block-heading"><span id="what%e2%80%99s-next">What’s next?&nbsp;</span></h2>



<p class="wp-block-paragraph">There are always ways to polish Copilots, with <strong>user feedback being the logical next step</strong>. To that end, he showed the <a href="https://arxiv.org/abs/2402.01306" target="_blank" rel="noreferrer noopener">KTO method</a> (Kahneman-Tversky Optimization): As it requires only a binary signal (desirable/undesirable outcome), the<strong> user feedback data is more abundant, cheaper, and faster to collect</strong> than data based on user preference between two different outputs, which is used in other popular methods like Reinforcement Learning. KTO is also a good choice when there is a marked imbalance between the number of desirable and undesirable examples.&nbsp;</p>



<p class="wp-block-paragraph">To take user feedback a step further, a multiarmed bandit algorithm can be used, as Emanuel demonstrated, to determine <strong>which of the LLMs produces the most favorable results </strong>while running in production and, consequently, <strong>which LLM to choose in an automatic way. &nbsp;</strong></p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Emanuel Lacić: Is Prompting Enough? The Process of Making a Copilot for UI-based Chatbot Builders" width="500" height="281" src="https://www.youtube.com/embed/eEaKR4uatwE?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div></figure>



<p class="wp-block-paragraph">You can find <a href="http://elacic.me/documents/talks/2024_04_Shift_Copilot.pdf" target="_blank" rel="noreferrer noopener">Emanuel&#8217;s slides</a> here or find out more about his work on his <a href="http://elacic.me/" target="_blank" rel="noreferrer noopener">personal website</a>.&nbsp;</p>



<p class="wp-block-paragraph"></p>
<p>The post <a href="https://shiftmag.dev/build-a-more-accurate-copilot-with-fewer-hallucinations-3256/">Want to build a more accurate Copilot with fewer hallucinations? Move from prompting to fine-tuning.</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/?utm_source=w3tc&utm_medium=footer_comment&utm_campaign=free_plugin

Page Caching using Disk: Enhanced 

Served from: shiftmag.dev @ 2026-06-27 13:41:52 by W3 Total Cache
-->