<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>LLM Archives - ShiftMag</title>
	<atom:link href="https://shiftmag.dev/tag/llm/feed/" rel="self" type="application/rss+xml" />
	<link>https://shiftmag.dev/tag/llm/</link>
	<description>Insightful engineering content &#38; community</description>
	<lastBuildDate>Thu, 05 Feb 2026 14:18:40 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>

<image>
	<url>https://shiftmag.dev/wp-content/uploads/2024/08/cropped-ShiftMag-favicon-32x32.png</url>
	<title>LLM Archives - ShiftMag</title>
	<link>https://shiftmag.dev/tag/llm/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Forget the Model, It’s Workflows That Make LLM Products Run</title>
		<link>https://shiftmag.dev/llms-can-improve-customer-operations-7716/</link>
		
		<dc:creator><![CDATA[Marko Crnjanski]]></dc:creator>
		<pubDate>Thu, 05 Feb 2026 14:18:39 +0000</pubDate>
				<category><![CDATA[Event]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[How to Web]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[programming]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=7716</guid>

					<description><![CDATA[<p>Building with LLMs is nothing like traditional software. If we want something that actually works in production, we have to test it, monitor it, and keep iterating on real customer data.</p>
<p>The post <a href="https://shiftmag.dev/llms-can-improve-customer-operations-7716/">Forget the Model, It’s Workflows That Make LLM Products Run</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img fetchpriority="high" decoding="async" width="1200" height="630" src="https://shiftmag.dev/wp-content/uploads/2026/02/New-ShiftMag-panel-interview.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2026/02/New-ShiftMag-panel-interview.png 1200w, https://shiftmag.dev/wp-content/uploads/2026/02/New-ShiftMag-panel-interview-300x158.png 300w, https://shiftmag.dev/wp-content/uploads/2026/02/New-ShiftMag-panel-interview-1024x538.png 1024w, https://shiftmag.dev/wp-content/uploads/2026/02/New-ShiftMag-panel-interview-768x403.png 768w" sizes="(max-width: 1200px) 100vw, 1200px" /></figure>


<p class="wp-block-paragraph">From his experience leading AI product teams, <strong>Andrew Mende</strong> (Senior Product Manager, Machine Learning at Booking.com) explained what it truly takes to ship LLM-based products in production.</p>



<h2 class="wp-block-heading"><span id="making-ai-products-reliable-requires-new-workflows">Making AI products reliable requires new workflows</span></h2>



<p class="wp-block-paragraph">For Mende, the buzz around AI is a rare shift, like the rise of smartphones. But what does it mean for product teams?</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">This moment unlocks new ways of solving customer problems that were previously impossible due to technical constraints. </p>
</blockquote>



<p class="wp-block-paragraph">He was clear: <strong>traditional product management approaches often fail with AI-driven products</strong>. </p>



<p class="wp-block-paragraph">LLM-based systems behave differently, demand new workflows, and bring new types of risk. </p>



<p class="wp-block-paragraph">Unlike deterministic software, <strong>LLMs are probabilistic</strong> (identical inputs can produce different outputs), making experimentation easy but production readiness challenging, and forcing teams to rethink how they test, evaluate, and monitor features.</p>



<p class="wp-block-paragraph">One of the biggest traps, Mende explained, is <strong>confusing a successful prototype with a scalable solution</strong>:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">It’s easy to paste a prompt into ChatGPT and see results; much harder to make it reliable across thousands of real customer inputs.</p>
</blockquote>



<p class="wp-block-paragraph">Teams need <strong>structured datasets</strong>, big tables of real customer examples, to track accuracy, spot regressions, and see if changes actually work. Without them, it’s all guesswork.</p>



<h2 class="wp-block-heading"><span id="focus-on-accuracy-cost-and-speed">Focus on accuracy, cost, and speed</span></h2>



<p class="wp-block-paragraph">Mende’s practical approach to model selection focuses on <strong>accuracy, cost, and latency</strong>: start with the most capable model to see if the problem can be solved, then move to smaller or faster models to optimize performance. </p>



<p class="wp-block-paragraph">This requires testing <strong>multiple configurations</strong> (context size, prompts, and parameters) since even small changes affect results. Beyond the model, context selection, prompt instructions, and external tools are critical:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">For example, when a customer asks about a specific order, the system should fetch real-time data instead of relying on static knowledge. This combination of LLMs and tools turns simple prompts into full systems, but also increases complexity and maintenance costs.</p>
</blockquote>



<h2 class="wp-block-heading">LLMs can transform how users interact &#8211; if teams build the right infrastructure</h2>



<p class="wp-block-paragraph">Mende concluded his How to Web lecture by saying LLMs shine by transforming user interaction: for the first time, <strong>digital products can understand plain language</strong>, turning customer requests directly into actions.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">This shift brings digital experiences closer to human conversations and enables new product patterns that were out of reach just a few years ago.</p>
</blockquote>



<p class="wp-block-paragraph">The challenge now, Mende explained, is not whether LLMs work, but whether teams are willing to build the evaluation, monitoring, and infrastructure required to make them truly useful.</p>



<p class="wp-block-paragraph"></p>
<p>The post <a href="https://shiftmag.dev/llms-can-improve-customer-operations-7716/">Forget the Model, It’s Workflows That Make LLM Products Run</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Shift is Coming to Asia &#8211; and We’re Giving Away 10 Tickets!</title>
		<link>https://shiftmag.dev/shift-is-coming-to-asia-and-were-giving-away-10-tickets-6597/</link>
		
		<dc:creator><![CDATA[ShiftMag]]></dc:creator>
		<pubDate>Fri, 17 Oct 2025 14:43:35 +0000</pubDate>
				<category><![CDATA[Event]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI agents]]></category>
		<category><![CDATA[Infobip Kuala Lumpur]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[Shift Conference]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=6597</guid>

					<description><![CDATA[<p>For the first time ever, the Shift Conference is coming to Asia, landing in Kuala Lumpur in November 2025 - with a full focus on Copilots, Agents, and LLMs.</p>
<p>The post <a href="https://shiftmag.dev/shift-is-coming-to-asia-and-were-giving-away-10-tickets-6597/">Shift is Coming to Asia &#8211; and We’re Giving Away 10 Tickets!</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img decoding="async" width="2047" height="1365" src="https://shiftmag.dev/wp-content/uploads/2025/10/54821194997_fc6a6a90e4_k.jpg?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2025/10/54821194997_fc6a6a90e4_k.jpg 2047w, https://shiftmag.dev/wp-content/uploads/2025/10/54821194997_fc6a6a90e4_k-300x200.jpg 300w, https://shiftmag.dev/wp-content/uploads/2025/10/54821194997_fc6a6a90e4_k-1024x683.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2025/10/54821194997_fc6a6a90e4_k-768x512.jpg 768w" sizes="(max-width: 2047px) 100vw, 2047px" /></figure>


<p class="wp-block-paragraph">Infobip Shift, one of Europe’s leading developer conferences, is making its <a href="https://shift.infobip.com/asia/" target="_blank" rel="noreferrer noopener">Asian debut</a> on <strong>November 4th, 2025</strong>! </p>



<p class="wp-block-paragraph">In partnership with Cradle, Malaysia’s startup ecosystem builder, we’re bringing <strong>developers, founders, and innovators</strong> from across ASEAN together in Kuala Lumpur &#8211; building connections, sparking ideas, and shaping the future of software.</p>



<p class="wp-block-paragraph">&#8220;By bringing Shift to Malaysia, we’re giving Southeast Asian developers the chance to learn, connect, and join <strong>the same conversations shaping tech in Europe and the US,</strong>&#8221; says Stipe Cigic, Head of the Infobip Shift team:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">The city has a fast-growing, diverse tech community, full of talent, curiosity, and energy &#8211; but without many events that truly bring developers together in one place. </p>
</blockquote>



<h2 class="wp-block-heading"><span id="everything-is-ai-are-you-ready">Everything is AI. Are you ready?</span></h2>



<p class="wp-block-paragraph">Whether you’re <strong>working with AI at scale or exploring it for the first time</strong>, Shift KL brings together developers, researchers, and founders to share knowledge and insights. </p>



<p class="wp-block-paragraph">From copilots and LLMs to agentic workflows and ethical challenges, AI is reshaping how software is built &#8211; and Shift KL is where you can see it in action and discuss what it means for your work.</p>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="683" src="https://shiftmag.dev/wp-content/uploads/2025/10/54822373215_f7032b5099_k-1024x683.jpg?x94846" alt="" class="wp-image-6611" srcset="https://shiftmag.dev/wp-content/uploads/2025/10/54822373215_f7032b5099_k-1024x683.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2025/10/54822373215_f7032b5099_k-300x200.jpg 300w, https://shiftmag.dev/wp-content/uploads/2025/10/54822373215_f7032b5099_k-768x512.jpg 768w, https://shiftmag.dev/wp-content/uploads/2025/10/54822373215_f7032b5099_k.jpg 2047w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph">World-class experts including <strong>Tejas Kumar</strong> (Developer Advocate, IBM), <strong>Dugald Morrow</strong> (Principal Developer Advocate, Atlassian), and <strong>Joyce Lin</strong> (Lead Tech Educator, LMArena) will share their perspectives on the latest AI trends and practical ways to apply them.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">For Shift, it’s also a way of staying true to what we’ve always stood for: connecting developers wherever they are, and building a global community that keeps learning and growing together.</p>



<p class="wp-block-paragraph"></p>
</blockquote>



<p class="wp-block-paragraph">Shift KL will also be <strong>third event on a third continent in the same year</strong>!</p>



<h2 class="wp-block-heading"><span id="get-your-ticket">Get your ticket!</span></h2>



<p class="wp-block-paragraph">Calling all developers, tech aficionados, and industry experts!</p>



<p class="wp-block-paragraph"><strong>The first 10 applicants to complete the form get free tickets, and the next 10 receive a 30% discount for <a href="https://shift.infobip.com/asia/" target="_blank" rel="noreferrer noopener">Shift Kuala Lumpur 2025</a>!</strong></p>



<p class="wp-block-paragraph"></p>



<iframe loading="lazy" class="airtable-embed" src="https://airtable.com/embed/appDKumOxVuEZO1nh/pagyF873fcKGOGrqT/form" frameborder="0" onmousewheel="" width="100%" height="533" style="background: transparent; border: 1px solid #ccc;"></iframe>
<p>The post <a href="https://shiftmag.dev/shift-is-coming-to-asia-and-were-giving-away-10-tickets-6597/">Shift is Coming to Asia &#8211; and We’re Giving Away 10 Tickets!</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How We Built an AI Learning Assistant &#8211; Approved by Teachers</title>
		<link>https://shiftmag.dev/ai-assistant-enters-the-classroom-but-teachers-arent-going-anywhere-5845/</link>
		
		<dc:creator><![CDATA[Jelena Matecic]]></dc:creator>
		<pubDate>Fri, 29 Aug 2025 11:10:21 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Productivity]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Learning Assistant]]></category>
		<category><![CDATA[LLM]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=5845</guid>

					<description><![CDATA[<p>Textbooks are great, but let’s be honest - sometimes students need a study buddy with a sense of humor and a knack for explaining photosynthesis. This is where our AI assistant enters the scene.</p>
<p>The post <a href="https://shiftmag.dev/ai-assistant-enters-the-classroom-but-teachers-arent-going-anywhere-5845/">How We Built an AI Learning Assistant &#8211; Approved by Teachers</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Textbooks are full of rich knowledge, but let&#8217;s face it: <strong>students often miss the good stuff</strong>. Important facts get buried in small print, side notes, or skipped pages, and curiosity can fade fast.</p>



<p class="wp-block-paragraph">That got us thinking: <em>what if an AI assistant could make learning more interactive, personal, and fun?</em></p>



<p class="wp-block-paragraph">In this article, I’ll share how the AI Base Engineering team at Infobip &#8211; of which I’m a member &#8211; built a <strong>prototype AI tutor for biology</strong>. I’ll explain why we chose this subject, how we tested it, and the key lessons we learned along the way.</p>



<p class="wp-block-paragraph">Spoiler: it’s not about replacing teachers &#8211; it’s about helping students learn in new and meaningful ways.</p>



<h2 class="wp-block-heading"><span id="so%e2%80%a6-why-an-ai-study-buddy">So… Why an AI study buddy?</span></h2>



<p class="wp-block-paragraph">Our challenge was simple but ambitious: <strong>make textbook content more accessible, engaging, and curiosity-driven</strong>.</p>



<p class="wp-block-paragraph">Biology was the perfect testing ground: it’s well-structured, widely taught, and available in digital form. For our prototype, we used official Croatian school textbooks, spanning 7th grade through the second year of high school.</p>



<p class="wp-block-paragraph">The goal? <strong>To support every kind of learner </strong>&#8211; those falling behind, those racing ahead, and everyone in between. The AI assistant acts like a responsive study buddy: highlighting overlooked facts, answering questions from verified sources, and adapting explanations to each student’s level of understanding.</p>



<p class="wp-block-paragraph">And just to be clear: this was never about replacing teachers. Our vision is to help students learn and engage more deeply, while keeping teachers central to the process.</p>



<h2 class="wp-block-heading"><span id="our-ai-tutor-explains-not-just-defines">Our AI tutor explains, not just defines</span></h2>



<p class="wp-block-paragraph">To build a reliable assistant, we grounded everything in the curriculum. Using trusted digital textbook content, we crafted precise prompts to guide the assistant toward clarity, simplicity, and curiosity-driven learning.</p>



<p class="wp-block-paragraph">One of our favorite tactics? <strong>The assistant doesn&#8217;t just dump definitions</strong>. Instead, it might explain photosynthesis like this:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Plants are like little factories; sunlight turns water and air into sugar. Do you want to know more about how that happens?</p>
</blockquote>



<p class="wp-block-paragraph">We also trained the assistant to <strong>ask questions back:</strong> A Socratic approach that encourages critical thinking. It doesn&#8217;t just answer; it engages.</p>



<h2 class="wp-block-heading"><span id="we-taught-our-ai-to-talk-the-talk">We taught our AI to talk the talk</span></h2>



<p class="wp-block-paragraph">Designing the assistant&#8217;s tone in Croatian was no small feat. The language includes formality distinctions and gendered grammar, so we had to strike a delicate balance: <strong>friendly, but not too casual; professional, but not robotic</strong>.</p>



<p class="wp-block-paragraph">We also taught it to respond to tricky situations &#8211; from inappropriate language to sensitive topics like human reproduction &#8211; with calm professionalism and respect. When students pushed boundaries, the assistant didn’t scold; it simply guided them back toward curious, respectful inquiry.</p>



<p class="wp-block-paragraph">To meet students where they already are, <strong>we brought the assistant to WhatsApp</strong>. And with Infobip’s <a href="https://www.infobip.com/voice" target="_blank" rel="noreferrer noopener">Voice API</a>, they can ask questions or get answers as voice messages. The result? A judgment-free, always-available biology buddy &#8211; just a tap (or a voice note) away.</p>



<h2 class="wp-block-heading"><span id="when-ai-gets-creative-and-sometimes-wrong">When AI gets creative (and sometimes WRONG)</span></h2>



<p class="wp-block-paragraph">Let’s address the elephant in the room: the hallucinations.</p>



<p class="wp-block-paragraph">Like all LLMs, ours<strong> sometimes got a bit too creative</strong>. Ask for an example? It might cheerfully invent one from thin air. Say hi? You could end up with a TED Talk on evolution. Ask who&#8217;s stronger, a lion or a wolf? You might get a philosophical journey through mammal diets, fur types, and migration patterns.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">These hallucinations were part of the process &#8211; and even charming at times &#8211; but accuracy is essential in education. We improved prompts and curated the assistant&#8217;s knowledge base more tightly to fix this. Hallucinations might not disappear entirely, but we learned how to keep the assistant on track.</p>
</blockquote>



<p class="wp-block-paragraph">After all, when a student asks about mitosis, they shouldn’t end up hearing about whales.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="614" src="https://shiftmag.dev/wp-content/uploads/2025/08/AI.Assistant2-1024x614.png?x94846" alt="" class="wp-image-5854" srcset="https://shiftmag.dev/wp-content/uploads/2025/08/AI.Assistant2-1024x614.png 1024w, https://shiftmag.dev/wp-content/uploads/2025/08/AI.Assistant2-300x180.png 300w, https://shiftmag.dev/wp-content/uploads/2025/08/AI.Assistant2-768x461.png 768w, https://shiftmag.dev/wp-content/uploads/2025/08/AI.Assistant2.png 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading"><span id="test-drive-phase-by-phase-first-staff-then-students">Test drive, phase by phase: first staff, then students!</span></h2>



<h3 class="wp-block-heading"><span id="phase-1-internal-pilot">Phase 1: Internal Pilot</span></h3>



<p class="wp-block-paragraph">Our first testers were<strong> internal education and tech staff</strong>. They knew what to look for and how to break things. Their feedback helped iron out glitches and set a strong foundation.</p>



<h3 class="wp-block-heading"><span id="phase-2-teacher-feedback">Phase 2: Teacher Feedback</span></h3>



<p class="wp-block-paragraph">Next, we brought in <strong>real teachers</strong>. They tested the assistant against real student questions. Could it explain clearly? Did it stay age-appropriate? Was it pedagogically sound?</p>



<p class="wp-block-paragraph">The feedback surprised us in a good way. Teachers appreciated the assistant&#8217;s thoroughness. When students asked if they could use the assistant during tests, it responded with integrity:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">That wouldn&#8217;t be correct. But I can help you prepare by giving you 10 questions and evaluating your answers.</p>
</blockquote>



<p class="wp-block-paragraph">Not hardcoded, just good training.</p>



<h3 class="wp-block-heading"><span id="phase-3-student-trials">Phase 3: Student Trials</span></h3>



<p class="wp-block-paragraph">Finally, <strong>students used the assistant in a classroom setting</strong>. They used it like a study buddy, asking it to quiz them or explain tricky terms. The results? Excited engagement.</p>



<p class="wp-block-paragraph">They loved the follow-up questions that kept the conversation going. They liked the longer answers.</p>



<p class="wp-block-paragraph">The only complaint? <strong>Voice messages sounded robotic</strong>! And yes, it sometimes reads formatting symbols out loud (literally saying &#8220;star&#8221; instead of bolding).</p>



<h2 class="wp-block-heading"><span id="how-ai-can-help-students-learn-and-engage">How AI Can Help Students Learn and Engage</span></h2>



<p class="wp-block-paragraph">Here&#8217;s what we saw, again and again: AI can help students learn and engage by providing:</p>



<ul class="wp-block-list">
<li><strong>Instant help </strong>&#8211; Students can ask questions privately, anytime, without fear of judgment.<br></li>



<li><strong>Personalized explanations</strong> &#8211; If one metaphor doesn&#8217;t work, the assistant tries another.<br></li>



<li><strong>Active learning</strong> &#8211; With questions like &#8220;Can you think of household acids?&#8221;, the assistant nudges students to connect concepts to real life.<br></li>



<li><strong>A safe space</strong> &#8211; For shy students, the AI is a no-pressure place to be curious.</li>
</ul>



<p class="wp-block-paragraph">Notably, the assistant always encouraged students to verify with their teacher and the textbook. Teachers remain the core of the classroom, and the assistant is just that, an assistant.</p>



<h2 class="wp-block-heading"><span id="lessons-learned-and-the-road-ahead">Lessons Learned and the Road Ahead</span></h2>



<p class="wp-block-paragraph">This project started with a simple goal: help students get unstuck. Along the way, it became a deeper exploration of<strong> what AI can do in education</strong>. What we discovered is this: with careful design and clear boundaries, AI can enhance learning and engagement &#8211; complementing, not replacing, human teaching.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Success comes down to the details: prompt phrasing, tone, voice, UX, and content quality. Teacher and student feedback proved invaluable, showing how much students respond when learning feels personal, responsive, and judgment-free.</p>
</blockquote>



<p class="wp-block-paragraph">Next steps? We’ll improve voice UX, expand to new subjects, and keep gathering feedback to make the experience even better.</p>



<p class="wp-block-paragraph">And to educators and tech innovators alike: <strong>building an AI assistant isn’t just a coding exercise</strong>, it’s a collaborative effort between tech and teaching. Done right, it becomes more than a tool &#8211; it becomes a trusted companion in the learning journey.</p>



<p class="wp-block-paragraph">And if one more student walks away thinking, &#8220;Hey, biology is kinda cool,&#8221; then we know we&#8217;ve done something right.</p>
<p>The post <a href="https://shiftmag.dev/ai-assistant-enters-the-classroom-but-teachers-arent-going-anywhere-5845/">How We Built an AI Learning Assistant &#8211; Approved by Teachers</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>OpenAI Drops GPT-OSS, But Can It Reclaim the Open LLM Crown?</title>
		<link>https://shiftmag.dev/openai-drops-gpt-oss-but-can-it-reclaim-the-open-llm-crown-5807/</link>
		
		<dc:creator><![CDATA[Senko Rasic]]></dc:creator>
		<pubDate>Mon, 11 Aug 2025 12:08:00 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Chat GPT]]></category>
		<category><![CDATA[DeepSeek]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[OpenAI]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=5807</guid>

					<description><![CDATA[<p>The battle for the best open LLM just got a new challenger - from OpenAI itself.</p>
<p>The post <a href="https://shiftmag.dev/openai-drops-gpt-oss-but-can-it-reclaim-the-open-llm-crown-5807/">OpenAI Drops GPT-OSS, But Can It Reclaim the Open LLM Crown?</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="1200" height="630" src="https://shiftmag.dev/wp-content/uploads/2025/08/open-ai-release-1.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2025/08/open-ai-release-1.png 1200w, https://shiftmag.dev/wp-content/uploads/2025/08/open-ai-release-1-300x158.png 300w, https://shiftmag.dev/wp-content/uploads/2025/08/open-ai-release-1-1024x538.png 1024w, https://shiftmag.dev/wp-content/uploads/2025/08/open-ai-release-1-768x403.png 768w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /></figure>


<p class="wp-block-paragraph">A few days ago, <strong>OpenAI released GPT-OSS</strong>, a new open-weights model (its first since 2019), in an attempt to take the state-of-the-art crown for open LLMs from its Chinese competitors.</p>



<p class="wp-block-paragraph">You can be excused if that sentence makes you dizzy.</p>



<h2 class="wp-block-heading"><span id="previously-in-the-world-of-llms%e2%80%a6">Previously, in the World of LLMs…</span></h2>



<p class="wp-block-paragraph">OpenAI, the American company behind ChatGPT, was created as a non-profit research lab back in 2015. While it initially published its <a href="https://arxiv.org/abs/2203.02155" target="_blank" rel="noreferrer noopener">research</a> and models openly (<a href="https://github.com/openai/gpt-2" target="_blank" rel="noreferrer noopener">GPT-2</a>, <a href="https://github.com/openai/whisper" target="_blank" rel="noreferrer noopener">Whisper</a>), after striking gold with ChatGPT, <strong>OpenAI stopped publishing its models</strong>, citing safety reasons.</p>



<p class="wp-block-paragraph">The situation changed with an accidental <a href="https://www.deeplearning.ai/the-batch/how-metas-llama-nlp-model-leaked/" target="_blank" rel="noreferrer noopener">leak of the Llama</a> model by Meta (Facebook&#8217;s parent company). Although it was less capable than OpenAI&#8217;s closed models, it was <strong>miles ahead of GPT-2</strong> and the smaller, less-capable open models published by various university labs. Llama unleashed a storm of open-source activity, both in infrastructure (how to run the models) and in research (fine-tuning and customizing the models).</p>



<p class="wp-block-paragraph">To its credit, <strong>Meta encouraged this adoption</strong> instead of trying to stifle it and published later models under an explicit open license.</p>



<p class="wp-block-paragraph">Open models continued to be an interesting side story until January of this year, when a Chinese company called DeepSeek stunned everyone by releasing <a href="https://www.linkedin.com/posts/senkorasic_deepseek-the-quiet-giant-leading-chinas-activity-7280539081342672896-anbR" target="_blank" rel="noreferrer noopener">DeepSeek R1</a>, a competitive open model trained for a fraction of the cost of US AI companies. The Chinese labs Qwen and Kimi followed with similar, also open, models.</p>



<p class="wp-block-paragraph">The <a href="https://greylock.com/greymatter/the-deepseek-moment/" target="_blank" rel="noreferrer noopener">DeepSeek moment</a> stunned American AI companies. The quick pace of Chinese AI progress and the massive uptake, due to the models being open, led some to worry that <a href="https://www.deeplearning.ai/the-batch/issue-312/" target="_blank" rel="noreferrer noopener">China is about to surpass the US in AI technology</a>, arguing that US companies should follow suit. The recently published <a href="https://www.ai.gov/action-plan" target="_blank" rel="noreferrer noopener">US government AI Action Plan</a> also aims to &#8220;encourage open-source and open-weights AI.&#8221;</p>



<p class="wp-block-paragraph">This brings us to last week, when OpenAI released a long-promised open-weights model of its own, <a href="https://openai.com/index/introducing-gpt-oss/" target="_blank" rel="noreferrer noopener">GPT-OSS</a>. While it is not on par with the best OpenAI, Anthropic, or Google models, its release acknowledges that <strong>open models are here to stay</strong>.</p>



<h2 class="wp-block-heading"><span id="not-so-open-source-after-all">Not so open-source after all</span></h2>



<p class="wp-block-paragraph">What makes a large language model (LLM) open, and why should we care?</p>



<p class="wp-block-paragraph">In contrast to closed models (like GPT, Claude, and Gemini), which can only be used via an official API, <strong>anyone can run open models on their own infrastructure or on third-party infrastructure providers</strong>. The architecture of open models can be analyzed, and researchers from other AI labs can learn from their design choices.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">To draw a parallel with open-source software, an <em>open-source model</em> would publish the training and inference source code (the &#8220;engine&#8221;), the model weights (the result of training, akin to compiled code for desktop or mobile apps), and the <em>training data</em> (the source data the model was trained on) under an open and permissible license, like MIT or Apache.</p>
</blockquote>



<p class="wp-block-paragraph">The source code is the least controversial part: <strong>there are many high-quality open-source LLM tools that support a wide variety of models</strong>, like <a href="https://github.com/ggml-org/llama.cpp" target="_blank" rel="noreferrer noopener">llama.cpp</a>, <a href="http://vllm.ai/" target="_blank" rel="noreferrer noopener">VLLM</a>, <a href="https://huggingface.co/" target="_blank" rel="noreferrer noopener">Hugging Face</a>, and <a href="https://lmstudio.ai/" target="_blank" rel="noreferrer noopener">LM Studio</a>. Support for new open models is usually added within days of their publication.</p>



<p class="wp-block-paragraph">The situation for model weights is a bit trickier. Many labs publish these under licenses that <strong>limit usage for potential competitors</strong>, restrict certain uses, or even ban usage in certain parts of the world. <a href="https://www.linkedin.com/pulse/llama-4-deception-how-meta-hijacked-open-source-label-dion-wiggins-mjjrc" target="_blank" rel="noreferrer noopener">Meta has notoriously claimed its Llama models are &#8220;open-source&#8221; while using such a restrictive license</a>. <a href="https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/LICENSE-MODEL" target="_blank" rel="noreferrer noopener">DeepSeek</a> and <a href="https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT" target="_blank" rel="noreferrer noopener">Qwen</a> also add restrictions to their model weights licenses, while OpenAI used the open-source Apache 2.0 license for the <a href="https://openai.com/index/introducing-gpt-oss/" target="_blank" rel="noreferrer noopener">GPT-OSS model weights</a>.</p>



<p class="wp-block-paragraph">Allowing the use of model weights and source code under a permissive license is sufficient for most users, but it doesn&#8217;t go far enough: <strong>you can&#8217;t retrain the model from scratch if you don&#8217;t also have the training data</strong>. The problem here is that the training data for all top models almost certainly contains copyrighted material that may have been illegally obtained and used.</p>



<p class="wp-block-paragraph">There are a number of ongoing court cases in the US to test this, such as those against <a href="https://www.bbc.com/news/articles/c77vr00enzyo">Anthropic</a> and <a href="https://www.bbc.com/news/articles/c77vr00enzyo" target="_blank" rel="noreferrer noopener">Meta</a>, which were partially won by the AI labs. However, the matter is far from settled, and it&#8217;s much safer for any company to <em>not</em> disclose the full dataset used in training, even for open models.</p>



<p class="wp-block-paragraph">This leads us to the distinction between &#8220;open-weights&#8221; (you can use and customize the LLM) and &#8220;open-source&#8221; (you have access to all the source data and can retrain from scratch) models. </p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Since very few organizations have the massive computing infrastructure required to train a big model from scratch, the main sticking point about having all the source data is being able to inspect how the model was trained and how that impacted its performance.</p>
</blockquote>



<p class="wp-block-paragraph">In practice, for most users, the additional restrictions attached to model weights (like Meta&#8217;s &#8220;you can&#8217;t use Llama 4 in the EU&#8221;) are much more problematic.</p>



<h2 class="wp-block-heading"><span id="open-llms-are-now-powerful-accessible-and-adaptable-tools">Open LLMs are now powerful, accessible, and adaptable tools</span></h2>



<p class="wp-block-paragraph">Open LLMs have <strong>historically been less capable</strong> than the best models from OpenAI, Anthropic, and Google, and they also have big hardware requirements. Why would these models be anything more than a geek&#8217;s curiosity?</p>



<p class="wp-block-paragraph">Start with capability. Since the DeepSeek moment, <strong>open models have come very close to the best ones</strong>. There&#8217;s still a gap, but it&#8217;s a much smaller one, and for many tasks &#8211; especially ones that don&#8217;t require state-of-the-art tech &#8211; open models can perform adequately.</p>



<p class="wp-block-paragraph">The hardware capabilities of modern computers are also constantly improving. Macs, with their unified memory (where the GPU has access to all the computer&#8217;s RAM), are ideally suited to running models that require dozens or hundreds of GB of memory. With ongoing improvements in LLM architecture, training, and hardware, you can now run an LLM on your phone (Qwen3 4B) that&#8217;s more powerful than the original ChatGPT!</p>



<p class="wp-block-paragraph">Moreover, there is a <strong>healthy industry of third-party inference providers</strong>, such as <a href="https://groq.com/" target="_blank" rel="noreferrer noopener">Groq</a> and <a href="https://www.cerebras.ai/" target="_blank" rel="noreferrer noopener">Cerebras</a> (which have their own custom chips), OpenRouter, TogetherAI, Replicate, and so on.</p>



<p class="wp-block-paragraph">Running an LLM locally also avoids dependence on another company that could easily revoke your usage for commercial or geopolitical reasons and avoids transferring potentially sensitive data to third parties. A recent court ruling that forced OpenAI to <a href="https://arstechnica.com/tech-policy/2025/06/openai-confronts-user-panic-over-court-ordered-retention-of-chatgpt-logs/" target="_blank" rel="noreferrer noopener">keep all ChatGPT chat data</a> was a stark reminder that these are not theoretical risks.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Finally, open models can be adapted (by fine-tuning or otherwise)<strong> </strong>for a specific purpose that wasn&#8217;t considered by the original authors. This is much cheaper than training a model from scratch and allows for powerful customization for a specific need.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="are-open-llms-just-hype">Are open LLMs just hype?</span></h2>



<p class="wp-block-paragraph">Is the future of LLMs open? Their recent gains in capability and popularity <strong>might be just temporary</strong>. OpenAI&#8217;s GPT-OSS is less capable than other, private OpenAI models. Meta plans to be &#8220;<a href="https://www.meta.com/superintelligence/" target="_blank" rel="noreferrer noopener">more rigorous with what they open-source</a>,&#8221; citing the same safety reasons OpenAI uses. The Chinese labs may decide to stop publishing theirs.</p>



<p class="wp-block-paragraph">On the other hand, with so many different companies involved in cutting-edge AI research and many opening at least some of their models, there is <strong>plenty of fertile ground for further innovation</strong> and already a lot of open models to choose from. Competition is good!</p>
<p>The post <a href="https://shiftmag.dev/openai-drops-gpt-oss-but-can-it-reclaim-the-open-llm-crown-5807/">OpenAI Drops GPT-OSS, But Can It Reclaim the Open LLM Crown?</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>What is Agency, Why AI Agents Lack It, and Why You Should Hire for It</title>
		<link>https://shiftmag.dev/what-is-agency-why-ai-agents-lack-it-5162/</link>
		
		<dc:creator><![CDATA[Rino Čala]]></dc:creator>
		<pubDate>Wed, 23 Apr 2025 12:59:18 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[agentic AI]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI agents]]></category>
		<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[LLM]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=5162</guid>

					<description><![CDATA[<p>AI is helping write code and automate tasks, but until it masters true agency (think independent decision-making and goal-setting) it’s still more like a coding assistant than a full-fledged teammate.</p>
<p>The post <a href="https://shiftmag.dev/what-is-agency-why-ai-agents-lack-it-5162/">What is Agency, Why AI Agents Lack It, and Why You Should Hire for It</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="1200" height="630" src="https://shiftmag.dev/wp-content/uploads/2025/04/AI-agency.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2025/04/AI-agency.png 1200w, https://shiftmag.dev/wp-content/uploads/2025/04/AI-agency-300x158.png 300w, https://shiftmag.dev/wp-content/uploads/2025/04/AI-agency-1024x538.png 1024w, https://shiftmag.dev/wp-content/uploads/2025/04/AI-agency-768x403.png 768w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /></figure>


<p class="wp-block-paragraph">ChatGPT has come out, and <strong>the whole AI industry has jumped headfirst into LLMs</strong>.</p>



<p class="wp-block-paragraph">Back in the day, early models like GPT-3 were like sentence finishers on autopilot. Handy, but not exactly mind readers. Modern LLMs, though? They&#8217;re built for instructions. <strong>You say the thing, they do the thing</strong>.</p>



<p class="wp-block-paragraph">Need an email, report, or essay? Done. With the right plugins, they can now search docs, generate images, and even poke around your desktop like a helpful little robot assistant.</p>



<p class="wp-block-paragraph">And one area where they also shine? <strong>Writing code</strong>.</p>



<h2 class="wp-block-heading"><span id="copilots-teammates-and-the-road-to-autonomy">Copilots, teammates, and the road to autonomy</span></h2>



<p class="wp-block-paragraph">During pretraining, LLMs are fed enormous amounts of code, allowing them to learn syntax and best practices for producing useful, working code.</p>



<p class="wp-block-paragraph">To assess the real-world usefulness of LLMs on coding tasks (and their potential economic impact), the <a href="https://openai.com/index/swe-lancer/" target="_blank" rel="noreferrer noopener">SWE-bench Lancer dataset</a> was introduced this year. It features over <strong>1,400 freelance software engineering tasks sourced from Upwork</strong>, representing a total of $1 million in actual payouts. On this benchmark, the Claude 3.5 Sonnet model managed to &#8220;earn&#8221; $400,000 worth of tasks.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">This kind of performance isn’t just theoretical &#8211; LLMs are already making their way into everyday development. Today, software engineers are using LLMs through tools like <a href="https://shiftmag.dev/build-a-more-accurate-copilot-with-fewer-hallucinations-3256/" target="_blank" rel="noreferrer noopener">Copilots</a> to assist with writing, reviewing, and understanding code.</p>
</blockquote>



<p class="wp-block-paragraph"><strong>Copilots have access to the entire codebase</strong>, allowing them to provide valuable insights and intelligent code completions to engineers.</p>



<p class="wp-block-paragraph">As LLMs have shown, they’re pretty good at coding, the bar has been raised. Enter <a href="https://shiftmag.dev/meet-devin-the-ai-software-engineer-2949/" target="_blank" rel="noreferrer noopener">Devin AI</a> &#8211; a company on a mission to build an AI teammate that doesn’t just help write code, but <strong>does the whole software engineering gig</strong>. We’re talking writing code, fixing its own bugs, Googling docs like a pro, and even testing the app it just built. It’s basically trying to be that one super-productive teammate who never takes coffee breaks.</p>



<p class="wp-block-paragraph">And Devin’s not alone &#8211; these big dreams are starting to catch on with industry leaders everywhere. CEO of Anthropic, Dario Amodei, says AI will write all code for software engineers within a year. Meta CEO Mark Zuckerberg claims <strong>AI will replace mid-level engineers</strong>.</p>



<p class="wp-block-paragraph">These ambitions have not gone unnoticed, and software engineers are beginning to wonder when they will be replaced.</p>



<p class="wp-block-paragraph"><strong>But that day isn’t here yet</strong>. The reason?</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">AI agents still lack some essential qualities that make software engineers &#8211; and people in general &#8211; truly capable, like real agency. Turns out, there’s more to being a good engineer than just writing code.</p>
</blockquote>



<h2 class="wp-block-heading"><span id="what-is-agency">What is agency?</span></h2>



<p class="wp-block-paragraph">Agency is typically defined as the <strong>ability of an individual to make meaningful choices and act on them</strong> in ways that influence their life and environment.</p>



<p class="wp-block-paragraph">Key ingredients of agency? Autonomy, intentionality, capability, and a sprinkle of responsibility!</p>



<p class="wp-block-paragraph">Individuals with high agency are <strong>intrinsically motivated</strong>. They believe they have the capability to take proactive action toward their goals and feel responsible for their success. They don’t rely on outside input or instructions &#8211; they find their own path.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="568" height="1024" src="https://shiftmag.dev/wp-content/uploads/2025/04/a1--568x1024.png?x94846" alt="" class="wp-image-5187" srcset="https://shiftmag.dev/wp-content/uploads/2025/04/a1--568x1024.png 568w, https://shiftmag.dev/wp-content/uploads/2025/04/a1--167x300.png 167w, https://shiftmag.dev/wp-content/uploads/2025/04/a1--768x1384.png 768w, https://shiftmag.dev/wp-content/uploads/2025/04/a1--scaled.png 1166w" sizes="auto, (max-width: 568px) 100vw, 568px" /></figure>



<p class="wp-block-paragraph">On the contrary, individuals with low agency tend to be more passive, <strong>relying on constant external stimuli to take action</strong>. For them, life feels more shaped by fate and luck than by their own decisions.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">Software engineers are selected for their technical skills, soft skills, and cultural fit, but proactiveness and autonomy &#8211; agency &#8211; are just as important. Engineers with high agency are goal-driven, problem solvers who add great value. They stay ahead of tech trends and often shape, rather than just fit into, company culture.</p>
</blockquote>



<p class="wp-block-paragraph">Software engineers are expected to have agency to do their jobs well. So, if AI is going to take over, it better have some agency, too! Let’s see if it has what it takes.</p>



<h2 class="wp-block-heading"><span id="ai-that-thinks-before-it-speaks">AI that thinks before it speaks</span></h2>



<p class="wp-block-paragraph">Most LLMs today are of an <strong>instruction-based nature</strong>. You can ask them a question, and they will provide a detailed answer.</p>



<p class="wp-block-paragraph">The first well-known example of this type of LLM was the GPT-3.5-turbo model, more famously known as ChatGPT. Over time, these models <strong>have improved significantly at answering questions</strong>.</p>



<p class="wp-block-paragraph">Today, some of the most capable instruction-based LLMs include GPT-4.5 from OpenAI, Gemini 2.5 Pro from Google, Grok 3 from xAI, and Deepseek V3 from Deepseek.</p>



<p class="wp-block-paragraph">These LLMs are good at answering questions, but for harder problems that require multiple steps of reasoning, they are used with <strong>Chain-of-Thought (CoT) prompting</strong>.</p>



<p class="wp-block-paragraph">To encourage thinking and gradual progress toward answers to more difficult problems, LLMs have shown great performance when instructed with the CoT prompting technique. When instructing the LLM to solve a problem, we ask it to think step-by-step, which boosts the LLM&#8217;s performance. When answering a user&#8217;s problem, <strong>LLMs now break their answer into multiple steps</strong>, increasing the likelihood that they will not overlook something and will arrive at a true answer to the problem.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">As this technique has proven beneficial, the industry has also come up with new types of models &#8211; <strong>reasoning models</strong>. These LLMs are natively trained to think before providing a final answer to the user. Once given a problem, the model begins thinking out loud about its reasoning process and, after arriving at a conclusion, presents the final answer to the user. </p>
</blockquote>



<p class="wp-block-paragraph">Examples of these models include O1 from OpenAI, DeepSeek R1 from DeepSeek, and Gemini 2.0 Flash Thinking from Google.</p>



<h2 class="wp-block-heading"><span id="say-hello-to-agent">Say hello to Agent</span></h2>



<p class="wp-block-paragraph">So, CoT prompting and reasoning have enabled LLMs to solve complex problems, but in order for them <strong>to take actions or observe results to solve broader issues</strong>, like checking the weather in your town or placing an order, we need to give them tools.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">A system that can take actions to solve a user&#8217;s problem is considered an <strong>Agent</strong>.</p>
</blockquote>



<p class="wp-block-paragraph">To help LLMs become AI agents, <span style="box-sizing: border-box; margin: 0px; padding: 0px;">a novel <a href="https://www.promptingguide.ai/techniques/react" target="_blank">ReAc</a></span>t pattern was introduced along with tools, giving them the ability to think and act more dynamically.</p>



<p class="wp-block-paragraph">LLMs are instructed to think in cycles of <strong>Thought</strong>, <strong>Action</strong>, and <strong>Observation</strong>. When given a problem, the LLM first reasons about what to do (Thought), then outputs an instruction (Action) that an external program can interpret and execute.</p>



<p class="wp-block-paragraph">For example, the action might be an API call or a simple calculation. Once the action is carried out, the result is returned to the LLM (Observation), which it uses to decide on the next step (another Thought). This cycle repeats until the AI agent completes the task.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">AI agents have been gaining popularity lately, and much of the industry is racing to build practical, helpful versions. One of the most talked-about types is the <strong>desktop-controlling agent</strong>. Given a task (say, booking a flight to Spain) these agents can perform real actions on your desktop that lead to actual results, like a confirmed ticket. </p>
</blockquote>



<p class="wp-block-paragraph">Notable examples include <strong>Operator</strong> from OpenAI and Computer Use from Anthropic.</p>



<p class="wp-block-paragraph">If AI agents can now perform actions on behalf of software engineers &#8211; and even see their desktops &#8211; what’s stopping them from fully replacing engineers in everyday tasks? <strong>The main limitation is</strong> <strong>agency</strong>.</p>



<h2 class="wp-block-heading"><span id="what%e2%80%99s-stopping-ai-from-being-like-humans">What’s stopping AI from being like humans?</span></h2>



<p class="wp-block-paragraph">AI agents still haven&#8217;t reached the level of agency that human software engineers possess. They continue to lack several key qualities that define true agency in individuals:</p>



<ul class="wp-block-list">
<li><strong>Full autonomy</strong> – AI agents need external instructions and inputs to complete a task. They aren’t yet capable of discovering value on their own or pursuing goals without being explicitly told to. To reach true autonomy, they&#8217;d need the ability to initiate meaningful action independently.</li>



<li><strong>Sensing the full environment</strong> – They still lack the ability to perceive the world like humans do. Full agency would require access to all human senses and the ability to act on them &#8211; through speech, physical actions, and even emotional understanding.</li>



<li><strong>Intentionality</strong> – AI agents only begin acting when a user prompts them. We haven’t yet discovered a way to give them a built-in value system that would guide them toward universal goals and push them to act on their own initiative.</li>



<li><strong>Capability</strong> – To be fully capable, AI agents would need more than just data -they’d need the full range of human senses and the power to interact with the world in complex ways.</li>



<li><strong>Responsibility</strong> – AI agents can correct their mistakes and even apologize, but they don’t truly carry the weight of responsibility. They still require humans to guide, initiate, and finalize their tasks.</li>
</ul>



<h2 class="wp-block-heading"><span id="agency-agi">Agency = AGI?</span></h2>



<p class="wp-block-paragraph">Until AI agents fully develop the capabilities tied to human agency, they won’t replace software engineers. Instead, they’ll remain invaluable tools, not complete teammates.</p>



<p class="wp-block-paragraph">Although AI agents have made great progress in automating tasks &#8211; clicking buttons, booking tickets, and more &#8211; they still lack true agency. They follow instructions effectively, but <strong>they’re not yet capable of independently setting goals, adapting to new situations, or thinking on their feet</strong>.</p>



<p class="wp-block-paragraph">Real agency would be more than just an upgrade &#8211; it would signal a leap toward Artificial General Intelligence (AGI), and we&#8217;re not quite there&#8230; yet.</p>
<p>The post <a href="https://shiftmag.dev/what-is-agency-why-ai-agents-lack-it-5162/">What is Agency, Why AI Agents Lack It, and Why You Should Hire for It</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Tejas Kumar: The future of AI isn’t LLMs, but affordable small language models</title>
		<link>https://shiftmag.dev/tejas-kumar-the-future-of-ai-isnt-llms-but-affordable-small-language-models-4318/</link>
		
		<dc:creator><![CDATA[Marin Pavelić]]></dc:creator>
		<pubDate>Tue, 08 Oct 2024 13:02:38 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Event]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[Shift Conference]]></category>
		<category><![CDATA[Shift Zadar 2024]]></category>
		<category><![CDATA[Tejas Kumar]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=4318</guid>

					<description><![CDATA[<p>Are you tired of the AI hype? Let’s see what it can really do.</p>
<p>The post <a href="https://shiftmag.dev/tejas-kumar-the-future-of-ai-isnt-llms-but-affordable-small-language-models-4318/">Tejas Kumar: The future of AI isn’t LLMs, but affordable small language models</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="800" height="480" src="https://shiftmag.dev/wp-content/uploads/2024/09/Tejas-Kumar-shift-202469.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2024/09/Tejas-Kumar-shift-202469.png 800w, https://shiftmag.dev/wp-content/uploads/2024/09/Tejas-Kumar-shift-202469-300x180.png 300w, https://shiftmag.dev/wp-content/uploads/2024/09/Tejas-Kumar-shift-202469-768x461.png 768w" sizes="auto, (max-width: 800px) 100vw, 800px" /></figure>


<p class="has-text-align-left wp-block-paragraph"><strong>Tejas Kumar</strong>, an AI DevRel Engineer at DataStax, took the stage at the Infobip Shift conference with a no-hype, straight-to-the-point talk on AI.</p>



<p class="has-text-align-left wp-block-paragraph">He broke down what AI engineering looks like today, sharing techniques for cutting costs, avoiding hallucinations, and what’s going to be key for building the next wave of AI systems.</p>



<h1 class="wp-block-heading"><span id="rag-solves-the-top-3-ai-limitations">RAG solves the top 3 AI limitations</span></h1>



<p class="wp-block-paragraph">The main limitations developers face today when working with AI are <strong>hallucinations, knowledge cutoffs, and finite context windows</strong>. Tejas believes that these three &#8220;flies&#8221; can be swatted in one strike using a technique called <strong>Retrieval-Augmented Generation (RAG)</strong>, which combines pre-trained language models with a real-time data retrieval system:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph"><em>With RAG, you fetch data from an authoritative source and use it to enhance or alter the generated text from an LLM. This data reaches the LLM through prompt engineering.</em></p>
</blockquote>



<p class="wp-block-paragraph">Tejas demonstrated how RAG works with a simple example. Kumar illustrated the RAG process with just a few clicks: <strong>he inputs a webpage into an embedding model, which then numerically encodes the data.</strong></p>



<p class="wp-block-paragraph">This model performs a similarity search,<strong> pulling relevant information from the database to answer the user&#8217;s question.</strong> This process ensures that responses are based on the most up-to-date information, effectively eliminating hallucinations common in LLMs like GPT.</p>



<h1 class="wp-block-heading">Chatbots are boring &#8211; AI should feel real</h1>



<p class="wp-block-paragraph">AI chatbots are everywhere today, but Tejas believes they&#8217;re mostly boring. <strong>They serve a purpose, but that purpose is very narrowly defined.</strong> That&#8217;s why Tejas offers an example of how a chatbot can be used more broadly-like searching Netflix.</p>



<p class="wp-block-paragraph">Tejas entered &#8220;movies with a strong female lead&#8221; into Netflix&#8217;s search system, which traditionally might return incorrect or no results. However, if a search system uses semantic AI in the background—understanding the meaning of the user&#8217;s query rather than just keywords &#8211; <strong>the user experience can be significantly enhanced:</strong></p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph"><em>With semantic search, we improve search results and generate interactive user interfaces that understand user intent on demand.</em></p>
</blockquote>



<p class="wp-block-paragraph">Tejas illustrated how DataStax developed a tool for semantic search that not only delivers accurate results for such queries<strong> but can generate an interactive user interface (UI) on demand</strong>. This means that by typing &#8220;movies with a strong female lead,&#8221; Netflix could present relevant movie posters and trailers. This kind of interactive UI represents the future of AI, where developers can use models like Langflow to integrate AI into applications without disrupting the user experience, Tejas emphasized:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph">As developers, we have a responsibility to our users. We must build AI experiences beyond simple chatbots and deliver real, purposeful interactions.</p>
</blockquote>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="800" height="480" src="https://shiftmag.dev/wp-content/uploads/2024/09/Tejas-Kumar-shift-202470-1.png?x94846" alt="" class="wp-image-4330" srcset="https://shiftmag.dev/wp-content/uploads/2024/09/Tejas-Kumar-shift-202470-1.png 800w, https://shiftmag.dev/wp-content/uploads/2024/09/Tejas-Kumar-shift-202470-1-300x180.png 300w, https://shiftmag.dev/wp-content/uploads/2024/09/Tejas-Kumar-shift-202470-1-768x461.png 768w" sizes="auto, (max-width: 800px) 100vw, 800px" /><figcaption class="wp-element-caption">Filip Popović/Infobip Shift</figcaption></figure>



<h1 class="wp-block-heading"><span id="ssms-instead-of-llms">SSMs instead of LLMs?</span></h1>



<p class="wp-block-paragraph">Looking ahead, Tejas sees a <strong>shift from general-purpose LLMs to small specialized models (SSMs)</strong>, which is his (unofficial) term for AI systems tailored to specific tasks:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph"><em>What if, instead of models like GPT-4 with 600 billion parameters, we had a smaller model with 7 billion specialized parameters? That&#8217;s the future, and that&#8217;s where we should invest.</em></p>
</blockquote>



<p class="wp-block-paragraph">Tejas believes companies will turn to <strong>smaller models focused on individual needs</strong>. That way developers will drastically cut costs while maintaining good product performance.</p>



<h1 class="wp-block-heading"><span id="building-responsible-ai-must-come-first">Building Responsible AI Must Come First</span></h1>



<p class="wp-block-paragraph">AI must be developed ethically, and one of the key things to watch out for is what Tejas calls &#8220;authority bias&#8221; &#8211; where <strong>users assume that results generated by AI are always correct</strong> simply because they come from an authoritative-sounding source:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph"><em>We need to be transparent about the data used to train LLMs. AI should be able to say, &#8220;Hey, this data might be wrong.&#8221;</em></p>
</blockquote>



<p class="wp-block-paragraph">The future of AI is in creating tools that allow models to recognize the limits of their capabilities. When AI can&#8217;t provide an answer, <strong>it should be able to use external tools or API</strong>s to retrieve the necessary information to ensure accuracy.</p>



<p class="wp-block-paragraph">In conclusion, Tejas encourages developers to think beyond simple chatbots because he believes the future of AI is tied to <strong>combining the power of LLMs with specialized models and dynamic interfaces that enhance user experiences.</strong></p>



<h1 class="wp-block-heading">AI won&#8217;t replace developers, but some skills will disappear</h1>



<p class="wp-block-paragraph">Tejas could also be heard further on the panel &#8220;AI-Powered Development Tools: Enhancing or Replacing Human Developers?&#8221; where he was joined by <strong>Simi Olabisi</strong>, an AI expert from Microsoft, and the discussion was moderated by our executive editor <strong>Antonija Bilić Arar</strong> on the ShiftMag stage!</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="800" height="480" src="https://shiftmag.dev/wp-content/uploads/2024/09/Tejas-Kumar-shift-202468-1.png?x94846" alt="" class="wp-image-4324" srcset="https://shiftmag.dev/wp-content/uploads/2024/09/Tejas-Kumar-shift-202468-1.png 800w, https://shiftmag.dev/wp-content/uploads/2024/09/Tejas-Kumar-shift-202468-1-300x180.png 300w, https://shiftmag.dev/wp-content/uploads/2024/09/Tejas-Kumar-shift-202468-1-768x461.png 768w" sizes="auto, (max-width: 800px) 100vw, 800px" /></figure>



<p class="wp-block-paragraph">AI will do the opposite of what people expect. <strong>It won&#8217;t replace developers</strong>; it will make them better at their job, says Simi:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph"><em>The tools we&#8217;re building at Microsoft are designed to handle repetitive tasks, allowing developers to focus on more complex and creative activities.</em></p>
</blockquote>



<p class="wp-block-paragraph">This brings us to the question of juniors and how they will learn<strong>. Simi believes they won&#8217;t need to spend time mastering basic tasks:</strong></p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p class="wp-block-paragraph"><em>Just like floppy disks became obsolete, some fundamental skills may become less important to master, but that doesn&#8217;t mean they&#8217;ll skip important lessons. They&#8217;ll face challenging tasks early in their careers.</em></p>
</blockquote>



<p class="wp-block-paragraph"><strong>Think about this as an evolution from a paintbrush to a camera. </strong>Tejas pointed out that basic tools of human creativity are still necessary to solve 70 to 80% of coding tasks, but human oversight and creativity remain essential. Simi concluded that we&#8217;re not facing any dramatic change within the next five years. Tools will advance, and AI will continue to enhance our abilities, <strong>but developers remain a key part of the entire process.</strong></p>
<p>The post <a href="https://shiftmag.dev/tejas-kumar-the-future-of-ai-isnt-llms-but-affordable-small-language-models-4318/">Tejas Kumar: The future of AI isn’t LLMs, but affordable small language models</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Engineer Explains: What are LLMs in less than 5 minutes</title>
		<link>https://shiftmag.dev/engineer-explains-what-are-llms-in-less-than-5-minutes-3908/</link>
		
		<dc:creator><![CDATA[Antonija Bilic Arar]]></dc:creator>
		<pubDate>Thu, 08 Aug 2024 10:35:03 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Video]]></category>
		<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[Emanuel Lacic]]></category>
		<category><![CDATA[Engineer Explains]]></category>
		<category><![CDATA[LLM]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=3908</guid>

					<description><![CDATA[<p>We've asked experienced engineers to share how they would explain some tech terminology at three levels of experience - from junior developer to CTO.</p>
<p>The post <a href="https://shiftmag.dev/engineer-explains-what-are-llms-in-less-than-5-minutes-3908/">Engineer Explains: What are LLMs in less than 5 minutes</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="1280" height="720" src="https://shiftmag.dev/wp-content/uploads/2024/07/emanuel_final.jpg?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2024/07/emanuel_final.jpg 1280w, https://shiftmag.dev/wp-content/uploads/2024/07/emanuel_final-300x169.jpg 300w, https://shiftmag.dev/wp-content/uploads/2024/07/emanuel_final-1024x576.jpg 1024w, https://shiftmag.dev/wp-content/uploads/2024/07/emanuel_final-768x432.jpg 768w" sizes="auto, (max-width: 1280px) 100vw, 1280px" /></figure>


<p class="wp-block-paragraph">We all know about Lange Language Models, but do we know what Large Language Models are actually?<br><br>LLMs are highly sophisticated <strong>deep-learning models</strong> trained on vast amounts of data so they can predict the next word in a row. They can process and generate human language.  <br> <br>But that’s just the surface-level explanation. We&#8217;ve asked <strong>Emanuel Lacic</strong>, senior researcher and principal engineer at Infobip, to explain LLM as he would to a junior engineer, a senior engineer, and a CTO. <br></p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Large Language Models – Explained" width="500" height="281" src="https://www.youtube.com/embed/_E1eJ5riYTY?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div></figure>



<p class="wp-block-paragraph">This video is a part of ShiftMag’s <strong>video series, <a href="https://www.youtube.com/@ShiftMag/videos" target="_blank" rel="noreferrer noopener">Engineer Explains</a>.</strong></p>



<p class="wp-block-paragraph">We’ve asked experienced engineers to share how they would explain some basic and some less basic tech terminology to different tech job titles or at three levels of experience — <strong>from junior developer to CTO.</strong><br><br><strong>More:</strong><br>How would you explain <a href="https://www.youtube.com/watch?v=qtxHm09FH_M" target="_blank" rel="noreferrer noopener">APIs</a>, <a href="https://www.youtube.com/watch?v=Rxi3fHEY48c" target="_blank" rel="noreferrer noopener">internal developer platforms</a>, <a href="https://www.youtube.com/watch?v=BqsTQWhyngg&amp;t=9s" target="_blank" rel="noreferrer noopener">software architecture</a>, <a href="https://www.youtube.com/watch?v=5aRuyTIoMys">software testing</a>, <a href="https://www.youtube.com/watch?v=s_Igmd5GpDg&amp;t=5s">scaling infrastructure </a>without breaking the bank,  <a href="https://www.youtube.com/watch?v=VhhkK0zCY7I&amp;t=42s">low-code as a dev tool</a>, <a href="https://www.youtube.com/watch?v=WgysaqzYMU0&amp;t=21s">what is a database</a>, <a href="https://www.youtube.com/watch?v=v2-wsawNurI" target="_blank" rel="noreferrer noopener">Network APIs</a>, <a href="https://www.youtube.com/watch?v=1tqWJwZQnkM">Developer Relations</a> or <a href="https://www.youtube.com/watch?v=z5f4eTaKu04&amp;t=207s">observability </a>at three levels of experience?</p>
<p>The post <a href="https://shiftmag.dev/engineer-explains-what-are-llms-in-less-than-5-minutes-3908/">Engineer Explains: What are LLMs in less than 5 minutes</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Hallucinations, prompts, cost, and other challenges we faced when creating an LLM-powered chatbot</title>
		<link>https://shiftmag.dev/creating-an-llm-powered-chatbot-1728/</link>
		
		<dc:creator><![CDATA[ShiftMag]]></dc:creator>
		<pubDate>Tue, 24 Oct 2023 09:07:12 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[chatbot]]></category>
		<category><![CDATA[large language models]]></category>
		<category><![CDATA[LLM]]></category>
		<guid isPermaLink="false">https://shiftmag.dev/?p=1728</guid>

					<description><![CDATA[<p>The motivation behind the project was to enable clients to easily create their own business chatbot powered by LLM.</p>
<p>The post <a href="https://shiftmag.dev/creating-an-llm-powered-chatbot-1728/">Hallucinations, prompts, cost, and other challenges we faced when creating an LLM-powered chatbot</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="1200" height="630" src="https://shiftmag.dev/wp-content/uploads/2023/10/LLM-ChatGPT-1.png?x94846" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="" style="object-fit:cover;" srcset="https://shiftmag.dev/wp-content/uploads/2023/10/LLM-ChatGPT-1.png 1200w, https://shiftmag.dev/wp-content/uploads/2023/10/LLM-ChatGPT-1-300x158.png 300w, https://shiftmag.dev/wp-content/uploads/2023/10/LLM-ChatGPT-1-1024x538.png 1024w, https://shiftmag.dev/wp-content/uploads/2023/10/LLM-ChatGPT-1-768x403.png 768w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /></figure>


<h2 class="wp-block-heading"><span id="motivation"><strong>Motivation</strong></span></h2>



<p class="wp-block-paragraph">With the arrival of ChatGPT and other LLMs (Large Language Models), chatbots have experienced a revolution. Talking to a chatbot has become much <strong>more like talking to a human</strong>. The motivation behind the project was to enable clients to easily <strong>create their own business chatbot powered by LLM</strong>.</p>



<p class="wp-block-paragraph">These chatbots should:</p>



<ul class="wp-block-list">
<li>Help end users solve their problems faster.</li>



<li>Represent the brand’s values.</li>



<li>Intelligently transfer to human agents when needed.</li>
</ul>



<p class="wp-block-paragraph">Today, we enable our clients to build LLM chatbots in just 5 minutes through <a href="https://www.infobip.com/answers" target="_blank" rel="noreferrer noopener">Infobip&#8217;s Answers platform</a>.</p>



<h2 class="wp-block-heading"><span id="challenges"><strong>Challenges</strong></span></h2>



<p class="wp-block-paragraph">Mainstream adoption of LLM-s is growing, but it is still recent and comes with some new challenges.</p>



<p class="wp-block-paragraph"><strong>Missing data</strong></p>



<p class="wp-block-paragraph">LLMs, specifically ChatGPT, are trained allegedly with data up to September 2021. Clients may care about more recent data and about data specific to their business.</p>



<p class="wp-block-paragraph">How to embed the knowledge in the chatbot that it was not trained with?</p>



<p class="wp-block-paragraph">One technique that has proven good is<strong> <em>in-context</em> learning</strong>.</p>



<p class="wp-block-paragraph">Given a user question, we retrieve and provide small sections of <em>relevant</em> document chunks to the LLM chatbot. The chatbot should generate a response using the information in these chunks.</p>



<h2 class="wp-block-heading"><span id="hallucinations"><strong>Hallucinations</strong></span></h2>



<p class="wp-block-paragraph">ChatGPT is an autoregressive <em>probabilistic</em> model.</p>



<p class="wp-block-paragraph">Given an input, ChatGPT predicts the token (word piece) that should come next, feeds it back with the input, and repeats the process until the <em>end</em> token is predicted. Since it is a probabilistic model, it may output something that is misleading or incorrect.</p>



<p class="wp-block-paragraph">Working with digital insurance company LAQO, this was something we had to keep an eye on. In their industry, it is important to <strong>keep responses as accurate as possible</strong> as they may have legal consequences.</p>



<p class="wp-block-paragraph">An example of hallucination could be a chatbot telling the user that LAQO covers some costs which are, in fact, not covered by the insurance type.</p>



<p class="wp-block-paragraph"><strong>There is currently no solution for hallucinations; ev</strong>en the most powerful models like GPT-4 hallucinate.</p>



<p class="wp-block-paragraph">Some things that could be done:</p>



<ul class="wp-block-list">
<li>Model parameters like <strong>temperature can be adjusted</strong> to lessen the chance of hallucination</li>



<li><strong>Well-crafted prompts</strong> with instructions specifically for the client’s business tend to help with the hallucinations. Usually, we can do better than generic prompts, which are available with popular LLM frameworks.</li>



<li><strong>Chatbot is constrained</strong> to only respond to topics from retrieved data chunks</li>
</ul>



<h2 class="wp-block-heading"><span id="prompts"><strong>Prompts</strong></span></h2>



<p class="wp-block-paragraph">Writing prompts or, in other words, instructions for LLMs is tricky. Having a good understanding of how an LLM works helps a lot. Being patient and willing to experiment with different ways of stating an instruction may help even more.</p>



<p class="wp-block-paragraph"><strong>Different LLMs have different quirks</strong>. ChatGPT and similar LLMs are trained to follow instructions, but they are not perfect. Using simple and clear instructions helps.</p>



<h2 class="wp-block-heading"><span id="response-format"><strong>Response format</strong></span></h2>



<p class="wp-block-paragraph">Sometimes, LLM may produce a response in a format that is not desired. It may even “leak” prompt/system message details that are like “internal” instructions. Following instructions is something that is being constantly worked on for LLMs.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="538" src="https://shiftmag.dev/wp-content/uploads/2023/10/machine-learning-1024x538.png?x94846" alt="" class="wp-image-1731" srcset="https://shiftmag.dev/wp-content/uploads/2023/10/machine-learning-1024x538.png 1024w, https://shiftmag.dev/wp-content/uploads/2023/10/machine-learning-300x158.png 300w, https://shiftmag.dev/wp-content/uploads/2023/10/machine-learning-768x403.png 768w, https://shiftmag.dev/wp-content/uploads/2023/10/machine-learning.png 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading"><span id="context-window"><strong>Context window</strong></span></h2>



<p class="wp-block-paragraph">Current LLMs typically have a <em>small</em> context window, which is the amount of information (tokens) we can feed into the LLM, and an LLM can generate in a single request. Using multiple LLM requests may be one strategy to deal with a limited context window, but this comes with <strong>increased costs and latency.</strong></p>



<h2 class="wp-block-heading"><span id="retrieval"><strong>Retrieval</strong></span></h2>



<p class="wp-block-paragraph">Retrieval is about finding small (3-4) chunks of data relevant to the question. There are many libraries or databases that can be used for the problem.</p>



<p class="wp-block-paragraph">It becomes tricky to find relevant chunks when there is a chat history.</p>



<p class="wp-block-paragraph">The retrieval system should consider a follow-up question but also previous questions. Exact search (slower, more accurate) makes sense over approximate search (faster, less accurate) for smaller documentation.</p>



<h2 class="wp-block-heading"><span id="cost"><strong>Cost</strong></span></h2>



<p class="wp-block-paragraph"><strong>LLM’s are a costly business.</strong> Even open-source solutions may require a lot of expensive hardware to operate at scale.</p>



<h2 class="wp-block-heading"><span id="latency"><strong>Latency</strong></span></h2>



<p class="wp-block-paragraph">Making multiple calls to an LLM will make end-users wait longer for a response, which may impact customer satisfaction.</p>



<h2 class="wp-block-heading"><span id="lora"><strong>LoRA</strong></span></h2>



<p class="wp-block-paragraph">In-context learning is a great technique, but we are also experimenting with “fine-tuning” open-source models with proprietary data.</p>



<p class="wp-block-paragraph">LoRA and many variants of this technique make the fine-tuning process much more accessible in terms of costs and time. So far, we had great results fine-tuning image generation models (Stable Diffusion). We were able to teach the Diffusion model to <strong>generate images about items specific to brands</strong>.</p>



<h2 class="wp-block-heading"><span id="next-steps"><strong>Next steps</strong></span></h2>



<p class="wp-block-paragraph">We&#8217;re working on creating a multimodal and multipersonality chatbot agent who will be able to also handle transactions. More details to come.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="520" src="https://shiftmag.dev/wp-content/uploads/2023/10/image-9-1024x520.png?x94846" alt="" class="wp-image-1864" srcset="https://shiftmag.dev/wp-content/uploads/2023/10/image-9-1024x520.png 1024w, https://shiftmag.dev/wp-content/uploads/2023/10/image-9-300x152.png 300w, https://shiftmag.dev/wp-content/uploads/2023/10/image-9-768x390.png 768w, https://shiftmag.dev/wp-content/uploads/2023/10/image-9.png 1348w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph"><em>This article was written by <strong>Danijel Temraz</strong>, Principal Engineer at Infobip, and <strong>Martina Ćurić</strong>, Staff Engineer at Infobip.</em></p>
<p>The post <a href="https://shiftmag.dev/creating-an-llm-powered-chatbot-1728/">Hallucinations, prompts, cost, and other challenges we faced when creating an LLM-powered chatbot</a> appeared first on <a href="https://shiftmag.dev">ShiftMag</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/?utm_source=w3tc&utm_medium=footer_comment&utm_campaign=free_plugin

Page Caching using Disk: Enhanced 

Served from: shiftmag.dev @ 2026-06-27 18:34:28 by W3 Total Cache
-->