LLM Archives - ShiftMag

Forget the Model, It’s Workflows That Make LLM Products Run

Marko Crnjanski — Thu, 05 Feb 2026 14:18:39 +0000

From his experience leading AI product teams, Andrew Mende (Senior Product Manager, Machine Learning at Booking.com) explained what it truly takes to ship LLM-based products in production.

Making AI products reliable requires new workflows

For Mende, the buzz around AI is a rare shift, like the rise of smartphones. But what does it mean for product teams?

This moment unlocks new ways of solving customer problems that were previously impossible due to technical constraints.

He was clear: traditional product management approaches often fail with AI-driven products.

LLM-based systems behave differently, demand new workflows, and bring new types of risk.

Unlike deterministic software, LLMs are probabilistic (identical inputs can produce different outputs), making experimentation easy but production readiness challenging, and forcing teams to rethink how they test, evaluate, and monitor features.

One of the biggest traps, Mende explained, is confusing a successful prototype with a scalable solution:

It’s easy to paste a prompt into ChatGPT and see results; much harder to make it reliable across thousands of real customer inputs.

Teams need structured datasets, big tables of real customer examples, to track accuracy, spot regressions, and see if changes actually work. Without them, it’s all guesswork.

Focus on accuracy, cost, and speed

Mende’s practical approach to model selection focuses on accuracy, cost, and latency: start with the most capable model to see if the problem can be solved, then move to smaller or faster models to optimize performance.

This requires testing multiple configurations (context size, prompts, and parameters) since even small changes affect results. Beyond the model, context selection, prompt instructions, and external tools are critical:

For example, when a customer asks about a specific order, the system should fetch real-time data instead of relying on static knowledge. This combination of LLMs and tools turns simple prompts into full systems, but also increases complexity and maintenance costs.

LLMs can transform how users interact – if teams build the right infrastructure

Mende concluded his How to Web lecture by saying LLMs shine by transforming user interaction: for the first time, digital products can understand plain language, turning customer requests directly into actions.

This shift brings digital experiences closer to human conversations and enables new product patterns that were out of reach just a few years ago.

The challenge now, Mende explained, is not whether LLMs work, but whether teams are willing to build the evaluation, monitoring, and infrastructure required to make them truly useful.

The post Forget the Model, It’s Workflows That Make LLM Products Run appeared first on ShiftMag.

Shift is Coming to Asia – and We’re Giving Away 10 Tickets!

ShiftMag — Fri, 17 Oct 2025 14:43:35 +0000

Infobip Shift, one of Europe’s leading developer conferences, is making its Asian debut on November 4th, 2025!

In partnership with Cradle, Malaysia’s startup ecosystem builder, we’re bringing developers, founders, and innovators from across ASEAN together in Kuala Lumpur – building connections, sparking ideas, and shaping the future of software.

“By bringing Shift to Malaysia, we’re giving Southeast Asian developers the chance to learn, connect, and join the same conversations shaping tech in Europe and the US,” says Stipe Cigic, Head of the Infobip Shift team:

The city has a fast-growing, diverse tech community, full of talent, curiosity, and energy – but without many events that truly bring developers together in one place.

Everything is AI. Are you ready?

Whether you’re working with AI at scale or exploring it for the first time, Shift KL brings together developers, researchers, and founders to share knowledge and insights.

From copilots and LLMs to agentic workflows and ethical challenges, AI is reshaping how software is built – and Shift KL is where you can see it in action and discuss what it means for your work.

World-class experts including Tejas Kumar (Developer Advocate, IBM), Dugald Morrow (Principal Developer Advocate, Atlassian), and Joyce Lin (Lead Tech Educator, LMArena) will share their perspectives on the latest AI trends and practical ways to apply them.

For Shift, it’s also a way of staying true to what we’ve always stood for: connecting developers wherever they are, and building a global community that keeps learning and growing together.

Shift KL will also be third event on a third continent in the same year!

Get your ticket!

Calling all developers, tech aficionados, and industry experts!

The first 10 applicants to complete the form get free tickets, and the next 10 receive a 30% discount for Shift Kuala Lumpur 2025!

The post Shift is Coming to Asia – and We’re Giving Away 10 Tickets! appeared first on ShiftMag.

How We Built an AI Learning Assistant – Approved by Teachers

Jelena Matecic — Fri, 29 Aug 2025 11:10:21 +0000

Textbooks are full of rich knowledge, but let’s face it: students often miss the good stuff. Important facts get buried in small print, side notes, or skipped pages, and curiosity can fade fast.

That got us thinking: what if an AI assistant could make learning more interactive, personal, and fun?

In this article, I’ll share how the AI Base Engineering team at Infobip – of which I’m a member – built a prototype AI tutor for biology. I’ll explain why we chose this subject, how we tested it, and the key lessons we learned along the way.

Spoiler: it’s not about replacing teachers – it’s about helping students learn in new and meaningful ways.

So… Why an AI study buddy?

Our challenge was simple but ambitious: make textbook content more accessible, engaging, and curiosity-driven.

Biology was the perfect testing ground: it’s well-structured, widely taught, and available in digital form. For our prototype, we used official Croatian school textbooks, spanning 7th grade through the second year of high school.

The goal? To support every kind of learner – those falling behind, those racing ahead, and everyone in between. The AI assistant acts like a responsive study buddy: highlighting overlooked facts, answering questions from verified sources, and adapting explanations to each student’s level of understanding.

And just to be clear: this was never about replacing teachers. Our vision is to help students learn and engage more deeply, while keeping teachers central to the process.

Our AI tutor explains, not just defines

To build a reliable assistant, we grounded everything in the curriculum. Using trusted digital textbook content, we crafted precise prompts to guide the assistant toward clarity, simplicity, and curiosity-driven learning.

One of our favorite tactics? The assistant doesn’t just dump definitions. Instead, it might explain photosynthesis like this:

Plants are like little factories; sunlight turns water and air into sugar. Do you want to know more about how that happens?

We also trained the assistant to ask questions back: A Socratic approach that encourages critical thinking. It doesn’t just answer; it engages.

We taught our AI to talk the talk

Designing the assistant’s tone in Croatian was no small feat. The language includes formality distinctions and gendered grammar, so we had to strike a delicate balance: friendly, but not too casual; professional, but not robotic.

We also taught it to respond to tricky situations – from inappropriate language to sensitive topics like human reproduction – with calm professionalism and respect. When students pushed boundaries, the assistant didn’t scold; it simply guided them back toward curious, respectful inquiry.

To meet students where they already are, we brought the assistant to WhatsApp. And with Infobip’s Voice API, they can ask questions or get answers as voice messages. The result? A judgment-free, always-available biology buddy – just a tap (or a voice note) away.

When AI gets creative (and sometimes WRONG)

Let’s address the elephant in the room: the hallucinations.

Like all LLMs, ours sometimes got a bit too creative. Ask for an example? It might cheerfully invent one from thin air. Say hi? You could end up with a TED Talk on evolution. Ask who’s stronger, a lion or a wolf? You might get a philosophical journey through mammal diets, fur types, and migration patterns.

These hallucinations were part of the process – and even charming at times – but accuracy is essential in education. We improved prompts and curated the assistant’s knowledge base more tightly to fix this. Hallucinations might not disappear entirely, but we learned how to keep the assistant on track.

After all, when a student asks about mitosis, they shouldn’t end up hearing about whales.

Test drive, phase by phase: first staff, then students!

Phase 1: Internal Pilot

Our first testers were internal education and tech staff. They knew what to look for and how to break things. Their feedback helped iron out glitches and set a strong foundation.

Phase 2: Teacher Feedback

Next, we brought in real teachers. They tested the assistant against real student questions. Could it explain clearly? Did it stay age-appropriate? Was it pedagogically sound?

The feedback surprised us in a good way. Teachers appreciated the assistant’s thoroughness. When students asked if they could use the assistant during tests, it responded with integrity:

That wouldn’t be correct. But I can help you prepare by giving you 10 questions and evaluating your answers.

Not hardcoded, just good training.

Phase 3: Student Trials

Finally, students used the assistant in a classroom setting. They used it like a study buddy, asking it to quiz them or explain tricky terms. The results? Excited engagement.

They loved the follow-up questions that kept the conversation going. They liked the longer answers.

The only complaint? Voice messages sounded robotic! And yes, it sometimes reads formatting symbols out loud (literally saying “star” instead of bolding).

How AI Can Help Students Learn and Engage

Here’s what we saw, again and again: AI can help students learn and engage by providing:

Instant help – Students can ask questions privately, anytime, without fear of judgment.
Personalized explanations – If one metaphor doesn’t work, the assistant tries another.
Active learning – With questions like “Can you think of household acids?”, the assistant nudges students to connect concepts to real life.
A safe space – For shy students, the AI is a no-pressure place to be curious.

Notably, the assistant always encouraged students to verify with their teacher and the textbook. Teachers remain the core of the classroom, and the assistant is just that, an assistant.

Lessons Learned and the Road Ahead

This project started with a simple goal: help students get unstuck. Along the way, it became a deeper exploration of what AI can do in education. What we discovered is this: with careful design and clear boundaries, AI can enhance learning and engagement – complementing, not replacing, human teaching.

Success comes down to the details: prompt phrasing, tone, voice, UX, and content quality. Teacher and student feedback proved invaluable, showing how much students respond when learning feels personal, responsive, and judgment-free.

Next steps? We’ll improve voice UX, expand to new subjects, and keep gathering feedback to make the experience even better.

And to educators and tech innovators alike: building an AI assistant isn’t just a coding exercise, it’s a collaborative effort between tech and teaching. Done right, it becomes more than a tool – it becomes a trusted companion in the learning journey.

And if one more student walks away thinking, “Hey, biology is kinda cool,” then we know we’ve done something right.

The post How We Built an AI Learning Assistant – Approved by Teachers appeared first on ShiftMag.

OpenAI Drops GPT-OSS, But Can It Reclaim the Open LLM Crown?

Senko Rasic — Mon, 11 Aug 2025 12:08:00 +0000

A few days ago, OpenAI released GPT-OSS, a new open-weights model (its first since 2019), in an attempt to take the state-of-the-art crown for open LLMs from its Chinese competitors.

You can be excused if that sentence makes you dizzy.

Previously, in the World of LLMs…

OpenAI, the American company behind ChatGPT, was created as a non-profit research lab back in 2015. While it initially published its research and models openly (GPT-2, Whisper), after striking gold with ChatGPT, OpenAI stopped publishing its models, citing safety reasons.

The situation changed with an accidental leak of the Llama model by Meta (Facebook’s parent company). Although it was less capable than OpenAI’s closed models, it was miles ahead of GPT-2 and the smaller, less-capable open models published by various university labs. Llama unleashed a storm of open-source activity, both in infrastructure (how to run the models) and in research (fine-tuning and customizing the models).

To its credit, Meta encouraged this adoption instead of trying to stifle it and published later models under an explicit open license.

Open models continued to be an interesting side story until January of this year, when a Chinese company called DeepSeek stunned everyone by releasing DeepSeek R1, a competitive open model trained for a fraction of the cost of US AI companies. The Chinese labs Qwen and Kimi followed with similar, also open, models.

The DeepSeek moment stunned American AI companies. The quick pace of Chinese AI progress and the massive uptake, due to the models being open, led some to worry that China is about to surpass the US in AI technology, arguing that US companies should follow suit. The recently published US government AI Action Plan also aims to “encourage open-source and open-weights AI.”

This brings us to last week, when OpenAI released a long-promised open-weights model of its own, GPT-OSS. While it is not on par with the best OpenAI, Anthropic, or Google models, its release acknowledges that open models are here to stay.

Not so open-source after all

What makes a large language model (LLM) open, and why should we care?

In contrast to closed models (like GPT, Claude, and Gemini), which can only be used via an official API, anyone can run open models on their own infrastructure or on third-party infrastructure providers. The architecture of open models can be analyzed, and researchers from other AI labs can learn from their design choices.

To draw a parallel with open-source software, an open-source model would publish the training and inference source code (the “engine”), the model weights (the result of training, akin to compiled code for desktop or mobile apps), and the training data (the source data the model was trained on) under an open and permissible license, like MIT or Apache.

The source code is the least controversial part: there are many high-quality open-source LLM tools that support a wide variety of models, like llama.cpp, VLLM, Hugging Face, and LM Studio. Support for new open models is usually added within days of their publication.

The situation for model weights is a bit trickier. Many labs publish these under licenses that limit usage for potential competitors, restrict certain uses, or even ban usage in certain parts of the world. Meta has notoriously claimed its Llama models are “open-source” while using such a restrictive license. DeepSeek and Qwen also add restrictions to their model weights licenses, while OpenAI used the open-source Apache 2.0 license for the GPT-OSS model weights.

Allowing the use of model weights and source code under a permissive license is sufficient for most users, but it doesn’t go far enough: you can’t retrain the model from scratch if you don’t also have the training data. The problem here is that the training data for all top models almost certainly contains copyrighted material that may have been illegally obtained and used.

There are a number of ongoing court cases in the US to test this, such as those against Anthropic and Meta, which were partially won by the AI labs. However, the matter is far from settled, and it’s much safer for any company to not disclose the full dataset used in training, even for open models.

This leads us to the distinction between “open-weights” (you can use and customize the LLM) and “open-source” (you have access to all the source data and can retrain from scratch) models.

Since very few organizations have the massive computing infrastructure required to train a big model from scratch, the main sticking point about having all the source data is being able to inspect how the model was trained and how that impacted its performance.

In practice, for most users, the additional restrictions attached to model weights (like Meta’s “you can’t use Llama 4 in the EU”) are much more problematic.

Open LLMs are now powerful, accessible, and adaptable tools

Open LLMs have historically been less capable than the best models from OpenAI, Anthropic, and Google, and they also have big hardware requirements. Why would these models be anything more than a geek’s curiosity?

Start with capability. Since the DeepSeek moment, open models have come very close to the best ones. There’s still a gap, but it’s a much smaller one, and for many tasks – especially ones that don’t require state-of-the-art tech – open models can perform adequately.

The hardware capabilities of modern computers are also constantly improving. Macs, with their unified memory (where the GPU has access to all the computer’s RAM), are ideally suited to running models that require dozens or hundreds of GB of memory. With ongoing improvements in LLM architecture, training, and hardware, you can now run an LLM on your phone (Qwen3 4B) that’s more powerful than the original ChatGPT!

Moreover, there is a healthy industry of third-party inference providers, such as Groq and Cerebras (which have their own custom chips), OpenRouter, TogetherAI, Replicate, and so on.

Running an LLM locally also avoids dependence on another company that could easily revoke your usage for commercial or geopolitical reasons and avoids transferring potentially sensitive data to third parties. A recent court ruling that forced OpenAI to keep all ChatGPT chat data was a stark reminder that these are not theoretical risks.

Finally, open models can be adapted (by fine-tuning or otherwise) for a specific purpose that wasn’t considered by the original authors. This is much cheaper than training a model from scratch and allows for powerful customization for a specific need.

Are open LLMs just hype?

Is the future of LLMs open? Their recent gains in capability and popularity might be just temporary. OpenAI’s GPT-OSS is less capable than other, private OpenAI models. Meta plans to be “more rigorous with what they open-source,” citing the same safety reasons OpenAI uses. The Chinese labs may decide to stop publishing theirs.

On the other hand, with so many different companies involved in cutting-edge AI research and many opening at least some of their models, there is plenty of fertile ground for further innovation and already a lot of open models to choose from. Competition is good!

The post OpenAI Drops GPT-OSS, But Can It Reclaim the Open LLM Crown? appeared first on ShiftMag.

What is Agency, Why AI Agents Lack It, and Why You Should Hire for It

Rino Čala — Wed, 23 Apr 2025 12:59:18 +0000

ChatGPT has come out, and the whole AI industry has jumped headfirst into LLMs.

Back in the day, early models like GPT-3 were like sentence finishers on autopilot. Handy, but not exactly mind readers. Modern LLMs, though? They’re built for instructions. You say the thing, they do the thing.

Need an email, report, or essay? Done. With the right plugins, they can now search docs, generate images, and even poke around your desktop like a helpful little robot assistant.

And one area where they also shine? Writing code.

Copilots, teammates, and the road to autonomy

During pretraining, LLMs are fed enormous amounts of code, allowing them to learn syntax and best practices for producing useful, working code.

To assess the real-world usefulness of LLMs on coding tasks (and their potential economic impact), the SWE-bench Lancer dataset was introduced this year. It features over 1,400 freelance software engineering tasks sourced from Upwork, representing a total of $1 million in actual payouts. On this benchmark, the Claude 3.5 Sonnet model managed to “earn” $400,000 worth of tasks.

This kind of performance isn’t just theoretical – LLMs are already making their way into everyday development. Today, software engineers are using LLMs through tools like Copilots to assist with writing, reviewing, and understanding code.

Copilots have access to the entire codebase, allowing them to provide valuable insights and intelligent code completions to engineers.

As LLMs have shown, they’re pretty good at coding, the bar has been raised. Enter Devin AI – a company on a mission to build an AI teammate that doesn’t just help write code, but does the whole software engineering gig. We’re talking writing code, fixing its own bugs, Googling docs like a pro, and even testing the app it just built. It’s basically trying to be that one super-productive teammate who never takes coffee breaks.

And Devin’s not alone – these big dreams are starting to catch on with industry leaders everywhere. CEO of Anthropic, Dario Amodei, says AI will write all code for software engineers within a year. Meta CEO Mark Zuckerberg claims AI will replace mid-level engineers.

These ambitions have not gone unnoticed, and software engineers are beginning to wonder when they will be replaced.

But that day isn’t here yet. The reason?

AI agents still lack some essential qualities that make software engineers – and people in general – truly capable, like real agency. Turns out, there’s more to being a good engineer than just writing code.

What is agency?

Agency is typically defined as the ability of an individual to make meaningful choices and act on them in ways that influence their life and environment.

Key ingredients of agency? Autonomy, intentionality, capability, and a sprinkle of responsibility!

Individuals with high agency are intrinsically motivated. They believe they have the capability to take proactive action toward their goals and feel responsible for their success. They don’t rely on outside input or instructions – they find their own path.

On the contrary, individuals with low agency tend to be more passive, relying on constant external stimuli to take action. For them, life feels more shaped by fate and luck than by their own decisions.

Software engineers are selected for their technical skills, soft skills, and cultural fit, but proactiveness and autonomy – agency – are just as important. Engineers with high agency are goal-driven, problem solvers who add great value. They stay ahead of tech trends and often shape, rather than just fit into, company culture.

Software engineers are expected to have agency to do their jobs well. So, if AI is going to take over, it better have some agency, too! Let’s see if it has what it takes.

AI that thinks before it speaks

Most LLMs today are of an instruction-based nature. You can ask them a question, and they will provide a detailed answer.

The first well-known example of this type of LLM was the GPT-3.5-turbo model, more famously known as ChatGPT. Over time, these models have improved significantly at answering questions.

Today, some of the most capable instruction-based LLMs include GPT-4.5 from OpenAI, Gemini 2.5 Pro from Google, Grok 3 from xAI, and Deepseek V3 from Deepseek.

These LLMs are good at answering questions, but for harder problems that require multiple steps of reasoning, they are used with Chain-of-Thought (CoT) prompting.

To encourage thinking and gradual progress toward answers to more difficult problems, LLMs have shown great performance when instructed with the CoT prompting technique. When instructing the LLM to solve a problem, we ask it to think step-by-step, which boosts the LLM’s performance. When answering a user’s problem, LLMs now break their answer into multiple steps, increasing the likelihood that they will not overlook something and will arrive at a true answer to the problem.

As this technique has proven beneficial, the industry has also come up with new types of models – reasoning models. These LLMs are natively trained to think before providing a final answer to the user. Once given a problem, the model begins thinking out loud about its reasoning process and, after arriving at a conclusion, presents the final answer to the user.

Examples of these models include O1 from OpenAI, DeepSeek R1 from DeepSeek, and Gemini 2.0 Flash Thinking from Google.

Say hello to Agent

So, CoT prompting and reasoning have enabled LLMs to solve complex problems, but in order for them to take actions or observe results to solve broader issues, like checking the weather in your town or placing an order, we need to give them tools.

A system that can take actions to solve a user’s problem is considered an Agent.

To help LLMs become AI agents, a novel ReAct pattern was introduced along with tools, giving them the ability to think and act more dynamically.

LLMs are instructed to think in cycles of Thought, Action, and Observation. When given a problem, the LLM first reasons about what to do (Thought), then outputs an instruction (Action) that an external program can interpret and execute.

For example, the action might be an API call or a simple calculation. Once the action is carried out, the result is returned to the LLM (Observation), which it uses to decide on the next step (another Thought). This cycle repeats until the AI agent completes the task.

AI agents have been gaining popularity lately, and much of the industry is racing to build practical, helpful versions. One of the most talked-about types is the desktop-controlling agent. Given a task (say, booking a flight to Spain) these agents can perform real actions on your desktop that lead to actual results, like a confirmed ticket.

Notable examples include Operator from OpenAI and Computer Use from Anthropic.

If AI agents can now perform actions on behalf of software engineers – and even see their desktops – what’s stopping them from fully replacing engineers in everyday tasks? The main limitation is agency.

What’s stopping AI from being like humans?

AI agents still haven’t reached the level of agency that human software engineers possess. They continue to lack several key qualities that define true agency in individuals:

Full autonomy – AI agents need external instructions and inputs to complete a task. They aren’t yet capable of discovering value on their own or pursuing goals without being explicitly told to. To reach true autonomy, they’d need the ability to initiate meaningful action independently.
Sensing the full environment – They still lack the ability to perceive the world like humans do. Full agency would require access to all human senses and the ability to act on them – through speech, physical actions, and even emotional understanding.
Intentionality – AI agents only begin acting when a user prompts them. We haven’t yet discovered a way to give them a built-in value system that would guide them toward universal goals and push them to act on their own initiative.
Capability – To be fully capable, AI agents would need more than just data -they’d need the full range of human senses and the power to interact with the world in complex ways.
Responsibility – AI agents can correct their mistakes and even apologize, but they don’t truly carry the weight of responsibility. They still require humans to guide, initiate, and finalize their tasks.

Agency = AGI?

Until AI agents fully develop the capabilities tied to human agency, they won’t replace software engineers. Instead, they’ll remain invaluable tools, not complete teammates.

Although AI agents have made great progress in automating tasks – clicking buttons, booking tickets, and more – they still lack true agency. They follow instructions effectively, but they’re not yet capable of independently setting goals, adapting to new situations, or thinking on their feet.

Real agency would be more than just an upgrade – it would signal a leap toward Artificial General Intelligence (AGI), and we’re not quite there… yet.

The post What is Agency, Why AI Agents Lack It, and Why You Should Hire for It appeared first on ShiftMag.

Tejas Kumar: The future of AI isn’t LLMs, but affordable small language models

Marin Pavelić — Tue, 08 Oct 2024 13:02:38 +0000

Tejas Kumar, an AI DevRel Engineer at DataStax, took the stage at the Infobip Shift conference with a no-hype, straight-to-the-point talk on AI.

He broke down what AI engineering looks like today, sharing techniques for cutting costs, avoiding hallucinations, and what’s going to be key for building the next wave of AI systems.

RAG solves the top 3 AI limitations

The main limitations developers face today when working with AI are hallucinations, knowledge cutoffs, and finite context windows. Tejas believes that these three “flies” can be swatted in one strike using a technique called Retrieval-Augmented Generation (RAG), which combines pre-trained language models with a real-time data retrieval system:

With RAG, you fetch data from an authoritative source and use it to enhance or alter the generated text from an LLM. This data reaches the LLM through prompt engineering.

Tejas demonstrated how RAG works with a simple example. Kumar illustrated the RAG process with just a few clicks: he inputs a webpage into an embedding model, which then numerically encodes the data.

This model performs a similarity search, pulling relevant information from the database to answer the user’s question. This process ensures that responses are based on the most up-to-date information, effectively eliminating hallucinations common in LLMs like GPT.

Chatbots are boring – AI should feel real

AI chatbots are everywhere today, but Tejas believes they’re mostly boring. They serve a purpose, but that purpose is very narrowly defined. That’s why Tejas offers an example of how a chatbot can be used more broadly-like searching Netflix.

Tejas entered “movies with a strong female lead” into Netflix’s search system, which traditionally might return incorrect or no results. However, if a search system uses semantic AI in the background—understanding the meaning of the user’s query rather than just keywords – the user experience can be significantly enhanced:

With semantic search, we improve search results and generate interactive user interfaces that understand user intent on demand.

Tejas illustrated how DataStax developed a tool for semantic search that not only delivers accurate results for such queries but can generate an interactive user interface (UI) on demand. This means that by typing “movies with a strong female lead,” Netflix could present relevant movie posters and trailers. This kind of interactive UI represents the future of AI, where developers can use models like Langflow to integrate AI into applications without disrupting the user experience, Tejas emphasized:

As developers, we have a responsibility to our users. We must build AI experiences beyond simple chatbots and deliver real, purposeful interactions.

Filip Popović/Infobip Shift

SSMs instead of LLMs?

Looking ahead, Tejas sees a shift from general-purpose LLMs to small specialized models (SSMs), which is his (unofficial) term for AI systems tailored to specific tasks:

What if, instead of models like GPT-4 with 600 billion parameters, we had a smaller model with 7 billion specialized parameters? That’s the future, and that’s where we should invest.

Tejas believes companies will turn to smaller models focused on individual needs. That way developers will drastically cut costs while maintaining good product performance.

Building Responsible AI Must Come First

AI must be developed ethically, and one of the key things to watch out for is what Tejas calls “authority bias” – where users assume that results generated by AI are always correct simply because they come from an authoritative-sounding source:

We need to be transparent about the data used to train LLMs. AI should be able to say, “Hey, this data might be wrong.”

The future of AI is in creating tools that allow models to recognize the limits of their capabilities. When AI can’t provide an answer, it should be able to use external tools or APIs to retrieve the necessary information to ensure accuracy.

In conclusion, Tejas encourages developers to think beyond simple chatbots because he believes the future of AI is tied to combining the power of LLMs with specialized models and dynamic interfaces that enhance user experiences.

AI won’t replace developers, but some skills will disappear

Tejas could also be heard further on the panel “AI-Powered Development Tools: Enhancing or Replacing Human Developers?” where he was joined by Simi Olabisi, an AI expert from Microsoft, and the discussion was moderated by our executive editor Antonija Bilić Arar on the ShiftMag stage!

AI will do the opposite of what people expect. It won’t replace developers; it will make them better at their job, says Simi:

The tools we’re building at Microsoft are designed to handle repetitive tasks, allowing developers to focus on more complex and creative activities.

This brings us to the question of juniors and how they will learn. Simi believes they won’t need to spend time mastering basic tasks:

Just like floppy disks became obsolete, some fundamental skills may become less important to master, but that doesn’t mean they’ll skip important lessons. They’ll face challenging tasks early in their careers.

Think about this as an evolution from a paintbrush to a camera. Tejas pointed out that basic tools of human creativity are still necessary to solve 70 to 80% of coding tasks, but human oversight and creativity remain essential. Simi concluded that we’re not facing any dramatic change within the next five years. Tools will advance, and AI will continue to enhance our abilities, but developers remain a key part of the entire process.

The post Tejas Kumar: The future of AI isn’t LLMs, but affordable small language models appeared first on ShiftMag.

Engineer Explains: What are LLMs in less than 5 minutes

Antonija Bilic Arar — Thu, 08 Aug 2024 10:35:03 +0000

We all know about Lange Language Models, but do we know what Large Language Models are actually?

LLMs are highly sophisticated deep-learning models trained on vast amounts of data so they can predict the next word in a row. They can process and generate human language.

But that’s just the surface-level explanation. We’ve asked Emanuel Lacic, senior researcher and principal engineer at Infobip, to explain LLM as he would to a junior engineer, a senior engineer, and a CTO.

This video is a part of ShiftMag’s video series, Engineer Explains.

We’ve asked experienced engineers to share how they would explain some basic and some less basic tech terminology to different tech job titles or at three levels of experience — from junior developer to CTO.

More:
How would you explain APIs, internal developer platforms, software architecture, software testing, scaling infrastructure without breaking the bank, low-code as a dev tool, what is a database, Network APIs, Developer Relations or observability at three levels of experience?

The post Engineer Explains: What are LLMs in less than 5 minutes appeared first on ShiftMag.

Hallucinations, prompts, cost, and other challenges we faced when creating an LLM-powered chatbot

ShiftMag — Tue, 24 Oct 2023 09:07:12 +0000

Motivation

With the arrival of ChatGPT and other LLMs (Large Language Models), chatbots have experienced a revolution. Talking to a chatbot has become much more like talking to a human. The motivation behind the project was to enable clients to easily create their own business chatbot powered by LLM.

These chatbots should:

Help end users solve their problems faster.
Represent the brand’s values.
Intelligently transfer to human agents when needed.

Today, we enable our clients to build LLM chatbots in just 5 minutes through Infobip’s Answers platform.

Challenges

Mainstream adoption of LLM-s is growing, but it is still recent and comes with some new challenges.

Missing data

LLMs, specifically ChatGPT, are trained allegedly with data up to September 2021. Clients may care about more recent data and about data specific to their business.

How to embed the knowledge in the chatbot that it was not trained with?

One technique that has proven good is in-context learning.

Given a user question, we retrieve and provide small sections of relevant document chunks to the LLM chatbot. The chatbot should generate a response using the information in these chunks.

Hallucinations

ChatGPT is an autoregressive probabilistic model.

Given an input, ChatGPT predicts the token (word piece) that should come next, feeds it back with the input, and repeats the process until the end token is predicted. Since it is a probabilistic model, it may output something that is misleading or incorrect.

Working with digital insurance company LAQO, this was something we had to keep an eye on. In their industry, it is important to keep responses as accurate as possible as they may have legal consequences.

An example of hallucination could be a chatbot telling the user that LAQO covers some costs which are, in fact, not covered by the insurance type.

There is currently no solution for hallucinations; even the most powerful models like GPT-4 hallucinate.

Some things that could be done:

Model parameters like temperature can be adjusted to lessen the chance of hallucination
Well-crafted prompts with instructions specifically for the client’s business tend to help with the hallucinations. Usually, we can do better than generic prompts, which are available with popular LLM frameworks.
Chatbot is constrained to only respond to topics from retrieved data chunks

Prompts

Writing prompts or, in other words, instructions for LLMs is tricky. Having a good understanding of how an LLM works helps a lot. Being patient and willing to experiment with different ways of stating an instruction may help even more.

Different LLMs have different quirks. ChatGPT and similar LLMs are trained to follow instructions, but they are not perfect. Using simple and clear instructions helps.

Response format

Sometimes, LLM may produce a response in a format that is not desired. It may even “leak” prompt/system message details that are like “internal” instructions. Following instructions is something that is being constantly worked on for LLMs.

Context window

Current LLMs typically have a small context window, which is the amount of information (tokens) we can feed into the LLM, and an LLM can generate in a single request. Using multiple LLM requests may be one strategy to deal with a limited context window, but this comes with increased costs and latency.

Retrieval

Retrieval is about finding small (3-4) chunks of data relevant to the question. There are many libraries or databases that can be used for the problem.

It becomes tricky to find relevant chunks when there is a chat history.

The retrieval system should consider a follow-up question but also previous questions. Exact search (slower, more accurate) makes sense over approximate search (faster, less accurate) for smaller documentation.

Cost

LLM’s are a costly business. Even open-source solutions may require a lot of expensive hardware to operate at scale.

Latency

Making multiple calls to an LLM will make end-users wait longer for a response, which may impact customer satisfaction.

LoRA

In-context learning is a great technique, but we are also experimenting with “fine-tuning” open-source models with proprietary data.

LoRA and many variants of this technique make the fine-tuning process much more accessible in terms of costs and time. So far, we had great results fine-tuning image generation models (Stable Diffusion). We were able to teach the Diffusion model to generate images about items specific to brands.

Next steps

We’re working on creating a multimodal and multipersonality chatbot agent who will be able to also handle transactions. More details to come.

This article was written by Danijel Temraz, Principal Engineer at Infobip, and Martina Ćurić, Staff Engineer at Infobip.

The post Hallucinations, prompts, cost, and other challenges we faced when creating an LLM-powered chatbot appeared first on ShiftMag.