The Glossary You Must Read If You Wanna Talk About AI

I often hear AI terms used loosely, so I put together this guide to explain key concepts like agents, tools, and LLMs clearly.

AI terminology can be confusing, especially when words like agents, skills, tools, and LLMs get used interchangeably.

That’s why I put together this glossary as a quick reference, to explain these concepts and help everyone, technical or not, talk about AI clearly.

Agent Skill

An agent skill is a predefined capability or behavior that an AI agent uses to accomplish specific tasks like searching the web, writing code, sending emails, or reading files. Skills give agents a structured way to interact with tools, APIs, or data sources, making them more reliable and reusable across workflows. Think of them as modular “superpowers” you can plug into an agent.

At a minimum, skills are just folders the agent reads, containing logic, instructions, assets, templates, and more. Most of today’s state-of-the-art agent apps let you create your own custom skills.

MCP

MCP (Model Context Protocol) is an open standard that lets AI agents connect to external tools and data sources consistently. Instead of creating a custom integration for every service (like Slack, Google Drive, or GitHub…) MCP provides a universal “plug-in” format, allowing any MCP-compatible server to communicate with any MCP-compatible AI.

Think of it as USB-C, but for AI tool integrations.

Agent Tool

A tool is a function (code) that an AI agent can execute when it decides to. That’s why each tool has a name and a description, which influence when the model chooses to use it (for example, “Use this function to pull the latest tickets from a Jira project”).

Besides the name and description, the function contains the code that the AI agent runs with the required arguments. For example, the agent could call:

jira_fetch_tickets(project="AI", limit=10)

Tools are also components that power MCP servers. In Open WebUI project, users can even write custom Python tools that agents can invoke.

Large Language Model (LLM)

An AI model within the deep learning spectrum, primarily designed for language understanding and content generation. LLMs excel at processing and generating human-like text.

Token

In Generative AI, a token is the smallest unit of information a language model processes. Depending on the language and the model’s design, a token can represent a whole word, part of a word, or even a single character. Tokens are the building blocks that language models use to understand and generate text.

Context Window

The context window is the number of tokens an LLM can process as input or generate as output. Input and output limits are usually different, with the input capacity typically much larger than the output.
For example, GPT-4o has an input limit of 128.000 tokens and an output limit of 16.384 tokens.

Fine-Tuning

Fine-tuning is the process of modifying a language model’s neural network using your own data. It’s different from simply adding documents to a conversation or adjusting prompts, which don’t change the model’s underlying structure (see RAG for an alternative approach).

Retrieval Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a system that uses a vector database to store ingested data, such as documents, web pages, and other sources. When a question is asked, relevant data is retrieved and combined with the question before being sent to a language model (LLM). The LLM itself doesn’t change, but it “sees” the retrieved information, allowing it to answer based on this additional context.

Prompt

A prompt is the user’s input that kicks off the model’s text generation. It guides the model to produce relevant and coherent responses based on the context or question. Prompts can be simple or detailed, shaping the quality and direction of the output.

ChatGPT

ChatGPT is an OpenAI product built on the GPT family of large language models (LLMs). As a product, it can use different LLMs, such as GPT-4o, GPT-4o-mini, and o1.

AI Agent

An AI agent is a software system powered by LLMs that performs tasks, answers questions, and automates processes for users. They can range from simple chatbots to advanced digital or robotic systems capable of running complex workflows autonomously. Key features include planning, using tools, perceiving their environment, and remembering past interactions, which help them improve performance over time.

Prompt Injection

Prompt injection is a type of cyberattack on large language models (LLMs), where malicious inputs are disguised as normal prompts to manipulate the model’s behavior or output. These attacks can make the model ignore safeguards, reveal sensitive information, or carry out unauthorized actions.

Google Gemini

Google’s family of Large Language Models (LLMs).

Anthropic Claude

Anthropic’s family of Large Language Models (LLMs).

Meta Llama

Meta’s family of Large Language Models.

LLM Parameters

LLM parameters are the components within a large language model that determine its behavior and capabilities. Learned during training, they include weights and biases that help the model understand and generate language. Generally, more parameters mean a smarter model, but they also require more computing power, especially memory (RAM), to run.

Copilot

Copilot is Microsoft’s branding for different AI agents, such as:

GitHub Copilot, which assists with coding
Copilot 365, which helps with Office and Windows tasks

System Prompt

A system prompt is a set of instructions or guidelines given to a language model to set its behavior, tone, and limits during a conversation.

Prompt Engineering

Prompt engineering is the practice of designing and refining prompts to optimize a language model’s performance and output. It involves crafting specific inputs that guide the model to produce the desired responses, improving accuracy, relevance, and coherence.

Digital Twin

Digital twins are virtual representations of assets, people, or processes and their environments that simulate strategies and optimize behaviors. In the CPaaS space, this usually refers to AI agents that mimic people using audio and video modalities.

Multimodal

Multimodal refers to the ability of AI systems to process and combine multiple types of data inputs (text, images, audio, video) to perform tasks or generate outputs. This approach allows AI models to understand and create content across different modalities, resulting in more comprehensive and context-aware applications.

Vector Database

Unlike traditional databases that store structured data in tables, vector databases are optimized for operations like similarity search, allowing efficient retrieval of data points that are mathematically close to a given query vector.

This capability is essential for applications such as recommendation systems, image recognition, natural language processing, and other AI-driven tasks where data is represented as vectors. For a common implementation, see RAG (Retrieval-Augmented Generation).

Hybrid Search

Hybrid search combines the strengths of vector search and traditional full-text search to improve the relevance of retrieved results. Vector search captures the semantic meaning of queries, matching based on context and intent, while full-text search ensures precise keyword matches.

By blending these approaches, hybrid search increases the likelihood of retrieving the most relevant documents, even when queries are vague or phrased differently from the source content. This boosts the accuracy of retrieval and enhances the overall effectiveness of the RAG (Retrieval-Augmented Generation) pipeline.

Embedding Model

An embedding model is a machine learning model trained to convert input text into numerical vectors, which can then be used for vector similarity search. Embedding models are a key part of the RAG (Retrieval-Augmented Generation) pipeline, as they transform user questions into vector representations.