AI Glossary


Classification refers to making a decision about information. For example, given a conversation transcript, is the customer happy or sad? Although LLMs are 'generative' their generative capabilities can be used to perform classification by simply asking them to output decisions on information.


Refers to the textual response from an LLM. The completion is the LLM's attempt to follow your instructions.


LLMs can both read and write text. Embedding models are sort of like half an LLM - they only read information. Their understanding of what they've read can be captured in a numerical vector called an embedding. Embeddings are a very useful way to search and understand textual data. For example, you can create an embedding for every article in a knowledge base. You can then embed customer questions in the same way. You can then check which articles are closest (most relevant) to the customer's inquiry by checking which articles are closest to the customer's inquiry in the embedded numeric space. Vector Databases are often utilized for this type of search.

TLDR - embeddings capture deep understanding of textual information in a numeric form that is highly useful for search retrieval and often form the backbone of RAG (Retrieval Augmented Generation).

Few Shot Learning

Few-shot learning is a technique where, when asking an LLM a question, you first provide it several examples to learn from in your prompt. For example, perhaps you want the LLM to classify if a user question is in scope for the assistant or not. You might leverage few-shot prompting to get a better result by listing examples of both in-scope and out-of-scope questions prior to asking the LLM to make decision about the current customer's question.

Few-shot can be compared with zero-shot, where you simple ask the LLM to make a decision without an examples of the behavior you're looking for.

Fine Tuning

Fine-tuning refers to the practice of taking an AI model (such as an LLM) that has already been trained and showing it some more examples to learn from that are specific to your domain. Along with prompt engineering/chaining and retrieval-augmentation, fine-tuning is another method for getting an LLM to behave how you want.


Refers to when an LLM generates content that is false. Hallucinations are mitigated by insuring prompts contain the necessary information to answer truthfully and accurately, as well as through techniques like prompt chaining. Collectively the practice of preventing hallucinations is sometimes referred to as 'guardrailing'

LLM (Large Language Model)

LLMs are large neural networks trained on massive amounts of data that have become synonymous with AI. Examples of LLMs include ChatGPT, Llama 2, Claude & Gemini. LLMs generally take text instructions (prompts) as input and produce text output (completions) to those instructions, often with human-level accuracy. LLMs are often used to produce generative content (e.g. summarizing a conversation or product review) but can also be used to perform classification (making decisions about information).


The instructions sent to an LLM. Prompts are most commonly textual instructions for the LLM to follow. Different LLMs have slightly different prompting formats.

Prompt Chaining

Refers to the practice of breaking a problem down into multiple LLM invocations rather than trying to solve it all with a single prompt. An example of prompt chaining would be first using an LLM classification to decide if a user's question is in-scope for the assistant to answer prior to using the LLM to generate a response.

Prompt Engineering

Prompt engineering refers to the art of constructing the right input or series of inputs to an LLM to achieve the desired result. Although LLMs exhibit impressive intelligence, your first attempts to instruct an LLM likely won't produce the behavior you want with any degree of reliability. You'll need to gradually refine the language, incorporate few-shot examples, split up prompts and add additional guardrails. All of this is referred to as prompt engineering.

Reinforcement Learning From Human Feedback

In addition to training language models on lots of examples of instructions and good completions of those instructions, Reinforcement Learning From Human Feedback (RLHF) is another key technique used in building LLMs. It refers to the practice of training a separate AI model to predict how humans would score an LLM's output, and then using the scoring model to further train the LLM.

Retrieval Augmented Generation (RAG)

Refers to the practice of retrieving information (often via search) and including that information in a LLM prompt to produce a generative result that is contextualized by the retrieved information.


LLMs see the world as tokens. They read tokens and generate tokens. You can think of tokens as being similar to words but they are actually smaller than words. All LLMs have limited 'context windows' which is essentially the number of tokens they can read and/or write in a single request.