Glossary

The working vocabulary

Thirteen terms that come up when deciding whether to build an agent. Short definitions, with the distinctions that actually matter in practice. Skim or read in order.

Architectures

The four shapes a system can take.

Automation

A deterministic script. Same input → same output, every time. No LLM.

Example: a cron job that compresses log files older than 30 days, or a script that copies new rows from one database to another. Use when the steps are knowable, no judgment is required, and you want zero variance. Most "I need an AI for this" requests can be served by plain automation.

Single LLM call

One prompt, one response. The simplest LLM-powered solution.

Example: classifying a piece of text as spam, summarizing an article, translating a paragraph. The whole task is "input goes in, output comes out, done." Practitioners systematically under-use this; it solves more problems than the discourse suggests.

Workflow

Multiple LLM calls chained together in a fixed, predefined order.

Each step has a known input and a known output shape. Example: extract structured data from an invoice → validate against business rules → format as JSON for the downstream system. You — the engineer — control the path. The LLM only decides the contents of each step's output, not the order of steps.

Agent

An LLM running in a loop: act → observe → decide → repeat until done.

The LLM controls the path, not just the contents. Example: a coding assistant that reads a file, edits code, runs tests, reads the test output, and decides whether to keep going or stop. Anthropic's working definition: a system that "dynamically directs its own processes and tool usage." Simon Willison's tighter version: "tools in a loop."

Patterns

Building blocks and workflow sub-types.

Tool use

When an LLM invokes external functions — search, code execution, database queries, APIs.

Modern LLMs can request a tool call as part of their output. Your system executes the tool and feeds the result back into the next LLM call. Tool use is what turns an LLM from a text generator into a workflow or agent — without tools, there's nothing for the LLM to chain or loop over.

Routing

A workflow pattern: classify the input, then run a different fixed chain for each category.

Example: a support bot that classifies an incoming message as "refund," "shipping," or "account," then routes to three different prompt chains. Often mistaken for an agent — but the dynamic step (the classification) happens once at the top, then everything else is fixed. The LLM doesn't loop.

Prompt chaining

A workflow pattern: the output of one LLM call becomes the input to the next.

Useful when each step has a clear, narrower job than asking everything at once. Example: "outline the article → write each section → polish the whole thing" is three prompts in a chain, not one big "write me an article" prompt. Easier to debug, evaluate, and improve.

Practice

Operating these systems in production.

Human-in-the-loop (HITL)

Inserting a human approval step before an irreversible action.

Standard practice for agents (and many workflows) that send emails, make purchases, modify production systems, or take any action that needs cleanup if wrong. The right granularity is per-action, not per-task: approve the email send, not the entire customer-support session. Don't gate on the LLM's confidence — gate on the action's reversibility.

Blast radius

How much damage a wrong action can do before someone notices.

A research agent generating a wrong summary has small blast radius — you re-read the source, nothing else happened. A trading bot submitting wrong orders has huge blast radius — real money is gone. The strongest predictor of whether you need HITL, regardless of how agent-shaped the problem looks. (OWASP's 2026 Top 10 for Agentic Applications treats this as the central operational risk.)

Eval

A way to measure whether the system's output is good.

For a single LLM call, often a benchmark dataset with known correct answers. For an agent, harder — what does "good" mean across a multi-step trajectory with branches? If you can't define and automate this, you can't safely run an agent in production. The most common reason agent projects stall is that nobody figured out what to measure.

Structured output

Constraining the LLM to produce JSON that matches a schema you define.

Eliminates an entire class of parsing bugs and is the right default for most production single-call use cases. Anthropic and OpenAI both have native support — no need for regex parsing or "please respond in JSON" prompt tricks. If you find yourself wanting a workflow because the output is unpredictable, try structured output on a single call first.

Stopping condition

How an agent decides it's done.

An agent isn't an infinite loop; it has a termination criterion: the task is complete, a max-iterations cap is hit, or a confidence threshold is reached. Designing a clear stopping condition is one of the hardest parts of building a reliable agent. Vague conditions ("when the task is done") lead to runaway loops or premature stops.

Modern examples

Canonical 2025–2026 systems to compare against.

Deep Research agent

The canonical 2025–2026 example of an agent in production.

Perplexity Deep Research, OpenAI deep research, Anthropic's research agents. Reads sources, decides what to search for next, decides when it has enough material, writes up the answer. The path branches with every finding — you can't write the search order down ahead of time, which is the defining property of an agent.

References

All free except where noted. Listed roughly in the order you'd want to read them.

Foundational

Start here. The canonical taxonomy and consensus definitions.

Building Effective Agents

Anthropic — December 2024

The article this quiz is built on. Original taxonomy of single call vs workflow vs agent, plus the named workflow patterns (routing, chaining, orchestrator-workers). Short, opinionated, the single highest-leverage read.

I think ‘agent’ may finally have a widely enough agreed upon definition

Simon Willison — September 2025

Crystallizes the field's consensus into one sentence: “an LLM agent runs tools in a loop to achieve a goal.” A short, accessible read with helpful framing for non-practitioners.

A Practical Guide to Building Agents

OpenAI — 2025

OpenAI's own framework for deciding when to build an agent. Uses the same single-call / workflow / agent split as Anthropic, with their own emphasis on guardrails and validation. PDF, ~30 pages.

LangGraph — Workflows and Agents

LangChain — Updated regularly

Practical, implementation-level treatment of the same patterns described in the Anthropic essay. Best read when you're ready to write code.

Production & safety

What changes when you actually try to ship and operate one of these.

Effective Context Engineering for AI Agents

Anthropic — September 2025

What changes when you actually try to run an agent reliably. Tokens, memory, sub-agents, and the practical concerns that come up after the prototype works.

Building Agents with the Claude Agent SDK

Anthropic — September 2025

The “how” once you've decided you actually need an agent. Concrete patterns for the loop, tool definitions, error handling, and shipping behavior.

Demystifying Evals for AI Agents

Anthropic — January 2026

What "good" actually means for an agent across a multi-step trajectory, and how to define rubrics you can automate. Pairs with the Eval glossary entry — if you can't write this, you can't safely run an agent.

OWASP Top 10 for Agentic Applications 2026

OWASP GenAI Security Project — December 2025

The security threats unique to agents. Specifically ASI02 (tool misuse) and ASI08 (cascading failures) map directly to the Q9 blast-radius question; ASI05 (unexpected code execution) is the canonical case for human-in-the-loop on any agent that runs code. Read before shipping anything that takes real actions.

Practical Lessons from 750+ Real-World LLM and Agent Deployments

Hugo Bowne-Anderson — 2025

What actually works in production, across a large sample of deployments. Patterns that survive contact with real users — mostly: heavy constraints, narrow scopes, evals from day one.

The Case for Bounded Autonomy

MongoDB Engineering — 2025–2026

Argues for earning agent autonomy incrementally rather than treating “agent” as a binary architectural choice. Useful framing for the Q8/Q9 overlays in the quiz.

Take the quiz →Home