Operating these systems in production.
Human-in-the-loop (HITL)
Inserting a human approval step before an irreversible action.
Standard practice for agents (and many workflows) that send emails, make purchases, modify production systems, or take any action that needs cleanup if wrong. The right granularity is per-action, not per-task: approve the email send, not the entire customer-support session. Don't gate on the LLM's confidence — gate on the action's reversibility.
Blast radius
How much damage a wrong action can do before someone notices.
A research agent generating a wrong summary has small blast radius — you re-read the source, nothing else happened. A trading bot submitting wrong orders has huge blast radius — real money is gone. The strongest predictor of whether you need HITL, regardless of how agent-shaped the problem looks. (OWASP's 2026 Top 10 for Agentic Applications treats this as the central operational risk.)
Eval
A way to measure whether the system's output is good.
For a single LLM call, often a benchmark dataset with known correct answers. For an agent, harder — what does "good" mean across a multi-step trajectory with branches? If you can't define and automate this, you can't safely run an agent in production. The most common reason agent projects stall is that nobody figured out what to measure.
Structured output
Constraining the LLM to produce JSON that matches a schema you define.
Eliminates an entire class of parsing bugs and is the right default for most production single-call use cases. Anthropic and OpenAI both have native support — no need for regex parsing or "please respond in JSON" prompt tricks. If you find yourself wanting a workflow because the output is unpredictable, try structured output on a single call first.
Stopping condition
How an agent decides it's done.
An agent isn't an infinite loop; it has a termination criterion: the task is complete, a max-iterations cap is hit, or a confidence threshold is reached. Designing a clear stopping condition is one of the hardest parts of building a reliable agent. Vague conditions ("when the task is done") lead to runaway loops or premature stops.