An agent that reads, writes, and runs code in a sandbox, calling tools and (optionally) sub-agents while a human approves the destructive parts. The shape that powers Cursor, Claude Code, OpenHands, Aider, Codex CLI.
Retrieval-grounded generation built to be defensible: hybrid retrieval, reranking, contextualised chunks, citations rendered to the user, and verification before the answer ships.
A low-latency conversational agent over a phone or microphone, with handoff to humans, mid-utterance cancellation, and per-call session boundaries. The shape behind LiveKit, Pipecat, Vapi, Retell.
An agent stack that satisfies data-residency and audit requirements: weights, inference, tools, and logs all sit inside an operator-controlled boundary, with provenance and incident response wired in.
An agent that operates over hours to weeks, surviving restarts and accumulating memory while remaining safe. The shape behind Devin, Manus, Sparrot, durable LangGraph runs.
Two or more agents argue toward a better answer than any single agent would produce, with a frozen rubric to score the result. The shape behind debate-style alignment work and 'committee of critics' setups.
An agent that drives a real GUI: planning a task, grounding actions in pixels or DOM, and asking permission before destructive clicks. The shape behind OpenAI Operator, Anthropic Computer Use, Browser Use, Stagehand, MultiOn.
How long-running agents structure what they remember: tiered short-to-long-term cascade, compaction across the window, paging, and reasoning carry-forward across tool calls.
Several agents collaborate under a coordinator, with explicit hand-offs and a shared protocol. The shape behind LangGraph supervisor, OpenAI Swarm, AutoGen group chat, Bedrock multi-agent orchestrators.
The minimum set of constraints to put around any production agent before it touches the world: budgets, gates, charters, kill-switches, approvals.
How you keep an agent honest in production: harness, judge, decision log, provenance, shadow rollouts.
Get typed, schema-conformant data out of the model and verify it. The shape behind Outlines, Instructor, Pydantic AI, DSPy.
User-perceivable real-time output: tokens streamed as they arrive, citations attached as they resolve, the user can stop at any time and the agent can interrupt the user when something matters.
Different ways to structure 'think then act': linear ReAct, plan-then-execute, parallel DAG planning, tree search with backtracking, and the outer/inner planner+executor split.
How requests get to the right model or specialist and how the system stays up when one upstream breaks. The shape behind LangChain fallbacks, model routers, provider cascades.
Patterns where the model reviews its own work before shipping it: scoped rubric reflection, self-refine, deterministic post-checks, process rewards.