Patterns: reflection, plan-execute and multi-agent
Agent architecture patterns and, above all, when NOT to use multi-agent.
7 min read
An agent with a ReAct loop already solves a lot — but there are problem classes where a single agent simply falls short: tasks too long for the context window, work that requires parallel specialization, or outputs that need review before reaching the user. For those cases, architectural patterns exist. This lesson is a practical catalog of those patterns — and an honest warning about when they become a trap.
Pattern 1 — Reflection: the agent reviews its own output
Reflection is the simplest pattern in the catalog and, for that reason, the most underestimated. The idea: after generating a response, the agent makes a second pass — with a critique prompt — and decides whether the result is good enough or needs revision.
Think of it like a developer who writes code and reads it again before opening the PR. The model doesn't change its nature; it just receives the context of "reviewer" instead of "author".
How to implement: pass the previous output back to the model with an instruction like "Review the response below. Identify factual errors, reasoning gaps, and inconsistencies. If correct, respond APPROVED. Otherwise, rewrite." You can iterate N times or until APPROVED is received.
When to use: code generation, reports, long responses where a single hallucination is costly. The cost is one extra model call — cheap compared to delivering garbage to the user.
Watch out for infinite loops: always define a max_iterations. Without it, a model that never converges will consume tokens indefinitely. Reflection doesn't guarantee correctness — it guarantees a second chance to catch the mistake.
Pattern 2 — Plan-and-Execute: plan before acting
In the standard ReAct loop (covered in lesson 11), the agent decides the next action at each step — it's greedy, step-by-step reasoning. This works well for short tasks, but in long tasks the agent can lose the thread, change strategy mid-way, or waste tokens repeating context.
Plan-and-Execute separates responsibilities into two phases:
- Planning phase: the model receives the objective and produces a structured plan — an ordered list of subtasks. No tools are called yet.
- Execution phase: each subtask in the plan is executed in sequence (or in parallel, if independent), possibly by specialized agents.
The advantage is clarity: the plan is an inspectable artifact. You can log it, validate it, and even show it to the user before executing — which opens space for the human-in-the-loop pattern we'll see shortly.
The disadvantage is rigidity: if the world changes during execution (a tool returns an error, data doesn't exist), the original plan may become invalid. Good implementations include a re-planning step when a subtask fails.
When to use: long research tasks, complex document generation, data pipelines where sequence matters. Avoid for short, unpredictable tasks — the planning overhead isn't worth it.
Pattern 3 — Multi-agent: supervisor, hierarchical, and swarm
Multi-agent is the pattern that appears most in demos and generates the most unnecessarily complex architectures. Before adopting it, understand the three variants:
Supervisor/Orchestrator → Sub-agents: a central agent receives the objective, decides which specialized sub-agent to trigger, and consolidates results. It's the most common and most controllable pattern. The supervisor doesn't execute tasks directly — it delegates.
Hierarchical: the supervisor can have sub-supervisors, which in turn have sub-agents. Useful for very large domains (e.g., a company with distinct departments). Coordination cost grows exponentially with depth — use carefully.
Peers/Swarm: agents without explicit hierarchy collaborate via messages. Each agent can initiate interactions with others. It's the most flexible and hardest-to-debug pattern — control flow emerges from collective behavior, not a central orchestrator.
The diagram below shows the supervisor → sub-agents pattern, which is the recommended starting point for any multi-agent system.
The critical point: each boundary between agents is a model call — latency, cost, and a new failure point. A system with five agents may have five times more chances of propagated hallucination. Always start with the simplest agent that solves the problem.
Supervisor → Sub-agents Pattern
The orchestrator receives the user's objective, decomposes it into subtasks, and delegates to specialized sub-agents. Each sub-agent has its own toolset. The orchestrator consolidates results before responding.
- Agente Supervisor · Orchestrator Agent
- Agente de Busca · Search Agent
- Agente de Código · Code Agent
- Agente de Escrita · Writer Agent
- Web Search · API externa
- Code Executor · Sandbox
- Knowledge Base · RAG / Vetores
- Aprovação Humana · Human Approval
In practice, most systems I see in production went multi-agent too early. The team creates a supervisor, three sub-agents, and ten tools in the first week — and spends the next weeks debugging why the orchestrator picked the wrong agent. My rule: if a single agent with good prompting and the right tools solves 80% of cases, it goes to production that way. Multi-agent comes in when there's a concrete limit: context overflow, latency that requires real parallelism, or domains so distinct that a single prompt can't cover them well. Architectural complexity has a cost — and that cost is paid in debugging, not tokens.
Human-in-the-Loop: the most underrated pattern
Human-in-the-loop (HITL) is not a system limitation — it's a deliberate architectural decision. Instead of letting the agent execute a full plan autonomously, you insert a human approval point before irreversible or high-impact actions.
Concrete examples: before sending an email on behalf of the user, before deleting database records, before publishing content, before executing a financial transaction. The practical rule: if the action can't be easily undone, put a human in the path.
Implementing HITL is conceptually simple — the agent stops, serializes the current state (plan + context), and waits for an external confirmation. On AWS, this translates to an SQS queue or a Step Functions waitForTaskToken. The real challenge is UX: how do you present the agent's state in a way the human understands what they're approving?
HITL is also a learning mechanism. Each approval or rejection is evaluation data — you can use these signals to improve prompts, identify error patterns, and calibrate when the agent can operate more autonomously.
Don't confuse HITL with distrust of the model. It's risk management. Mature systems start with more approval points and remove them gradually as the agent demonstrates reliability — not the other way around.
Which pattern to use?
Simple Agent (ReAct)
- Low latency, easy to debug
- Predictable cost per call
- No coordination overhead
- Limited context for long tasks
- No automatic output review
Default starting point. Use until you hit a concrete limit.
Reflection
- Improves quality without changing architecture
- Easy to add to any existing agent
- Doubles (or more) token cost
- Risk of loop without max_iterations
Add when quality matters more than speed.
Plan-and-Execute
- Inspectable and auditable plan
- Enables HITL before execution
- Good for long, predictable tasks
- Rigid: fails if the environment changes
- Planning overhead for short tasks
Use when sequence matters and you want plan visibility.
Supervisor Multi-agent
- Real parallelism between sub-agents
- Domain specialization
- High latency and coordination cost
- Multiple failure points
- Complex debugging
Use when a single agent has a concrete context or domain limit.
Key points from this lesson
Frequently asked questions
Can I combine Reflection with Plan-and-Execute?
Yes, and it makes sense. You can apply Reflection in the planning phase (review the plan before executing) and/or on the final output of each subtask. Cost increases, but so does quality. Evaluate by the impact of failure — the more costly the mistake, the more justified the review cost.
Is Swarm suitable for production?
It can be, but it requires very good observability. Without a central orchestrator, tracing why the system made a specific decision is hard. If you use swarm in production, invest heavily in distributed tracing — each message between agents must be logged with enough context to reconstruct the flow.
How to implement HITL on AWS?
The most robust pattern is Step Functions with waitForTaskToken: the agent stops, sends the token to a queue or notification, and resumes when the human responds with the token. For simpler cases, an SQS queue with an approval Lambda works well. Bedrock AgentCore (lesson 17) has native HITL support — we'll cover that in detail.
What's the difference between hierarchical multi-agent and Plan-and-Execute?
Plan-and-Execute is about separating the reasoning and action phases within an agent (or system). Hierarchical is about the organizational structure of multiple agents with supervision levels. You can have a hierarchical system that uses Plan-and-Execute internally at each level — they're different dimensions of the architecture.
My direct take
Of these three patterns, Reflection is the one I recommend adding first to any production agent — the cost is low and the quality gain is immediate. Plan-and-Execute comes in when you need auditability or HITL before critical actions. Multi-agent is for when you have a concrete, measurable limit that a single agent can't overcome. The order matters: don't skip steps. Most problems that seem to require five agents are solved with one well-designed agent, good tools, and Reflection. Architectural complexity is an investment — demand ROI before paying it.
Quick check
1. When is multi-agent usually the wrong choice?
2. The Reflection pattern is about…