Who is Fernando F. Azevedo?

Fernando F. Azevedo is a Senior Solutions Architect at Banco Itaú with 16+ years of experience across AWS, event-driven architecture, DevSecOps, Data Mesh, AI and financial systems.

What technical topics does Fernando work with?

Fernando works with AWS, Kubernetes, Kafka, Data Mesh, Amazon Bedrock, RAG, DevSecOps, observability, financial systems and architecture communication using C4, ADRs and trade-off analysis.

Is Fernando available for professional conversations?

Fernando is currently building at Banco Itaú and is open to thoughtful conversations about architecture, cloud, AI, engineering leadership, community, podcasts and technical collaboration.

The AI Architect Track

Module 3 · Agents· Lesson 13/22

Patterns: reflection, plan-execute and multi-agent

Agent architecture patterns and, above all, when NOT to use multi-agent.

7 min read

Listen — Fernando's cloned voice

0:0012:05

Speed

Download

An agent with a ReAct loop already solves a lot — but there are problem classes where a single agent simply falls short: tasks too long for the context window, work that requires parallel specialization, or outputs that need review before reaching the user. For those cases, architectural patterns exist. This lesson is a practical catalog of those patterns — and an honest warning about when they become a trap.

Pattern 1 — Reflection: the agent reviews its own output

Reflection is the simplest pattern in the catalog and, for that reason, the most underestimated. The idea: after generating a response, the agent makes a second pass — with a critique prompt — and decides whether the result is good enough or needs revision.

Think of it like a developer who writes code and reads it again before opening the PR. The model doesn't change its nature; it just receives the context of "reviewer" instead of "author".

How to implement: pass the previous output back to the model with an instruction like "Review the response below. Identify factual errors, reasoning gaps, and inconsistencies. If correct, respond APPROVED. Otherwise, rewrite." You can iterate N times or until APPROVED is received.

When to use: code generation, reports, long responses where a single hallucination is costly. The cost is one extra model call — cheap compared to delivering garbage to the user.

Watch out for infinite loops: always define a max_iterations. Without it, a model that never converges will consume tokens indefinitely. Reflection doesn't guarantee correctness — it guarantees a second chance to catch the mistake.

Pattern 2 — Plan-and-Execute: plan before acting

In the standard ReAct loop (covered in lesson 11), the agent decides the next action at each step — it's greedy, step-by-step reasoning. This works well for short tasks, but in long tasks the agent can lose the thread, change strategy mid-way, or waste tokens repeating context.

Plan-and-Execute separates responsibilities into two phases:

Planning phase: the model receives the objective and produces a structured plan — an ordered list of subtasks. No tools are called yet.
Execution phase: each subtask in the plan is executed in sequence (or in parallel, if independent), possibly by specialized agents.

The advantage is clarity: the plan is an inspectable artifact. You can log it, validate it, and even show it to the user before executing — which opens space for the human-in-the-loop pattern we'll see shortly.

The disadvantage is rigidity: if the world changes during execution (a tool returns an error, data doesn't exist), the original plan may become invalid. Good implementations include a re-planning step when a subtask fails.

When to use: long research tasks, complex document generation, data pipelines where sequence matters. Avoid for short, unpredictable tasks — the planning overhead isn't worth it.

Pattern 3 — Multi-agent: supervisor, hierarchical, and swarm

Multi-agent is the pattern that appears most in demos and generates the most unnecessarily complex architectures. Before adopting it, understand the three variants:

Supervisor/Orchestrator → Sub-agents: a central agent receives the objective, decides which specialized sub-agent to trigger, and consolidates results. It's the most common and most controllable pattern. The supervisor doesn't execute tasks directly — it delegates.

Hierarchical: the supervisor can have sub-supervisors, which in turn have sub-agents. Useful for very large domains (e.g., a company with distinct departments). Coordination cost grows exponentially with depth — use carefully.

Peers/Swarm: agents without explicit hierarchy collaborate via messages. Each agent can initiate interactions with others. It's the most flexible and hardest-to-debug pattern — control flow emerges from collective behavior, not a central orchestrator.

The diagram below shows the supervisor → sub-agents pattern, which is the recommended starting point for any multi-agent system.

The critical point: each boundary between agents is a model call — latency, cost, and a new failure point. A system with five agents may have five times more chances of propagated hallucination. Always start with the simplest agent that solves the problem.

Supervisor → Sub-agents Pattern

The orchestrator receives the user's objective, decomposes it into subtasks, and delegates to specialized sub-agents. Each sub-agent has its own toolset. The orchestrator consolidates results before responding.

🧠 Camada de Orquestração — Orchestration Layer

Agente Supervisor · Orchestrator Agent

⚙️ Sub-agentes Especializados — Specialized Sub-agents

Agente de Busca · Search Agent
Agente de Código · Code Agent
Agente de Escrita · Writer Agent

🔧 Ferramentas — Tools

Web Search · API externa
Code Executor · Sandbox
Knowledge Base · RAG / Vetores

🛡️ Human-in-the-Loop

Aprovação Humana · Human Approval

In practice: start with one agent, not five

Senior Solutions Architect

In practice, most systems I see in production went multi-agent too early. The team creates a supervisor, three sub-agents, and ten tools in the first week — and spends the next weeks debugging why the orchestrator picked the wrong agent. My rule: if a single agent with good prompting and the right tools solves 80% of cases, it goes to production that way. Multi-agent comes in when there's a concrete limit: context overflow, latency that requires real parallelism, or domains so distinct that a single prompt can't cover them well. Architectural complexity has a cost — and that cost is paid in debugging, not tokens.

Human-in-the-Loop: the most underrated pattern

Human-in-the-loop (HITL) is not a system limitation — it's a deliberate architectural decision. Instead of letting the agent execute a full plan autonomously, you insert a human approval point before irreversible or high-impact actions.

Concrete examples: before sending an email on behalf of the user, before deleting database records, before publishing content, before executing a financial transaction. The practical rule: if the action can't be easily undone, put a human in the path.

Implementing HITL is conceptually simple — the agent stops, serializes the current state (plan + context), and waits for an external confirmation. On AWS, this translates to an SQS queue or a Step Functions waitForTaskToken. The real challenge is UX: how do you present the agent's state in a way the human understands what they're approving?

HITL is also a learning mechanism. Each approval or rejection is evaluation data — you can use these signals to improve prompts, identify error patterns, and calibrate when the agent can operate more autonomously.

Don't confuse HITL with distrust of the model. It's risk management. Mature systems start with more approval points and remove them gradually as the agent demonstrates reliability — not the other way around.

Which pattern to use?

Simple Agent (ReAct)

Pros

Low latency, easy to debug
Predictable cost per call
No coordination overhead

Cons

Limited context for long tasks
No automatic output review

Default starting point. Use until you hit a concrete limit.

Reflection

Pros

Improves quality without changing architecture
Easy to add to any existing agent

Cons

Doubles (or more) token cost
Risk of loop without max_iterations

Add when quality matters more than speed.

Plan-and-Execute

Pros

Inspectable and auditable plan
Enables HITL before execution
Good for long, predictable tasks

Cons

Rigid: fails if the environment changes
Planning overhead for short tasks

Use when sequence matters and you want plan visibility.

Supervisor Multi-agent

Pros

Real parallelism between sub-agents
Domain specialization

Cons

High latency and coordination cost
Multiple failure points
Complex debugging

Use when a single agent has a concrete context or domain limit.

Key points from this lesson

Reflection = second pass with a critique prompt. Cheap to implement, effective for quality. Always define max_iterations.

Plan-and-Execute separates reasoning from execution. The plan is an inspectable artifact — great for HITL and auditing.

Supervisor multi-agent is the recommended starting point when you truly need multiple agents. Swarm is powerful and hard to debug.

Each boundary between agents = latency + cost + failure point. Premature multi-agent is a classic anti-pattern.

Human-in-the-loop is not weakness — it's risk management. Put humans in the path of irreversible actions.

Start simple. Add architectural complexity only when you hit a concrete limit, not an anticipated one.

Frequently asked questions

Can I combine Reflection with Plan-and-Execute?

Yes, and it makes sense. You can apply Reflection in the planning phase (review the plan before executing) and/or on the final output of each subtask. Cost increases, but so does quality. Evaluate by the impact of failure — the more costly the mistake, the more justified the review cost.

Is Swarm suitable for production?

It can be, but it requires very good observability. Without a central orchestrator, tracing why the system made a specific decision is hard. If you use swarm in production, invest heavily in distributed tracing — each message between agents must be logged with enough context to reconstruct the flow.

How to implement HITL on AWS?

The most robust pattern is Step Functions with waitForTaskToken: the agent stops, sends the token to a queue or notification, and resumes when the human responds with the token. For simpler cases, an SQS queue with an approval Lambda works well. Bedrock AgentCore (lesson 17) has native HITL support — we'll cover that in detail.

What's the difference between hierarchical multi-agent and Plan-and-Execute?

Plan-and-Execute is about separating the reasoning and action phases within an agent (or system). Hierarchical is about the organizational structure of multiple agents with supervision levels. You can have a hierarchical system that uses Plan-and-Execute internally at each level — they're different dimensions of the architecture.

My direct take

Comece simples. Escale com evidência. /

Of these three patterns, Reflection is the one I recommend adding first to any production agent — the cost is low and the quality gain is immediate. Plan-and-Execute comes in when you need auditability or HITL before critical actions. Multi-agent is for when you have a concrete, measurable limit that a single agent can't overcome. The order matters: don't skip steps. Most problems that seem to require five agents are solved with one well-designed agent, good tools, and Reflection. Architectural complexity is an investment — demand ROI before paying it.

Quiz

Quick check

1. When is multi-agent usually the wrong choice?

2. The Reflection pattern is about…

References and further reading

LangGraph — Multi-agent architectures (LangChain docs)Reflexion: Language Agents with Verbal Reinforcement Learning (Shinn et al., 2023)Plan-and-Solve Prompting (Wang et al., 2023)Amazon Bedrock — Multi-agent collaboration AWS Step Functions — Wait for a callback with a task token Anthropic — Building effective agents

Previous Next lesson

Pattern 1 — Reflection: the agent reviews its own output

Think of it like a developer who writes code and reads it again before opening the PR. The model doesn't change its nature; it just receives the context of "reviewer" instead of "author".

When to use: code generation, reports, long responses where a single hallucination is costly. The cost is one extra model call — cheap compared to delivering garbage to the user.

Pattern 2 — Plan-and-Execute: plan before acting

Plan-and-Execute separates responsibilities into two phases:

Planning phase: the model receives the objective and produces a structured plan — an ordered list of subtasks. No tools are called yet.
Execution phase: each subtask in the plan is executed in sequence (or in parallel, if independent), possibly by specialized agents.

When to use: long research tasks, complex document generation, data pipelines where sequence matters. Avoid for short, unpredictable tasks — the planning overhead isn't worth it.

Pattern 3 — Multi-agent: supervisor, hierarchical, and swarm

Multi-agent is the pattern that appears most in demos and generates the most unnecessarily complex architectures. Before adopting it, understand the three variants:

The diagram below shows the supervisor → sub-agents pattern, which is the recommended starting point for any multi-agent system.

Supervisor → Sub-agents Pattern

🧠 Camada de Orquestração — Orchestration Layer

Agente Supervisor · Orchestrator Agent

⚙️ Sub-agentes Especializados — Specialized Sub-agents

Agente de Busca · Search Agent
Agente de Código · Code Agent
Agente de Escrita · Writer Agent

🔧 Ferramentas — Tools

Web Search · API externa
Code Executor · Sandbox
Knowledge Base · RAG / Vetores

🛡️ Human-in-the-Loop

Aprovação Humana · Human Approval

Human-in-the-Loop: the most underrated pattern

Which pattern to use?

Simple Agent (ReAct)

Pros

Low latency, easy to debug
Predictable cost per call
No coordination overhead

Cons

Limited context for long tasks
No automatic output review

Default starting point. Use until you hit a concrete limit.

Reflection

Pros

Improves quality without changing architecture
Easy to add to any existing agent

Cons

Doubles (or more) token cost
Risk of loop without max_iterations

Add when quality matters more than speed.

Plan-and-Execute

Pros

Inspectable and auditable plan
Enables HITL before execution
Good for long, predictable tasks

Cons

Rigid: fails if the environment changes
Planning overhead for short tasks

Use when sequence matters and you want plan visibility.

Supervisor Multi-agent

Pros

Real parallelism between sub-agents
Domain specialization

Cons

High latency and coordination cost
Multiple failure points
Complex debugging

Use when a single agent has a concrete context or domain limit.

Key points from this lesson

Reflection = second pass with a critique prompt. Cheap to implement, effective for quality. Always define max_iterations.

Plan-and-Execute separates reasoning from execution. The plan is an inspectable artifact — great for HITL and auditing.

Supervisor multi-agent is the recommended starting point when you truly need multiple agents. Swarm is powerful and hard to debug.

Each boundary between agents = latency + cost + failure point. Premature multi-agent is a classic anti-pattern.

Human-in-the-loop is not weakness — it's risk management. Put humans in the path of irreversible actions.

Start simple. Add architectural complexity only when you hit a concrete limit, not an anticipated one.

Frequently asked questions

Can I combine Reflection with Plan-and-Execute?

Is Swarm suitable for production?

How to implement HITL on AWS?

What's the difference between hierarchical multi-agent and Plan-and-Execute?

My direct take

Comece simples. Escale com evidência. /