Who is Fernando F. Azevedo?

Fernando F. Azevedo is a Senior Solutions Architect at Banco Itaú with 16+ years of experience across AWS, event-driven architecture, DevSecOps, Data Mesh, AI and financial systems.

What technical topics does Fernando work with?

Fernando works with AWS, Kubernetes, Kafka, Data Mesh, Amazon Bedrock, RAG, DevSecOps, observability, financial systems and architecture communication using C4, ADRs and trade-off analysis.

Is Fernando available for professional conversations?

Fernando is currently building at Banco Itaú and is open to thoughtful conversations about architecture, cloud, AI, engineering leadership, community, podcasts and technical collaboration.

AI & AgentsComparison

Agentic RAG on AWS: Architecture Bake-Off for Financial-Grade Platforms

Jun 13, 2026 8 minexpert AI-assisted

Listen to article

Fernando's voice

Fernando · 16:40

Download MP3

0:0016:40

Speed

The MP3 is saved to S3 after the first play.

AI & AgentsComparison

$0.06–0.12

Cost per session (5 steps, Claude 3 Sonnet, 4k tokens/step)

Valid for Options A, C, D. Option B adds ~$0.002/session amortized EKS cost

10 TPS

Default Bedrock invocation limit per model per region

Throttling is the primary scale limiter for Options A and D at peak. Request increase via Service Quotas with use-case justification.

25 KB

Payload limit per state in Step Functions

Use the S3 reference pattern: store RAG context in S3 and pass only the S3 URI between states to avoid the limit.

fernando.moretes.com

Agentic RAG has moved from lab experiment to platform requirement in financial environments that demand auditability, cost control, and predictable latency. In this article I compare four concrete architectural approaches on AWS, with real trade-offs, plausible numbers, and an unambiguous recommendation.

When Google Research published its analysis on reliable Agentic RAG for enterprise platforms, the signal I captured was not about models — it was about orchestration, governance, and where control responsibility should live in the stack. In financial environments where every LLM call needs to be auditable, every retrieval failure needs to be traceable, and every inference dollar needs to be justifiable, the choice of agent orchestration architecture is not an implementation detail: it is a first-class architectural decision. I have four serious candidates on AWS and I am going to put them head to head.

The Real Problem: Why Naive RAG Falls Short in Finance

Classic RAG — embed, retrieve, generate — solves the static knowledge problem but breaks along three critical dimensions in financial environments. First, multi-hop reasoning: a query like "what is customer X's consolidated credit exposure considering derivatives and open positions as of T-2?" requires multiple retrieval steps with dependencies between them, not a single vector search. Second, tool use: computing VaR, querying a credit limit system, or triggering a real-time pricing API are actions, not documents — and naive RAG has no action mechanism. Third, regulatory traceability: under BACEN, CVM, or SEC, the reasoning chain that led to an automated recommendation or decision must be reconstructible, not just the final output.

Agentic RAG solves all three by introducing a plan-execute-observe loop (ReAct, MRKL, or variants) where the agent dynamically decides which tools to invoke, in what order, and when the answer is good enough to terminate. The cost of that capability is operational complexity: agent loops introduce non-determinism, variable latency, infinite-loop risk, and an expanded attack surface. The central architectural choice is where that loop lives and who controls it — and that is precisely where the four approaches diverge materially.

The Four Candidates: Architectural Identity of Each

Option A — Native Bedrock Agents: The agent lives entirely inside Bedrock. AWS manages the ReAct loop, tool routing (Action Groups via Lambda), session memory, and Knowledge Base integration (OpenSearch Serverless as the vector backend). The operator defines the instruction prompt, tool definitions in OpenAPI schema, and guardrails. Orchestration latency runs around 800ms–1.5s per reasoning step for Claude 3 Sonnet, billed per input/output token plus orchestration overhead.

Option B — LangGraph + EKS: The agent loop runs in Python inside EKS pods, using LangGraph to define the state graph. The team has full control over every graph node, transitions, state checkpointing in DynamoDB, and integration with any retrieval backend — OpenSearch Service (provisioned), pgvector on Aurora, or Kendra. Orchestration latency is deterministic and controllable, but operational responsibility is total.

Option C — Step Functions-orchestrated: The agent loop is externalized to a Step Functions Express Workflow. Each reasoning step is a machine state — invoke model, evaluate, branch, retrieve, tool call. LLM non-determinism is contained within individual states; the orchestration itself is deterministic, auditable in X-Ray, and replayable. Lambda executes the tools; Bedrock or SageMaker serves the model.

Option D — Hybrid Bedrock + Step Functions: Bedrock Agents handles the inner ReAct loop while Step Functions orchestrates the outer business flow — input validation, context enrichment, agent invocation, post-processing, audit logging. The agent is a controlled black box, not the primary orchestrator.

Comparative Map: Four Agentic RAG Architectures on AWS

Each column represents one architecture. Edges show orchestration and retrieval flow. The control line indicates where the agent loop resides.

🟧 Option A — Bedrock Agents Native

Bedrock Agent · ReAct loop (managed)
Knowledge Base · OpenSearch Serverless
Action Groups · Lambda (OpenAPI)
Guardrails · + Session Memory

🟦 Option B — LangGraph + EKS

LangGraph Pod · EKS (Fargate/EC2)
OpenSearch Service · Provisioned + kNN
DynamoDB · State checkpoint
Bedrock / SageMaker · Model endpoint

🟩 Option C — Step Functions Orchestrated

Step Functions · Express Workflow
Lambda · Tool executor
Bedrock InvokeModel · per-step call
X-Ray + CloudWatch · Full trace per step

🟨 Option D — Hybrid Bedrock + Step Functions

Step Functions · Outer orchestrator
Bedrock Agent · Inner ReAct loop
S3 + CloudWatch · Audit log sink

Critical Dimensions: Where Each Architecture Wins and Where It Bleeds

Regulatory auditability is where Step Functions (Option C) has a structural advantage. Each machine state generates an EventBridge event and an X-Ray entry with the full payload — input, output, duration, errors. In a BACEN audit, I can reconstruct exactly what the system decided at each reasoning step for any historical session. Bedrock Agents (Option A) provides CloudTrail for API calls and invocation logs in CloudWatch, but the internal reasoning loop is opaque — you see the input and final output, not the intermediate thinking steps, unless you explicitly enable trace via enableTrace: true in the InvokeAgent call.

Inference cost is the second differentiator. In Bedrock Agents, each reasoning step consumes tokens from the system prompt + conversation history + retrieved context + response. For Claude 3 Sonnet (us-east-1), that means roughly $3/1M input tokens and $15/1M output tokens. An agent with 5 reasoning steps and average context of 4k tokens per step costs ~$0.08 per session. In Step Functions Express, orchestration cost is $1/1M state transitions — practically zero — but you still pay for Bedrock tokens. The real difference is that Step Functions lets you terminate the loop early with deterministic evaluation logic, reducing unnecessary steps.

P99 latency is where LangGraph+EKS (Option B) can surprise negatively. Pod cold starts, state serialization overhead in DynamoDB, and network latency to provisioned OpenSearch Service can push P99 to 8–12 seconds under complex workloads. Bedrock Agents, being managed serverless, has more consistent P99 around 4–7 seconds for 5 steps, but without tuning control.

Technical Comparison: Four Agentic RAG Architectures

	Dimension	A — Bedrock Agents	B — LangGraph + EKS	C — Step Functions	D — Hybrid
Agent loop control	AWS-managed (grey box)	Full (Python code)	Full (declarative states)	Partial (outer controlled)	—
Regulatory auditability	Medium (trace opt-in)	High (custom logging)	Very high (native X-Ray)	High (SF trace + CT)	—
P50 latency (5 steps)	~3–5s	~2–4s (warm pods)	~3–6s	~4–7s	—
P99 latency (5 steps)	~5–8s	~8–14s (cold start)	~6–10s	~7–12s	—
Orchestration cost (excl. tokens)	Included in Bedrock	EKS node hours (~$0.10–0.20/h)	~$0.000001/transition	SF + Bedrock overhead	—
Guardrails and security	Native (Bedrock Guardrails)	Custom (code + WAF)	Custom (Lambda + WAF)	Bedrock + SF validation	—
Operational complexity	Low	High	Medium	Medium-high	—
Portability (multi-cloud / on-prem)	Low (Bedrock lock-in)	High (framework-agnostic)	Low (AWS lock-in)	Low (AWS lock-in)	—
Hybrid retrieval support (sparse+dense)	Limited (managed KB)	Full (custom pipeline)	Full (via Lambda)	Partial (KB + Lambda)	—
Time-to-production (average team)	2–4 weeks	8–16 weeks	4–8 weeks	5–10 weeks	—

Security and Governance: What the Comparison Table Doesn't Fully Capture

In regulated financial environments, the attack surface of a RAG agent is qualitatively different from a REST API. The agent can be induced via prompt injection to exfiltrate context data, invoke unauthorized tools, or bypass guardrails. Each architecture has a distinct risk profile.

In Bedrock Agents, native Guardrails offers configurable content filters with harm categories (HATE, INSULTS, SEXUAL, VIOLENCE, MISCONDUCT, PROMPT_ATTACK) and word filters with custom lists — configurable via the CreateGuardrail API with contentPolicyConfig and wordPolicyConfig. The problem is that Guardrails evaluates the output, not the intermediate reasoning. An injection that manipulates the planning step can go undetected if the final output appears benign.

In LangGraph+EKS, security responsibility is entirely the team's. That means implementing: (1) input sanitization before any model call, (2) IAM roles with least-privilege per graph node — a retrieval node should not have permission to invoke payment APIs, (3) KMS CMK for state encryption in DynamoDB with aws:kms encryption type and per-tenant key in multi-tenant environments, (4) VPC endpoints for OpenSearch and Bedrock eliminating public internet traffic.

In Step Functions, the state separation creates a natural opportunity to insert deterministic validation between steps — a ValidateToolOutput state that checks schema, range, and permissions before passing the result to the next reasoning step. This is difficult to do reliably in Bedrock Agents without extensive Action Group customization. For environments with LGPD/GDPR requirements, Step Functions also facilitates data residency controls via IAM condition aws:RequestedRegion on each service invocation.

Decision Matrix: Which Architecture for Which Context

A — Native Bedrock Agents

Pros

Lowest time-to-production (2–4 weeks)
Managed guardrails and session memory
Native integration with Knowledge Bases and OpenSearch Serverless
No agent infrastructure operational overhead

Cons

Opaque reasoning loop — limited auditability without explicit trace
Hybrid retrieval (BM25 + dense) not natively available in KB
Strong Bedrock lock-in; costly migration
Limited control over per-tool retry and backoff policy

Ideal for MVPs and teams without distributed orchestration expertise. Not recommended for Tier-1 regulatory audit environments.

B — LangGraph + EKS

Pros

Full control over state graph — unit-testable
Support for custom hybrid retrieval and re-ranking
Portability: can run on any cloud or on-prem
Persistent state checkpointing for long-running sessions

Cons

High operational complexity: EKS, HPA, cold starts, dependency management
Time-to-production 3–4x longer than Bedrock Agents
Security and guardrails are entirely the team's responsibility
P99 degraded by cold starts without configured warm pool

Right for AI platforms with mature ML engineering teams needing portability and granular control. Overkill for most financial use cases.

C — Step Functions-Orchestrated

Pros

Maximum auditability: every step traced in X-Ray with full payload
Deterministic orchestration with LLM non-determinism contained
Virtually zero orchestration cost ($1/1M transitions)
Native retry with jitter, per-state timeout, and transactional compensation

Cons

ASL verbosity for complex loops — non-trivial maintenance
25k character limit per state payload (mitigable with S3 reference pattern)
No native session memory — must be implemented externally
Dynamic agent loop requires Map state or recursion — complex to model

Best choice for agent flows with known maximum step count and Tier-1 regulatory audit requirements. My primary recommendation for banks and brokerages.

D — Hybrid Bedrock + Step Functions

Pros

Combines Bedrock development speed with SF flow control
Business flow audit in SF; internal reasoning in Bedrock trace
Easy to add deterministic pre/post-processing around the agent

Cons

Two orchestration systems to operate and debug
Bedrock Agent inner loop still partially opaque
Cumulative cost: Bedrock overhead + SF transitions

Good compromise for teams already using Bedrock Agents that need to add governance without rewriting everything. Not the ideal greenfield choice.

Observability and SLOs: What to Monitor in Each Architecture

Agentic RAG breaks traditional API SLOs because latency is a function of the number of reasoning steps, which is non-deterministic. Defining a P99 < 5s SLO for an agent with up to 8 steps is mathematically impossible without explicit termination control.

For Bedrock Agents, the primary observability signals are: InvokeAgent duration in CloudWatch (metric InvocationLatency), step count via trace parsing (field orchestrationTrace.rationale), and THROTTLING_EXCEPTION rate indicating pressure on the account TPS limit (default: 10 TPS per model per region, increasable via Service Quotas). A realistic SLO is P95 < 8s with a 0.5% error budget for throttling.

For Step Functions, each state emits native metrics: ExecutionTime, ExecutionsFailed, ExecutionsTimedOut. With OpenTelemetry, I can instrument each tool's Lambda with spans that propagate the Step Functions traceId, creating an end-to-end trace tree in X-Ray or Datadog. The most useful SLO here is steps per session — if P95 > 6 steps, that's a signal that the system prompt or retrieved documents are low quality, not that the system is slow.

For LangGraph+EKS, the pattern that works is the OpenTelemetry SDK with LangChain auto-instrumentation, exporting to ADOT Collector on EKS and then to CloudWatch EMF or Datadog. Critical metrics: llm.token.usage per graph node (for per-step cost control), retrieval.hit_rate (relevant documents / total retrieved), and tool.error_rate per tool name. A retrieval.hit_rate < 0.6 in production is an alert for vector index degradation — likely embedding drift or a stale index.

The Flexibility Paradox in Financial Agents

The architecture that gives the agent the most flexibility (LangGraph with an open graph) is precisely the one that most complicates regulatory audit. In finance, the goal is not to maximize agent autonomy — it is to maximize behavioral predictability within an envelope of authorized actions. This inverts the intuition of those coming from the AI research world: you want the most constrained agent that still solves the problem, not the most capable one. Step Functions enforces that constraint structurally.

Cost and Performance Reference Points (Production Estimates)

$0.06–0.12

Cost per session (5 steps, Claude 3 Sonnet, 4k tokens/step)

Valid for Options A, C, D. Option B adds ~$0.002/session amortized EKS cost

10 TPS

Default Bedrock invocation limit per model per region

Throttling is the primary scale limiter for Options A and D at peak. Request increase via Service Quotas with use-case justification.

25 KB

Payload limit per state in Step Functions

Use the S3 reference pattern: store RAG context in S3 and pass only the S3 URI between states to avoid the limit.

Critical Anti-Patterns in Financial Agentic RAG

No step limit: Agents without a configured maxIterations can enter infinite loops consuming tokens indefinitely. Always set a ceiling — 8 steps is reasonable for most financial use cases.
Overpermissioned tools: Action Groups or Lambda tools with IAM policies using * on resources. Each tool should have a dedicated IAM role with minimum permissions and aws:ResourceTag conditions for per-tenant isolation.
RAG context without metadata filtering: Retrieving documents without filtering by tenantId, classification_level, or effective_date in the OpenSearch query. In multi-tenant environments, this is a data leakage vector between clients.
Logging prompts with PII: Enabling full trace in Bedrock or logging LangGraph payloads without masking CPF, bank account, and position data. This violates LGPD and creates audit risk.
Static embeddings in production: Indexing documents once and never reindexing. Embedding drift when switching embedding models silently invalidates retrieval quality — monitor retrieval.hit_rate and reindex when switching models.

Architect's Note: What I Would Actually Do

Senior Solutions Architect

In every regulated financial environment I have architected, Option C — Step Functions with Bedrock InvokeModel per state — is the correct starting point, not because it is the most elegant, but because it is the most auditable and the easiest to explain to a compliance team that has never seen an AI agent. The hard-won lesson is that the AI adoption battle in finance is not technical — it is about institutional trust. A Step Functions flow with named states (EvaluateCreditQuery, RetrieveRegulatoryDocs, ValidateToolOutput) that an auditor can read in the AWS console is worth more than a perfectly optimized LangGraph loop that only the ML team understands. Once the business and governance are comfortable, then you evolve to the hybrid or to LangGraph — but start with what you can defend in a risk committee meeting.

Verdict: Step Functions-Orchestrated Is the Financially Responsible Choice

Step Functions-Orchestrated (Option C) f

For financial platforms with serious regulatory requirements — BACEN, CVM, SEC, LGPD — Option C (Step Functions-orchestrated) is the primary recommendation. It delivers structural auditability that none of the other options offer natively, negligible orchestration cost, deterministic per-step retry and timeout, and a mental model that compliance and engineering teams can share. The 25KB payload limit is the only real obstacle — solved with the S3 reference pattern in less than a sprint. Native Bedrock Agents (Option A) is the right choice for MVPs, proofs of concept, and internal use cases where Tier-1 auditability is not a requirement. Do not dismiss it — it has the best time-to-production and the lowest operational overhead. LangGraph+EKS (Option B) only makes sense if you have an AI platform with a dedicated ML engineering team, need real portability (multi-cloud or on-prem), and are willing to invest 3–4x more engineering time. For most Brazilian banks and brokerages, that investment is not justified.

References and Further Reading

AWS Bedrock Agents — Developer Guide AWS Bedrock Guardrails — Configuration Reference AWS Step Functions — Express Workflows Amazon OpenSearch Service — k-NN Plugin AWS Well-Architected Framework — Machine Learning Lens LangGraph — State Graph Documentation ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022)Google Research Blog — Agentic RAG

#agentic-rag#bedrock#opensearch#eks#step-functions#financial-grade#governance#aws

Liked this? Get the next one.

Architecture, AWS, AI and market deep dives — straight to your inbox. Free.

No spam · unsubscribe anytime

Analyzed source: Agentic RAG

Ask Fernando about this

Get a focused answer about this article from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Keep reading

AI & AgentsContract Intelligence on AWS: Field-Notes ArchitectureBuilding contract intelligence with generative AI goes far beyond wiring an LLM to PDFs. This article documents the architectural patterns, operational gotchas, and design decisions that separate an impressive PoC from a reliable system in financial-grade production.Read AI & AgentsBedrock Managed Knowledge Base: Anatomy of a Managed RAG PipelineAmazon Bedrock Managed Knowledge Base abstracts the entire RAG stack — connectors, parsing, embeddings, re-ranking, and agentic retrieval — into a single managed primitive. In this article, I disassemble each layer, expose the failure modes the documentation doesn't mention, and analyze the real trade-offs for engineers designing financial-grade AI systems on AWS.Read AI & AgentsDocument Automation with Bedrock: A Modernization JourneyLegacy document extraction pipelines in financial environments accumulate silent technical debt: brittle OCR, manual rules, and absent traceability. In this article, I narrate the modernization journey to Bedrock Data Automation, covering architecture decisions, managed risks, and what genuinely changes in operations. The analysis is grounded in real patterns from critical financial systems, not lab demos.Read

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime

AI & AgentsComparison

Agentic RAG on AWS: Architecture Bake-Off for Financial-Grade Platforms

Jun 13, 2026 8 minexpert AI-assisted

Listen to article

Fernando's voice

Fernando · 16:40

Download MP3

0:0016:40

Speed

The MP3 is saved to S3 after the first play.

AI & AgentsComparison

$0.06–0.12

Cost per session (5 steps, Claude 3 Sonnet, 4k tokens/step)

Valid for Options A, C, D. Option B adds ~$0.002/session amortized EKS cost

10 TPS

Default Bedrock invocation limit per model per region

Throttling is the primary scale limiter for Options A and D at peak. Request increase via Service Quotas with use-case justification.

25 KB

Payload limit per state in Step Functions

Use the S3 reference pattern: store RAG context in S3 and pass only the S3 URI between states to avoid the limit.

fernando.moretes.com

The Real Problem: Why Naive RAG Falls Short in Finance

The Four Candidates: Architectural Identity of Each

Comparative Map: Four Agentic RAG Architectures on AWS

Each column represents one architecture. Edges show orchestration and retrieval flow. The control line indicates where the agent loop resides.

🟧 Option A — Bedrock Agents Native

Bedrock Agent · ReAct loop (managed)
Knowledge Base · OpenSearch Serverless
Action Groups · Lambda (OpenAPI)
Guardrails · + Session Memory

🟦 Option B — LangGraph + EKS

LangGraph Pod · EKS (Fargate/EC2)
OpenSearch Service · Provisioned + kNN
DynamoDB · State checkpoint
Bedrock / SageMaker · Model endpoint

🟩 Option C — Step Functions Orchestrated

Step Functions · Express Workflow
Lambda · Tool executor
Bedrock InvokeModel · per-step call
X-Ray + CloudWatch · Full trace per step

🟨 Option D — Hybrid Bedrock + Step Functions

Step Functions · Outer orchestrator
Bedrock Agent · Inner ReAct loop
S3 + CloudWatch · Audit log sink

Critical Dimensions: Where Each Architecture Wins and Where It Bleeds

Technical Comparison: Four Agentic RAG Architectures

	Dimension	A — Bedrock Agents	B — LangGraph + EKS	C — Step Functions	D — Hybrid
Agent loop control	AWS-managed (grey box)	Full (Python code)	Full (declarative states)	Partial (outer controlled)	—
Regulatory auditability	Medium (trace opt-in)	High (custom logging)	Very high (native X-Ray)	High (SF trace + CT)	—
P50 latency (5 steps)	~3–5s	~2–4s (warm pods)	~3–6s	~4–7s	—
P99 latency (5 steps)	~5–8s	~8–14s (cold start)	~6–10s	~7–12s	—
Orchestration cost (excl. tokens)	Included in Bedrock	EKS node hours (~$0.10–0.20/h)	~$0.000001/transition	SF + Bedrock overhead	—
Guardrails and security	Native (Bedrock Guardrails)	Custom (code + WAF)	Custom (Lambda + WAF)	Bedrock + SF validation	—
Operational complexity	Low	High	Medium	Medium-high	—
Portability (multi-cloud / on-prem)	Low (Bedrock lock-in)	High (framework-agnostic)	Low (AWS lock-in)	Low (AWS lock-in)	—
Hybrid retrieval support (sparse+dense)	Limited (managed KB)	Full (custom pipeline)	Full (via Lambda)	Partial (KB + Lambda)	—
Time-to-production (average team)	2–4 weeks	8–16 weeks	4–8 weeks	5–10 weeks	—

Security and Governance: What the Comparison Table Doesn't Fully Capture

Decision Matrix: Which Architecture for Which Context

A — Native Bedrock Agents

Pros

Lowest time-to-production (2–4 weeks)
Managed guardrails and session memory
Native integration with Knowledge Bases and OpenSearch Serverless
No agent infrastructure operational overhead

Cons

Opaque reasoning loop — limited auditability without explicit trace
Hybrid retrieval (BM25 + dense) not natively available in KB
Strong Bedrock lock-in; costly migration
Limited control over per-tool retry and backoff policy

Ideal for MVPs and teams without distributed orchestration expertise. Not recommended for Tier-1 regulatory audit environments.

B — LangGraph + EKS

Pros

Full control over state graph — unit-testable
Support for custom hybrid retrieval and re-ranking
Portability: can run on any cloud or on-prem
Persistent state checkpointing for long-running sessions

Cons

High operational complexity: EKS, HPA, cold starts, dependency management
Time-to-production 3–4x longer than Bedrock Agents
Security and guardrails are entirely the team's responsibility
P99 degraded by cold starts without configured warm pool

Right for AI platforms with mature ML engineering teams needing portability and granular control. Overkill for most financial use cases.

C — Step Functions-Orchestrated

Pros

Maximum auditability: every step traced in X-Ray with full payload
Deterministic orchestration with LLM non-determinism contained
Virtually zero orchestration cost ($1/1M transitions)
Native retry with jitter, per-state timeout, and transactional compensation

Cons

ASL verbosity for complex loops — non-trivial maintenance
25k character limit per state payload (mitigable with S3 reference pattern)
No native session memory — must be implemented externally
Dynamic agent loop requires Map state or recursion — complex to model

Best choice for agent flows with known maximum step count and Tier-1 regulatory audit requirements. My primary recommendation for banks and brokerages.

D — Hybrid Bedrock + Step Functions

Pros

Combines Bedrock development speed with SF flow control
Business flow audit in SF; internal reasoning in Bedrock trace
Easy to add deterministic pre/post-processing around the agent

Cons

Two orchestration systems to operate and debug
Bedrock Agent inner loop still partially opaque
Cumulative cost: Bedrock overhead + SF transitions

Good compromise for teams already using Bedrock Agents that need to add governance without rewriting everything. Not the ideal greenfield choice.

Observability and SLOs: What to Monitor in Each Architecture

The Flexibility Paradox in Financial Agents

Cost and Performance Reference Points (Production Estimates)

$0.06–0.12

Cost per session (5 steps, Claude 3 Sonnet, 4k tokens/step)

Valid for Options A, C, D. Option B adds ~$0.002/session amortized EKS cost

10 TPS

Default Bedrock invocation limit per model per region

Throttling is the primary scale limiter for Options A and D at peak. Request increase via Service Quotas with use-case justification.

25 KB

Payload limit per state in Step Functions

Use the S3 reference pattern: store RAG context in S3 and pass only the S3 URI between states to avoid the limit.

Critical Anti-Patterns in Financial Agentic RAG

No step limit: Agents without a configured maxIterations can enter infinite loops consuming tokens indefinitely. Always set a ceiling — 8 steps is reasonable for most financial use cases.
Overpermissioned tools: Action Groups or Lambda tools with IAM policies using * on resources. Each tool should have a dedicated IAM role with minimum permissions and aws:ResourceTag conditions for per-tenant isolation.
RAG context without metadata filtering: Retrieving documents without filtering by tenantId, classification_level, or effective_date in the OpenSearch query. In multi-tenant environments, this is a data leakage vector between clients.
Logging prompts with PII: Enabling full trace in Bedrock or logging LangGraph payloads without masking CPF, bank account, and position data. This violates LGPD and creates audit risk.
Static embeddings in production: Indexing documents once and never reindexing. Embedding drift when switching embedding models silently invalidates retrieval quality — monitor retrieval.hit_rate and reindex when switching models.

Architect's Note: What I Would Actually Do

Senior Solutions Architect

Verdict: Step Functions-Orchestrated Is the Financially Responsible Choice

Step Functions-Orchestrated (Option C) f

References and Further Reading

#agentic-rag#bedrock#opensearch#eks#step-functions#financial-grade#governance#aws

Liked this? Get the next one.

Architecture, AWS, AI and market deep dives — straight to your inbox. Free.

No spam · unsubscribe anytime

Analyzed source: Agentic RAG

Ask Fernando about this

Get a focused answer about this article from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Keep reading

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime