Who is Fernando F. Azevedo?

Fernando F. Azevedo is a Senior Solutions Architect at Banco Itaú with 16+ years of experience across AWS, event-driven architecture, DevSecOps, Data Mesh, AI and financial systems.

What technical topics does Fernando work with?

Fernando works with AWS, Kubernetes, Kafka, Data Mesh, Amazon Bedrock, RAG, DevSecOps, observability, financial systems and architecture communication using C4, ADRs and trade-off analysis.

Is Fernando available for professional conversations?

Fernando is currently building at Banco Itaú and is open to thoughtful conversations about architecture, cloud, AI, engineering leadership, community, podcasts and technical collaboration.

AI & AgentsDeep Dive

Bedrock Managed Knowledge Base: Anatomy of a Managed RAG Pipeline

Jun 23, 2026 9 minexpert AI-assisted

Listen to article

Fernando's voice

Fernando · 18:55

Download MP3

0:0018:55

Speed

The MP3 is saved to S3 after the first play.

AI & AgentsDeep Dive

fernando.moretes.com

Amazon Bedrock Managed Knowledge Base abstracts the entire RAG stack — connectors, parsing, embeddings, re-ranking, and agentic retrieval — into a single managed primitive. In this article, I disassemble each layer, expose the failure modes the documentation doesn't mention, and analyze the real trade-offs for engineers designing financial-grade AI systems on AWS.

When AWS announces something is "fully managed," the right question isn't "what does it do?" — it's "what does it hide from you, and at what cost?" Amazon Bedrock Managed Knowledge Base, launched at AWS Summit New York in June 2026, collapses six RAG infrastructure components — ingestion connectors, multimodal parsing, chunking, embeddings, vector store, and re-ranking — into a single API primitive. For engineering teams today operating hand-rolled RAG pipelines with Glue, OpenSearch, and Lambda stitched together by Step Functions, this is a genuine leverage shift. But managed abstractions in financial environments demand that you understand exactly where the control boundary ends and where operational risk begins.

What the abstraction actually hides

Before Managed Knowledge Base, building a production-grade RAG pipeline on AWS involved at least six independent infrastructure decisions: which embedding model to use (Titan Embeddings v2, Cohere Embed v3, or a custom model?), which chunking strategy (fixed, semantic, hierarchical?), which vector store (OpenSearch Serverless, Aurora pgvector, or Pinecone via external connector?), how to implement re-ranking (local cross-encoder or Cohere Rerank via Bedrock?), how to handle data source ACLs at retrieval time, and how to orchestrate incremental data sync. Each of those decisions carries experimentation latency — typically weeks of tuning before hitting acceptable recall@10 in production.

Managed Knowledge Base makes those decisions for you by default, automatically selecting and managing the embedding model, re-ranker, and underlying foundation model. This is non-trivial: automatic model selection implies AWS is making cost/latency/quality trade-offs on your behalf, and those choices shift as new models are released. In a regulated financial environment, where model decision traceability is an audit requirement, you need to understand that "managed" means AWS may silently update the underlying embedding model — potentially shifting the vector space distribution and invalidating existing indexes. This is not hypothetical: it's exactly the kind of change that breaks RAG pipelines in production without an obvious alarm.

Managed RAG Pipeline: Ingestion and Retrieval Flow

Two parallel tracks: ingestion (left) and agentic retrieval (right). Edges show where customer control ends and the managed plane takes over.

📥 Ingestion — Data Sources

Amazon S3 · bucket / prefix
SharePoint · OAuth connector
Google Drive · native connector
Web Crawler · HTML + images

⚙️ Managed Plane — Smart Parsing & Embedding

Smart Parsing · per-connector strategy
Multimodal FM · bounding box + caption
Adaptive Chunking · structure-aware
Embedding Model · auto-selected
Vector Store · managed index

🔍 Retrieval — Agentic Retriever

Agentic Retriever · query planner
Re-ranker Model · auto-selected
AgentCore Gateway · RBAC + observability

👤 Consumer

Bedrock Agent · or FM tool call
AgentCore Observability · metrics + eval

Smart Parsing: What is happening under the hood

Smart Parsing is not a single algorithm — it's a strategy router per connector type and content type. For the S3 connector, the system detects MIME type and applies differentiated strategies: PDFs with complex tables are sent to a foundation model that identifies bounding boxes and extracts tabular structure before chunking; Word documents preserve heading hierarchy; video files receive caption generation and scene description via multimodal processing. For the Web Crawler connector, HTML structure — including embedded images and tables — is preserved rather than flattened to plain text, which is a genuine improvement over BeautifulSoup-based parsers that discard visual context.

The critical point for financial systems engineers: regulatory documents (fund prospectuses, SEC filings, derivatives contracts) are dense with numerical tables, cross-references, and hierarchical structure. Naïve fixed-token-size chunking is the single biggest quality destroyer for RAG on these documents — it splits tables mid-row, separates headers from their data, and creates chunks without sufficient context for accurate retrieval. Smart Parsing's adaptive chunking, which uses an FM to understand document structure before deciding chunk boundaries, is genuinely superior for this document profile. The caveat: you have no direct visibility into which FM is being used for parsing, what the per-document cost is, or how behavior changes when AWS internally updates the parsing model. For a pipeline processing millions of regulatory documents, that cost opacity is a FinOps risk that needs monitoring via CloudWatch Metrics with alarms on KnowledgeBase/IngestDocumentCount and Bedrock costs per knowledge base ID.

Vector Space and Model Traceability: The Silent Risk

In regulated financial systems, traceability of which model generated which embeddings is not an engineering detail — it's an audit requirement. If AWS silently updates the default embedding model in Managed Knowledge Base, existing vectors in the index were generated by a different model than new vectors being inserted. This creates vector space inconsistency that silently degrades retrieval quality without triggering any obvious alarm. The mitigation: explicitly pin the embedding model version in knowledge base configuration when the API allows it, and monitor KnowledgeBase/RetrievalRelevanceScore as an SLI — a sustained drop of more than 5% should trigger a full re-indexing review.

Agentic Retriever: Multi-Hop Query Planning and Its Limits

The Agentic Retriever is the most architecturally interesting piece. Instead of executing a single vector search and passing the top-K chunks to the FM, it decomposes the user query into a step-by-step execution plan — inferring intent, identifying which knowledge bases are relevant for each sub-query, executing retrievals in parallel or sequentially as needed, and combining results before returning context to the agent.

The article's example is illustrative: "What is the cloud infrastructure budget for the ML platform team?" and "Does our expense policy allow prepaying annual commitments?" are two queries requiring retrieval from different sources — budget data and policy documents — and result synthesis. A single-step retriever would fail to connect that information. The Agentic Retriever solves this with a two-step plan: first retrieve who owns the ML platform and what their budget is; then retrieve the relevant expense policy; finally synthesize the answer.

The limits that matter in production: first, query planning is executed by an FM, which adds latency and cost per invocation — in financial systems with p99 latency SLOs below 2 seconds for agent queries, this needs to be measured, not assumed. Second, the execution plan is not deterministic across invocations — the same query may generate different plans, which complicates debugging and response auditing. Third, the Agentic Retriever operates within a single knowledge base or across multiple knowledge bases of the same Managed KB type — it cannot retrieve from legacy Self-Managed KB types in the same plan, which is a real limitation for gradual migrations. For systems requiring full traceability of each retrieval step, you'll want to instrument the AgentCore Observability dashboard and export traces to CloudWatch Logs with 7-year retention for financial regulatory compliance.

AgentCore Gateway: Security, RBAC, and the Permissions Model

AgentCore Gateway is where Managed Knowledge Base connects to the AWS security model. When you create a Managed KB, IAM roles are auto-generated — but "automatic" does not mean "correct for your environment." In multi-tenant financial systems, where different users should only see documents they have permission for, automatic role generation is a starting point, not a destination.

The Managed Knowledge Base permissions model inherits ACLs from data sources at ingestion time — for SharePoint and OneDrive, this means SharePoint document permissions are propagated to the index. At retrieval time, the Agentic Retriever can filter results based on the querying user's identity, provided that identity is correctly passed via AgentCore Gateway. This is a genuine improvement over knowledge bases that ignore ACLs entirely, but correct implementation requires configuring sessionAttributes with the user identity context and ensuring AgentCore Gateway is configured with the correct IAM conditions to propagate that identity to the retrieval plane.

The Verified Permissions pattern for multi-tenant RAG — documented in the AWS Architecture Blog — is the natural complement here: you use Amazon Verified Permissions to evaluate authorization policies outside the retrieval path, and pass results as metadata filters to the knowledge base. This decouples authorization logic from retrieval logic, which is essential for auditability. Integration with Managed Knowledge Base via AgentCore Gateway is still maturing — verify that sessionAttributes-based filtering is available in the current release before assuming ACL enforcement is automatic end-to-end.

Failure Modes the Documentation Does Not Mention

Every managed system has failure modes that only surface in production. Based on patterns I've observed in financial-grade RAG pipelines, here are the ones that matter most for Managed Knowledge Base.

Incremental sync drift: Native connectors perform incremental sync — but "incremental" means documents deleted at the source may remain in the index for an indeterminate period depending on the configured sync interval. In financial systems where stale documents (a fund prospectus with outdated risk information, for example) cannot be retrieved, you need an explicit invalidation mechanism and monitoring of KnowledgeBase/DocumentDeleteCount to ensure deletions are being propagated.

Embedding throttling on bulk ingestion: Bedrock has tokens-per-minute quotas per embedding model. For an initial ingestion of millions of documents, you will hit throttling. Managed Knowledge Base should handle retries internally, but effective ingestion rate is bounded by these quotas — which vary by region and model. Request quota increases before starting bulk ingestion.

Silent multimodal parsing failure: When the parsing FM fails to extract content from a complex image or table, the default behavior is to skip the problematic content and continue. This means critical documents may be partially indexed with no obvious alarm. Monitor KnowledgeBase/ParseFailureCount and implement post-ingestion coverage validation for critical documents.

Non-deterministic Agentic Retriever latency: Queries that trigger multi-hop plans have variable latency — a 3-hop plan may take 8-15 seconds depending on knowledge base size and synthesis complexity. For synchronous user interfaces with 30-second timeouts, this is acceptable. For APIs with p99 SLOs under 3 seconds, you need a circuit breaker that gracefully degrades to single-hop retrieval when the Agentic Retriever exceeds a configurable latency budget.

Critical Anti-Patterns with Managed Knowledge Base

Assuming ACL enforcement is automatic end-to-end: Connectors propagate permissions at ingestion, but identity-based filtering at retrieval requires explicit sessionAttributes configuration in AgentCore Gateway. Without it, all users see all documents — a data leak in multi-tenant environments.
Using the Agentic Retriever for all queries without a latency budget: Simple queries requiring a single retrieval hop don't need the query planner overhead. Implement query routing — simple queries go directly to single-step retrieval, complex queries use the Agentic Retriever — to control cost and latency.
Not monitoring retrieval quality drift after model updates: AWS may update embedding and parsing models internally. Without a continuous evaluation pipeline measuring recall@10 and MRR on a golden dataset, you won't know when retrieval quality has silently degraded.
Bulk ingestion without quota control: Starting ingestion of millions of documents without requesting Bedrock tokens-per-minute quota increases results in throttling that may take days to complete the initial index — with retry behavior opaque to the operator.
Treating Managed Knowledge Base as a direct drop-in replacement for Self-Managed KB in migrations: The Agentic Retriever does not operate in mixed plans with Self-Managed KBs. Gradual migrations need to be planned with a parallel operation period and retrieval parity validation before cutover.

AWS Well-Architected Lenses for Managed Knowledge Base

Security

Configure sessionAttributes in AgentCore Gateway to propagate user identity to the retrieval plane. Use Amazon Verified Permissions for authorization outside the retrieval path and pass results as metadata filters. Enable KMS CMK for vector index encryption. Review auto-generated IAM roles and apply least privilege — especially for SaaS connectors requiring OAuth tokens stored in Secrets Manager.

Reliability

Implement a circuit breaker for the Agentic Retriever with graceful degradation to single-hop retrieval when latency exceeds budget. Monitor KnowledgeBase/DocumentDeleteCount to ensure deletion propagation. Validate ingestion coverage post-sync for critical documents via KnowledgeBase/ParseFailureCount. Plan Bedrock quota capacity before bulk ingestion.

Performance efficiency

Implement query routing to separate simple queries (single-hop) from complex queries (Agentic Retriever). Measure p50/p95/p99 latency per query type and configure CloudWatch alarms. For read-heavy knowledge bases, evaluate whether the cost of automatic re-ranking justifies the quality gain for your specific query profile.

Managed Knowledge Base vs. Self-Managed RAG: When to Use Each

	Dimension	Managed Knowledge Base	Self-Managed RAG (OpenSearch + Glue + Lambda)
Embedding model control	Automatic (pinnable via config)	Full — you choose and version	—
Multi-tenant ACL enforcement	Native for SaaS connectors; requires explicit sessionAttributes config	Manual implementation — more work, more control	—
Retrieval p99 latency	Variable: 500ms-15s depending on Agentic Retriever plan	Predictable: 200ms-2s with OpenSearch tuning	—
Audit traceability	AgentCore Observability + CloudWatch; execution plan exportable	Full — you control every log and trace	—
Operational cost (TCO)	Lower engineering overhead; FM cost for parsing/retrieval	Higher engineering overhead; predictable infrastructure cost	—
Suitability for gradual migration	Limited — Agentic Retriever does not operate with Self-Managed KBs in the same plan	Full — you control the migration strategy	—

My Curation Note

Senior Solutions Architect

In production-grade financial systems, I would adopt Managed Knowledge Base for new greenfield projects where delivery speed is a priority and the document profile benefits from multimodal Smart Parsing — especially for SharePoint and OneDrive connectors where native ACL propagation solves a problem that takes weeks to implement correctly from scratch. What I would not do: use model defaults without explicitly pinning versions, and trust automatic ACL enforcement without validating the sessionAttributes flow end-to-end in a staging environment with synthetic data mirroring the production permission structure. The most expensive lesson I've learned in production RAG pipelines is that retrieval quality degrades silently — implement continuous evaluation with a golden dataset before go-live, not after.

Verdict: Real Primitive, But Not a Black Box

Amazon Bedrock Managed Knowledge Base is a genuine advance in reducing undifferentiated heavy lifting for enterprise RAG pipelines. Multimodal Smart Parsing, native connectors with ACL propagation, and the Agentic Retriever for multi-hop queries solve real problems that cost weeks of engineering to implement with acceptable quality. For teams building AI agents over corporate data dispersed across SharePoint, S3, and Google Drive, the time-to-value is dramatically lower than a self-managed approach. But in regulated financial environments, "managed" is not synonymous with "governance-free." You need to pin embedding model versions for audit traceability, explicitly configure ACL enforcement via sessionAttributes, implement continuous retrieval quality evaluation as an SLI, and monitor FM costs per knowledge base ID. The Agentic Retriever has non-deterministic latency that requires a circuit breaker for aggressive SLOs. And the cost opacity of Smart Parsing on ingestion of millions of documents is a FinOps risk requiring proactive instrumentation. My recommendation: adopt for greenfield, validate ACL enforcement end-to-end before production, implement continuous evaluation with a golden dataset, and monitor RetrievalRelevanceScore as the primary SLI. For legacy systems with Self-Managed KBs, plan careful migration — the Agentic Retriever limitation in mixed plans is real and affects gradual migration strategies.

References

Introducing Amazon Bedrock Managed Knowledge Base (AWS News Blog)Secure multi-tenant RAG with Amazon Bedrock and Verified Permissions Amazon Bedrock Knowledge Bases — Developer Guide Amazon Bedrock AgentCore Gateway — Developer Guide Amazon Verified Permissions — Developer Guide Architecting AI-powered resilience framework on AWS CNCF Blog: Agent Auth — A lawyer's day in court Building RAG-based LLM applications for production (Anyscale)

#bedrock#rag#knowledge-base#agentic-ai#enterprise-ai#aws#financial-grade#vector-search

Liked this? Get the next one.

Architecture, AWS, AI and market deep dives — straight to your inbox. Free.

No spam · unsubscribe anytime

Analyzed source: Introducing Amazon Bedrock Managed Knowledge Base for faster, more accurate enterprise AI applications

Ask Fernando about this

Get a focused answer about this article from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Keep reading

AI & AgentsWeb Search on Bedrock AgentCore: An In-Depth Technical ReviewWeb Search on Amazon Bedrock AgentCore delivers managed web search for AI agents, with zero data egress outside the AWS environment and MCP-based grounding. I review the capability with a senior architect's critical eye: real trade-offs, operational limits, and when adoption actually makes sense.Read Data PlatformsAgentic RAG with OpenSearch Serverless: Anatomy of a PatternThe agentic RAG pattern with OpenSearch Serverless promises elastic scale and semantic retrieval without infrastructure management — but hides serious latency, cost, and consistency pitfalls that financial-grade systems cannot afford to ignore. In this article, I dissect the pattern's anatomy, map when it works, when it fails, and how to configure it with production-grade rigor.Read AI & AgentsAgentic RAG on AWS: Architecture Bake-Off for Financial-Grade PlatformsAgentic RAG has moved from lab experiment to platform requirement in financial environments that demand auditability, cost control, and predictable latency. In this article I compare four concrete architectural approaches on AWS, with real trade-offs, plausible numbers, and an unambiguous recommendation.Read

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime

AI & AgentsDeep Dive

Bedrock Managed Knowledge Base: Anatomy of a Managed RAG Pipeline

Jun 23, 2026 9 minexpert AI-assisted

Listen to article

Fernando's voice

Fernando · 18:55

Download MP3

0:0018:55

Speed

The MP3 is saved to S3 after the first play.

AI & AgentsDeep Dive

fernando.moretes.com

What the abstraction actually hides

Managed RAG Pipeline: Ingestion and Retrieval Flow

Two parallel tracks: ingestion (left) and agentic retrieval (right). Edges show where customer control ends and the managed plane takes over.

📥 Ingestion — Data Sources

Amazon S3 · bucket / prefix
SharePoint · OAuth connector
Google Drive · native connector
Web Crawler · HTML + images

⚙️ Managed Plane — Smart Parsing & Embedding

Smart Parsing · per-connector strategy
Multimodal FM · bounding box + caption
Adaptive Chunking · structure-aware
Embedding Model · auto-selected
Vector Store · managed index

🔍 Retrieval — Agentic Retriever

Agentic Retriever · query planner
Re-ranker Model · auto-selected
AgentCore Gateway · RBAC + observability

👤 Consumer

Bedrock Agent · or FM tool call
AgentCore Observability · metrics + eval

Smart Parsing: What is happening under the hood

Vector Space and Model Traceability: The Silent Risk

Agentic Retriever: Multi-Hop Query Planning and Its Limits

AgentCore Gateway: Security, RBAC, and the Permissions Model

Failure Modes the Documentation Does Not Mention

Critical Anti-Patterns with Managed Knowledge Base

Assuming ACL enforcement is automatic end-to-end: Connectors propagate permissions at ingestion, but identity-based filtering at retrieval requires explicit sessionAttributes configuration in AgentCore Gateway. Without it, all users see all documents — a data leak in multi-tenant environments.
Using the Agentic Retriever for all queries without a latency budget: Simple queries requiring a single retrieval hop don't need the query planner overhead. Implement query routing — simple queries go directly to single-step retrieval, complex queries use the Agentic Retriever — to control cost and latency.
Not monitoring retrieval quality drift after model updates: AWS may update embedding and parsing models internally. Without a continuous evaluation pipeline measuring recall@10 and MRR on a golden dataset, you won't know when retrieval quality has silently degraded.
Bulk ingestion without quota control: Starting ingestion of millions of documents without requesting Bedrock tokens-per-minute quota increases results in throttling that may take days to complete the initial index — with retry behavior opaque to the operator.
Treating Managed Knowledge Base as a direct drop-in replacement for Self-Managed KB in migrations: The Agentic Retriever does not operate in mixed plans with Self-Managed KBs. Gradual migrations need to be planned with a parallel operation period and retrieval parity validation before cutover.

AWS Well-Architected Lenses for Managed Knowledge Base

Security

Reliability

Performance efficiency

Managed Knowledge Base vs. Self-Managed RAG: When to Use Each

	Dimension	Managed Knowledge Base	Self-Managed RAG (OpenSearch + Glue + Lambda)
Embedding model control	Automatic (pinnable via config)	Full — you choose and version	—
Multi-tenant ACL enforcement	Native for SaaS connectors; requires explicit sessionAttributes config	Manual implementation — more work, more control	—
Retrieval p99 latency	Variable: 500ms-15s depending on Agentic Retriever plan	Predictable: 200ms-2s with OpenSearch tuning	—
Audit traceability	AgentCore Observability + CloudWatch; execution plan exportable	Full — you control every log and trace	—
Operational cost (TCO)	Lower engineering overhead; FM cost for parsing/retrieval	Higher engineering overhead; predictable infrastructure cost	—
Suitability for gradual migration	Limited — Agentic Retriever does not operate with Self-Managed KBs in the same plan	Full — you control the migration strategy	—

My Curation Note

Senior Solutions Architect

Verdict: Real Primitive, But Not a Black Box

References

#bedrock#rag#knowledge-base#agentic-ai#enterprise-ai#aws#financial-grade#vector-search

Liked this? Get the next one.

Architecture, AWS, AI and market deep dives — straight to your inbox. Free.

No spam · unsubscribe anytime

Analyzed source: Introducing Amazon Bedrock Managed Knowledge Base for faster, more accurate enterprise AI applications

Ask Fernando about this

Get a focused answer about this article from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Keep reading

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime