Who is Fernando F. Azevedo?

Fernando F. Azevedo is a Senior Solutions Architect at Banco Itaú with 16+ years of experience across AWS, event-driven architecture, DevSecOps, Data Mesh, AI and financial systems.

What technical topics does Fernando work with?

Fernando works with AWS, Kubernetes, Kafka, Data Mesh, Amazon Bedrock, RAG, DevSecOps, observability, financial systems and architecture communication using C4, ADRs and trade-off analysis.

Is Fernando available for professional conversations?

Fernando is currently building at Banco Itaú and is open to thoughtful conversations about architecture, cloud, AI, engineering leadership, community, podcasts and technical collaboration.

Production RAG on AWS

Module 3 · Production on AWS· Lesson 09/12

Amazon Bedrock Knowledge Bases: managed RAG

The RAG pipeline as a managed service — connect the source and get ingestion and retrieval.

6 min read

Building a RAG pipeline from scratch is instructive — but in production you'll want to delegate the ingestion, chunking, and indexing infrastructure so you can focus on what actually differentiates your product. Amazon Bedrock Knowledge Bases does exactly that: you connect a data source, choose the embedding model and vector store, and AWS handles the rest. In this lesson we'll dissect what the service delivers, where it shines, and where you still need to get your hands dirty.

Flow: data source → Knowledge Base → app / agent

Two consumption paths: direct application call (Retrieve / RetrieveAndGenerate) and use as a tool by a Bedrock Agent.

🗄️ Fontes de dados — Data Sources

Amazon S3 · PDFs, DOCX, HTML, MD
Confluence / SharePoint · conectores nativos
Web Crawler · URLs públicas

🟧 AWS — Bedrock Knowledge Base

Ingestão gerenciada · parse + chunk + embed
Knowledge Base · configuração central
Vector Store · OpenSearch / Aurora / etc.

🤖 AWS — Consumo — Consumption

Retrieve API · retorna chunks + scores
RetrieveAndGenerate · retrieval + LLM inline
Bedrock Agent · usa KB como ferramenta

💻 Aplicação — Application

Seu app / Lambda · consume resposta

What the service delivers — and what it abstracts

Bedrock Knowledge Bases manages four steps that in lesson 04 you assembled manually: document parsing, chunking, embedding generation, and upsert into the vector store. You trigger a sync job (via console, SDK, or EventBridge) and the service processes new or modified files in S3 — or in other connected sources.

For parsing, the service uses Amazon Bedrock Data Automation (BDA) or native parsers. BDA can extract text from PDFs with tables and images using multimodal models — useful when your documents aren't clean text. For simple files (Markdown, HTML, plain text), the default parser is sufficient and cheaper.

Embeddings are generated by models available in Bedrock: Amazon Titan Embeddings, Cohere Embed, and others. You choose the model once in the KB configuration — and you can't change it later without reindexing everything. This is an architecture decision, not an operational one: choose carefully considering vector dimension, cost per token, and multilingual support (relevant for Portuguese content).

The result lands in a vector store that you provision — OpenSearch Serverless, Aurora PostgreSQL with pgvector, Redis Enterprise Cloud, MongoDB Atlas, or Pinecone. Lesson 10 covers each option in detail; the point here is that Knowledge Bases is not a vector store — it's the orchestration layer above it.

How the application consumes the Knowledge Base

There are two main APIs and one agentic mode.

Retrieve returns the most relevant chunks with relevance scores and metadata (source, page, section). Use this mode when you want to control generation — assemble the prompt yourself, apply additional reranking (lesson 05), filter by metadata (lesson 06), or generate with a model outside Bedrock. It's the most flexible mode.

RetrieveAndGenerate does everything in one call: retrieves chunks, assembles the prompt internally, and returns the generated answer along with citations. Convenient for prototypes and cases where you don't need to customize the system prompt. The downside is less visibility and control over what happens between retrieval and generation.

Bedrock Agents can use a Knowledge Base as a native tool. The agent decides when to query the KB based on user intent — this is the agentic RAG that lesson 07 details. In practice, you register the KB with the agent using a natural language description ("use this base to answer questions about internal policies") and the model decides when to trigger the search.

All three modes support metadata filters — you can restrict the search by attributes like department, publication_date, or language without changing the semantic query. This is the same mechanism from lesson 06, just configured via API instead of directly in the vector store.

Managed chunking options

Fixed-size: splits by token count with configurable overlap — equivalent to fixed chunking from lesson 03. Simple and predictable.

Default (semantic): the service tries to respect paragraph and section boundaries. Good starting point for well-structured documents.

Hierarchical: creates parent and child chunks — the child is indexed, the parent is sent to the LLM as expanded context. Reduces context loss without increasing the indexed chunk size.

Semantic chunking: uses embeddings to detect topic shifts before cutting. More expensive at ingestion time, but produces more coherent chunks for long, dense texts.

Custom (Lambda): you provide a Lambda function that receives the document and returns chunks. Full control — useful for proprietary formats or specific business logic.

In practice: when managed is worth it

Senior Solutions Architect

In my experience, Bedrock Knowledge Bases handles 70-80% of enterprise RAG cases well: documents in S3, standard or hierarchical chunking, Titan or Cohere embeddings, OpenSearch Serverless as the store. Setup time drops from days to hours. The problem shows up when you need very specific chunking (e.g., processing source code respecting function boundaries), when you want reranking with your own model, or when sync job latency doesn't work for documents that change in real time. In those cases, Custom Lambda mode or a custom pipeline (lessons 03/04) is still the right choice. Use managed as the default and leave it only when you have a concrete reason.

Managed vs. custom pipeline

	Criterion	Knowledge Bases (managed)	Custom pipeline
Setup time	Hours	Days to weeks	—
Chunking control	4 strategies + custom Lambda	Total — any logic	—
Reranking	Not native (use Retrieve + Lambda)	Any model/service	—
Real-time ingestion	Async sync job (minutes)	Possible with streaming pipeline	—
Observability	Basic CloudWatch; less granular	You instrument — full control	—
Operational cost	Low — no infra to maintain	High — you operate everything	—

Where the vectors live — and what comes in the next lesson

Knowledge Bases doesn't store vectors internally. It delegates to a vector store that you choose and provision before creating the KB. The currently supported options are: OpenSearch Serverless (most common default on AWS), Aurora PostgreSQL with pgvector, Redis Enterprise Cloud, MongoDB Atlas, and Pinecone.

Each option has different trade-offs in latency, cost, filter capability, and operations. OpenSearch Serverless is convenient because AWS manages scaling, but has a minimum cost even without traffic. Aurora with pgvector is a good choice if you already use RDS and want to consolidate infrastructure. Redis is excellent for very low latency on smaller datasets.

Lesson 10 covers each vector store in detail — hybrid search capabilities, cost model, scale limits, and when to choose each one. For now, the important point is: the choice of vector store is separate from the choice to use Knowledge Bases. You can use the managed service for ingestion and still have control over which store and which index configuration to use.

If you're starting a new project, my recommendation is: start with Knowledge Bases + OpenSearch Serverless, measure, and only migrate to a custom pipeline if you hit a concrete limit. The operational complexity of maintaining a custom RAG pipeline has a real cost that doesn't show up in the initial benchmark.

Setting up a Knowledge Base: minimum sequence

1
Provision the vector store
Create the collection in OpenSearch Serverless (or the Aurora cluster with pgvector). The KB will need the ARN and access credentials.
2
Create the Knowledge Base via console or IaC
Choose the embedding model (e.g., Titan Embeddings V2), point to the vector store, and define the chunking strategy. This configuration can't be changed later without reindexing.
3
Connect the data source
For S3: provide the bucket and prefix. Configure the IAM role with read permission on the bucket and write permission on the vector store.
4
Trigger the sync job
Via console, StartIngestionJob in the SDK, or EventBridge Scheduler for periodic syncs. The job is incremental — processes only what changed.
5
Test with Retrieve before exposing to users
Use the Retrieve API with queries representative of your use case. Check scores, returned chunks, and source metadata. Only then integrate RetrieveAndGenerate or the agent.

Frequently asked questions

Can I use Knowledge Bases with models outside Bedrock?

Yes — use the Retrieve API to fetch chunks and pass the result to any LLM. RetrieveAndGenerate is what requires a Bedrock model, since generation happens inside the service.

Is the sync job real-time?

No. It's an async job that you trigger manually or schedule. For documents that change at high frequency (seconds), a custom ingestion pipeline with direct upsert into the vector store is more appropriate.

Can I have multiple sources in the same KB?

Yes. A KB supports multiple data sources (different S3 buckets, Confluence, web crawler). All documents are indexed in the same vector store and available in the same search — use metadata filters to separate by origin if needed.

What happens if I change the embedding model?

You need to reindex all content. Vectors generated by different models are not comparable — mixing embeddings from distinct models in the same index produces incorrect results. Plan this decision before going to production.

Quiz

Quick check

1. What does a Bedrock Knowledge Base give you out of the box?

References

Amazon Bedrock Knowledge Bases — Developer Guide Retrieve and RetrieveAndGenerate API Reference Chunking strategies for Amazon Bedrock Knowledge Bases Amazon Bedrock Data Automation for document parsing Agentic RAG with Bedrock Agents and Knowledge Bases — AWS Blog Supported vector stores for Knowledge Bases

Previous Next lesson

Flow: data source → Knowledge Base → app / agent

Two consumption paths: direct application call (Retrieve / RetrieveAndGenerate) and use as a tool by a Bedrock Agent.

🗄️ Fontes de dados — Data Sources

Amazon S3 · PDFs, DOCX, HTML, MD
Confluence / SharePoint · conectores nativos
Web Crawler · URLs públicas

🟧 AWS — Bedrock Knowledge Base

Ingestão gerenciada · parse + chunk + embed
Knowledge Base · configuração central
Vector Store · OpenSearch / Aurora / etc.

🤖 AWS — Consumo — Consumption

Retrieve API · retorna chunks + scores
RetrieveAndGenerate · retrieval + LLM inline
Bedrock Agent · usa KB como ferramenta

💻 Aplicação — Application

Seu app / Lambda · consume resposta

What the service delivers — and what it abstracts

How the application consumes the Knowledge Base

There are two main APIs and one agentic mode.

Managed chunking options

Fixed-size: splits by token count with configurable overlap — equivalent to fixed chunking from lesson 03. Simple and predictable.

Default (semantic): the service tries to respect paragraph and section boundaries. Good starting point for well-structured documents.

Hierarchical: creates parent and child chunks — the child is indexed, the parent is sent to the LLM as expanded context. Reduces context loss without increasing the indexed chunk size.

Semantic chunking: uses embeddings to detect topic shifts before cutting. More expensive at ingestion time, but produces more coherent chunks for long, dense texts.

Custom (Lambda): you provide a Lambda function that receives the document and returns chunks. Full control — useful for proprietary formats or specific business logic.

Managed vs. custom pipeline

	Criterion	Knowledge Bases (managed)	Custom pipeline
Setup time	Hours	Days to weeks	—
Chunking control	4 strategies + custom Lambda	Total — any logic	—
Reranking	Not native (use Retrieve + Lambda)	Any model/service	—
Real-time ingestion	Async sync job (minutes)	Possible with streaming pipeline	—
Observability	Basic CloudWatch; less granular	You instrument — full control	—
Operational cost	Low — no infra to maintain	High — you operate everything	—

Where the vectors live — and what comes in the next lesson

Setting up a Knowledge Base: minimum sequence

Provision the vector store

Create the collection in OpenSearch Serverless (or the Aurora cluster with pgvector). The KB will need the ARN and access credentials.

Create the Knowledge Base via console or IaC

Choose the embedding model (e.g., Titan Embeddings V2), point to the vector store, and define the chunking strategy. This configuration can't be changed later without reindexing.

Connect the data source

For S3: provide the bucket and prefix. Configure the IAM role with read permission on the bucket and write permission on the vector store.

Trigger the sync job

Via console, StartIngestionJob in the SDK, or EventBridge Scheduler for periodic syncs. The job is incremental — processes only what changed.

Test with Retrieve before exposing to users

Use the Retrieve API with queries representative of your use case. Check scores, returned chunks, and source metadata. Only then integrate RetrieveAndGenerate or the agent.

Frequently asked questions

Can I use Knowledge Bases with models outside Bedrock?

Yes — use the Retrieve API to fetch chunks and pass the result to any LLM. RetrieveAndGenerate is what requires a Bedrock model, since generation happens inside the service.

Is the sync job real-time?

Can I have multiple sources in the same KB?

What happens if I change the embedding model?