Amazon Bedrock Knowledge Bases: managed RAG
The RAG pipeline as a managed service — connect the source and get ingestion and retrieval.
6 min read
Building a RAG pipeline from scratch is instructive — but in production you'll want to delegate the ingestion, chunking, and indexing infrastructure so you can focus on what actually differentiates your product. Amazon Bedrock Knowledge Bases does exactly that: you connect a data source, choose the embedding model and vector store, and AWS handles the rest. In this lesson we'll dissect what the service delivers, where it shines, and where you still need to get your hands dirty.
Flow: data source → Knowledge Base → app / agent
Two consumption paths: direct application call (Retrieve / RetrieveAndGenerate) and use as a tool by a Bedrock Agent.
- Amazon S3 · PDFs, DOCX, HTML, MD
- Confluence / SharePoint · conectores nativos
- Web Crawler · URLs públicas
- Ingestão gerenciada · parse + chunk + embed
- Knowledge Base · configuração central
- Vector Store · OpenSearch / Aurora / etc.
- Retrieve API · retorna chunks + scores
- RetrieveAndGenerate · retrieval + LLM inline
- Bedrock Agent · usa KB como ferramenta
- Seu app / Lambda · consume resposta
What the service delivers — and what it abstracts
Bedrock Knowledge Bases manages four steps that in lesson 04 you assembled manually: document parsing, chunking, embedding generation, and upsert into the vector store. You trigger a sync job (via console, SDK, or EventBridge) and the service processes new or modified files in S3 — or in other connected sources.
For parsing, the service uses Amazon Bedrock Data Automation (BDA) or native parsers. BDA can extract text from PDFs with tables and images using multimodal models — useful when your documents aren't clean text. For simple files (Markdown, HTML, plain text), the default parser is sufficient and cheaper.
Embeddings are generated by models available in Bedrock: Amazon Titan Embeddings, Cohere Embed, and others. You choose the model once in the KB configuration — and you can't change it later without reindexing everything. This is an architecture decision, not an operational one: choose carefully considering vector dimension, cost per token, and multilingual support (relevant for Portuguese content).
The result lands in a vector store that you provision — OpenSearch Serverless, Aurora PostgreSQL with pgvector, Redis Enterprise Cloud, MongoDB Atlas, or Pinecone. Lesson 10 covers each option in detail; the point here is that Knowledge Bases is not a vector store — it's the orchestration layer above it.
How the application consumes the Knowledge Base
There are two main APIs and one agentic mode.
Retrieve returns the most relevant chunks with relevance scores and metadata (source, page, section). Use this mode when you want to control generation — assemble the prompt yourself, apply additional reranking (lesson 05), filter by metadata (lesson 06), or generate with a model outside Bedrock. It's the most flexible mode.
RetrieveAndGenerate does everything in one call: retrieves chunks, assembles the prompt internally, and returns the generated answer along with citations. Convenient for prototypes and cases where you don't need to customize the system prompt. The downside is less visibility and control over what happens between retrieval and generation.
Bedrock Agents can use a Knowledge Base as a native tool. The agent decides when to query the KB based on user intent — this is the agentic RAG that lesson 07 details. In practice, you register the KB with the agent using a natural language description ("use this base to answer questions about internal policies") and the model decides when to trigger the search.
All three modes support metadata filters — you can restrict the search by attributes like department, publication_date, or language without changing the semantic query. This is the same mechanism from lesson 06, just configured via API instead of directly in the vector store.
Managed chunking options
In my experience, Bedrock Knowledge Bases handles 70-80% of enterprise RAG cases well: documents in S3, standard or hierarchical chunking, Titan or Cohere embeddings, OpenSearch Serverless as the store. Setup time drops from days to hours. The problem shows up when you need very specific chunking (e.g., processing source code respecting function boundaries), when you want reranking with your own model, or when sync job latency doesn't work for documents that change in real time. In those cases, Custom Lambda mode or a custom pipeline (lessons 03/04) is still the right choice. Use managed as the default and leave it only when you have a concrete reason.
Managed vs. custom pipeline
| Criterion | Knowledge Bases (managed) | Custom pipeline | |
|---|---|---|---|
| Setup time | Hours | Days to weeks | — |
| Chunking control | 4 strategies + custom Lambda | Total — any logic | — |
| Reranking | Not native (use Retrieve + Lambda) | Any model/service | — |
| Real-time ingestion | Async sync job (minutes) | Possible with streaming pipeline | — |
| Observability | Basic CloudWatch; less granular | You instrument — full control | — |
| Operational cost | Low — no infra to maintain | High — you operate everything | — |
Where the vectors live — and what comes in the next lesson
Knowledge Bases doesn't store vectors internally. It delegates to a vector store that you choose and provision before creating the KB. The currently supported options are: OpenSearch Serverless (most common default on AWS), Aurora PostgreSQL with pgvector, Redis Enterprise Cloud, MongoDB Atlas, and Pinecone.
Each option has different trade-offs in latency, cost, filter capability, and operations. OpenSearch Serverless is convenient because AWS manages scaling, but has a minimum cost even without traffic. Aurora with pgvector is a good choice if you already use RDS and want to consolidate infrastructure. Redis is excellent for very low latency on smaller datasets.
Lesson 10 covers each vector store in detail — hybrid search capabilities, cost model, scale limits, and when to choose each one. For now, the important point is: the choice of vector store is separate from the choice to use Knowledge Bases. You can use the managed service for ingestion and still have control over which store and which index configuration to use.
If you're starting a new project, my recommendation is: start with Knowledge Bases + OpenSearch Serverless, measure, and only migrate to a custom pipeline if you hit a concrete limit. The operational complexity of maintaining a custom RAG pipeline has a real cost that doesn't show up in the initial benchmark.
Setting up a Knowledge Base: minimum sequence
- 1
Provision the vector store
Create the collection in OpenSearch Serverless (or the Aurora cluster with pgvector). The KB will need the ARN and access credentials.
- 2
Create the Knowledge Base via console or IaC
Choose the embedding model (e.g., Titan Embeddings V2), point to the vector store, and define the chunking strategy. This configuration can't be changed later without reindexing.
- 3
Connect the data source
For S3: provide the bucket and prefix. Configure the IAM role with read permission on the bucket and write permission on the vector store.
- 4
Trigger the sync job
Via console, StartIngestionJob in the SDK, or EventBridge Scheduler for periodic syncs. The job is incremental — processes only what changed.
- 5
Test with Retrieve before exposing to users
Use the Retrieve API with queries representative of your use case. Check scores, returned chunks, and source metadata. Only then integrate RetrieveAndGenerate or the agent.
Frequently asked questions
Can I use Knowledge Bases with models outside Bedrock?
Yes — use the Retrieve API to fetch chunks and pass the result to any LLM. RetrieveAndGenerate is what requires a Bedrock model, since generation happens inside the service.
Is the sync job real-time?
No. It's an async job that you trigger manually or schedule. For documents that change at high frequency (seconds), a custom ingestion pipeline with direct upsert into the vector store is more appropriate.
Can I have multiple sources in the same KB?
Yes. A KB supports multiple data sources (different S3 buckets, Confluence, web crawler). All documents are indexed in the same vector store and available in the same search — use metadata filters to separate by origin if needed.
What happens if I change the embedding model?
You need to reindex all content. Vectors generated by different models are not comparable — mixing embeddings from distinct models in the same index produces incorrect results. Plan this decision before going to production.
Quick check
1. What does a Bedrock Knowledge Base give you out of the box?