bedrock-agent-starter
Production-shaped Amazon Bedrock agent starter — tools, IaC, and evals in 30 minutes.
git clone https://github.com/fernandofatech/bedrock-agent-starter.gitbedrock-agent-starter is a bilingual, opinionated template that wires together the pieces Bedrock leaves to you — tool registry, multi-turn memory, structured observability, Terraform IaC, and an eval harness — so you can fork it and ship a working agent instead of rebuilding the scaffolding from scratch.
Why this exists
Amazon Bedrock makes it straightforward to call a foundation model. It does not make it straightforward to ship a real agent. The moment you move past a single-turn prompt you need to decide how to register tools, how to pass tool results back into the conversation, where to persist session state, how to emit metrics without littering your business logic with CloudWatch boilerplate, and how to know whether a model swap broke something.
Every team I have seen start from scratch spends the first two weeks solving exactly those plumbing problems. This starter solves them once, in a way that is easy to replace piece by piece. The agent loop is built on the Bedrock Converse API, which is stable across Claude, Nova, Llama, and Mistral — so swapping models is a one-line environment variable change, not a refactor.
The project is also a portfolio artifact. It is bilingual (English and Portuguese), documented with MkDocs Material on GitHub Pages, and has a static landing page on Vercel. Every commit follows Conventional Commits, every PR runs ruff, mypy, and pytest. The intent is to demonstrate not just that I can wire up a Bedrock agent, but that I can do it in a way a team can actually maintain.
What you get out of the box
calculator, get_time, web_search stub) and a four-line decorator pattern to add your own; Pydantic infers the JSON schema automatically.Turns, InputTokens, OutputTokens, Duration, ToolErrors) under the BedrockAgent namespace.pytest and a golden JSONL file — replay prompts, assert expected substrings and tool calls, fail on regressions.How the agent works — request flow
A request enters either through the local CLI or API Gateway, passes through the agent loop which may invoke one or more Lambda tools, and emits structured logs and EMF metrics on every turn.
- agent chat · CLI
- API Gateway · HTTP API
- Lambda Handler · Python 3.12
- Agent Loop · Converse API
- Tool Registry · calculator / get_time / …
- Amazon Bedrock · Converse API
- DynamoDB · Sessions Table
- CloudWatch · Logs + EMF Metrics
Install and run locally
- 1
Clone the repository
Fork or clone directly. The
mainbranch is always in a working state. - 2
Create a virtual environment and install dependencies
Python 3.11+ is required. The
[dev]extra pulls in ruff, mypy, pytest, and MkDocs. - 3
Configure AWS credentials and model ID
Set
AWS_REGIONandBEDROCK_MODEL_ID. The default model is Claude 3.5 Sonnet v2. Your IAM identity needsbedrock:InvokeModelon the target model ARN. - 4
Start a local chat session
Run
agent chat. The CLI uses in-memory session state by default — no AWS storage resources needed for local iteration. - 5
Run the test suite and evals
pytestruns unit tests;pytest tests/evals/replays the golden JSONL set and fails on regressions. Run this before adding a new tool or swapping models. - 6
Deploy to AWS with Terraform
From the
terraform/directory, runterraform initthenterraform apply -var="project=my-agent". State backend, tagging strategy, and IAM permission boundaries are intentionally left for you to define per your organisation's standards.
git clone git@github.com:fernandofatech/bedrock-agent-starter.git
cd bedrock-agent-starter
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
export AWS_REGION=us-east-1
export BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
# Chat locally — no deployed infrastructure needed
agent chat
# > what time is it in Tokyo, and what is (123 * 456) - 789?
# [tool] get_time(tz="Asia/Tokyo") → "2026-05-15T22:41:09+09:00"
# [tool] calculator(expression="(123 * 456) - 789") → 55299
# It is 22:41 in Tokyo, and (123 × 456) − 789 = 55 299.
# Run unit tests + evals
pytest
pytest tests/evals/
# Deploy to AWS
cd terraform
terraform init
terraform apply -var="project=my-agent"Adding a tool — the four-line pattern
The tool registry is the part of this starter I am most deliberate about. A common failure mode in agent projects is that tool definitions drift from their implementations — the schema says one thing, the function does another, and the model hallucinates parameters that do not exist.
Here, the schema is derived from the function signature at import time via Pydantic. You write one decorator and one function; the registry picks it up automatically on the next invocation:
from agent.tools import tool
@tool(description="Translate text between languages using a deterministic table.")
def translate(text: str, source_lang: str, target_lang: str) -> str:
...
return translated
Pydantic infers text, source_lang, and target_lang as required string properties in the JSON schema that gets sent to Bedrock. If you add a parameter with a default value, it becomes optional in the schema. If you annotate with Literal["en", "pt", "es"], the schema gains an enum constraint — and the model will respect it.
This approach keeps the schema and the implementation co-located and in sync. It also means you can test a tool in pure Python without any Bedrock dependency, which is exactly what the unit tests do. The full walkthrough is in the docs.
Observability and evals — knowing when something breaks
Two things I have seen skipped most often in agent projects are structured observability and regression testing. Both are included here and both are lightweight enough that there is no excuse to remove them.
Observability: Every turn emits a JSON log line with session_id, turn, model_id, token counts, tool calls made, and wall-clock duration. These go to CloudWatch Logs. In parallel, CloudWatch EMF metrics are published under the BedrockAgent namespace — Turns, InputTokens, OutputTokens, Duration, and ToolErrors. EMF means you get metrics without a separate put_metric_data call; they are extracted from the log line by CloudWatch automatically. This is enough to build a dashboard and set alarms on token spend or error rate from day one.
Evals: The eval harness in tests/evals/ is intentionally simple. A golden.jsonl file contains prompt/expected-output pairs. The pytest runner sends each prompt through the real agent loop (or a mock, depending on your CI setup) and asserts that expected substrings appear in the response and that the expected tools were called. This is not a substitute for human evaluation, but it catches the class of regression where a prompt that worked with Claude 3.5 Sonnet stops working after you switch to Nova Pro or change a system prompt. Run pytest tests/evals/ before any model or prompt change.
What the Terraform skeleton does NOT do for you
The terraform/ directory provisions the core resources but intentionally leaves three things undefined: remote state backend (S3 + DynamoDB lock), resource tagging strategy, and IAM permission boundaries. These are organisation-specific decisions. If you apply the skeleton as-is in a shared AWS account, state will be local and IAM roles will have no boundaries. Define these before promoting to any non-sandbox environment.
Swapping models is one environment variable
Because the agent loop uses the Bedrock Converse API, changing from Claude 3.5 Sonnet to Amazon Nova Pro or Mistral Large is export BEDROCK_MODEL_ID=amazon.nova-pro-v1:0. Run pytest tests/evals/ after the swap to catch any prompt-format regressions before they reach production.
Frequently asked questions
Do I need a Bedrock model access request before running locally?
Yes. Bedrock model access is not enabled by default. Go to the Bedrock console → Model access, request access to the model you intend to use, and wait for approval (usually instant for Claude and Nova in us-east-1). Your IAM identity also needs bedrock:InvokeModel on the model ARN.
Can I use this with Bedrock Agents (the managed orchestration service) instead of the Converse API?
No — this starter implements its own agent loop using the Converse API directly. It does not use the managed Bedrock Agents service (which has its own action groups, knowledge bases, and session management). The trade-off is more control and lower cost at the expense of managing the loop yourself. If you want the managed service, this starter is not the right starting point.
How do I add persistent memory in production?
Set the MEMORY_BACKEND=dynamodb environment variable (or the equivalent Terraform variable). The Lambda handler will use the DynamoDB sessions table provisioned by Terraform. The table key is session_id; TTL is configurable. In-memory backend remains the default for local development.
Is the `web_search` tool functional?
It is a stub. It demonstrates the tool pattern but returns a placeholder response. Wiring it to a real search API (Brave Search, Tavily, SerpAPI) is a deliberate exercise left for the implementer — the integration point is clearly marked in the source.
Who this is for and when to use it
Use this starter if you are building a custom agent on Amazon Bedrock and want a solid foundation rather than a blank file. It is well-suited for: solution architects evaluating Bedrock for a client, engineers who need a working agent in a day and plan to extend it, and teams who want an example of how to structure observability and evals in an agent project from the start. Do not use it if you want the managed Bedrock Agents service (action groups, knowledge bases, managed orchestration) — this starter deliberately bypasses that in favour of a self-managed loop. Also do not use it if you need a multi-agent orchestration framework like LangGraph or AutoGen; the scope here is a single-agent loop with tools and memory. The project is MIT-licensed, bilingual, and actively maintained as part of my public portfolio. The documentation site at bedrock-agent.moretes.com and the GitHub Pages docs are kept in sync with the main branch.