# bedrock-agent-starter

Production-shaped Amazon Bedrock agent starter — tools, IaC, and evals in 30 minutes.

- URL: https://fernando.moretes.com/open-source/bedrock-agent-starter

- Markdown: https://fernando.moretes.com/open-source/bedrock-agent-starter/guide.md?lang=en

- GitHub: https://github.com/fernandofatech/bedrock-agent-starter

- Homepage: https://bedrock-agent.moretes.com

- Language: Python

- Topics: ai, ai-agents, aws, bedrock, github-actions, lambda, moretes, portfolio, solution-architecture, terraform

- Stars: 0

- Forks: 0

- Updated: 2026-05-16T02:23:27Z

---

bedrock-agent-starter is a bilingual, opinionated template that wires together the pieces Bedrock leaves to you — tool registry, multi-turn memory, structured observability, Terraform IaC, and an eval harness — so you can fork it and ship a working agent instead of rebuilding the scaffolding from scratch.

## Why this exists

Amazon Bedrock makes it straightforward to call a foundation model. It does not make it straightforward to ship a real agent. The moment you move past a single-turn prompt you need to decide how to register tools, how to pass tool results back into the conversation, where to persist session state, how to emit metrics without littering your business logic with CloudWatch boilerplate, and how to know whether a model swap broke something.

Every team I have seen start from scratch spends the first two weeks solving exactly those plumbing problems. This starter solves them once, in a way that is easy to replace piece by piece. The agent loop is built on the [Bedrock Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html), which is stable across Claude, Nova, Llama, and Mistral — so swapping models is a one-line environment variable change, not a refactor.

The project is also a portfolio artifact. It is bilingual (English and Portuguese), documented with MkDocs Material on GitHub Pages, and has a static landing page on Vercel. Every commit follows Conventional Commits, every PR runs ruff, mypy, and pytest. The intent is to demonstrate not just that I can wire up a Bedrock agent, but that I can do it in a way a team can actually maintain.

## What you get out of the box

- **Agent loop** on the Bedrock Converse API — model-agnostic, with first-class tool use across Claude, Nova, Llama, and Mistral.
- **Tool registry** with three working tools (`calculator`, `get_time`, `web_search` stub) and a four-line decorator pattern to add your own; Pydantic infers the JSON schema automatically.
- **Pluggable memory** — in-memory for local development, DynamoDB-backed for production, swapped by environment.
- **Structured observability** — JSON log lines per turn plus CloudWatch EMF metrics (`Turns`, `InputTokens`, `OutputTokens`, `Duration`, `ToolErrors`) under the `BedrockAgent` namespace.
- **Terraform IaC skeleton** that provisions Lambda, API Gateway HTTP API, DynamoDB sessions table, IAM roles, and a CloudWatch log group.
- **Eval harness** driven by `pytest` and a golden JSONL file — replay prompts, assert expected substrings and tool calls, fail on regressions.

## How the agent works — request flow

A request enters either through the local CLI or API Gateway, passes through the agent loop which may invoke one or more Lambda tools, and emits structured logs and EMF metrics on every turn.

### 💻 Local Dev

- agent chat CLI (user)

### ☁️ AWS Edge

- API Gateway HTTP API (edge)

### ⚙️ Compute

- Lambda Handler Python 3.12 (compute)
- Agent Loop Converse API (compute)
- Tool Registry calculator / get_time / … (compute)

### 🤖 AI

- Amazon Bedrock Converse API (ai)

### 🗄️ Storage

- DynamoDB Sessions Table (storage)

### 📊 Observability

- CloudWatch Logs + EMF Metrics (data)

### Flows

- cli -> agentloop: direct invocation
- apigw -> lambda: HTTP request
- lambda -> agentloop: delegates to
- agentloop -> bedrock: Converse API call
- bedrock -> agentloop: tool_use / text response
- agentloop -> toolregistry: dispatch tool call
- toolregistry -> agentloop: tool result
- agentloop -> dynamo: read/write session
- agentloop -> cw: JSON log + EMF metrics

## Install and run locally

1. **Clone the repository** — Fork or clone directly. The `main` branch is always in a working state.

2. **Create a virtual environment and install dependencies** — Python 3.11+ is required. The `[dev]` extra pulls in ruff, mypy, pytest, and MkDocs.

3. **Configure AWS credentials and model ID** — Set `AWS_REGION` and `BEDROCK_MODEL_ID`. The default model is Claude 3.5 Sonnet v2. Your IAM identity needs `bedrock:InvokeModel` on the target model ARN.

4. **Start a local chat session** — Run `agent chat`. The CLI uses in-memory session state by default — no AWS storage resources needed for local iteration.

5. **Run the test suite and evals** — `pytest` runs unit tests; `pytest tests/evals/` replays the golden JSONL set and fails on regressions. Run this before adding a new tool or swapping models.

6. **Deploy to AWS with Terraform** — From the `terraform/` directory, run `terraform init` then `terraform apply -var="project=my-agent"`. State backend, tagging strategy, and IAM permission boundaries are intentionally left for you to define per your organisation's standards.

_Full quickstart — from clone to first agent response_

```bash
git clone git@github.com:fernandofatech/bedrock-agent-starter.git
cd bedrock-agent-starter

python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

export AWS_REGION=us-east-1
export BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0

# Chat locally — no deployed infrastructure needed
agent chat

# > what time is it in Tokyo, and what is (123 * 456) - 789?
# [tool] get_time(tz="Asia/Tokyo") → "2026-05-15T22:41:09+09:00"
# [tool] calculator(expression="(123 * 456) - 789") → 55299
# It is 22:41 in Tokyo, and (123 × 456) − 789 = 55 299.

# Run unit tests + evals
pytest
pytest tests/evals/

# Deploy to AWS
cd terraform
terraform init
terraform apply -var="project=my-agent"
```

## Adding a tool — the four-line pattern

The tool registry is the part of this starter I am most deliberate about. A common failure mode in agent projects is that tool definitions drift from their implementations — the schema says one thing, the function does another, and the model hallucinates parameters that do not exist.

Here, the schema is derived from the function signature at import time via Pydantic. You write one decorator and one function; the registry picks it up automatically on the next invocation:

```python
from agent.tools import tool

@tool(description="Translate text between languages using a deterministic table.")
def translate(text: str, source_lang: str, target_lang: str) -> str:
    ...
    return translated
```

Pydantic infers `text`, `source_lang`, and `target_lang` as required string properties in the JSON schema that gets sent to Bedrock. If you add a parameter with a default value, it becomes optional in the schema. If you annotate with `Literal["en", "pt", "es"]`, the schema gains an `enum` constraint — and the model will respect it.

This approach keeps the schema and the implementation co-located and in sync. It also means you can test a tool in pure Python without any Bedrock dependency, which is exactly what the unit tests do. The full walkthrough is in the [docs](https://fernandofatech.github.io/bedrock-agent-starter/adding-a-tool/).

## Observability and evals — knowing when something breaks

Two things I have seen skipped most often in agent projects are structured observability and regression testing. Both are included here and both are lightweight enough that there is no excuse to remove them.

**Observability:** Every turn emits a JSON log line with `session_id`, `turn`, `model_id`, token counts, tool calls made, and wall-clock duration. These go to CloudWatch Logs. In parallel, CloudWatch EMF metrics are published under the `BedrockAgent` namespace — `Turns`, `InputTokens`, `OutputTokens`, `Duration`, and `ToolErrors`. EMF means you get metrics without a separate `put_metric_data` call; they are extracted from the log line by CloudWatch automatically. This is enough to build a dashboard and set alarms on token spend or error rate from day one.

**Evals:** The eval harness in `tests/evals/` is intentionally simple. A `golden.jsonl` file contains prompt/expected-output pairs. The pytest runner sends each prompt through the real agent loop (or a mock, depending on your CI setup) and asserts that expected substrings appear in the response and that the expected tools were called. This is not a substitute for human evaluation, but it catches the class of regression where a prompt that worked with Claude 3.5 Sonnet stops working after you switch to Nova Pro or change a system prompt. Run `pytest tests/evals/` before any model or prompt change.

> **What the Terraform skeleton does NOT do for you:** The `terraform/` directory provisions the core resources but intentionally leaves three things undefined: remote state backend (S3 + DynamoDB lock), resource tagging strategy, and IAM permission boundaries. These are organisation-specific decisions. If you apply the skeleton as-is in a shared AWS account, state will be local and IAM roles will have no boundaries. Define these before promoting to any non-sandbox environment.

> **Swapping models is one environment variable:** Because the agent loop uses the Bedrock Converse API, changing from Claude 3.5 Sonnet to Amazon Nova Pro or Mistral Large is `export BEDROCK_MODEL_ID=amazon.nova-pro-v1:0`. Run `pytest tests/evals/` after the swap to catch any prompt-format regressions before they reach production.

## Frequently asked questions

### Do I need a Bedrock model access request before running locally?

Yes. Bedrock model access is not enabled by default. Go to the Bedrock console → Model access, request access to the model you intend to use, and wait for approval (usually instant for Claude and Nova in us-east-1). Your IAM identity also needs `bedrock:InvokeModel` on the model ARN.

### Can I use this with Bedrock Agents (the managed orchestration service) instead of the Converse API?

No — this starter implements its own agent loop using the Converse API directly. It does not use the managed Bedrock Agents service (which has its own action groups, knowledge bases, and session management). The trade-off is more control and lower cost at the expense of managing the loop yourself. If you want the managed service, this starter is not the right starting point.

### How do I add persistent memory in production?

Set the `MEMORY_BACKEND=dynamodb` environment variable (or the equivalent Terraform variable). The Lambda handler will use the DynamoDB sessions table provisioned by Terraform. The table key is `session_id`; TTL is configurable. In-memory backend remains the default for local development.

### Is the `web_search` tool functional?

It is a stub. It demonstrates the tool pattern but returns a placeholder response. Wiring it to a real search API (Brave Search, Tavily, SerpAPI) is a deliberate exercise left for the implementer — the integration point is clearly marked in the source.

## Who this is for and when to use it

Use this starter if you are building a custom agent on Amazon Bedrock and want a solid foundation rather than a blank file. It is well-suited for: solution architects evaluating Bedrock for a client, engineers who need a working agent in a day and plan to extend it, and teams who want an example of how to structure observability and evals in an agent project from the start.

Do not use it if you want the managed Bedrock Agents service (action groups, knowledge bases, managed orchestration) — this starter deliberately bypasses that in favour of a self-managed loop. Also do not use it if you need a multi-agent orchestration framework like LangGraph or AutoGen; the scope here is a single-agent loop with tools and memory.

The project is MIT-licensed, bilingual, and actively maintained as part of my public portfolio. The documentation site at [bedrock-agent.moretes.com](https://bedrock-agent.moretes.com) and the GitHub Pages docs are kept in sync with the main branch.

## Links and resources

- [GitHub — fernandofatech/bedrock-agent-starter](https://github.com/fernandofatech/bedrock-agent-starter)
- [Live portfolio site — bedrock-agent.moretes.com](https://bedrock-agent.moretes.com)
- [Project documentation (GitHub Pages)](https://fernandofatech.github.io/bedrock-agent-starter/)
- [Amazon Bedrock Converse API — AWS Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html)
- [Amazon Bedrock — product page](https://aws.amazon.com/bedrock/)
- [CloudWatch Embedded Metric Format (EMF)](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Embedded_Metric_Format.html)
- [Terraform AWS Provider](https://registry.terraform.io/providers/hashicorp/aws/latest/docs)
- [MkDocs Material](https://squidfunk.github.io/mkdocs-material/)

## Links

- [GitHub repository](https://github.com/fernandofatech/bedrock-agent-starter)
- [Homepage](https://bedrock-agent.moretes.com)
