# Design Doc: Frontier Model Governance on Bedrock with GPT, Claude, and Nova

This document proposes an AI Gateway architecture to orchestrate and govern multiple frontier models — OpenAI GPT-5.5/GPT-4.5, Anthropic Claude, Amazon Nova, and specialized models — within Amazon Bedrock. The design covers intelligent routing, guardrails, prompt registry, inference logging, per-tenant IAM, data residency, and fallback policy, with a focus on auditability and cost control in enterprise environments.

- URL: https://fernando.moretes.com/studies/design-doc-bedrock-openai-frontier-model-governance

- Markdown: https://fernando.moretes.com/studies/design-doc-bedrock-openai-frontier-model-governance/study.md?lang=en

- Type: Design Doc / RFC

- Company: AI Platform (cenário)

- Domain: IA / Governança

- Date: 2026-06-01

- Tags: bedrock, ai-gateway, guardrails, multi-model, governance, iam, data-residency, llmops

- Reading time: 11 min

---

With OpenAI models arriving on Amazon Bedrock and the growing maturity of Claude and Amazon Nova, enterprise platforms face a real governance problem: how to route, audit, control costs, and ensure compliance when the available model portfolio grows faster than the operational capacity to manage it. This RFC proposes an AI Gateway that treats this complexity as a platform problem — not an application problem.

## The Problem: Model Proliferation Without a Control Plane

Over the last 18 months, every product team started consuming LLMs independently. One squad uses Claude 3.5 Sonnet directly via the Anthropic SDK. Another calls GPT-4o via Azure OpenAI. A third experimented with Amazon Titan and now wants to migrate to Nova Pro. The result is predictable: no centralized cost visibility, no consolidated prompt audit trail, no fallback policy when a provider suffers degradation, and — most critically in regulated environments — no guarantee that sensitive data isn't crossing jurisdictional boundaries that violate LGPD, GDPR, or contractual data residency requirements.

Amazon Bedrock addresses part of this problem by consolidating multiple providers under a single API and AWS infrastructure. With the recent announcement of OpenAI models available on Bedrock (GPT-4.5, GPT-4.1, and variants), the platform becomes a legitimate aggregation point for most enterprise use cases. But Bedrock alone doesn't solve governance: it offers the primitives (Guardrails, Model Invocation Logging, IAM, VPC endpoints), but the responsibility for composing those primitives into a coherent architecture belongs to the architect.

The core problem this document addresses is: **how to build a unified control plane over multiple frontier models within Bedrock that is auditable, secure by design, cost-aware, and operationally sustainable?** The answer is not a product — it is an architecture.

## Goals and Non-Goals

- ✅ GOAL: Intelligent routing between models (GPT-5.5, GPT-4.5, Claude 3.7, Nova Pro/Lite, specialized models) with declarative policy per use case
- ✅ GOAL: Centralized guardrails (PII, toxicity, jailbreak, grounding) applied before and after each inference, regardless of model
- ✅ GOAL: Versioned Prompt Registry with traceability of which prompt version generated which output
- ✅ GOAL: Per-tenant IAM with credential isolation and model-specific access policies per workload
- ✅ GOAL: Guaranteed data residency — regulated customer data processed exclusively in approved regions
- ✅ GOAL: Token budget enforcement per tenant/use-case with alerts and hard stops

## Fact Sheet

- **Scenario:** Multi-tenant enterprise AI platform (composite scenario)
- **Models in scope:** GPT-4.5, GPT-4.1 (OpenAI via Bedrock), Claude 3.7 Sonnet, Claude 3.5 Haiku, Amazon Nova Pro, Nova Lite, Nova Micro
- **Base platform:** Amazon Bedrock (us-east-1, sa-east-1 for Brazilian data)
- **Estimated scale:** ~50M tokens/day, 15 tenants, 40+ distinct use cases
- **Regulatory requirements:** LGPD, GDPR, SOC 2 Type II, contractual data residency
- **Control stack:** API Gateway, Lambda, DynamoDB, S3, CloudWatch, Bedrock Guardrails, AWS IAM, Secrets Manager, EventBridge
- **Document status:** RFC — Proposed, pending security review and architecture approval

## Proposed Design: The AI Gateway as Control Plane

The central architecture is an **AI Gateway** — a service layer that sits between consumers (applications, agents, pipelines) and the models on Bedrock. It is not a dumb proxy: it executes governance business logic before, during, and after each inference call.

**Ingress and Authentication Layer**

All traffic enters via Amazon API Gateway with JWT authentication (Cognito) or API Key tied to a `tenant_id`. The first step is tenant context resolution: the Gateway Lambda queries DynamoDB (`tenant-config` table) to retrieve the tenant profile — which models are enabled, what the monthly token budget is, which data region is permitted, and what fallback policy is configured. This eliminates authorization logic scattered across applications.

**Prompt Registry and Versioning**

Prompts are not inline strings in application code. They are versioned artifacts stored in S3 with metadata in DynamoDB: `prompt_id`, `version`, `model_affinity` (which models were tested with this prompt), `eval_score`, `author`, `approved_at`. The Gateway resolves the prompt by ID and version before assembling the inference payload. This ensures the audit log always references `prompt_id:version`, not raw text — which is critical for post-incident investigations and for demonstrating control in audits.

**Two-Phase Guardrails**

Bedrock Guardrails is invoked in two phases: pre-inference (input validation — PII detection, topic blocking, prompt injection patterns) and post-inference (output validation — grounding check, toxicity, internal data leakage). Guardrail configuration is per `guardrail_profile` associated with the use case, not the tenant — because the same tenant can have use cases with different tolerances (a public chatbot has stricter guardrails than an internal code analysis tool). Blocking guardrails return a structured event to EventBridge, which triggers alerts and, in cases of serious violation, automatically suspends the use case.

**Routing and Fallback**

The routing policy is declarative, stored per use case in DynamoDB: primary model, fallback model, fallback criterion (latency > X ms, 5xx error, or cost per token above threshold). The Gateway implements a simple circuit breaker: after N consecutive failures on the primary model, it opens the circuit and routes to the fallback for the configured cooldown period. The log records which model actually served each request — critical for debugging and for actual vs. planned cost analysis.

**Token Budget and Cost Control**

Each tenant has a monthly token budget (configurable by model tier — Nova Micro tokens cost less than GPT-4.5 tokens). The Gateway maintains an incremental counter in DynamoDB with monthly TTL. When usage reaches 80% of the budget, an event is published to EventBridge for notification. At 100%, the Gateway returns 429 with header `X-Budget-Exhausted: true`. Hard stops are non-negotiable — no override exists without explicit approval recorded in the audit log.

**Inference Logging**

Bedrock Model Invocation Logging is enabled at the account level, with logs delivered to S3 and indexed in CloudWatch Logs Insights. The Gateway enriches these logs with metadata that Bedrock doesn't natively capture: `tenant_id`, `use_case_id`, `prompt_id:version`, `guardrail_result`, `routing_decision` (primary vs. fallback), and `budget_remaining`. The schema is fixed and versioned — changes to the log schema go through review because breaking audit queries is a real operational risk.

## Architecture: AI Gateway over Amazon Bedrock

Complete flow of an inference request: from consumer to frontier model, passing through the governance control plane.

### 👤 Consumers

- Application Tenant App / Agent (user)
- Data Pipeline Batch / Streaming (user)

### 🔐 Auth & Ingress

- API Gateway REST + JWT/API Key (edge)
- Cognito Tenant Identity (security)

### 🧠 AI Gateway Control Plane

- Gateway Lambda Orchestrator (compute)
- Tenant Config DynamoDB (data)
- Prompt Registry S3 + DynamoDB (storage)
- Budget Counter DynamoDB (TTL) (data)
- Routing Engine Policy + Circuit Breaker (compute)

### 🛡️ Guardrails

- Bedrock Guardrails PII / Toxicity / Grounding (security)
- EventBridge Violation Events (messaging)

### 🤖 Bedrock Model Layer

- Amazon Nova Pro Primary: General (ai)
- Amazon Nova Lite Fallback: Cost-opt (ai)
- Claude 3.7 Sonnet Primary: Reasoning (ai)
- Claude 3.5 Haiku Fallback: Latency-opt (ai)
- GPT-4.5 (OpenAI) via Bedrock (ai)
- GPT-4.1 (OpenAI) Fallback tier (ai)

### 📋 Audit & Observability

- Invocation Logs S3 + CloudWatch (storage)
- Log Enricher Lambda tenant/prompt/routing meta (compute)
- CloudWatch Insights Audit Queries (data)

### Flows

- app -> apigw: HTTPS + JWT
- pipeline -> apigw: HTTPS + API Key
- apigw -> cognito: Validate token
- apigw -> gw_lambda: Request + tenant_id
- gw_lambda -> tenant_cfg: Resolve profile
- gw_lambda -> prompt_reg: Fetch prompt v.
- gw_lambda -> budget_ctr: Check/increment
- gw_lambda -> guardrails: Pre-inference
- gw_lambda -> router: Routing policy
- router -> nova_pro: Primary
- router -> nova_lite: Fallback
- router -> claude_sonnet: Primary
- router -> claude_haiku: Fallback
- router -> gpt45: Primary
- router -> gpt41: Fallback
- gw_lambda -> guardrails: Post-inference
- guardrails -> eb_violations: Violation detected
- nova_pro -> inv_log: Native Bedrock log
- claude_sonnet -> inv_log
- gpt45 -> inv_log
- inv_log -> audit_enricher: Enrich metadata
- audit_enricher -> cw_insights

## Data Residency and Tenant Isolation

This is the most frequently underestimated aspect in AI Gateway architectures. When a client has data classified as Brazilian PII under LGPD, or health data under sectoral regulation, the guarantee that this data doesn't leave the `sa-east-1` region cannot be a verbal promise — it needs to be architectural and auditable.

**Per-Tenant Residency Strategy**

The tenant profile in DynamoDB includes an `allowed_regions: ["sa-east-1"]` field and `allowed_models` which is the intersection of models available in that region with the models enabled for the tenant. The Gateway rejects requests that would route data from a restricted tenant to an unapproved region — and this rejection is logged as a policy violation event, not a silent error.

Amazon Bedrock today makes models available in multiple regions, but the portfolio varies. Nova Pro and Claude 3.5 Haiku are available in `sa-east-1`. GPT-4.5 via Bedrock is initially available in `us-east-1`. This means tenants with Brazilian residency restrictions **cannot use GPT-4.5 at this time** — and the Gateway must enforce this automatically, not depend on developer discipline.

**Credential Isolation**

Each tenant has a dedicated IAM Role with an access policy restricted to permitted models. The Gateway assumes this role via STS `AssumeRole` before invoking Bedrock, ensuring that the CloudTrail log records the tenant's role — not the Gateway's generic role. This is essential for auditing: in an investigation, you need to know exactly which tenant invoked which model, not just that the Gateway made a call.

Secrets Manager stores role ARN configurations per tenant. Credential rotation is automatic. Bedrock access is exclusively via VPC Endpoint — there is no data path that crosses the public internet, regardless of the model.

**Evals and Quality**

The Prompt Registry includes evaluation scores per model: before promoting a prompt version to production, it goes through an automated evals pipeline that tests against a set of reference cases and records quality metrics (accuracy, hallucination rate, p95 latency) per model. This enables evidence-based routing decisions — not vendor preference. If Claude 3.7 has a superior eval score for the contract clause extraction use case, it is the primary for that use case, regardless of cost per token.

## Architectural Alternatives Considered

### Custom AI Gateway (this design)

**Pros**
- Full control over routing logic, budget, and audit
- Native integration with IAM, CloudTrail, VPC Endpoints
- No additional vendor dependency in the critical path

**Cons**
- Development and maintenance cost of the Gateway
- Responsibility for evolving circuit breaker and retry logic

**Verdict:** Recommended for enterprise environments with regulatory requirements

### LiteLLM / Portkey as OSS proxy

**Pros**
- Fast to start, native multi-model support
- Active community, less code to maintain

**Cons**
- Per-tenant IAM integration requires significant customization
- Audit and data residency are not first-class features
- One more component to operate and patch

**Verdict:** Suitable for MVPs without regulatory requirements; not recommended for this scenario

### Amazon Bedrock Agents as orchestrator

**Pros**
- Managed by AWS, native integration with Guardrails and Knowledge Bases
- Reduces orchestration code

**Cons**
- Multi-model routing is not the primary use case for Bedrock Agents
- Less flexibility for budget enforcement and custom tenant logic
- Cost per agent step can be prohibitive at high volume

**Verdict:** Complementary for agent use cases; does not replace the governance Gateway

### Direct per-application Bedrock access

**Pros**
- Initial simplicity, no additional Gateway latency

**Cons**
- No centralized governance — the current state we are solving
- Impossible to audit, control cost, or reliably enforce data residency
- Each application reimplements (poorly) authentication, retry, and logging

**Verdict:** Rejected — this is the problem, not the solution

## Decision: Bedrock as Unified Inference Plane

**Status:** proposed

**Context**

With OpenAI models now available on Bedrock, there is the option to consolidate all inference traffic on Bedrock (eliminating direct integrations with OpenAI API and Anthropic API) or maintain direct integrations for some models.

**Decision**

Consolidate all inference traffic on Amazon Bedrock. Direct integrations with OpenAI API or Anthropic API are prohibited for production workloads. Exception: models not available on Bedrock may use direct integration with explicit architecture approval and with the same Gateway as intermediary.

**Consequences**
- ✅ Consolidated billing on AWS — a single point of cost and financial audit
- ✅ VPC Endpoints available for all models — enforceable data residency
- ✅ CloudTrail covers all invocations — unified audit
- ⚠️ Dependency on model availability on Bedrock — new OpenAI models may take time to arrive
- ⚠️ Possible additional latency vs. direct provider API call — to be measured and documented

## Rollout Plan

1. **Week 1-2: Foundation and Tenant Config** — Provision base infrastructure: API Gateway, Gateway Lambda, DynamoDB tables (tenant-config, prompt-registry, budget-counters), VPC Endpoints for Bedrock. Migrate 1 pilot tenant (non-regulated) to the Gateway. Validate end-to-end flow with Nova Lite.

2. **Week 3-4: Prompt Registry and Logging** — Implement Prompt Registry with S3 + DynamoDB. Migrate existing prompts to the registry with versioning. Enable Model Invocation Logging on Bedrock. Implement Log Enricher Lambda. Validate that audit queries in CloudWatch Insights correctly return tenant_id and prompt_id:version.

3. **Week 5-6: Guardrails and Policy Violation** — Configure Bedrock Guardrails per use case profile. Implement violation flow: Guardrails → EventBridge → SNS → alerts. Test with PII, toxicity, and prompt injection cases. Document false positive rate per profile and adjust thresholds.

4. **Week 7-8: Per-Tenant IAM and Data Residency** — Create per-tenant IAM Roles with access policies for specific models. Implement STS AssumeRole in the Gateway. Configure per-tenant region restrictions in DynamoDB. Test that tenant with allowed_regions=[sa-east-1] cannot invoke GPT-4.5 (us-east-1 only). Validate CloudTrail with tenant role.

5. **Week 9-10: Token Budget and Circuit Breaker** — Implement budget counter with DynamoDB atomic increments and TTL. Configure 80% alerts and 100% hard stop. Implement circuit breaker in Router with state in DynamoDB (N failures → open circuit → cooldown). Test automatic failover between primary and fallback models.

6. **Week 11-12: Evals Pipeline and Regulated Tenant Migration** — Implement automated evals pipeline for prompt promotion. Migrate tenants with regulatory requirements (LGPD) to the Gateway. Execute formal security review. Document operational runbook. Go-live with all tenants.

> **Risks and Mitigations:** **R1 — Additional Gateway latency (High):** Each hop in the Gateway (DynamoDB reads, STS AssumeRole, pre/post Guardrails) adds latency. Estimate: 50-150ms per request. Mitigation: tenant config cache (5min TTL), resolved prompt cache (TTL per version), Guardrails in async mode for latency-tolerant use cases. Measure p95 and p99 from day 1.

**R2 — OpenAI model availability on Bedrock (Medium):** GPT-4.5 and GPT-4.1 are in initial availability on Bedrock. SLAs may differ from the direct OpenAI API. Mitigation: mandatory fallback configured for all use cases using GPT, with Claude as a tested alternative with documented eval scores.

**R3 — DynamoDB as Gateway SPOF (High):** Budget counter, tenant config, and circuit breaker state depend on DynamoDB. A DynamoDB degradation paralyzes the Gateway. Mitigation: DynamoDB Global Tables with multi-region, read replicas, and graceful degradation mode (fail-open with audit log) for tenant config reads — never for budget enforcement.

**R4 — False Positives in Guardrails (Medium):** Overly restrictive guardrails block legitimate requests, degrading UX. Mitigation: shadow mode period (log without blocking) before activating blocking mode. Weekly false positive rate review per profile for the first 4 weeks.

**R5 — Prompt Registry as deployment bottleneck (Low-Medium):** If the prompt approval process is bureaucratic, teams will circumvent it. Mitigation: automated evals pipeline with automatic approval when score > threshold. Manual approval only for high-risk changes (system prompts, persona changes).

**R6 — Cost of the Gateway itself (Low):** Lambda + DynamoDB + API Gateway for 50M tokens/day is estimated at < $500/month — negligible vs. inference cost. But monitor to avoid surprises during peaks.

## AWS Well-Architected: Pillar Assessment

- **security**: Per-tenant IAM with STS AssumeRole, VPC Endpoints for Bedrock, two-phase Guardrails, Secrets Manager for credentials, CloudTrail enabled. No inference data traverses the public internet.
- **reliability**: Circuit breaker with automatic fallback, DynamoDB Global Tables, graceful degradation for config reads. Gateway SLA dependent on Bedrock SLA — monitor Service Health Dashboard.
- **performance**: Tenant config and prompt caching reduces control latency. Async Guardrails for tolerant use cases. Routing to Nova Lite/Haiku in cost-opt scenarios reduces inference latency.
- **sustainability**: Routing to smaller models (Nova Micro, Haiku) for simple use cases reduces computational consumption. Token budget limits unnecessary inference waste.

## Success Metrics and Targets

- **Gateway-added latency (p95):** < 150ms — measure from week 1
- **Audit coverage:** 100% of invocations with tenant_id + prompt_id:version in log
- **Time to detect data residency violation:** < 1 minute (EventBridge + SNS)
- **Guardrail false positive rate:** < 2% per use case after 4 weeks of tuning
- **Gateway availability:** 99.9% (excluding planned maintenance windows)
- **Inference cost deviation vs. budget:** < 5% overage per tenant per month
- **Prompt promotion time (evals → production):** < 30 minutes for automatic approval
- **Gateway adoption:** 100% of tenants migrated by week 12

> **My Senior Take:** The most common mistake I see in enterprise AI platforms is treating governance as a compliance problem to be solved later — when in reality it's a platform problem that needs to be solved first. If you let 15 teams consume models directly for 6 months, you'll have 15 different authentication patterns, 15 different ways of logging (or not logging) prompts, and zero ability to answer 'what customer X data was sent to which model on which date' — which is exactly the question an auditor will ask.

On the arrival of OpenAI models on Bedrock: this is genuinely relevant for enterprise architectures, not just marketing. Consolidating GPT-4.5, Claude, and Nova under a single API, billing, and security model (VPC Endpoints, IAM, CloudTrail) eliminates an entire class of governance problems. But I wouldn't migrate to this without validating latency and SLA in your specific region and use case — the consolidation promise only holds if performance is comparable.

The design I propose here is deliberately heavier than the minimum necessary for an MVP. For a product with 2 tenants and no regulatory requirements, LiteLLM + basic logging is sufficient. But for the described scenario — 15 tenants, LGPD, contractual data residency, SOC 2 audit — each component of this design has a reason to exist that you'll need to justify to an auditor, not to me. The Prompt Registry seems like overhead until the day you need to prove that version 1.3 of the data extraction prompt was never used with regulated customer data. STS AssumeRole per tenant seems like paranoia until CloudTrail saves your investigation.

One point I'd emphasize for those implementing: the audit log schema is a contract. Treat it as such. Version it, document it, and never make breaking changes without migration. Breaking audit queries in production is the kind of problem that surfaces at the worst possible moment.

## Verdict

This design is recommended for approval with the following conditions: (1) Gateway latency validation in a staging environment before migrating regulated tenants — the p95 < 150ms target needs to be measured, not assumed; (2) formal security review before go-live with LGPD tenants, specifically covering the STS AssumeRole flow and VPC Endpoints configuration; (3) mandatory shadow mode period for Guardrails (minimum 2 weeks per profile) before activating blocking mode in production.

Consolidation on Bedrock as the unified inference plane is the most important architectural decision in this design, and it is correct for the described scenario. The availability of OpenAI models on Bedrock removes the main argument against this consolidation. The AI Gateway as a control plane over Bedrock is the pattern I would implement in any enterprise platform with more than 5 tenants or with any regulatory requirement — not because it's elegant, but because it's the only design that allows answering audit questions with evidence, not assumption.

## References

- [AWS News Blog — OpenAI models on Amazon Bedrock](https://aws.amazon.com/blogs/aws/)
- [Amazon Bedrock — Product Page](https://aws.amazon.com/bedrock/)
- [Amazon Bedrock — Supported Foundation Models](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html)
- [Amazon Bedrock Guardrails — Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html)
- [Amazon Bedrock Model Invocation Logging](https://docs.aws.amazon.com/bedrock/latest/userguide/model-invocation-logging.html)
- [AWS IAM — AssumeRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html)
- [Amazon Bedrock VPC Endpoints](https://docs.aws.amazon.com/bedrock/latest/userguide/usingVPC.html)

## Case sources

- [AWS News Blog — OpenAI models on Amazon Bedrock](https://aws.amazon.com/blogs/aws/)
- [Amazon Bedrock — AWS](https://aws.amazon.com/bedrock/)
- [Amazon Bedrock — Supported foundation models](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html)
