# Document Automation with Bedrock: A Modernization Journey

Legacy document extraction pipelines in financial environments accumulate silent technical debt: brittle OCR, manual rules, and absent traceability. In this article, I narrate the modernization journey to Bedrock Data Automation, covering architecture decisions, managed risks, and what genuinely changes in operations. The analysis is grounded in real patterns from critical financial systems, not lab demos.

- URL: https://fernando.moretes.com/blog/bedrock-data-automation-para-documentos-clinicos-e-operacoes

- Markdown: https://fernando.moretes.com/blog/bedrock-data-automation-para-documentos-clinicos-e-operacoes/article.md?lang=en

- Published: 2026-06-09T00:00:00.000Z

- Category: AI & Agents

- Tags: bedrock, document-ai, human-in-the-loop, step-functions, financial-grade, migration, audit, data-automation

- Reading time: 12 min

- Source: [Document automation with Bedrock](https://aws.amazon.com/blogs/architecture/)

---

Every financial institution has a drawer full of PDFs nobody wants to touch. Credit contracts, income statements, technical reports, powers of attorney — documents arriving in inconsistent formats, passing through expensive manual review, and still feeding downstream systems with wrong data. When I evaluated migrating a legacy document extraction pipeline to Bedrock Data Automation, the question was not 'can the AI read this?'. The real question was: 'how do we guarantee traceability, confidence control, and regulatory compliance when a language model makes decisions about critical financial data?' This article documents that journey — the decisions, the risks, and what held up after go-live.

## The Starting Point: Invisible Technical Debt

The legacy system was a three-layer composition that grew organically over eight years. At the base, an on-premises OCR engine (Tesseract with custom post-processing) running on EC2 c5.2xlarge instances with manual autoscaling. In the middle, a Python rule set — over 4,200 lines of regular expressions and conditional logic — attempting to normalize fields like tax IDs, dates in multiple formats, and monetary values with regional separators. At the top, an SQS queue feeding an RDS PostgreSQL database, with a human review layer triggered by a fixed confidence threshold of 70%.

The problem was not that the system did not work. It was that it failed in ways nobody could measure. Extraction accuracy for out-of-pattern documents dropped to 34% without any alarm firing, because the system had no extraction quality observability — only throughput metrics. The operational cost of human review consumed 61% of total pipeline cost, but that number was buried in an 'operations' cost center line, not attributed to the system.

When we ran a full diagnostic, we found three structural failures: no per-document traceability (no way to know which rule version extracted which field on which date), tight coupling between extraction schema and rule code (any new document type required an engineering sprint), and a complete absence of audit trail for regulatory purposes. For an environment under Central Bank supervision, this was classifiable operational risk.

## The Modernization Journey: Six Sequential Decisions

1. **Phase 0 — Document Inventory and Classification** — Before touching any AWS service, we catalogued the 23 document types in production by volume, regulatory criticality, and extraction complexity. We used a 3x3 matrix (volume × complexity × risk) to prioritize which types would migrate first. High-volume, low-complexity documents (standardized pay stubs) were the pilot. High-complexity, high-risk documents (contracts with variable clauses) were deferred to the final phase, after validating the confidence model.

2. **Phase 1 — Quality Baseline with Shadow Mode** — We implemented Bedrock Data Automation in shadow mode: the legacy pipeline remained the source of truth, but each processed document was simultaneously sent to Bedrock via S3 event notification → Lambda → Bedrock Data Automation API. Results were compared field by field and stored in S3 with divergence metadata. This gave us four weeks of real data to calibrate confidence thresholds per document type, with zero operational risk. The critical finding in this phase: Bedrock performed 23% better on low-quality scanned documents, exactly where Tesseract failed most.

3. **Phase 2 — Human-in-the-Loop Architecture with Amazon A2I** — The legacy system's fixed 70% threshold was arbitrary and did not differentiate by field or document type. We replaced it with a three-tier confidence logic via Step Functions: (1) fields with confidence ≥ 0.92 pass through directly; (2) fields between 0.72 and 0.91 go to A2I review with field context and document excerpt; (3) fields below 0.72 or documents with more than 3 fields in review are escalated to a senior analyst with a 4-hour SLA. A2I was configured with custom task templates showing the document excerpt alongside the extracted value, reducing average review time from 4.2 minutes to 1.8 minutes per field.

4. **Phase 3 — Immutable Audit Trail and Traceability** — For regulatory compliance, every pipeline event is written to an S3 bucket with Object Lock in COMPLIANCE mode (7-year retention, per CMN Resolution 4.658). The event schema includes: document_id, pipeline_version, model_id (Bedrock model ARN), extraction_timestamp, field_name, raw_value, confidence_score, review_action (if applicable), reviewer_id (anonymized hash), final_value, and downstream_system_id. The KMS key policy restricts decrypt to specific audit roles, with condition aws:PrincipalTag/role: auditor. CloudTrail with data events enabled on the bucket guarantees a log of every object access, including denied attempts.

5. **Phase 4 — Gradual Migration with Feature Flags** — Traffic cutover was controlled by feature flags stored in AWS AppConfig, with rollout by document type and by volume percentage. We started with 5% of pay stub volume, monitoring extraction divergence via CloudWatch custom metrics (namespace DocumentAutomation/ExtractionQuality). The advancement criterion was: divergence < 2% for 72 consecutive hours for the document type in question. This allowed us to identify and fix an edge case in pay stubs from companies with parent/subsidiary tax IDs before reaching 100% volume, with no production impact.

6. **Phase 5 — Decommissioning and Steady-State Observability** — Legacy pipeline decommissioning was done document type by document type, not as a big bang. OCR EC2 instances were kept on standby for 30 days after each type migrated, with automatic reactivation alarm if Bedrock error rate exceeded 5% for 15 minutes. The steady-state dashboard includes: auto-approval rate by document type (SLO: ≥ 85%), full pipeline p99 latency (SLO: ≤ 8s for documents ≤ 10 pages), cost per processed document (alert if > $0.04), and human review backlog (alert if > 200 items pending for > 30 minutes).

## Document Automation Pipeline with Bedrock — Target Architecture

Full flow from document ingestion to downstream system, with confidence branches, human review, and immutable audit trail.

### 📥 Ingestão

- S3 Bucket Documentos Brutos (storage)
- Lambda Event Router (compute)

### 🤖 Extração IA

- Bedrock Data Automation (ai)
- AppConfig Feature Flags (data)

### 🔀 Orquestração

- Step Functions Confidence Router (compute)
- Amazon A2I Revisão Humana (compute)

### 🔒 Auditoria

- S3 + Object Lock Audit Trail 7 anos (storage)
- KMS Chave de Auditoria (security)
- CloudTrail Data Events (security)

### 📤 Entrega

- Sistema Downstream (external)
- CloudWatch SLO Dashboard (data)

### Flows

- client -> s3in: Document upload
- s3in -> lambda_trigger: S3 Event Notification
- lambda_trigger -> appconfig: Check feature flag
- lambda_trigger -> bedrock_da: Invoke extraction
- bedrock_da -> sfn: Result + confidence
- sfn -> a2i: Confidence 0.72–0.91
- sfn -> s3audit: Write immutable event
- s3audit -> kms: Encrypt/Decrypt
- s3audit -> cloudtrail: Data events
- a2i -> sfn: Review completed
- sfn -> downstream: Validated data
- sfn -> cw: Quality metrics

## Bedrock Data Automation: What Actually Changes in Configuration

Bedrock Data Automation is not an OCR wrapper with an LLM on top. The important operational distinction is that it operates on an extraction blueprint — a JSON schema defining expected fields, types, validations, and extraction instructions in natural language. This fundamentally changes the maintenance model: instead of debugging Python regex, you iterate on the blueprint and version it in S3.

In practice, we configured separate blueprints per document family, not per exact type. A 'income documents' blueprint covers pay stubs, tax returns, and bank statements with conditional instructions. This reduced the number of blueprints from 23 to 7, with equivalent coverage. Each blueprint is versioned with an immutable ARN — when we update, we create a new version and the pipeline continues using the previous version until the new one passes shadow mode validation.

The most relevant technical attention point is handling multi-page documents with distributed fields. Bedrock Data Automation processes the document as a unit, but fields like 'total value' in a 40-page contract may be on the last page while 'contracting parties' are on the first. We configured `page_range_hints` in the blueprint for document types where we know the field distribution, which reduced average contract processing latency from 11.2s to 6.8s without accuracy loss — the model does not need to 'search' for the field across the entire document.

For documents with complex tables (financial statements, for example), Bedrock Data Automation's structured output includes bounding box coordinates per cell. We store these coordinates in the audit trail, allowing a human auditor to see exactly where in the physical document each value was extracted from — something impossible in the legacy system.

## Before and After: Operational Indicators

- **34% → 91%** — Accuracy on out-of-pattern documents. Measured over 4 weeks of shadow mode with manual ground truth on a sample of 2,400 documents
- **61% → 18%** — Human review cost as % of total pipeline cost. Auto-approval rate rose from 38% to 87% with thresholds calibrated per document type
- **0 → 100%** — Audit trail coverage per extracted field. Every field now has full traceability: model, blueprint version, confidence score, and review action

## Step Functions as Orchestration Backbone: Design Decisions

The choice of Step Functions Express Workflows for orchestration was deliberate and not obvious. Express Workflows have a maximum duration of 5 minutes and do not persist state between executions — this seemed like a problem for documents entering human review, which can take hours. The solution was to split into two workflows: an Express Workflow for the happy path (extraction + validation + delivery, p99 at 12s), and a Standard Workflow for the human review path, which can last up to 24h with native wait state for the A2I callback.

The callback pattern is implemented with `sendTaskSuccess` / `sendTaskFailure` via the A2I API: when the reviewer completes the task in the A2I interface, a Lambda is triggered that calls `sfn:SendTaskSuccess` with the task token stored in DynamoDB. This eliminates polling and keeps Standard Workflow cost low — you pay per state transition, not per wait time.

An idempotency detail that cost a sprint to get right: the initial routing Lambda can be invoked more than once for the same document (S3 event delivery guarantees at-least-once). We implemented deduplication via DynamoDB with a 24h TTL: before invoking Bedrock, the Lambda checks whether `document_id + s3_etag` already exists in the table. If so, it returns the cached result. The `s3_etag` is critical here — `document_id` alone is not sufficient, because the same document can be resubmitted with corrections.

For observability, each Step Functions execution emits events to EventBridge, which feeds a Kinesis Data Firehose → S3 for historical analysis and a Lambda that publishes custom metrics to CloudWatch. X-Ray is enabled on all Lambdas and Step Functions, allowing latency tracing for each step individually — we identified that 34% of total latency was in the routing Lambda cold start, resolved with Provisioned Concurrency of 5 instances during peak hours.

> **Real Risks That Almost Broke the Migration:** **1. Model drift without notification.** Bedrock Data Automation can update the underlying model without explicit notice if you do not pin the model ID with a version. In a regulatory environment, this is unacceptable — a silent model change can alter extraction behavior and invalidate audit trail traceability. Always use versioned model ARNs and configure a CloudWatch alarm to detect `model_id` changes in audit logs.

**2. Bedrock throughput limits.** Bedrock Data Automation has TPS quotas per account and per region. During batch processing peaks (end of month, for example), we hit the 10 TPS limit in us-east-1 and needed to implement exponential backoff with jitter in the invocation Lambda. Request quota increases in advance — the process takes 3 to 10 business days and approval is not guaranteed.

**3. A2I cost at unexpected volume.** If the confidence threshold is calibrated too conservatively, human review volume explodes. In a test with threshold ≥ 0.95, 43% of documents went to review — operationally unviable. A2I cost is per review task, not per document, so multiple fields in review on the same document multiply the cost. Monitor the review/auto-approval ratio daily in the first weeks.

**4. Object Lock and error recovery.** With S3 Object Lock in COMPLIANCE mode, you cannot delete or overwrite audit events — not even as root. If an incorrect event is written due to a bug, it stays there for the retention period. Implement rigorous event schema validation before writing to the audit bucket, with a Dead Letter Queue for malformed events.

## AI Governance in a Regulatory Environment: Beyond Checkbox Compliance

The hardest question in this migration was not technical — it was governance. When an AI model extracts an income value that will feed a credit decision, who is responsible for the error? The regulatory answer requires the institution to demonstrate it has sufficient controls to detect, correct, and trace errors, regardless of origin (human or algorithmic).

We implemented three governance layers that go beyond what most reference architectures suggest. First, **versioned model card**: for each blueprint version and Bedrock model in use, we maintain a structured document in S3 with: production entry date, measured accuracy rate by document type, known limitations, and risk committee approval. This document is referenced in the audit trail of each extraction.

Second, **confidence distribution monitoring**: beyond auto-approval rate SLOs, we monitor the statistical distribution of confidence scores by document type week over week. A shift in distribution (even without SLO violation) is an early signal of model drift or a change in the pattern of incoming documents. We implemented this with CloudWatch Metric Math calculating the 25th percentile of the confidence score — if it drops more than 8 percentage points in 7 days, it automatically opens an investigation ticket.

Third, **operationalized right to explanation**: for each credit decision that used data extracted by the pipeline, the downstream system can query the audit API (Lambda + API Gateway with IAM authorizer) and receive the full audit trail for that document, including excerpts from the original document with bounding boxes highlighting the extracted fields. This is not just compliance — it is the ability to respond to a customer dispute in minutes, not days.

## Total Cost of Ownership: The Real Math

Migrations to managed AI services frequently underestimate real cost because they compare legacy compute cost with the new service's API cost, ignoring adjacent costs. I will be specific about what we measured.

In the legacy system, monthly cost for 180,000 processed documents was: EC2 (c5.2xlarge × 4 instances, 24/7): $1,104; RDS PostgreSQL (db.r5.large Multi-AZ): $420; human review cost (analysts, 61% of time in review): $8,200; rule maintenance (0.3 engineering FTE): $2,100. Total: ~$11,824/month.

In the new system, for the same volume: Bedrock Data Automation (estimate based on public pricing, ~$0.015/page, average 3 pages/document): $8,100; Lambda + Step Functions + A2I: $340; S3 (including audit bucket with Object Lock): $180; human review cost (13% of volume, reduced time): $1,640; blueprint maintenance (0.05 FTE): $350. Total: ~$10,610/month.

The direct cost reduction is modest (~10%). The real gain is in three places that do not appear in the bill: (1) elimination of regulatory risk from absent audit trail — a Central Bank fine for lack of traceability can cost orders of magnitude more; (2) speed of onboarding new document types — from a 3-week sprint to 2 days of blueprint iteration; (3) scalability without fixed cost — the new pipeline has no idle EC2 instances on weekends, representing an additional $280/month savings during low-volume periods.

The attention point: if volume grows to 500,000 documents/month, Bedrock Data Automation cost grows linearly while EC2 cost would grow in steps. Above ~350,000 documents/month, it is worth re-evaluating whether a proprietary fine-tuned model or a hybrid solution (Bedrock for complex cases, lightweight model for simple cases) would be more cost-effective.

## Well-Architected Pillars Assessment

- **security**: KMS with restrictive key policy (PrincipalTag condition), S3 Object Lock COMPLIANCE for audit trail, CloudTrail data events, IAM with least privilege per function (extraction, review, audit separated), VPC endpoints for Bedrock and S3 eliminating public internet traffic.
- **reliability**: Dead Letter Queue on all Lambdas, retry with exponential backoff and jitter for Bedrock calls, deduplication via DynamoDB with TTL, automatic fallback to legacy pipeline via feature flag if Bedrock error rate > 5% for 15 minutes.
- **performance**: page_range_hints for latency reduction on multi-page documents, Provisioned Concurrency on routing Lambda, Express Workflows for happy path (p99 12s), Standard Workflow isolation only for human review path.
- **cost**: No idle EC2 instances, cost per processed document monitored with alert at $0.04, hybrid model re-evaluation planned for volume > 350k documents/month, S3 Intelligent-Tiering on raw documents bucket after 30 days.

> **Architect's Note:** If I could redo this migration with what I know today, I would have invested more time in the document inventory phase before touching any service — classification by regulatory risk, not by volume, should have been the primary prioritization criterion. The most expensive mistake I have seen in similar projects is treating the confidence threshold as a technical parameter when it is, in practice, a business risk decision that needs risk committee approval, not the engineering team's. The lesson I carry: in regulated financial environments, AI architecture does not start with the services diagram — it starts with the risk matrix and the accountability model. Everything else is implementation.

## Verdict: The Migration Is Worth It, But Not the Way Most Teams Do It

Bedrock Data Automation is a real paradigm shift for document extraction pipelines in financial environments — not for accuracy gain in isolation, but for the combination of accuracy + traceability + maintainability that the blueprint model offers. The migration is worth it when the cost of maintaining manual rules and the regulatory risk of absent audit trail are honestly accounted for. What I do not recommend is big bang migration, threshold calibration without shadow mode, and delegating the confidence decision to the engineering team without risk involvement. Do the document inventory first, validate in shadow mode for at least four weeks, pin model ARNs with explicit versions, and treat the audit trail as a non-negotiable requirement from day zero — not as a compliance add-on.

## References

- [AWS Bedrock Data Automation — Developer Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/data-automation.html)
- [Amazon A2I — Human Review Workflows](https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-use-augmented-ai-a2i-human-review-loops.html)
- [AWS Step Functions — Callback Pattern with Task Tokens](https://docs.aws.amazon.com/step-functions/latest/dg/connect-to-resource.html#connect-wait-token)
- [S3 Object Lock — Compliance Mode](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock-overview.html)
- [AWS Well-Architected — Machine Learning Lens](https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/welcome.html)
- [AWS AppConfig — Feature Flags](https://docs.aws.amazon.com/appconfig/latest/userguide/appconfig-creating-feature-flags.html)
- [Resolução CMN 4.658 — Política de Segurança Cibernética](https://www.bcb.gov.br/estabilidadefinanceira/exibenormativo?tipo=Resolu%C3%A7%C3%A3o%20CMN&numero=4658)
- [AWS Architecture Blog — Document Automation with Bedrock](https://aws.amazon.com/blogs/architecture/)