# Custom Lens for Data Platforms: Anatomy of a Pattern

The AWS Well-Architected Custom Lens is often treated as a documentation artifact — but when applied to enterprise data platforms, it becomes an operational governance mechanism with real teeth. In this article, I dissect the pattern's anatomy, expose its most common adoption failures, and propose a reference design that connects lens reviews to automated remediation pipelines.

- URL: https://fernando.moretes.com/blog/well-architected-lens-para-plataformas-de-dados-enterprise

- Markdown: https://fernando.moretes.com/blog/well-architected-lens-para-plataformas-de-dados-enterprise/article.md?lang=en

- Published: 2026-06-10T12:00:00.000Z

- Category: Data Platforms

- Tags: well-architected, custom-lens, data-platform, governance, data-mesh, glue, msk, finops

- Reading time: 8 min

- Source: [Custom Lens and enterprise data platforms](https://aws.amazon.com/blogs/architecture/)

---

Every enterprise data platform accumulates governance debt faster than technical debt. The AWS Well-Architected Custom Lens is the only AWS-native mechanism that lets you codify your domain-specific quality criteria — ingestion latency, data lineage, tenant isolation, cost per dataset — and turn them into structured, repeatable reviews. But the pattern has a serious adoption problem: most teams treat it as an annual audit checklist rather than a continuous feedback loop integrated into the delivery cycle. This article tears down the pattern end-to-end, with a focus on financial-grade data platforms where the cost of a superficial review is measured in compliance incidents.

## The Problem Custom Lens Actually Solves

The standard Well-Architected Framework was designed to be universal. That is its strength and its limitation. When you review a data platform against the generic pillars, the reliability questions ask whether you have multi-AZ — but they don't ask whether your Kafka/MSK ingestion pipeline has consumer idempotency configured, whether your schema registry is versioned with BACKWARD compatibility, or whether your Glue Job has bookmarking enabled for safe reprocessing. These gaps are not framework bugs; they are the exact space Custom Lens was designed to fill.

In financial environments, the consequences of these gaps are concrete. A pipeline without auditable lineage control can result in regulatory reconciliation failure. A dataset incorrectly partitioned in S3 — say, by `event_date` instead of `processing_date` — creates lag windows that invalidate intraday risk reports. A DynamoDB table with a poorly chosen partition key for portfolio positions generates hot partitions that degrade read latency precisely at peak load: market close.

Custom Lens solves this by letting you codify **domain-context questions** — not just generic best practices. Each question has choices with risk levels (HIGH, MEDIUM, NONE), and the framework aggregates these risks into a workload score. The difference between a well-built lens and a poorly built one lies in the granularity of choices: vague choices generate useless scores; choices with objective, measurable criteria generate actions.

## Pattern Anatomy: Custom Lens as a Governance Loop

Full flow from lens definition to automated remediation, through workload review and improvement plan generation integrated into the data pipeline.

### 📐 Lens Authoring — Governance Team

- Lens JSON Pillar + Questions + Choices (ci)
- WA Tool Publish & Version Lens (security)

### 🗂️ Workload Review — Data Platform Team

- WA Workload Data Platform ARN (compute)
- Review Session Risk Scores per Pillar (compute)
- Milestones Point-in-time snapshots (storage)

### 🔁 Automated Remediation — DevSecOps Pipeline

- EventBridge WA Risk Change Event (messaging)
- Step Functions Remediation Orchestrator (compute)
- Lambda Config Drift Detector (compute)
- AWS Config Compliance Rules (security)

### 📊 Observability — Data Platform Health

- CloudWatch Custom Metrics + Alarms (data)
- Jira / Backlog Improvement Plan Items (external)

### Flows

- lens-json -> lens-publish: upload via CLI/API
- lens-publish -> workload: associate to workload
- workload -> review: start review
- review -> milestones: save milestone
- review -> eventbridge: HIGH risk detected
- eventbridge -> stepfn: trigger remediation
- stepfn -> lambda-check: check drift
- stepfn -> config: evaluate compliance
- stepfn -> cloudwatch: publish metrics
- stepfn -> jira: create improvement ticket

## Lens Anatomy: JSON Structure and Design Decisions

A Custom Lens is a JSON document with a well-defined hierarchy: `pillars` → `questions` → `choices`. Each choice has an `id`, a `title`, a `description`, and a `helpfulResource`. Workload risk is calculated by the WA Tool based on **unselected** choices that have associated `riskRules` — meaning you define what safe looks like, and the absence of safe behavior generates the risk.

For a data platform, I organize pillars around five domains the standard framework doesn't adequately cover:

1. **Stream Ingestion Reliability** — idempotency, dead-letter queues on MSK, consumer group lag monitoring
2. **Data Quality and Lineage** — AWS Glue Data Catalog integration, lineage tags in S3 object metadata, schema validation with AWS Glue Schema Registry
3. **Tenant Isolation and Segmentation** — Lake Formation row-level and column-level security, per-tenant KMS CMK, S3 bucket policies with `aws:PrincipalTag` conditions
4. **Processing Cost and Efficiency** — Glue DPU sizing, S3 Intelligent-Tiering enabled, Athena workgroup with data scanned limits
5. **Pipeline Observability** — OpenTelemetry spans in Glue Jobs, custom ingestion latency metrics in CloudWatch, SLOs defined per dataset

A critical design decision: **do not mix operational questions with architectural questions in the same pillar**. Operational questions ("is monitoring configured?") have binary answers and change frequently. Architectural questions ("does the design support event replay?") are more stable. Mixing them creates a volatile score that loses meaning over time.

## When to Use This Pattern — and When It Is Overkill

Custom Lens makes sense when you have **multiple workloads in the same domain** that need to be evaluated with consistent criteria over time. In a financial data platform with 15 ingestion pipelines, 8 data products, and 3 landing zones, the lens becomes the only mechanism that ensures all teams answer the same questions with the same risk vocabulary.

The pattern is also justified when you have **regulatory requirements that need to be tracked** — BACEN, LGPD, SOX. In that case, the WA Tool milestone functions as auditable evidence that the review happened on a specific date with a specific set of risks identified and accepted. This has direct value in external audits.

On the other hand, **do not use Custom Lens** when:
- You have only one or two workloads and the review can be done with the standard framework plus additional notes
- Your team lacks the capacity to keep the lens updated — an outdated lens is worse than no lens, because it creates a false sense of coverage
- You are in proof-of-concept phase — the overhead of creating and publishing a lens is not justified before having production workloads
- The goal is documentation only: for that, a well-written ADR is more efficient

The rule of thumb I use: **Custom Lens is justified when the cost of inconsistent reviews across teams exceeds the cost of maintaining the lens**. On platforms with more than 5 workloads in the same domain and quarterly reviews, that break-even is reached quickly.

## Anti-Patterns: How Custom Lens Fails in Practice

- **Lens as a static artifact**: creating the JSON once, publishing it, and never revisiting it. After 6 months, the questions no longer reflect the actual state of the platform. The score becomes noise.
- **Questions without objective choices**: "Do you monitor your pipelines?" without defining what constitutes adequate monitoring. This allows any answer to be justified, hollowing out the review's value.
- **Review without milestone**: conducting the review without saving a milestone means losing the risk evolution history. The WA Tool does not persist automatically — you must create the milestone explicitly via API or console.
- **Poorly scoped workload**: associating a single workload to the entire data platform instead of creating workloads per domain (ingestion, transformation, serving). This dilutes the score and makes it impossible to prioritize remediation.
- **Ignoring the improvement plan**: the WA Tool generates an automatic improvement plan based on identified risks. Not integrating that plan into the team's backlog is the most common and most costly anti-pattern — risks are identified, documented, and ignored.
- **Lens with more than 30 questions**: excessively long lenses generate review fatigue. Teams start checking choices without reading, invalidating the process. Keep between 15 and 25 questions per lens.

## Integration with Remediation Pipeline: Closing the Loop

The real value of Custom Lens emerges when you close the loop between review and remediation. The WA Tool API exposes events via EventBridge — specifically, the `AWS Well-Architected Tool:UpdateAnswer` event is emitted whenever an answer changes risk level. You can use this as a trigger for a remediation pipeline.

The design I've implemented in production uses Step Functions with the following flow:

1. **EventBridge Rule** captures HIGH risk events from the WA Tool and invokes a Step Function
2. **Triage Lambda** queries the WA Tool API to get the full context of the question and unselected choice
3. **AWS Config Rule** checks whether the corresponding configuration drift exists in the environment (for example, if the question is about Glue Job bookmarking, the Config Rule checks whether `--job-bookmark-option` is enabled on the workload's Glue Jobs)
4. **Conditional decision**: if drift exists, automatically creates a Jira ticket with full context and a link to the choice documentation; if it doesn't exist, marks the risk as "accepted" via API with automatic justification
5. **CloudWatch Custom Metric** `WATool/HighRiskCount` is published after each execution, enabling alarms and trend dashboards

This integration transforms the WA Tool from a passive audit mechanism into an active component of the governance cycle. The operational cost is low: Step Functions Express Workflows charge per execution and duration — for quarterly reviews with ~20 questions, the monthly cost is under $1. The value, on the other hand, is the elimination of the gap between identification and action that kills most governance programs.

## Security and IAM: Who Can Do What in the WA Tool

The WA Tool has a permissions model that most teams ignore until they have an incident. By default, any principal with `wellarchitected:*` can create workloads, conduct reviews, and — critically — **delete milestones**. In an audit context, a deleted milestone is destroyed evidence.

The control I implement uses three separate IAM roles with the least privilege principle:

- **`WATool-Reviewer`**: `wellarchitected:CreateWorkload`, `wellarchitected:UpdateAnswer`, `wellarchitected:GetWorkload`, `wellarchitected:ListAnswers`. No permission to create or delete milestones.
- **`WATool-Auditor`**: `wellarchitected:CreateMilestone`, `wellarchitected:GetMilestone`, `wellarchitected:ListMilestones`. No permission to update answers. This role is assumed only by the CI/CD pipeline after the review is complete.
- **`WATool-Admin`**: full permissions, including `wellarchitected:DeleteWorkload` and `wellarchitected:DisassociateLenses`. Restricted to an administrator group with mandatory MFA and a maximum session duration of 1 hour via `aws:MultiFactorAuthAge` condition.

An important detail: the WA Tool **does not support resource-based policies** — all control is via IAM identity policies. This means you cannot use `aws:ResourceTag` conditions directly on WA Tool resources. The alternative is to use SCPs in AWS Organizations to restrict WA Tool actions to specific roles in production accounts.

For published lenses, use `wellarchitected:ShareInvitationAction` carefully — sharing a lens across accounts exposes the question and choice structure, which may reveal control gaps you prefer to keep internal.

## Pattern Key Points

- Custom Lens codifies domain-specific quality criteria that the standard framework doesn't cover — for data platforms, this includes lineage, tenant isolation, and per-dataset SLOs.
- Milestones are the only auditable evidence mechanism in the WA Tool — protect them with separate IAM roles and never delegate deletion permission to reviewer roles.
- Integrating EventBridge + Step Functions with the WA Tool transforms passive reviews into active remediation loops, with operational cost under $1/month for quarterly reviews.
- Keep between 15 and 25 questions per lens. Beyond that, review fatigue invalidates the process — teams check choices without reading.
- Create separate workloads per functional domain (ingestion, transformation, serving), not a single workload for the entire platform. Diluted scores don't drive prioritization.
- The improvement plan generated by the WA Tool must be integrated into the team's backlog via automation — without this integration, the pattern generates risk documentation without corresponding action.

> **Version Your Lens as Code:** Treat the Custom Lens JSON as infrastructure: store it in Git with semver, use pull requests for changes to questions and choices, and automate the upload via `aws wellarchitected upload-lens-review` in the CI pipeline. This ensures change traceability — when a question is removed or a risk criterion is relaxed, you have a commit with author, date, and justification. In regulatory audits, this traceability is worth as much as the lens content itself.

## Well-Architected Pillars Applied to the Pattern

- **security**: Separate IAM roles for Reviewer, Auditor, and Admin with mandatory MFA for Admin. SCPs blocking milestone deletion in production accounts. Lens JSON stored in S3 with versioning and KMS CMK.
- **reliability**: Milestones as point-in-time snapshots ensure risk history is preserved even if workloads are recreated. Remediation pipeline with exponential retry in Step Functions for WA Tool API failures.

## Custom Lens vs. Data Platform Governance Alternatives
| Criterion | Mechanism | Temporal Traceability | AWS-Native Integration | Maintenance Cost | Audit Value |
| --- | --- | --- | --- | --- | --- |
| Custom Lens (WA Tool) | High — milestones by date | High — EventBridge, native API | Medium — requires ongoing curation | High — structured evidence | — |
| AWS Config + Conformance Packs | High — compliance history | High — native | Low — declarative rules | Medium — technical, not narrative | — |
| ADRs + Runbooks (Confluence/Git) | Low — no risk versioning | None | Low — free text | Low — unstructured | — |
| Security Hub + Standards | High — historical findings | High — native | Low — managed by AWS | High — security focused | — |

> **My Practical Perspective:** In financial data platforms I've operated, Custom Lens proved useful not for the quality of the score itself, but for the **conversation it forces** between engineering and risk teams. The hardest lesson I learned: a lens without an executive sponsor dies at the second review. The engineering team fills in the answers, the improvement plan goes to the backlog, nobody prioritizes it, and at the third review the same HIGH risks appear again. The remediation pipeline automation I described here solves part of the problem — but the other part is political: the lens needs to be connected to a platform OKR with a named owner. Without that, it's expensive documentation.

## Verdict: Worth It, But Only with the Loop Closed

Custom Lens for data platforms is a mature and undervalued pattern. When implemented correctly — with workloads per domain, IAM-protected milestones, automated remediation pipeline, and executive sponsor — it delivers what no other AWS tool delivers: a structured, temporal, and auditable view of governance debt specific to your data domain. The implementation cost is low (the WA Tool is free; the automation pipeline costs pennies), but the maintenance cost is real and proportional to the quality of the questions you write. My recommendation: start with a 20-question lens covering the five domains I described, automate the improvement plan to the backlog in the first sprint, and review the lens every 6 months with the same rigor you apply to your architecture review. If you don't have the capacity for that, use the standard framework with additional notes — a poorly maintained lens is worse than no lens.

**Rating:** Recommended with conditions

## References

- [AWS Well-Architected Custom Lens — Official Documentation](https://docs.aws.amazon.com/wellarchitected/latest/userguide/lenses-custom.html)
- [AWS Well-Architected Tool API Reference](https://docs.aws.amazon.com/wellarchitected/latest/APIReference/Welcome.html)
- [AWS Glue Schema Registry — Developer Guide](https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html)
- [AWS Lake Formation — Data Filtering and Cell-Level Security](https://docs.aws.amazon.com/lake-formation/latest/dg/data-filtering.html)
- [AWS Architecture Blog — Well-Architected for Data Analytics](https://aws.amazon.com/blogs/architecture/category/analytics/)
- [Data Mesh — Delivering Data-Driven Value at Scale (Zhamak Dehghani)](https://www.oreilly.com/library/view/data-mesh/9781492092384/)
- [AWS Step Functions — Express Workflows Pricing](https://aws.amazon.com/step-functions/pricing/)
- [AWS Config — Conformance Packs](https://docs.aws.amazon.com/config/latest/developerguide/conformance-packs.html)
