Who is Fernando F. Azevedo?

Fernando F. Azevedo is a Senior Solutions Architect at Banco Itaú with 16+ years of experience across AWS, event-driven architecture, DevSecOps, Data Mesh, AI and financial systems.

What technical topics does Fernando work with?

Fernando works with AWS, Kubernetes, Kafka, Data Mesh, Amazon Bedrock, RAG, DevSecOps, observability, financial systems and architecture communication using C4, ADRs and trade-off analysis.

Is Fernando available for professional conversations?

Fernando is currently building at Banco Itaú and is open to thoughtful conversations about architecture, cloud, AI, engineering leadership, community, podcasts and technical collaboration.

Security & ResilienceDeep Dive

Scalable User Search with Amazon Cognito: A Deep-Dive Analysis

Jun 3, 2026 8 minadvanced AI-assisted

Listen to article

Fernando's voice

Fernando · 15:37

Download MP3

0:0015:37

Speed

The MP3 is saved to S3 after the first play.

Security & ResilienceDeep Dive

5 RPS

ListUsers rate limit per User Pool

Native ceiling; burst undocumented and inconsistent across regions

<50ms

p99 search latency via OpenSearch

With well-mapped index and r6g.large.search instance, 500 RPS load

~$180/mês

Base OpenSearch cluster cost (1x r6g.large node)

For pools up to 2M users with optimized index; scales horizontally with 3 nodes for HA

fernando.moretes.com

Amazon Cognito excels at authentication, but its user-listing API was never designed for high-frequency search against large user pools. In this article, I analyze how to build a scalable search layer on top of Cognito, the failure modes that emerge when you ignore native API limits, and the real trade-offs between eventual consistency, data privacy, and operational cost.

Every architect who has run an Amazon Cognito User Pool in production with more than 100,000 users knows the exact moment the ListUsers API starts to disappoint: silent throttling, forced pagination, no full-text search, and zero sub-second latency guarantees. The problem is not a bug — it is an intentional design boundary. Cognito was built to be an identity control plane, not a search engine. When product teams demand user autocomplete, custom-attribute filtering, or real-time filtered listings, the right answer is not to force the native API beyond its limits: it is to design a dedicated search layer, asynchronously synchronized, with privacy and consistency as first-class citizens.

Why the Native Cognito API Does Not Scale for Search

Cognito's ListUsers accepts a Filter parameter with proprietary syntax and supports only prefix matching on indexed attributes — email, phone_number, cognito:user_status, and a handful of others. There is no fuzzy search, relevance scoring, custom-attribute search (custom:*), or sorting by arbitrary fields. The default rate limit for ListUsers is 5 RPS per User Pool in most regions, with limited burst. In a financial system with dozens of microservices calling this endpoint — onboarding, KYC, customer support, back-office — that ceiling is hit within seconds.

Beyond throttling, there is a structural latency problem. Each ListUsers call with pagination (PaginationToken) is sequential; you cannot parallelize a scan of a 500,000-user pool. Full-pool iteration can take minutes, making any real-time use case infeasible.

The third failure vector is privacy. Returning user attributes directly from a search API — even an internal one — without field-level filtering exposes PII (name, national ID, date of birth) to consumers that may only need the sub (unique identifier). In environments governed by LGPD or PCI-DSS, this is a concrete compliance risk, not a theoretical one.

Scalable Search Architecture on Top of Cognito

Full flow: Cognito lifecycle events trigger async sync to OpenSearch, while the search API serves reads with IAM-controlled field projection.

👤 Identity Plane — Amazon Cognito

Cognito User Pool · Lifecycle triggers
Post-Confirmation · & Post-Auth Lambda

⚡ Event Bus — Async Sync

EventBridge · Custom Bus
SQS DLQ · max 3 retries
Sync Lambda · Idempotent upsert

🔍 Search Plane — OpenSearch

OpenSearch · users-v2 index
KMS CMK · Encryption at rest

🌐 API Layer — Search Entrypoint

API Gateway · REST + Authorizer
Search Lambda · Field projection
WAF · Rate limit + IP rules

📊 Observability

CloudWatch · SLO dashboards

How the Synchronization Pipeline Actually Works

The central mechanism is an event-driven asynchronous synchronization pipeline. Cognito exposes Lambda Triggers for lifecycle events: PostConfirmation, PostAuthentication, PreTokenGeneration, and for deletion, a trigger via AdminDeleteUser that can be intercepted with a custom Lambda before the SDK call. Each of these triggers publishes an event to EventBridge with a versioned schema — { "source": "identity.cognito", "detail-type": "user.created" } — containing only the user's sub and the non-sensitive attributes required for indexing.

The sync Lambda consumes these events and performs an idempotent upsert in OpenSearch using the sub as the document _id. Idempotency is not optional here: EventBridge guarantees at-least-once delivery, and a duplicate event without idempotency results in unnecessary re-indexing with version inconsistency risk. The solution is to use OpenSearch's _seq_no and _primary_term fields for optimistic concurrency control, or simply accept that an upsert by _id is naturally idempotent for the search use case.

For the initial load — migrating an existing pool of N users — the pattern is a backfill job that uses ListUsers with pagination, but executed exactly once, off the critical path, with explicit rate limiting (max 4 RPS to stay below the 5 RPS limit) and a DynamoDB checkpoint for resumption on failure. This job does not need to be fast; it needs to be correct and resumable.

Eventual Consistency Is Not a Weakness — It Is a Contract

The inconsistency window between Cognito and the OpenSearch index is typically 200ms to 2s under normal conditions, but can reach minutes if the sync Lambda is throttled or EventBridge is under load. For search use cases (autocomplete, back-office listing), this window is acceptable. For authorization use cases (verifying a user exists before issuing a token), Cognito is the source of truth — never the search index. Explicitly documenting this contract in the system's ADR prevents product teams from building incorrect dependencies on the search layer.

Privacy by Design: Field Projection and Data Minimization

The most common mistake I see in search implementations over identity data is indexing everything and filtering at presentation time. This violates LGPD's data minimization principle (Art. 6, III) and creates an index that, if compromised, exposes the full profile of every user. The correct approach is to index only the fields needed for search and return only the fields needed by the consumer.

In OpenSearch, this translates into two distinct design decisions. First, the index mapping must omit sensitive fields such as national ID, date of birth, and full phone number — these fields must not exist in the index. If an attribute is not in the mapping, it cannot be leaked. Second, response projection via _source filtering in the query DSL must be applied by the Search Lambda based on the caller's JWT token scope. A token with scope search:basic receives only { sub, display_name, email_prefix }; a token with scope search:admin receives additional fields such as account_status and created_at.

The sync Lambda must also apply hashing or truncation before indexing. For example, indexing email_domain (the part after @) instead of the full email enables corporate domain searches without exposing individual addresses. For name search, techniques like n-gram tokenization in OpenSearch enable autocomplete without storing the full name in plaintext — though this increases index size by 3-4x and must be evaluated against cluster storage cost.

Real Numbers: Native Cognito vs. Dedicated Search Layer

5 RPS

ListUsers rate limit per User Pool

Native ceiling; burst undocumented and inconsistent across regions

<50ms

p99 search latency via OpenSearch

With well-mapped index and r6g.large.search instance, 500 RPS load

~$180/mês

Base OpenSearch cluster cost (1x r6g.large node)

For pools up to 2M users with optimized index; scales horizontally with 3 nodes for HA

Failure Modes Nobody Documents

Silent drift between Cognito and OpenSearch is the most insidious failure mode. If the sync Lambda fails silently — for example, an IAM permission error after a role rotation — the index ages without a visible alarm. The mitigation is a periodic reconciliation job (daily or weekly) that compares counts and random samples between Cognito and OpenSearch, publishing a search.index.drift_count metric to CloudWatch with an alarm on any value > 0 for more than 1 hour.

Index explosion from dynamic custom attributes is another vector. Cognito allows up to 50 custom attributes per User Pool. If the sync Lambda indexes all of them without an explicit mapping, OpenSearch will use dynamic mapping and create fields for each variation, leading to a mapping explosion that can bring down the cluster. The solution is to always define an explicit mapping with dynamic: false and an allowlist of indexable fields.

Cascading throttling during backfill is an operational risk. If the initial backfill and the real-time sync pipeline compete for the same Lambda worker pool, the backfill can exhaust reserved concurrency and delay real lifecycle events. The solution is to run the backfill in a separate Lambda function with isolated reserved concurrency and a dedicated SQS queue with a conservative batch size (10-20 messages).

PII leakage via query logging is frequently overlooked. OpenSearch can log full queries to CloudWatch Logs, including search terms that may contain names or email fragments. In regulated environments, slow logs and audit logs must be configured with field masking or disabled for sensitive fields.

Anti-Patterns That Are Expensive in Production

Calling ListUsers in a loop on the critical path of a product API — unpredictable latency, guaranteed throttling under load, and no latency SLA.
Using the search index as the source of truth for authorization decisions — the index is eventually consistent; a deleted user may appear as active for seconds or minutes.
Indexing the full Cognito user object (including custom:* attributes) without explicit mapping — mapping explosion, increased storage cost, and expanded PII exposure surface.
Synchronizing via periodic Cognito polling instead of lifecycle triggers — inconsistency window proportional to polling interval, unnecessary ListUsers cost, and no per-event granularity.
Exposing the OpenSearch endpoint directly via API Gateway without a projection Lambda — impossible to apply token-scope-based field-level security without an intermediate logic layer.
Omitting the periodic reconciliation job — silent drift between Cognito and the index becomes invisible until a customer support incident reveals it.

Search Backend Options: OpenSearch vs. DynamoDB vs. RDS

	Criterion	OpenSearch	DynamoDB + GSI	RDS Aurora (ILIKE)
Full-text / Fuzzy	Native (BM25, n-gram)	Not supported	Limited (ILIKE, pg_trgm)	—
p99 Latency @ 500 RPS	< 50ms	< 10ms (exact key)	50-200ms (without optimized index)	—
Cost for 2M users	~$180-360/mo (1-3 nodes)	~$30-80/mo (WCU/RCU + storage)	~$200-500/mo (instance + storage)	—
Native field-level security	Yes (OpenSearch Security plugin)	No (requires app-level logic)	No (requires app-level logic)	—
Operational complexity	High (cluster, snapshots, upgrades)	Low (serverless)	Medium (RDS managed, but schema migrations)	—

Security and IAM: Zero Trust in the Search Layer

In a financial environment, the user search layer is a high-value target: any data leak here can result in regulatory fines and reputational damage. The security model must follow the principle of least privilege at every hop.

The Search Lambda must have an IAM role with permission only for es:ESHttpGet and es:ESHttpPost on the specific OpenSearch domain ARN, with an aws:SourceVpc condition to ensure calls only occur within the VPC. OpenSearch access must be configured with Fine-Grained Access Control enabled, mapping the Lambda IAM role to an OpenSearch role with read-only permissions on the users-v2 index.

The Sync Lambda needs es:ESHttpPut and es:ESHttpDelete on the same domain, but must use a separate role — never the same role as the Search Lambda. Separating read and write roles limits the blast radius if one of the functions is compromised.

The OpenSearch domain must be deployed inside a private VPC, with no public endpoint, and Security Groups restricting access only to the relevant Lambdas. The KMS CMK for encryption at rest must have a key policy that allows only the Lambda roles and explicitly named administrators — no kms: for Principal: "".

For auditing, CloudTrail must be enabled with data events for the OpenSearch domain, and the API Gateway Access Log must be sent to an S3 bucket with Object Lock enabled (COMPLIANCE mode, 90-day retention for PCI-DSS) to guarantee immutability of access logs.

AWS Well-Architected Pillars Assessment

Security

Field-level security via OpenSearch Fine-Grained Access Control; separate IAM roles for read and write; KMS CMK with restrictive key policy; VPC-only endpoint; CloudTrail with data events; API Gateway with WAF and per-IP and per-token rate limiting.

Reliability

EventBridge with DLQ for unprocessed sync events; periodic reconciliation job for drift detection; backfill with DynamoDB checkpoint for resumption; OpenSearch with 3 nodes across multiple AZs for HA; CloudWatch alarms on search.index.drift_count and p99 latency.

Performance efficiency

OpenSearch index with explicit mapping and n-gram tokenizer for autocomplete; Search Lambda with Provisioned Concurrency to eliminate cold starts during peak hours; API Gateway response cache for frequent queries (30s TTL for back-office listings).

Cost optimization

OpenSearch Serverless as an alternative for intermittent workloads (cost per OCU-hour vs. reserved instance); Lambda on ARM (Graviton2) reduces compute cost by ~20%; S3 Intelligent-Tiering for OpenSearch snapshots; monthly review of unused indexes.

Architect's Note: What I Would Do Differently

Senior Solutions Architect

In production, the mistake that cost me the most was not documenting the eventual consistency contract from day zero — product teams built business logic assuming the search index was synchronous with Cognito, and the result was subtle bugs in onboarding flows. Today, the first thing I do is write the ADR with the acceptable maximum lag SLO (e.g., 99.9% of events indexed in < 60s) and expose a search.sync.lag_seconds metric on the product dashboard, not just the infra dashboard. The second lesson: never use OpenSearch Serverless for a user pool with more than 500k records without first modeling OCU cost — the per-OCU-hour billing can surprise you under high-frequency query workloads. For most financial cases I have seen, a 3-node r6g.large.search cluster with 1-year Reserved Instances is more predictable and 40-60% cheaper than Serverless at steady state.

Verdict: Build the Search Layer, Don't Work Around Cognito

Strongly Recommended with Caveats

Amazon Cognito is a solid choice for identity management in financial systems — but its listing API is not a search engine and never will be. Trying to work around its limits with aggressive polling, short-lived caching, or parallel queries is a race against throttling that you will lose at scale. The correct architecture is to accept Cognito as the identity control plane and build a dedicated search layer — based on OpenSearch, synchronized via lifecycle events, with privacy by design and eventual consistency documented as an explicit contract. The incremental cost (an OpenSearch cluster at ~$180-360/month) is trivial compared to the cost of a throttling incident in production or a privacy audit that finds unnecessary PII indexed. Invest in the correct synchronization pipeline, document the lag SLO, and treat the search index for what it is: an optimized read projection, not a source of truth.

References and Further Reading

Amazon Cognito — ListUsers API Reference Amazon OpenSearch Service — Fine-Grained Access Control Amazon Cognito — Lambda Trigger Overview Amazon EventBridge — Event Delivery and Retries AWS Well-Architected Framework — Security Pillar OpenSearch — Index Mapping and Dynamic Templates LGPD — Lei Geral de Proteção de Dados Pessoais (Art. 6º)AWS Architecture Blog — Scalable user search with Amazon Cognito

#cognito#opensearch#identity#privacy#api-gateway#lambda#event-driven#financial-grade

Liked this? Get the next one.

Architecture, AWS, AI and market deep dives — straight to your inbox. Free.

No spam · unsubscribe anytime

Analyzed source: Scalable user search with Amazon Cognito

Ask Fernando about this

Get a focused answer about this article from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Keep reading

Security & ResilienceOIDC Session Metadata and Zero Trust: An Architecture Decision RecordSession metadata support in Sign in with Google opens a genuine window for continuous, signal-driven adaptive access — not just at login time. In this ADR, I analyze the architectural forces, options considered, and the decision I would make in a high-criticality financial system integrated with AWS.Read Security & ResilienceCognito Multi-Region: Migrating Identity to High AvailabilityAuthentication is critical infrastructure — a regional Cognito failure brings down the entire user journey. With Cognito multi-Region replication now available, there is a concrete path to elevating the identity plane to the same resilience level we already demand from databases and queues. In this article, I document the migration journey, the architecture decisions, and the risks that need active management.Read Security & ResilienceADR: Replacing SMS OTP with Silent Authentication in CognitoSMS OTP is simultaneously the most widely deployed authentication mechanism and one of the weakest: vulnerable to SIM swap, SS7 interception, and social engineering, with only ~80% completion rates. This ADR examines the decision to replace or complement SMS OTP with network-silent authentication via Vonage integrated into Amazon Cognito's CUSTOM_AUTH flow.Read

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime

Security & ResilienceDeep Dive

Scalable User Search with Amazon Cognito: A Deep-Dive Analysis

Jun 3, 2026 8 minadvanced AI-assisted

Listen to article

Fernando's voice

Fernando · 15:37

Download MP3

0:0015:37

Speed

The MP3 is saved to S3 after the first play.

Security & ResilienceDeep Dive

5 RPS

ListUsers rate limit per User Pool

Native ceiling; burst undocumented and inconsistent across regions

<50ms

p99 search latency via OpenSearch

With well-mapped index and r6g.large.search instance, 500 RPS load

~$180/mês

Base OpenSearch cluster cost (1x r6g.large node)

For pools up to 2M users with optimized index; scales horizontally with 3 nodes for HA

fernando.moretes.com

Why the Native Cognito API Does Not Scale for Search

Scalable Search Architecture on Top of Cognito

Full flow: Cognito lifecycle events trigger async sync to OpenSearch, while the search API serves reads with IAM-controlled field projection.

👤 Identity Plane — Amazon Cognito

Cognito User Pool · Lifecycle triggers
Post-Confirmation · & Post-Auth Lambda

⚡ Event Bus — Async Sync

EventBridge · Custom Bus
SQS DLQ · max 3 retries
Sync Lambda · Idempotent upsert

🔍 Search Plane — OpenSearch

OpenSearch · users-v2 index
KMS CMK · Encryption at rest

🌐 API Layer — Search Entrypoint

API Gateway · REST + Authorizer
Search Lambda · Field projection
WAF · Rate limit + IP rules

📊 Observability

CloudWatch · SLO dashboards

How the Synchronization Pipeline Actually Works

Eventual Consistency Is Not a Weakness — It Is a Contract

Privacy by Design: Field Projection and Data Minimization

Real Numbers: Native Cognito vs. Dedicated Search Layer

5 RPS

ListUsers rate limit per User Pool

Native ceiling; burst undocumented and inconsistent across regions

<50ms

p99 search latency via OpenSearch

With well-mapped index and r6g.large.search instance, 500 RPS load

~$180/mês

Base OpenSearch cluster cost (1x r6g.large node)

For pools up to 2M users with optimized index; scales horizontally with 3 nodes for HA

Failure Modes Nobody Documents

Anti-Patterns That Are Expensive in Production

Calling ListUsers in a loop on the critical path of a product API — unpredictable latency, guaranteed throttling under load, and no latency SLA.
Using the search index as the source of truth for authorization decisions — the index is eventually consistent; a deleted user may appear as active for seconds or minutes.
Indexing the full Cognito user object (including custom:* attributes) without explicit mapping — mapping explosion, increased storage cost, and expanded PII exposure surface.
Synchronizing via periodic Cognito polling instead of lifecycle triggers — inconsistency window proportional to polling interval, unnecessary ListUsers cost, and no per-event granularity.
Exposing the OpenSearch endpoint directly via API Gateway without a projection Lambda — impossible to apply token-scope-based field-level security without an intermediate logic layer.
Omitting the periodic reconciliation job — silent drift between Cognito and the index becomes invisible until a customer support incident reveals it.

Search Backend Options: OpenSearch vs. DynamoDB vs. RDS

	Criterion	OpenSearch	DynamoDB + GSI	RDS Aurora (ILIKE)
Full-text / Fuzzy	Native (BM25, n-gram)	Not supported	Limited (ILIKE, pg_trgm)	—
p99 Latency @ 500 RPS	< 50ms	< 10ms (exact key)	50-200ms (without optimized index)	—
Cost for 2M users	~$180-360/mo (1-3 nodes)	~$30-80/mo (WCU/RCU + storage)	~$200-500/mo (instance + storage)	—
Native field-level security	Yes (OpenSearch Security plugin)	No (requires app-level logic)	No (requires app-level logic)	—
Operational complexity	High (cluster, snapshots, upgrades)	Low (serverless)	Medium (RDS managed, but schema migrations)	—

Security and IAM: Zero Trust in the Search Layer

AWS Well-Architected Pillars Assessment

Security

Reliability

Performance efficiency

Cost optimization

Architect's Note: What I Would Do Differently

Senior Solutions Architect

Verdict: Build the Search Layer, Don't Work Around Cognito

Strongly Recommended with Caveats

References and Further Reading

#cognito#opensearch#identity#privacy#api-gateway#lambda#event-driven#financial-grade

Liked this? Get the next one.

Architecture, AWS, AI and market deep dives — straight to your inbox. Free.

No spam · unsubscribe anytime

Analyzed source: Scalable user search with Amazon Cognito

Ask Fernando about this

Get a focused answer about this article from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Keep reading

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime