Scalable User Search with Amazon Cognito: A Deep-Dive Analysis
Listen to article
Fernando's voiceFernando · 15:37
Powered by Amazon Polly + OmniVoice
Amazon Cognito excels at authentication, but its user-listing API was never designed for high-frequency search against large user pools. In this article, I analyze how to build a scalable search layer on top of Cognito, the failure modes that emerge when you ignore native API limits, and the real trade-offs between eventual consistency, data privacy, and operational cost.
Every architect who has run an Amazon Cognito User Pool in production with more than 100,000 users knows the exact moment the ListUsers API starts to disappoint: silent throttling, forced pagination, no full-text search, and zero sub-second latency guarantees. The problem is not a bug — it is an intentional design boundary. Cognito was built to be an identity control plane, not a search engine. When product teams demand user autocomplete, custom-attribute filtering, or real-time filtered listings, the right answer is not to force the native API beyond its limits: it is to design a dedicated search layer, asynchronously synchronized, with privacy and consistency as first-class citizens.
Why the Native Cognito API Does Not Scale for Search
Cognito's ListUsers accepts a Filter parameter with proprietary syntax and supports only prefix matching on indexed attributes — email, phone_number, cognito:user_status, and a handful of others. There is no fuzzy search, relevance scoring, custom-attribute search (custom:*), or sorting by arbitrary fields. The default rate limit for ListUsers is 5 RPS per User Pool in most regions, with limited burst. In a financial system with dozens of microservices calling this endpoint — onboarding, KYC, customer support, back-office — that ceiling is hit within seconds.
Beyond throttling, there is a structural latency problem. Each ListUsers call with pagination (PaginationToken) is sequential; you cannot parallelize a scan of a 500,000-user pool. Full-pool iteration can take minutes, making any real-time use case infeasible.
The third failure vector is privacy. Returning user attributes directly from a search API — even an internal one — without field-level filtering exposes PII (name, national ID, date of birth) to consumers that may only need the sub (unique identifier). In environments governed by LGPD or PCI-DSS, this is a concrete compliance risk, not a theoretical one.
Scalable Search Architecture on Top of Cognito
Full flow: Cognito lifecycle events trigger async sync to OpenSearch, while the search API serves reads with IAM-controlled field projection.
- Cognito User Pool · Lifecycle triggers
- Post-Confirmation · & Post-Auth Lambda
- EventBridge · Custom Bus
- SQS DLQ · max 3 retries
- Sync Lambda · Idempotent upsert
- OpenSearch · users-v2 index
- KMS CMK · Encryption at rest
- API Gateway · REST + Authorizer
- Search Lambda · Field projection
- WAF · Rate limit + IP rules
- CloudWatch · SLO dashboards
How the Synchronization Pipeline Actually Works
The central mechanism is an event-driven asynchronous synchronization pipeline. Cognito exposes Lambda Triggers for lifecycle events: PostConfirmation, PostAuthentication, PreTokenGeneration, and for deletion, a trigger via AdminDeleteUser that can be intercepted with a custom Lambda before the SDK call. Each of these triggers publishes an event to EventBridge with a versioned schema — { "source": "identity.cognito", "detail-type": "user.created" } — containing only the user's sub and the non-sensitive attributes required for indexing.
The sync Lambda consumes these events and performs an idempotent upsert in OpenSearch using the sub as the document _id. Idempotency is not optional here: EventBridge guarantees at-least-once delivery, and a duplicate event without idempotency results in unnecessary re-indexing with version inconsistency risk. The solution is to use OpenSearch's _seq_no and _primary_term fields for optimistic concurrency control, or simply accept that an upsert by _id is naturally idempotent for the search use case.
For the initial load — migrating an existing pool of N users — the pattern is a backfill job that uses ListUsers with pagination, but executed exactly once, off the critical path, with explicit rate limiting (max 4 RPS to stay below the 5 RPS limit) and a DynamoDB checkpoint for resumption on failure. This job does not need to be fast; it needs to be correct and resumable.
Eventual Consistency Is Not a Weakness — It Is a Contract
The inconsistency window between Cognito and the OpenSearch index is typically 200ms to 2s under normal conditions, but can reach minutes if the sync Lambda is throttled or EventBridge is under load. For search use cases (autocomplete, back-office listing), this window is acceptable. For authorization use cases (verifying a user exists before issuing a token), Cognito is the source of truth — never the search index. Explicitly documenting this contract in the system's ADR prevents product teams from building incorrect dependencies on the search layer.
Privacy by Design: Field Projection and Data Minimization
The most common mistake I see in search implementations over identity data is indexing everything and filtering at presentation time. This violates LGPD's data minimization principle (Art. 6, III) and creates an index that, if compromised, exposes the full profile of every user. The correct approach is to index only the fields needed for search and return only the fields needed by the consumer.
In OpenSearch, this translates into two distinct design decisions. First, the index mapping must omit sensitive fields such as national ID, date of birth, and full phone number — these fields must not exist in the index. If an attribute is not in the mapping, it cannot be leaked. Second, response projection via _source filtering in the query DSL must be applied by the Search Lambda based on the caller's JWT token scope. A token with scope search:basic receives only { sub, display_name, email_prefix }; a token with scope search:admin receives additional fields such as account_status and created_at.
The sync Lambda must also apply hashing or truncation before indexing. For example, indexing email_domain (the part after @) instead of the full email enables corporate domain searches without exposing individual addresses. For name search, techniques like n-gram tokenization in OpenSearch enable autocomplete without storing the full name in plaintext — though this increases index size by 3-4x and must be evaluated against cluster storage cost.
Real Numbers: Native Cognito vs. Dedicated Search Layer
Failure Modes Nobody Documents
Silent drift between Cognito and OpenSearch is the most insidious failure mode. If the sync Lambda fails silently — for example, an IAM permission error after a role rotation — the index ages without a visible alarm. The mitigation is a periodic reconciliation job (daily or weekly) that compares counts and random samples between Cognito and OpenSearch, publishing a search.index.drift_count metric to CloudWatch with an alarm on any value > 0 for more than 1 hour.
Index explosion from dynamic custom attributes is another vector. Cognito allows up to 50 custom attributes per User Pool. If the sync Lambda indexes all of them without an explicit mapping, OpenSearch will use dynamic mapping and create fields for each variation, leading to a mapping explosion that can bring down the cluster. The solution is to always define an explicit mapping with dynamic: false and an allowlist of indexable fields.
Cascading throttling during backfill is an operational risk. If the initial backfill and the real-time sync pipeline compete for the same Lambda worker pool, the backfill can exhaust reserved concurrency and delay real lifecycle events. The solution is to run the backfill in a separate Lambda function with isolated reserved concurrency and a dedicated SQS queue with a conservative batch size (10-20 messages).
PII leakage via query logging is frequently overlooked. OpenSearch can log full queries to CloudWatch Logs, including search terms that may contain names or email fragments. In regulated environments, slow logs and audit logs must be configured with field masking or disabled for sensitive fields.
Anti-Patterns That Are Expensive in Production
- Calling
ListUsersin a loop on the critical path of a product API — unpredictable latency, guaranteed throttling under load, and no latency SLA. - Using the search index as the source of truth for authorization decisions — the index is eventually consistent; a deleted user may appear as active for seconds or minutes.
- Indexing the full Cognito user object (including
custom:*attributes) without explicit mapping — mapping explosion, increased storage cost, and expanded PII exposure surface. - Synchronizing via periodic Cognito polling instead of lifecycle triggers — inconsistency window proportional to polling interval, unnecessary ListUsers cost, and no per-event granularity.
- Exposing the OpenSearch endpoint directly via API Gateway without a projection Lambda — impossible to apply token-scope-based field-level security without an intermediate logic layer.
- Omitting the periodic reconciliation job — silent drift between Cognito and the index becomes invisible until a customer support incident reveals it.
Search Backend Options: OpenSearch vs. DynamoDB vs. RDS
| Criterion | OpenSearch | DynamoDB + GSI | RDS Aurora (ILIKE) | |
|---|---|---|---|---|
| Full-text / Fuzzy | Native (BM25, n-gram) | Not supported | Limited (ILIKE, pg_trgm) | — |
| p99 Latency @ 500 RPS | < 50ms | < 10ms (exact key) | 50-200ms (without optimized index) | — |
| Cost for 2M users | ~$180-360/mo (1-3 nodes) | ~$30-80/mo (WCU/RCU + storage) | ~$200-500/mo (instance + storage) | — |
| Native field-level security | Yes (OpenSearch Security plugin) | No (requires app-level logic) | No (requires app-level logic) | — |
| Operational complexity | High (cluster, snapshots, upgrades) | Low (serverless) | Medium (RDS managed, but schema migrations) | — |
Security and IAM: Zero Trust in the Search Layer
In a financial environment, the user search layer is a high-value target: any data leak here can result in regulatory fines and reputational damage. The security model must follow the principle of least privilege at every hop.
The Search Lambda must have an IAM role with permission only for es:ESHttpGet and es:ESHttpPost on the specific OpenSearch domain ARN, with an aws:SourceVpc condition to ensure calls only occur within the VPC. OpenSearch access must be configured with Fine-Grained Access Control enabled, mapping the Lambda IAM role to an OpenSearch role with read-only permissions on the users-v2 index.
The Sync Lambda needs es:ESHttpPut and es:ESHttpDelete on the same domain, but must use a separate role — never the same role as the Search Lambda. Separating read and write roles limits the blast radius if one of the functions is compromised.
The OpenSearch domain must be deployed inside a private VPC, with no public endpoint, and Security Groups restricting access only to the relevant Lambdas. The KMS CMK for encryption at rest must have a key policy that allows only the Lambda roles and explicitly named administrators — no kms: for Principal: "".
For auditing, CloudTrail must be enabled with data events for the OpenSearch domain, and the API Gateway Access Log must be sent to an S3 bucket with Object Lock enabled (COMPLIANCE mode, 90-day retention for PCI-DSS) to guarantee immutability of access logs.
AWS Well-Architected Pillars Assessment
Security
Field-level security via OpenSearch Fine-Grained Access Control; separate IAM roles for read and write; KMS CMK with restrictive key policy; VPC-only endpoint; CloudTrail with data events; API Gateway with WAF and per-IP and per-token rate limiting.
Reliability
EventBridge with DLQ for unprocessed sync events; periodic reconciliation job for drift detection; backfill with DynamoDB checkpoint for resumption; OpenSearch with 3 nodes across multiple AZs for HA; CloudWatch alarms on search.index.drift_count and p99 latency.
Performance efficiency
OpenSearch index with explicit mapping and n-gram tokenizer for autocomplete; Search Lambda with Provisioned Concurrency to eliminate cold starts during peak hours; API Gateway response cache for frequent queries (30s TTL for back-office listings).
Cost optimization
OpenSearch Serverless as an alternative for intermittent workloads (cost per OCU-hour vs. reserved instance); Lambda on ARM (Graviton2) reduces compute cost by ~20%; S3 Intelligent-Tiering for OpenSearch snapshots; monthly review of unused indexes.
In production, the mistake that cost me the most was not documenting the eventual consistency contract from day zero — product teams built business logic assuming the search index was synchronous with Cognito, and the result was subtle bugs in onboarding flows. Today, the first thing I do is write the ADR with the acceptable maximum lag SLO (e.g., 99.9% of events indexed in < 60s) and expose a search.sync.lag_seconds metric on the product dashboard, not just the infra dashboard. The second lesson: never use OpenSearch Serverless for a user pool with more than 500k records without first modeling OCU cost — the per-OCU-hour billing can surprise you under high-frequency query workloads. For most financial cases I have seen, a 3-node r6g.large.search cluster with 1-year Reserved Instances is more predictable and 40-60% cheaper than Serverless at steady state.
Verdict: Build the Search Layer, Don't Work Around Cognito
Amazon Cognito is a solid choice for identity management in financial systems — but its listing API is not a search engine and never will be. Trying to work around its limits with aggressive polling, short-lived caching, or parallel queries is a race against throttling that you will lose at scale. The correct architecture is to accept Cognito as the identity control plane and build a dedicated search layer — based on OpenSearch, synchronized via lifecycle events, with privacy by design and eventual consistency documented as an explicit contract. The incremental cost (an OpenSearch cluster at ~$180-360/month) is trivial compared to the cost of a throttling incident in production or a privacy audit that finds unnecessary PII indexed. Invest in the correct synchronization pipeline, document the lag SLO, and treat the search index for what it is: an optimized read projection, not a source of truth.
References and Further Reading
Architecture, AWS, AI and market deep dives — straight to your inbox. Free.
No spam · unsubscribe anytime
Ask Fernando about this
Get a focused answer about this article from my AI assistant, grounded in my work.
Join the conversation
Sign in to comment
Verify your email to join in — you'll also get the newsletter. No password.
Keep reading
Architecture intelligence, in your inbox
Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.
- Curated AWS · AI · architecture · market signals
- New architecture studies & deep-dives when they ship
- Sharp summaries — depth without the noise
- No spam · double opt-in · unsubscribe anytime