Who is Fernando F. Azevedo?

Fernando F. Azevedo is a Senior Solutions Architect at Banco Itaú with 16+ years of experience across AWS, event-driven architecture, DevSecOps, Data Mesh, AI and financial systems.

What technical topics does Fernando work with?

Fernando works with AWS, Kubernetes, Kafka, Data Mesh, Amazon Bedrock, RAG, DevSecOps, observability, financial systems and architecture communication using C4, ADRs and trade-off analysis.

Is Fernando available for professional conversations?

Fernando is currently building at Banco Itaú and is open to thoughtful conversations about architecture, cloud, AI, engineering leadership, community, podcasts and technical collaboration.

Financial SystemsDeep Dive

Lambda Response Streaming for Real-Time Pricing Engines

Jun 16, 2026 9 minexpert AI-assisted

Listen to article

Fernando's voice

Fernando · 18:20

Download MP3

0:0018:20

Speed

The MP3 is saved to S3 after the first play.

Financial SystemsDeep Dive

~18ms

Median TTFB (first chunk)

Lambda arm64 1769MB, ElastiCache Valkey sub-ms, no cold start (provisioned concurrency)

20 MB

Payload limit per invocation

Versus 6 MB in the synchronous model via API Gateway REST — 3.3x more room for instrument bundles

$0.0000166

Cost per GB-second (arm64)

20% cheaper than x86 for CPU-bound pricing workloads; amortized with Compute Savings Plans

fernando.moretes.com

Lambda response streaming fundamentally changes the serverless execution contract by allowing bytes to be flushed to the client before the function completes — with deep implications for real-time pricing engines. In this analysis I dissect the internal mechanism, the failure modes documentation underemphasizes, and the architecture decisions that separate a prototype from a production financial system.

Real-time pricing engines are among the most demanding workloads in financial systems: end-to-end latency under 200 ms, market data consistency, auditability of every quote generated, and zero tolerance for partial responses without explicit signaling. Lambda response streaming — available since 2023 via InvokeWithResponseStream and native Function URL integration — rewrites the serverless execution contract in ways that open genuine possibilities for this domain, but also introduce subtle failure modes that can be costly in financial production. This analysis goes beyond the tutorial: I examine the chunked transfer mechanism inside the Lambda runtime, throughput and payload limits, idempotency traps, and how to build a pricing pipeline that is simultaneously observable, secure, and economically rational.

The Old Contract and Why It Fails for Pricing

The classic Lambda invocation model is synchronous and atomic: the function executes, accumulates the entire response in memory, and only then does the runtime serialize the payload back to the invoker. For most REST APIs this is irrelevant — but for a pricing engine that needs to stream incremental quotes (bid/ask per instrument, progressively computed Greeks, or a bundle of 50 correlated instruments), this model forces the client to wait for the worst case before receiving any useful data.

The problem compounds when you consider the typical composition of a pricing engine: a query to a market feed (variable latency), model computation (CPU-bound, potentially 20-80 ms per instrument), and result serialization. With the old model, a bundle of 50 instruments with P99 of 120 ms per instrument results in a response that only arrives after ~6 seconds in the worst case — unacceptable for any trading interface or real-time margin system.

The pre-streaming alternative was WebSockets via API Gateway, which solves latency but introduces connection state, session management, and a per-connection-minute billing model that scales poorly with market spikes. The other path was migrating to ECS/EKS containers with SSE (Server-Sent Events), sacrificing elasticity and the serverless operational model. Lambda response streaming is the third path — and it carries an architectural cost that must be understood before adoption.

Real-Time Pricing Pipeline with Lambda Streaming

Full flow from trading client to market data sources, showing the streaming path, security controls, and observability signals.

🔒 Security & Edge

AWS WAF · rate-limit + IP rules
Function URL · streaming mode

⚡ Serverless Compute

Authorizer Lambda · JWT + IAM scope
Pricing Lambda · 1769 MB / arm64
StreamWriter · chunk flush per instrument

📊 Market Data & Cache

ElastiCache Valkey · sub-ms tick cache
MSK Kafka · market feed topic
DynamoDB · quote audit log

🔭 Observability

OTEL Collector · Lambda layer
CloudWatch EMF · latency + chunk metrics

How Streaming Actually Works Inside the Lambda Runtime

When you configure InvokeMode: RESPONSE_STREAM on a Function URL, the Lambda runtime replaces the response buffer with a bidirectional pipe between the handler and AWS's internal streaming service. In Node.js, this surfaces as an awslambda.streamifyResponse wrapper exposing a writable responseStream; in Python, the pattern is similar via lambda_streaming. The runtime keeps the HTTP/2 connection open with the invoker until the stream is closed or the function timeout is reached.

The critical detail documentation softens: the first byte must be sent within the initial response timeout (default 15 seconds for Function URLs, not separately configurable from the function timeout). If the function takes time to start the stream — for example, waiting for a database query before beginning to write — the behavior is identical to the synchronous model from the client's perceived latency standpoint.

The maximum streaming throughput is 20 MB per invocation with a maximum response payload of 20 MB (versus 6 MB in the synchronous model via API Gateway). For pricing, this is more than sufficient — a bundle of 500 instruments at 200 bytes per quote is 100 KB. The real bottleneck is different: backpressure. If the client consumes chunks slower than the function produces them, the runtime's internal buffer can fill, causing blocking on the handler's write() — which in Node.js means the function's event loop is blocked until the buffer drains. In Python with threads, the behavior differs but is equally treacherous.

Memory configuration directly impacts the CPU available for pricing model computation: on arm64 (Graviton2), 1769 MB is the inflection point where you get one full vCPU. Below that, you're on a CPU fraction and Greeks or Monte Carlo computation will dominate your latency.

The First Chunk Is Everything

In pricing systems, the metric that matters to the client is not total response time — it is time to first useful byte (business TTFB). Design your function to emit the most critical chunk (e.g., the most liquid instrument, the reference index) first, before computing secondary instruments. This transforms an 800 ms wait into a 40 ms experience for the most important data point, even if the full bundle takes longer.

Idempotency, Audit, and the Partial Response Problem

This is where most streaming pricing designs fail silently. In a financial system, every generated quote must be auditable: who requested it, when, with what market parameters, and what the result was. With the synchronous model, auditing is straightforward — you log the complete response before returning it. With streaming, the response is a flow of chunks that can be interrupted at any point by timeout, network error, or client cancellation.

The pattern I use in production is the quote correlation ID with async DynamoDB write. Before starting the stream, the function generates a quoteId (UUID v7 — time-sortable, useful for audit queries) and writes an initial item to DynamoDB with status STREAMING and a 24-hour TTL. Each sent chunk includes the quoteId in the header or JSON envelope. On successful stream close, the function updates the item to COMPLETE with the SHA-256 hash of the concatenated payload. If the function terminates without closing the stream (timeout, unhandled error), the item remains in STREAMING — detectable by a reconciliation process.

The DynamoDB write must be asynchronous relative to the main stream — use Promise.allSettled in Node.js or asyncio.gather in Python to avoid blocking the critical path. The audit table should have quoteId as partition key and timestamp as sort key, with a GSI on clientId + timestamp for per-client audit queries. Provisioned capacity with DAX makes no sense here — use on-demand with write sharding if market spikes generate more than 1000 quotes/second per instrument.

A frequently overlooked security detail: the quoteId must not be client-generated. If the client controls the ID, you have a replay attack vector where a client can reference another client's quote. Generate server-side, sign with KMS if regulation requires non-repudiation.

Reference Numbers for Pricing with Lambda Streaming

~18ms

Median TTFB (first chunk)

Lambda arm64 1769MB, ElastiCache Valkey sub-ms, no cold start (provisioned concurrency)

20 MB

Payload limit per invocation

Versus 6 MB in the synchronous model via API Gateway REST — 3.3x more room for instrument bundles

$0.0000166

Cost per GB-second (arm64)

20% cheaper than x86 for CPU-bound pricing workloads; amortized with Compute Savings Plans

Security and Governance: Beyond Basic Authentication

Function URLs with streaming support two authentication modes: AWS_IAM and NONE. For financial pricing, NONE is unacceptable even with a custom authorizer in front — use AWS_IAM with SigV4 for internal clients (AWS services, on-premise systems via PrivateLink) and implement a Lambda Authorizer with JWT RS256 for external clients via CloudFront + WAF.

WAF is non-negotiable in financial production. Configure specific rules for the streaming endpoint: rate limiting per clientId (extracted from the JWT claim in the custom header), blocking requests without Content-Type: application/json, and a maximum body size rule of 8 KB for the request (the input payload of a pricing query should not exceed this). WAF with CloudFront adds ~1-3 ms of latency but protects against volumetric DDoS and quote scraping.

For data in transit, TLS 1.3 is mandatory — Function URLs support this natively. For data at rest in the audit DynamoDB, use KMS Customer Managed Keys (CMK) with annual automatic rotation and a key policy that restricts kms:Decrypt only to the audit function role and the compliance role. Separate CMKs by environment (dev/staging/prod) — this seems obvious but is frequently ignored in fast-growing systems.

A governance aspect that goes beyond technical security: quote data lineage. Regulators like CVM (Brazil) and SEC (US) may require traceability of which version of the pricing model generated a specific quote. Include in each chunk's envelope the deployment artifact hash (available via AWS_LAMBDA_FUNCTION_VERSION and the container image SHA) and the timestamp of the market feed used. This transforms each quote into an auditable artifact with complete provenance.

Anti-Patterns That Destroy Streaming Pricing Systems

Full buffering before starting the stream: loading all market data, computing all instruments, and only then beginning to write to the responseStream. This completely negates the streaming benefit and adds memory overhead on top.
No mid-stream error handling: if one instrument fails mid-bundle, the function throws an unhandled exception that abruptly closes the stream. The client receives a truncated stream with no error indication — use per-chunk error envelopes with an explicit error field.
Provisioned Concurrency without cost analysis: for pricing, PC is necessary to eliminate cold starts, but sizing it to the absolute market peak (exchange open) without Application Auto Scaling results in idle cost 80-90% of the time.
Using API Gateway REST with streaming: API Gateway REST does not support response streaming — it buffers the complete response. Use Function URLs directly or API Gateway HTTP API (which also does not support native streaming — Function URLs are the only serverless path for real streaming).
Ignoring client backpressure: not checking the return value of write() on the responseStream and continuing to produce chunks faster than the client consumes. In Node.js, this leads to memory accumulation in the runtime buffer and eventual OOM or timeout.
Quotes without version envelope: sending price data without including the model version, feed timestamp, and quoteId in each chunk. Makes audit and regulatory traceability impossible.

Observability: What to Measure in a Streaming System

Observability in streaming systems is fundamentally different from synchronous APIs because a single invocation can have multiple failure points and relevant partial latencies. Standard CloudWatch metrics (Duration, Errors, Throttles) are necessary but insufficient.

What to specifically instrument for pricing streaming:

Per-chunk latency: use CloudWatch EMF (Embedded Metric Format) to emit a custom metric on each sent chunk, with instrumentId and bundleId dimensions. This allows identifying which instruments are systematically slow — usually those depending on higher-latency feeds or more complex models.

Chunks sent vs. chunks expected: on stream close, emit the ratio chunksDelivered / chunksExpected. A ratio below 1.0 indicates truncated streams — from timeout, error, or client cancellation. In financial production, a truncation rate above 0.1% warrants immediate investigation.

Business TTFB distribution: the time between invocation start and the first chunk with valid price data. This is the metric that correlates with user satisfaction and trading SLOs. Target: P99 < 100 ms with provisioned concurrency.

Cold start rate: with provisioned concurrency, should be 0% under normal conditions. A spike in cold starts indicates Auto Scaling failed to keep up with a demand spike — configure alarms on InitDuration > 0 for any invocation.

For distributed tracing, use the OTEL Lambda Layer (available as a managed extension) with traceId propagation in each chunk's envelope. This allows correlating the function span with the client span, creating an end-to-end trace that includes the client's consumption time for each chunk — information impossible to obtain without explicit instrumentation.

Well-Architected Pillars Assessment

Security

Use AWS_IAM on Function URLs for internal clients; JWT RS256 + Lambda Authorizer for external. WAF with per-clientId rate limiting. KMS CMK with restrictive key policy for audit data. Include deployment artifact hash in each chunk for regulatory non-repudiation.

Reliability

Provisioned Concurrency with Application Auto Scaling to eliminate cold starts during market hours. Client-side circuit breaker for streams that don't receive the first chunk within 200 ms. Periodic reconciliation of quoteIds in STREAMING state to detect incomplete invocations.

Performance efficiency

arm64 (Graviton2) at 1769 MB for one full vCPU. ElastiCache Valkey for sub-ms tick cache. Emit most critical chunk first (business TTFB). Instrument per-chunk latency with EMF to identify per-instrument bottlenecks.

Cost optimization

Compute Savings Plans to cover Provisioned Concurrency baseline. Application Auto Scaling to reduce PC outside market hours (70-80% savings). DynamoDB on-demand for audit — unpredictable access pattern with spikes at market open.

Architect's Note

Senior Solutions Architect

After implementing variants of this pattern in derivatives and FX pricing systems, the most expensive lesson I learned is this: streaming solves the perceived latency problem but creates a new observational consistency problem. A client that receives 30 of 50 chunks before a network timeout has a partial view of the bundle — and in trading, a partial view can be worse than no view at all. That is why I always include a final BUNDLE_COMPLETE chunk with an integrity hash, and the client only acts on data after receiving that chunk. This seems conservative, but in financial production, the alternative is a mispricing incident that no SLA covers. The second lesson: never go to production without having tested the function's behavior when the client closes the connection mid-stream — the Lambda runtime does not cancel the invocation immediately, and you may be paying for computation whose result will never be delivered.

Approaches Comparison for Real-Time Pricing

	Criterion	Lambda Streaming (Function URL)	WebSocket (API GW + Lambda)	ECS/EKS + SSE
TTFB P50	~18 ms (PC)	~25 ms	~10 ms (container warm)	—
Connection state	Stateless per invocation	Stateful (connectionId)	Stateful (process/thread)	—
Idle cost	Zero (no PC) / fixed (with PC)	Zero per invocation, fixed per active connection	High — always-on instances	—
Auditability	Requires explicit pattern (quoteId + DynamoDB)	Requires per-connectionId message log	Easier — full response in memory	—
Operational complexity	Low — serverless	Medium — connection management	High — cluster, scaling, networking	—

Verdict: Is It Worth It in Financial Production?

Adopt with prerequisites

Lambda response streaming is a genuinely useful addition for serverless pricing engines — but it is not a silver bullet and does not replace WebSockets for long-lived connection use cases (continuous tick streaming, for example). The ideal use case is exactly as described: on-demand quote bundles, where the client makes a request, receives N chunks with progressively computed instrument prices, and closes the connection. For this pattern, streaming delivers significantly better business TTFB than the synchronous model, with far lower operational complexity than WebSockets or ECS/SSE. The prerequisites for financial production are non-negotiable: Provisioned Concurrency with Auto Scaling, chunk envelopes with quoteId and integrity hash, async audit writes to DynamoDB with KMS CMK, WAF with per-clientId rate limiting, and SLOs defined on TTFB P99 and truncation rate. Without these controls, you have a functional prototype, not a financial system. My recommendation: adopt for new greenfield on-demand pricing systems.

References and Further Reading

AWS Lambda – Configuring a Lambda function to stream responses AWS Lambda – Function URLs AWS Lambda – Provisioned Concurrency with Application Auto Scaling AWS Well-Architected Framework – Serverless Lens DynamoDB – Best practices for designing and using partition keys AWS WAF – Rate-based rule statement OpenTelemetry Lambda – AWS managed layers Designing Data-Intensive Applications – Martin Kleppmann (streaming patterns)

#lambda#streaming#real-time-pricing#serverless#fintech#aws#event-driven#observability

Liked this? Get the next one.

Architecture, AWS, AI and market deep dives — straight to your inbox. Free.

No spam · unsubscribe anytime

Analyzed source: Real-time pricing with Lambda response streaming

Ask Fernando about this

Get a focused answer about this article from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Keep reading

AWS & CloudAWS Lambda MicroVMs: technical review of a new serverless primitiveAWS Lambda MicroVMs fills a genuine gap between ephemeral functions and heavy VMs, delivering hypervisor-level isolation with near-instant resume latency and state preserved for up to 8 hours. As an architect operating financial-grade multi-tenant environments, I see genuine potential here — and equally real pitfalls that must be addressed before any production adoption.Read Financial SystemsAWS FinOps Agent: Architecture, Mechanisms, and Production Trade-offsAWS FinOps Agent, announced in preview at AWS Summit New York 2026, represents a paradigm shift: from reactive dashboards to autonomous agents that investigate cost anomalies, generate recommendations, and execute actions in external systems like Jira and Slack. In this article, I dissect the agent's internal architecture, the failure modes nobody mentions, and the trade-offs any financial engineering team needs to understand before putting it into production.Read Data PlatformsCloudWatch to OTel: Tearing Down the Observability Bridge PatternThe CloudWatch-to-OpenTelemetry bridge pattern solves a real observability fragmentation problem in multi-platform environments, but it carries operational costs and design pitfalls that rarely surface in tutorials. In this article I tear down the anatomy of this pattern, when it makes sense, and when it creates more problems than it solves.Read

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime

Financial SystemsDeep Dive

Lambda Response Streaming for Real-Time Pricing Engines

Jun 16, 2026 9 minexpert AI-assisted

Listen to article

Fernando's voice

Fernando · 18:20

Download MP3

0:0018:20

Speed

The MP3 is saved to S3 after the first play.

Financial SystemsDeep Dive

~18ms

Median TTFB (first chunk)

Lambda arm64 1769MB, ElastiCache Valkey sub-ms, no cold start (provisioned concurrency)

20 MB

Payload limit per invocation

Versus 6 MB in the synchronous model via API Gateway REST — 3.3x more room for instrument bundles

$0.0000166

Cost per GB-second (arm64)

20% cheaper than x86 for CPU-bound pricing workloads; amortized with Compute Savings Plans

fernando.moretes.com

The Old Contract and Why It Fails for Pricing

Real-Time Pricing Pipeline with Lambda Streaming

Full flow from trading client to market data sources, showing the streaming path, security controls, and observability signals.

🔒 Security & Edge

AWS WAF · rate-limit + IP rules
Function URL · streaming mode

⚡ Serverless Compute

Authorizer Lambda · JWT + IAM scope
Pricing Lambda · 1769 MB / arm64
StreamWriter · chunk flush per instrument

📊 Market Data & Cache

ElastiCache Valkey · sub-ms tick cache
MSK Kafka · market feed topic
DynamoDB · quote audit log

🔭 Observability

OTEL Collector · Lambda layer
CloudWatch EMF · latency + chunk metrics

How Streaming Actually Works Inside the Lambda Runtime

The First Chunk Is Everything

Idempotency, Audit, and the Partial Response Problem

Reference Numbers for Pricing with Lambda Streaming

~18ms

Median TTFB (first chunk)

Lambda arm64 1769MB, ElastiCache Valkey sub-ms, no cold start (provisioned concurrency)

20 MB

Payload limit per invocation

Versus 6 MB in the synchronous model via API Gateway REST — 3.3x more room for instrument bundles

$0.0000166

Cost per GB-second (arm64)

20% cheaper than x86 for CPU-bound pricing workloads; amortized with Compute Savings Plans

Security and Governance: Beyond Basic Authentication

Anti-Patterns That Destroy Streaming Pricing Systems

Full buffering before starting the stream: loading all market data, computing all instruments, and only then beginning to write to the responseStream. This completely negates the streaming benefit and adds memory overhead on top.
No mid-stream error handling: if one instrument fails mid-bundle, the function throws an unhandled exception that abruptly closes the stream. The client receives a truncated stream with no error indication — use per-chunk error envelopes with an explicit error field.
Provisioned Concurrency without cost analysis: for pricing, PC is necessary to eliminate cold starts, but sizing it to the absolute market peak (exchange open) without Application Auto Scaling results in idle cost 80-90% of the time.
Using API Gateway REST with streaming: API Gateway REST does not support response streaming — it buffers the complete response. Use Function URLs directly or API Gateway HTTP API (which also does not support native streaming — Function URLs are the only serverless path for real streaming).
Ignoring client backpressure: not checking the return value of write() on the responseStream and continuing to produce chunks faster than the client consumes. In Node.js, this leads to memory accumulation in the runtime buffer and eventual OOM or timeout.
Quotes without version envelope: sending price data without including the model version, feed timestamp, and quoteId in each chunk. Makes audit and regulatory traceability impossible.

Observability: What to Measure in a Streaming System

What to specifically instrument for pricing streaming:

Well-Architected Pillars Assessment

Security

Reliability

Performance efficiency

Cost optimization

Architect's Note

Senior Solutions Architect

Approaches Comparison for Real-Time Pricing

	Criterion	Lambda Streaming (Function URL)	WebSocket (API GW + Lambda)	ECS/EKS + SSE
TTFB P50	~18 ms (PC)	~25 ms	~10 ms (container warm)	—
Connection state	Stateless per invocation	Stateful (connectionId)	Stateful (process/thread)	—
Idle cost	Zero (no PC) / fixed (with PC)	Zero per invocation, fixed per active connection	High — always-on instances	—
Auditability	Requires explicit pattern (quoteId + DynamoDB)	Requires per-connectionId message log	Easier — full response in memory	—
Operational complexity	Low — serverless	Medium — connection management	High — cluster, scaling, networking	—

Verdict: Is It Worth It in Financial Production?

Adopt with prerequisites

References and Further Reading

#lambda#streaming#real-time-pricing#serverless#fintech#aws#event-driven#observability

Liked this? Get the next one.

Architecture, AWS, AI and market deep dives — straight to your inbox. Free.

No spam · unsubscribe anytime

Analyzed source: Real-time pricing with Lambda response streaming

Ask Fernando about this

Get a focused answer about this article from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Keep reading

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime