When (not) to use AI: the architecture decision
Maturity is knowing when AI is the right tool — and when an if/else solves it better.
6 min read
The most valuable skill of an AI architect is not knowing how to build agents — it's knowing when not to. AI solves real problems, but it also creates unnecessary complexity, cost, and risk when applied in the wrong place. This lesson is about judgment: the criterion that separates a well-designed system from an expensive project that fails silently.
Where AI genuinely shines
AI — especially LLMs — excels at problems where the correct answer doesn't fit into an explicit rule. Think of classifying the intent of a support message with ambiguous language, extracting structured fields from a scanned PDF contract, summarizing 40 pages of incident logs into three actionable paragraphs, or answering questions about a knowledge base that changes every week.
The common denominator: ambiguity, natural language, variation in form with stable meaning. In these cases, writing manual rules is fragile — you spend weeks covering edge cases and still miss new ones. A model trained on human language generalizes naturally.
Other strong cases: draft generation (where a human reviews), semantic search (lesson 04), fuzzy classification with many categories, and synthesis of distributed information. The pattern is always the same: variable input, output tolerant of small errors, and a human or downstream system able to absorb imperfections.
If your problem has these characteristics, AI is not hype — it's the right tool.
Where AI is the wrong choice
There are three situations where I actively recommend not using AI, and I learned each one the hard way.
A deterministic rule exists and is stable. If the logic is if status == 'APPROVED' and amount < 1000: auto_release(), an LLM only adds latency, cost, and non-determinism. if/else is faster, testable, auditable, and doesn't hallucinate.
Critical accuracy with no verification mechanism. Tax calculation, medication dosage, financial transfer — any domain where errors are costly and there's no human or system checking the output. LLMs make arithmetic mistakes, confuse dates, and fabricate references. If you don't have robust evals (lesson 09) and a guardrail (lesson 10) covering the critical path, don't put AI in that flow.
Cost or latency don't work out. An endpoint that needs to respond in 80ms at $0.0001 per call is not a candidate for a 70B-parameter LLM. Do the math before prototyping. Sometimes a smaller model, a traditional classifier, or simply a search index solves it at 1/100th the cost.
The most common mistake I see: teams using an LLM to parse JSON that already comes structured from the API, or to validate a field that has a three-line regex. That's complexity with no benefit.
Decision flow: use AI or not?
Walk through this flow for any requirement before choosing an approach. Each node is a real architecture question.
- Requisito · chega
- Existe regra · determinística?
- Use if/else · ou regex
- Entrada é ambígua · ou linguagem natural?
- Sistema tolera · erro ocasional?
- Custo/latência · fecham?
- Não use IA · (revisar requisito)
- Padrão híbrido · IA + regra + humano
- IA como · componente principal
- Adicionar evals · e guardrails
Use AI vs. don't use AI: comparative analysis
AI as primary component
- Handles natural language and ambiguity without manual rules
- Generalizes to cases not anticipated at design time
- Scales capability without scaling rule engineering
- Non-deterministic: same input may produce different outputs
- Per-token cost accumulates at high volume
- Requires evals, guardrails, and continuous monitoring
Right for: fuzzy classification, text extraction, synthesis, semantic search, generation with review
Deterministic logic (if/else, regex, rules)
- 100% predictable and auditable behavior
- Near-zero marginal cost at scale
- Testable with simple unit tests
- Fragile to unanticipated form variations (free language)
- Maintenance grows with rule complexity
- Doesn't scale to open-ended domains
Right for: field validation, status-based routing, financial calculations, fixed-format parsing
Hybrid pattern (AI + rule + human)
- AI handles most cases; rules cover critical ones
- Human enters only on low-confidence cases
- Reduces risk without sacrificing automation
- More components = more failure surface to manage
- Requires defining confidence thresholds and human-review SLAs
Right for: content moderation, medical triage, credit approval, any high-impact flow
The decision framework in four questions
Before any line of code, answer these four questions in order. They work as a filter — each negative answer shortens the path.
1. Does the problem require non-determinism? If the correct answer is always the same given the same input, you don't need AI. A parser, a SQL query, a pure function solve it better.
2. Does the system tolerate occasional errors? LLMs make mistakes. If an error causes financial, legal, or health damage without a containment mechanism, the risk isn't justified — unless you add mandatory output verification (evals + human in the loop).
3. Is there a source of truth to ground the answer? If yes, RAG (lesson 06) or grounding (lesson 19) reduce hallucination. If there's no source and accuracy is critical, rethink the approach.
4. Does the cost per call and latency fit the SLA and budget? Calculate: monthly volume × average tokens × price per token. Compare with the value generated. If the math doesn't work even with the cheapest model on Bedrock (lesson 16), the problem may not be an AI one — or it needs a different architecture (cache, smaller model, pre-computation).
These four questions don't eliminate creativity — they direct energy to where AI genuinely delivers value.
In practice, most production systems that work well use the hybrid pattern: AI processes the general case (80-90% of volume), deterministic rules block or redirect critical cases, and humans review low-confidence ones. This pattern isn't weakness — it's mature engineering. I never put AI in a critical path without at least one deterministic verification layer after it. The model can be wrong; the system cannot.
How to evaluate a new requirement
- 1
Describe the problem in one sentence without mentioning AI
If the description already implies a clear rule, it's probably not an AI case. E.g., 'reject order if CPF is invalid' → regex.
- 2
List failure cases and their impact
How much does a false positive cost? A false negative? If both are expensive, you need verification — and AI may not be the right component for the final decision.
- 3
Estimate cost before prototyping
Volume × tokens × price. Add network latency and cold start if serverless. Compare with the cost of the equivalent deterministic solution.
- 4
Define the success criterion before the first prompt
Without an evaluation metric (lesson 09), you won't know when to stop iterating. Define: minimum acceptable accuracy, maximum latency, cost per transaction.
- 5
Design the fallback before the happy path
What happens when the model returns low confidence, timeout, or an invalid response? Define this before building the main flow.
Frequent architecture questions
What if I don't know the volume before production?
Use the cheapest model that meets minimum quality, add cache for repeated inputs (lesson 19), and instrument everything from day one. Surprise AI costs almost always come from lack of observability, not unexpected volume.
Can I use AI for critical business logic if a human is reviewing?
Yes — that's the hybrid pattern. The condition is that human review is real, with a defined SLA, and not a checkbox nobody reads. If the human approves everything without reading, you don't have review — you have security theater.
Traditional classifier (classic ML) vs. LLM: when to use each?
Trained classifier: when you have enough labeled data, latency < 50ms is a requirement, and categories are stable. LLM: when categories change, labeled data is scarce, or classification requires reasoning over broad context. LLM is more flexible; classifier is more predictable and cheaper.
Opening the final module: from prototype to production
This lesson closes the foundations cycle and opens the final module of the track. You now have the complete map: you understand how models learn (lessons 01-03), how to represent and retrieve knowledge (04-06), how to connect AI to the world (07-08), how to evaluate and protect systems (09-10), how to build agents (11-15), and how to use AWS infrastructure for all of it (16-19).
What remains is the transition from prototype to a system that works in production with real users — and the guided project that consolidates all of this into practice.
In lesson 21, we'll cover what changes when you leave the notebook: observability, prompt versioning, CI/CD for AI systems, cost management at scale, and the deployment patterns that work on Bedrock AgentCore. Not theory — these are the decisions you'll make in the first weeks of every real project.
Lesson 22 is the guided project: you'll design a RAG + agent system from scratch, making each architecture decision with the criteria you learned here. It includes a final exam that tests judgment, not memorization.
Maturity in AI is not knowing how to use every feature — it's knowing how to choose the right ones for the right problem, with explicit trade-offs. You've arrived.
Quick check
1. Which case is the LEAST suited to generative AI?
Architect's verdict
AI is a powerful tool with a specific risk profile: non-deterministic, expensive at scale, and silently wrong when it fails. Use it where ambiguity and natural language make manual rules impractical. Avoid it where accuracy is critical without verification, where the rule already exists, or where the cost doesn't work out. The hybrid pattern — AI for the general case, rules for the critical, human for the uncertain — is the most mature architecture I know for production systems. Judgment is what differentiates an architect from someone who just knows how to call the API.