← Courses
Leveraging RAG for AI Development
← Module 2
Module 3 of 8
Module 4 →
Intro
Scenario
Lesson
Context
Lab Debate ~25 min
Intro

The Wrong Tool Costs More Than No Tool

2 min read

Developers who've worked with RAG long enough usually have the same story: they chose the wrong memory architecture early, built on it for months, and then had to tear it out. Not because the technology failed — because the requirements never matched the approach.

The choice between RAG, fine-tuning, and extended context isn't a preference. It's an engineering decision with real costs attached. Get it right and you build something that scales. Get it wrong and you're rebuilding in six months with the added complexity of a system that's already in production.

This module forces you to make that choice under pressure — across three scenarios where the stakes are different, the knowledge shapes are different, and the right answer is different each time. The goal isn't to always pick RAG. It's to argue a specific position with technical and governance reasoning, and hold that position under challenge.

Your artifact — Debate
A written defense of a memory architecture position across three developer scenarios — position stated, technically justified, and held or revised under challenge with explicit reasoning for any position change.
  • Apply the size/dynamism/auditability framework to real architecture decisions
  • Argue for a specific memory approach with technical justification
  • Defend a position under challenge without retreating to "it depends"
  • Identify the NIST MAP risk profile for each memory architecture option
  • Recognize when a hybrid approach is justified versus when it's complexity avoidance
Scenario

Three Teams, Three Problems

3 min read

Three teams are building AI systems that need persistent memory. Each has a different constraint. Each is leaning toward a different solution. Each is asking for a second opinion.

Team A — Legal research assistant: A firm wants an AI that can answer questions about their case files — thousands of documents accumulated over twenty years. The documents change: new filings added weekly, old decisions revised when appeals come through. Attorneys need to be able to cite the exact document behind any AI answer. The team is leaning toward fine-tuning on the case file corpus. They believe this will make the AI "speak like a lawyer" and answer questions faster than retrieval would allow.

Team B — Personal developer assistant: A solo developer wants an AI that knows everything they know — their Obsidian notes, their codebase, their decision history, their reading notes. The knowledge set is roughly 4,000 notes spanning five years, updated daily. They're leaning toward extended context: prepend the full vault to every prompt. They have a 200k-token context window and believe they can fit enough.

Team C — Company onboarding bot: A startup wants an AI to answer new-hire questions about company policy, benefits, team structure, and culture. The knowledge is about 150 documents, mostly stable, updated quarterly. They're leaning toward RAG because they've heard it's the modern approach. They haven't considered whether the problem actually requires it.

Each team is about to make an expensive commitment. Each team's instinct might be wrong. And the person they're asking for a second opinion is you.

Lesson

When Each Approach Wins

3 min read

The right memory architecture is determined by three properties of the knowledge: its volume, its rate of change, and whether answers must be traceable to specific sources. These three properties divide the decision space cleanly.

The knowledge is large, stable, and style-heavy. Fine-tuning teaches a model how to think and communicate in a domain — its vocabulary, its reasoning patterns, its conventions. If the knowledge changes rarely and you care more about fluency than source attribution, fine-tuning is the right investment. The failure mode is staleness: every knowledge update requires retraining, and a fine-tuned model cannot tell you which training example produced a given answer.

The knowledge is small, stable, and always relevant. If the full knowledge set fits comfortably in the context window and every query needs access to all of it, prepending the full context is the simplest correct solution. The failure mode is scale: as knowledge grows or queries specialize, you pay the token cost of irrelevant context on every call.

The knowledge is large, dynamic, or must be auditable. RAG is the right choice when: you can't predict which documents are relevant at query time, the knowledge base changes frequently enough to make retraining impractical, or you need to trace every answer back to a specific source. The failure mode is retrieval quality — a RAG system returns what the retrieval layer finds, and if the retrieval layer finds the wrong things, the AI answers confidently from noise.

NIST AI RMF — MAP Function

MAP requires identifying the failure modes of the AI system in context. For memory architecture decisions, this means naming the specific failure mode of your chosen approach before committing to it. Fine-tuning fails via stale confident answers. Extended context fails via irrelevant injection at scale. RAG fails via empty or wrong retrieval. Name the failure mode for the approach you're recommending, then explain how you'll detect it.

NIST AI RMF — MEASURE Function

MEASURE requires testing the AI system's behavior against its intended purpose. For memory architectures, this means: How will you measure whether the approach is working? For RAG, retrieval precision and recall are measurable. For fine-tuning, you need evaluation sets that test both fluency and factual accuracy. For extended context, you need to verify the model actually uses the injected context and doesn't ignore it.

EU AI Act — Article 13: Transparency

If the AI system informs decisions that affect others, Art.13 requires that users understand how the system works. RAG provides a transparency advantage: every answer can reference the retrieved document. Fine-tuning cannot provide this traceability. This is not a blocker for fine-tuning — but it is a governance constraint that must be addressed through other means if fine-tuning is chosen for high-stakes applications.

Hybrid approaches are valid but expensive. Build one approach correctly before combining two imperfectly.

Context

How to Hold a Position

2 min read

In the lab, you'll take a position on each of the three teams' approaches. You'll be challenged. Three things will make your position defensible.

Name the property that drives the decision

Volume, dynamism, or auditability — which one is the deciding factor for this scenario? State it explicitly. "I'm recommending RAG because the knowledge is dynamic and citations are required" is a defensible position. "RAG is usually better" is not.

Name the failure mode of your recommendation

Every architecture fails in a specific way. Naming the failure mode of your chosen approach before being challenged shows you understand the tradeoffs. It also defines what you'd monitor in production.

Changing your position is acceptable — if you name the reason

If a counterargument changes your mind, say which argument changed it and why. "You're right that the 150-document corpus fits in context — I'm revising my recommendation to extended context for now, with a migration plan to RAG when the corpus exceeds 200 documents" is a strong position change. "OK maybe" is not.

You'll apply all three to each scenario in the lab.

⚔ Debate Lab
Architecture Decision Debate
~25 minutes · 3 scenarios
What you're doing
Evaluate each team's instinct. Take a position: confirm their approach, redirect them to a different one, or recommend a hybrid with specific conditions. Defend your reasoning under challenge.
Roles
🤔
You — Architect on CallYou're giving the second opinion. Take a clear position and defend it with technical and governance reasoning.
🎯
AI — Skeptical CTOI'll push back on every recommendation. I want the failure mode named and the governance constraint addressed.
Three scenarios
Legal research (fine-tune instinct) · Personal dev assistant (context window instinct) · Onboarding bot (RAG instinct)
Framework — apply to each
Volume, dynamism, or auditability — which drives the decision?
Name the failure mode of your recommendation
NIST MAP — what breaks and how will you detect it?
Art.13 — if answers must be traceable, fine-tuning alone won't do
Success criteria
A clear recommendation for each team. Position changes are fine — but every change needs an explicit reason.
Shift + Enter for a new line
✓ Module Complete
You've completed Module 3 of 8.
Next Module →