Developers who've worked with RAG long enough usually have the same story: they chose the wrong memory architecture early, built on it for months, and then had to tear it out. Not because the technology failed — because the requirements never matched the approach.
The choice between RAG, fine-tuning, and extended context isn't a preference. It's an engineering decision with real costs attached. Get it right and you build something that scales. Get it wrong and you're rebuilding in six months with the added complexity of a system that's already in production.
This module forces you to make that choice under pressure — across three scenarios where the stakes are different, the knowledge shapes are different, and the right answer is different each time. The goal isn't to always pick RAG. It's to argue a specific position with technical and governance reasoning, and hold that position under challenge.
Three teams are building AI systems that need persistent memory. Each has a different constraint. Each is leaning toward a different solution. Each is asking for a second opinion.
Team A — Legal research assistant: A firm wants an AI that can answer questions about their case files — thousands of documents accumulated over twenty years. The documents change: new filings added weekly, old decisions revised when appeals come through. Attorneys need to be able to cite the exact document behind any AI answer. The team is leaning toward fine-tuning on the case file corpus. They believe this will make the AI "speak like a lawyer" and answer questions faster than retrieval would allow.
Team B — Personal developer assistant: A solo developer wants an AI that knows everything they know — their Obsidian notes, their codebase, their decision history, their reading notes. The knowledge set is roughly 4,000 notes spanning five years, updated daily. They're leaning toward extended context: prepend the full vault to every prompt. They have a 200k-token context window and believe they can fit enough.
Team C — Company onboarding bot: A startup wants an AI to answer new-hire questions about company policy, benefits, team structure, and culture. The knowledge is about 150 documents, mostly stable, updated quarterly. They're leaning toward RAG because they've heard it's the modern approach. They haven't considered whether the problem actually requires it.
Each team is about to make an expensive commitment. Each team's instinct might be wrong. And the person they're asking for a second opinion is you.
The right memory architecture is determined by three properties of the knowledge: its volume, its rate of change, and whether answers must be traceable to specific sources. These three properties divide the decision space cleanly.
The knowledge is large, stable, and style-heavy. Fine-tuning teaches a model how to think and communicate in a domain — its vocabulary, its reasoning patterns, its conventions. If the knowledge changes rarely and you care more about fluency than source attribution, fine-tuning is the right investment. The failure mode is staleness: every knowledge update requires retraining, and a fine-tuned model cannot tell you which training example produced a given answer.
The knowledge is small, stable, and always relevant. If the full knowledge set fits comfortably in the context window and every query needs access to all of it, prepending the full context is the simplest correct solution. The failure mode is scale: as knowledge grows or queries specialize, you pay the token cost of irrelevant context on every call.
The knowledge is large, dynamic, or must be auditable. RAG is the right choice when: you can't predict which documents are relevant at query time, the knowledge base changes frequently enough to make retraining impractical, or you need to trace every answer back to a specific source. The failure mode is retrieval quality — a RAG system returns what the retrieval layer finds, and if the retrieval layer finds the wrong things, the AI answers confidently from noise.
MAP requires identifying the failure modes of the AI system in context. For memory architecture decisions, this means naming the specific failure mode of your chosen approach before committing to it. Fine-tuning fails via stale confident answers. Extended context fails via irrelevant injection at scale. RAG fails via empty or wrong retrieval. Name the failure mode for the approach you're recommending, then explain how you'll detect it.
MEASURE requires testing the AI system's behavior against its intended purpose. For memory architectures, this means: How will you measure whether the approach is working? For RAG, retrieval precision and recall are measurable. For fine-tuning, you need evaluation sets that test both fluency and factual accuracy. For extended context, you need to verify the model actually uses the injected context and doesn't ignore it.
If the AI system informs decisions that affect others, Art.13 requires that users understand how the system works. RAG provides a transparency advantage: every answer can reference the retrieved document. Fine-tuning cannot provide this traceability. This is not a blocker for fine-tuning — but it is a governance constraint that must be addressed through other means if fine-tuning is chosen for high-stakes applications.
Hybrid approaches are valid but expensive. Build one approach correctly before combining two imperfectly.
In the lab, you'll take a position on each of the three teams' approaches. You'll be challenged. Three things will make your position defensible.
Volume, dynamism, or auditability — which one is the deciding factor for this scenario? State it explicitly. "I'm recommending RAG because the knowledge is dynamic and citations are required" is a defensible position. "RAG is usually better" is not.
Every architecture fails in a specific way. Naming the failure mode of your chosen approach before being challenged shows you understand the tradeoffs. It also defines what you'd monitor in production.
If a counterargument changes your mind, say which argument changed it and why. "You're right that the 150-document corpus fits in context — I'm revising my recommendation to extended context for now, with a migration plan to RAG when the corpus exceeds 200 documents" is a strong position change. "OK maybe" is not.
You'll apply all three to each scenario in the lab.