← Courses
Leveraging RAG for AI Development
← Prev
Module 1 of 8
Module 2 →
Intro
Scenario
Lesson
Context
Lab Skill ~20 min
Intro

Memory Is the Problem

2 min read

Every AI system you build starts fresh. Send it a message, get a response, send another — and unless you explicitly feed it the prior conversation, it remembers nothing. That's fine for one-off tasks. It's a serious problem for anything that needs to know who you are, what you've decided, what you've built, or where you left off.

Developers have tried to solve this three ways: stuff more context into the prompt window, fine-tune the model on their data, or build a retrieval layer that fetches the right information at the right time. The third approach — Retrieval-Augmented Generation — is the one that actually scales. This course is about building it properly, using Obsidian as your knowledge store.

This module establishes the baseline: what RAG is, what it isn't, and how to evaluate whether a given memory problem is a RAG problem or something else entirely.

Your artifact — Skill
A RAG strategy audit for an AI tool you're familiar with — evaluating its memory approach, identifying where retrieval would improve it, and recommending a specific implementation path with justification.
  • Distinguish RAG from fine-tuning and extended context window approaches
  • Identify which memory problems RAG solves and which it doesn't
  • Evaluate a real AI tool's memory approach against the three options
  • Apply NIST MAP thinking to understand RAG as a system with failure modes
  • Write a specific, justified retrieval strategy recommendation
Scenario

The Assistant That Forgets

3 min read

A software team builds an internal AI assistant to help engineers navigate their codebase. The assistant can answer questions about specific files, explain architecture decisions, and suggest implementation patterns. Within a week, engineers love it.

Within a month, they have a problem. The assistant doesn't know anything that happened before the current conversation. Ask it about a decision made last quarter and it has nothing. Ask it to explain why the team chose one database over another — a decision documented in a 47-page architecture review — and it makes something up. Ask it who owns a particular service, and it guesses.

The team's first instinct: increase the context window. They start prepending the last 10,000 tokens of relevant documentation to every request. Costs triple. Latency jumps. And the assistant still doesn't know what it doesn't know — it's just reading more of the wrong documents faster.

Their second instinct: fine-tune the model on their codebase. They spend three weeks preparing data, run the fine-tune, and deploy. The model now speaks their codebase's idioms fluently. But it still doesn't know about the architecture review — that document wasn't in the fine-tune set. And when the codebase changes, the fine-tune is instantly stale. Every update requires retraining.

Their third option, which they haven't tried: build a retrieval layer. Index the documents that matter. When a question comes in, retrieve the relevant chunks, inject them into the prompt, and let the model answer from actual sources. The architecture review is in the index. The ownership docs are in the index. When things change, update the index — not the model.

The choice between these three approaches isn't obvious from the outside. Each has costs. Each has failure modes. Understanding them is the difference between building something that scales and building something you'll replace in six months.

Lesson

Three Ways to Give AI Memory

4 min read

RAG is not the only way to give an AI system persistent knowledge. It's the right choice in specific situations. Understanding those situations requires understanding what the other options actually are.

You inject the information directly into every prompt. If the AI needs to know about a 50-page document, you include the full document in the system prompt or user message. Modern models can handle 100k–200k tokens, so this is technically feasible for a lot of use cases.

When it works: Small, stable knowledge sets. One reference document. A fixed set of rules. When the relevant information is always relevant.

When it breaks: Large or dynamic knowledge bases. When you don't know in advance which documents are relevant. When costs from token volume become prohibitive. When latency from processing massive contexts becomes user-noticeable.

You bake the knowledge into the model's weights by training on your data. The model learns your terminology, your patterns, your domain. It answers faster because it doesn't need retrieved documents — it already knows.

When it works: Stable knowledge that changes rarely. Domain-specific tone, style, or vocabulary. Teaching a model how to reason in your domain, not just what to know.

When it breaks: Dynamic knowledge — anything that changes more often than you can retrain. Specific factual claims that must be verifiably sourced. Cases where you need to audit exactly why the model said what it said. Fine-tuning is opaque: you can't trace a specific answer back to a specific source.

You maintain a separate knowledge store — a set of documents, notes, records — and build a retrieval layer that queries it at inference time. When a question comes in, the retrieval layer finds the relevant pieces, injects them into the prompt, and the model answers from those sources.

When it works: Large knowledge bases. Knowledge that changes frequently. Cases where you need source attribution — you can trace every answer back to a specific retrieved document. Obsidian vaults, company wikis, conversation histories, codebases.

When it breaks: Poorly structured knowledge stores. Bad chunking strategies that retrieve the wrong pieces. Retrieval systems that fail silently — the model answers confidently from nothing because the retrieval returned empty.

NIST AI RMF — MAP Function

Before choosing a memory architecture, MAP the system: What knowledge does the AI need access to? Who creates and maintains that knowledge? What happens when the knowledge is wrong, stale, or missing? For RAG systems, the MAP function requires naming the retrieval failure modes specifically: failed retrieval, wrong chunk retrieved, stale index, no source for the claim.

NIST AI RMF — GOVERN Function

A RAG system is an AI system with two distinct components — the retrieval layer and the generation layer — each requiring governance. Who owns the knowledge store? Who decides what goes in and what comes out? Who audits retrieval quality? Governance of a RAG system means governance of both halves.

EU AI Act — Article 13: Transparency

RAG has a transparency advantage over fine-tuning: every answer can be traced to a specific retrieved source. This is the foundation of Art.13 compliance for AI-assisted decisions — users and deployers can see which documents informed the response. Build source attribution into your RAG pipeline from the start, not as an afterthought.

The right choice depends on the shape of your knowledge: its size, its rate of change, and how much you need to audit the outputs.

Context

How to Audit a Memory Approach

2 min read

In the lab, you'll evaluate an AI tool's memory approach and recommend a retrieval strategy. Three questions will structure your analysis.

Question 1 — How large and how dynamic is the knowledge?

Knowledge that's small and stable is a context window problem. Knowledge that's medium and mostly stable might be a fine-tuning problem. Knowledge that's large, dynamic, or needs source attribution is a RAG problem. Size and rate-of-change are the primary filters.

Question 2 — What are the failure modes?

For each option: What does "wrong" look like, and how often will it happen? Context window failures look like irrelevant injection. Fine-tuning failures look like confident hallucination from stale weights. RAG failures look like empty retrieval returned as confident answers. Name the failure modes for the option you're recommending, not just its advantages.

Question 3 — Can you audit the output?

Can you trace the AI's answer back to its source? For tools making decisions that affect others, auditability isn't optional. RAG wins on auditability — every chunk retrieved is traceable. Fine-tuning loses — you can't tell why the model said what it said. Extended context wins conditionally — you injected the source, but did the model actually use it?

You'll apply all three to a real tool in the lab.

⚙ Skill Lab
RAG Strategy Audit
~20 minutes · 1 tool audit
What you're doing
Choose an AI tool you actually use — a coding assistant, a writing tool, an internal assistant. Evaluate its memory approach. Then recommend a specific retrieval strategy to improve it.
Roles
🔍
You — ArchitectYou evaluate the tool's memory approach and propose an improvement. Be specific about what you'd build and why.
🎯
AI — Technical ReviewerI'll probe your analysis. I want specifics — not "RAG would help" but how, where, and what would fail.
Analysis framework
How large and how dynamic is the knowledge?
What are the failure modes for each option?
Can you audit the output back to a source?
NIST MAP — Name the retrieval failure modes specifically
Success criteria
A clear recommendation with justification. Not "RAG is better" but "this tool needs a retrieval layer for X reason, using Obsidian as the store, with Y as the primary failure mode to design against."
Shift + Enter for a new line
✓ Module Complete
You've completed Module 1 of 8.
Next Module →