← Courses
Leveraging RAG for AI Development
← Module 5
Module 6 of 8
Module 7 →
Intro
Scenario
Lesson
Context
Lab Build ~30 min
Intro

Capture Is a Pipeline, Not a Habit

2 min read

A RAG system is only as good as what's in it. The best retrieval pipeline in the world can't surface knowledge that was never captured. Most developers build the retrieval layer carefully and leave the capture layer to chance — they take notes when they remember, tag them when they feel like it, and process them when they have time.

That's not a system. That's wishful thinking.

The knowledge that matters most is the knowledge generated in the moment of doing: the decision made at the end of a long code review, the tradeoff surfaced in a Claude conversation at 11 PM, the architecture call settled in a Slack thread that nobody archived. This knowledge disappears not because it was forgotten — but because no trigger existed to capture it.

This module is about building capture as a pipeline, not a habit. You'll design the triggers, the quality gates, and the governance policy that determine what enters your retrieval store — and what stays out.

Your artifact — Build
An automated memory capture workflow — trigger conditions, note templates, tagging rules, indexing schedule, and a governance policy specifying what gets stored and who reviews it.
  • Design event-driven capture triggers for a developer knowledge workflow
  • Define a note quality gate that blocks noise from entering the indexed vault
  • Build a staging area process with explicit review criteria
  • Write a governance policy specifying vault scope, exclusions, and ownership
  • Apply NIST GOVERN to define accountability for what enters the knowledge base
Scenario

The Vault Has a Retrieval Problem

3 min read

A developer has spent three months building a RAG pipeline on their Obsidian vault. The retrieval layer works. Embeddings are clean. Chunk sizes are tuned. Semantic search returns relevant results for what's in the vault.

The problem is the vault itself.

It contains everything they've written deliberately — book notes, project specs, reference pages they moved from Notion. What it doesn't contain is everything they've generated in the flow of actual work. Their conversations with Claude, where most of their reasoning actually happens, aren't in the vault. Their pull request comments — where they explained why they made a specific tradeoff — aren't in the vault. Architecture decisions settled in a Slack thread at the end of a sprint aren't there. The code review that surfaced a security assumption isn't there.

The knowledge that matters most is the knowledge generated in the moment of doing. None of it makes it into the retrieval store.

The developer has tried adding a "daily notes" practice. It lasts two weeks before it breaks down under deadline pressure. They've tried a weekly review. Same result. Every manual habit collapses when the work gets intense — which is exactly when the most important decisions are being made.

The solution isn't more discipline. It's a capture pipeline with event-driven triggers, quality gates, and a governance policy that runs whether or not the developer remembers to run it.

Lesson

Memory Capture as a Pipeline

4 min read

A capture pipeline has three stages: triggers that generate notes, quality gates that filter them, and governance rules that define scope. Skip any stage and the pipeline degrades — either missing knowledge worth capturing, or flooding the vault with noise that degrades retrieval quality.

Instead of "remember to take notes," define triggers — specific events that automatically generate or prompt for a note. A trigger has three parts: the event that fires it, the template it produces, and the destination where the draft lands.

Code review completed → generate a decision note with the PR number, the tradeoff surfaced, and the reasoning behind the final call. Claude conversation exceeds 20 exchanges → extract key decisions and open questions into a staging note. Daily journal completed → tag and route to the appropriate domain folder. Architecture decision record written → index immediately with the ADR tag, bypassing staging.

The value of a trigger is that it fires on events that already happen in your workflow. You're not adding a new habit — you're attaching a capture action to something you already do.

Automated capture generates noise unless constrained. A quality gate is a checklist that runs before a note leaves staging and enters the indexed vault. A well-designed gate checks three things: self-containment (can someone understand this note without context from the conversation that generated it?), taxonomy coverage (does it have at least one domain tag and one note-type tag?), and scope compliance (does this content belong in the vault at all, per the governance policy?).

Notes that fail the gate go to a staging area, not directly to the indexed vault. The staging area is where notes wait for human review. It is not a holding bin — it has a defined review cadence and a clear definition of "ready to index."

Not everything should go into the RAG knowledge base. Personal information, draft thinking, and sensitive communications need explicit scope rules. The governance policy defines: what content types belong in the vault, what is explicitly excluded, and who has authority to add or remove content categories.

Without a governance policy, scope creep degrades retrieval quality over time. When the vault contains a mix of finished decisions, half-formed thoughts, and personal diary entries, every query retrieves a mix of signal and noise — and the developer can't tell which is which.

The governance policy is not a document you write once and forget. It's a living constraint that the automation enforces at capture time.

NIST GOVERN — Capture Pipeline Ownership

GOVERN requires defining who owns the AI risk management process. Applied to memory capture: who owns the capture pipeline configuration? Who approves changes to what triggers fire? Who has authority to modify the governance policy that determines vault scope? Without clear ownership, the pipeline drifts — triggers accumulate, exclusions erode, and nobody is accountable when the vault starts returning garbage.

NIST MANAGE — Remediation When Automation Misfires

MANAGE covers what happens when a risk materializes. For capture pipelines, the specific risk is: automated capture stores something it shouldn't — sensitive communications, draft content that misrepresents a decision, personal information. MANAGE requires a defined remediation process: how is the content identified? Who reviews and removes it? How is the indexing refreshed after removal? The process must exist before the misfire happens, not after.

UNESCO Accountability — Incorrect Information Retrieved and Used

UNESCO's AI ethics framework requires human accountability for AI-generated outcomes. When the vault contains an incorrect note — a decision captured imprecisely, a conclusion drawn before all the facts were in — and that note gets retrieved and used to inform a new decision, who is responsible? The accountability framework must name a human who reviews flagged retrievals and who has authority to update or remove incorrect knowledge from the vault.

Context

Three Things to Bring Into the Lab

2 min read

In the lab you'll build the full capture workflow for a developer's Obsidian vault. Three decisions shape everything else — get these clear before you start designing.

1. Your capture triggers

Identify the events in your workflow that generate knowledge worth storing. Pull requests, long AI conversations, architecture reviews, meeting decisions, sprint retrospectives — which three events already happen regularly and carry the most decision-value? For each trigger: name the event, describe the template it produces (what fields does the generated note contain?), and specify the destination where the draft lands.

2. Your staging area design

Captured notes go through a staging area before entering the indexed vault. Define: what makes a note ready to leave staging? Is it self-contained? Does it have the required tags? Does it comply with the governance policy? Who reviews staged notes, and how often? A staging area without a review cadence is just a second inbox that never gets processed.

3. Your governance policy

Write a one-paragraph policy that specifies what types of content belong in the vault, what is explicitly excluded, and who has authority to add or remove content categories. The policy must be concrete enough that the automation can enforce it — "no personal information" is not enforceable; "no content that names individuals outside of decision-context" is. Without this, scope creep degrades retrieval quality over time.

The AI reviewer will ask you two questions the lesson didn't answer: what happens when a Claude conversation captures something sensitive? And who decides when a staged note is ready to index? Have a specific answer for both before you begin.

⚙ Build Lab
Design the Memory Capture Workflow
~30 minutes · 4 deliverables
What you're building
Design the complete automated memory capture pipeline for a developer's Obsidian vault. The AI reviewer will push you to specify, not generalize — every deliverable needs concrete details, not principles.
Roles
🏗
You — Pipeline DesignerDesign the capture workflow: triggers, quality gate, staging process, governance policy. The AI will challenge every vague claim.
🔍
AI — Systems ReviewerI'll ask what happens in the edge cases. What if sensitive content gets captured? What if staging fills up and nobody reviews it?
Four deliverables
Three capture triggers with specific automation logic
A note quality gate specification (pass/fail criteria)
A staging area process with review cadence and owner
A governance policy for vault scope (what's in, what's out, who decides)
Framework reminders
NIST GOVERN — who owns the capture pipeline?
NIST MANAGE — what happens when automation captures something sensitive?
UNESCO — who is accountable when a retrieved note contains incorrect information?
Success criteria
A pipeline specification detailed enough that someone else could implement it. No "it depends" answers — every decision names the specific condition that drives it.
Shift + Enter for a new line
✓ Module Complete
You've completed Module 6 of 8.
Next Module →