Intro

Structure Is the Product

2 min read

Most developers think about RAG as an engineering problem — embeddings, vector stores, retrieval algorithms. The engineering matters. But the retrieval layer can only surface what the knowledge base actually contains, and knowledge bases fail in ways that have nothing to do with algorithms.

They fail because notes are written for humans who already know the context, not for a retrieval system that knows nothing. They fail because tags are inconsistent, links are orphaned, and folder hierarchies reflect the day they were created rather than how knowledge actually clusters. They fail because nothing governs what goes in, so everything goes in — and retrieval returns noise.

This module is about Obsidian as a RAG-ready knowledge store. Not Obsidian as a note-taking app — that's a different use case. Obsidian as a structured, governed, retrieval-optimized vault that an AI can actually navigate.

Your artifact — Skill

An Obsidian vault template — folder hierarchy, tagging taxonomy, note template, and linking conventions — designed for retrieval, with written rationale for each structural decision.

By the end of this module, you will:

Explain why vault structure directly affects retrieval quality
Design a folder hierarchy and tagging taxonomy for a specific knowledge domain
Write a note template that optimizes for both human readability and chunk retrieval
Apply NIST GOVERN thinking to define what belongs in the vault and who decides
Document structural decisions with rationale a retrieval engineer could use

Scenario

The Vault That Returns Nothing Useful

3 min read

A developer builds a RAG pipeline on top of their Obsidian vault. They've been keeping notes for two years — meeting notes, project documentation, code decisions, architecture reviews, reading notes from technical articles. Thousands of notes. The pipeline is technically solid: the embeddings are good, the vector store is configured correctly, the retrieval algorithm is tuned.

The retrieval quality is terrible.

Queries about architecture decisions return meeting notes from two years ago. Queries about project status return reading notes from unrelated articles. Queries about a specific codebase component return nothing — the notes about that component exist, but they were written as running commentary in daily logs, embedded in date-stamped files under folder names like "2024-03" that tell the retrieval layer nothing about the content.

The problem isn't the retrieval algorithm. The problem is that the vault was built for a human who already knew where everything was. Tags were added inconsistently — some notes have three tags, some have none. Links were made organically — there are 40-note clusters with no links to the rest of the vault. Folder hierarchies reflect the original project structure, which has since been completely reorganized.

When the developer looks at what the retrieval layer actually sees — chunks of 200–500 tokens extracted from note files — they see fragments without context. A chunk that says "decided to use PostgreSQL because of the constraint mentioned above" is useless to a retrieval system that has no idea what the constraint was or which note mentioned it.

The fix isn't a better algorithm. The fix is designing the vault for retrieval from the start.

Lesson

Designing for Retrieval

4 min read

A retrieval-optimized vault has four properties: self-contained notes, consistent taxonomy, meaningful structure, and governed scope. None of these come naturally from how people take notes. They require deliberate design.

Self-Contained Notes

Each note should be understandable without reading the notes around it. When the retrieval layer extracts a chunk, that chunk will be injected into a prompt with no surrounding context. If the note says "see the previous decision" or "as discussed in the meeting," the retrieval layer has nothing to work with.

In practice: Every note begins with a one-sentence context statement. Decisions include the decision, the rationale, and the alternatives considered — in the same note. References to other notes use Obsidian's wiki-link syntax so the retrieval layer can follow the connection if needed.

Consistent Taxonomy

Tags must be defined before they're used, not added organically as notes accumulate. A tag taxonomy is a controlled vocabulary: a fixed list of tags with defined meanings, used consistently across every note in the vault.

In practice: Keep the taxonomy flat and small. Fifteen to twenty tags covering the primary knowledge domains. Hierarchical tags (e.g., #decision/architecture) are useful for drilling down without exploding the namespace. Every new note is required to use at least one tag from the taxonomy.

Meaningful Structure

Folder hierarchies should reflect the knowledge domain, not the calendar. Date-stamped folders are for journals and logs — they're actively harmful for topic retrieval because they scatter related knowledge across time. A folder called decisions/database is retrievable. A folder called 2024-03 is not.

In practice: Top-level folders represent primary knowledge domains. Notes move to the appropriate domain folder when they're mature enough to reference. Daily logs and meeting notes live in a staging area and are processed into domain folders within 24–48 hours.

Governed Scope

Not everything belongs in the RAG vault. Personal notes, unprocessed drafts, and irrelevant reference material are noise that degrades retrieval quality. Scope governance means defining what belongs in the vault — and enforcing it.

Governance Standards — NIST AI RMF

NIST AI RMF — GOVERN Function

The GOVERN function requires defining who is responsible for the AI system's inputs. For a RAG knowledge base, this means: Who decides what goes in the vault? Who reviews notes for quality before they're indexed? Who removes stale or incorrect information? Without GOVERN-level ownership, the knowledge base drifts — and retrieval quality degrades silently.

NIST AI RMF — MAP Function

MAP requires identifying what could go wrong. For a knowledge base, the primary failure modes are: stale knowledge returned as current, out-of-scope content polluting retrieval, and missing context in notes that makes chunks uninterpretable. MAP these risks explicitly in your vault governance policy — don't discover them after the retrieval layer is live.

EU AI Act — Article 17: Quality Management

Art.17 requires quality management systems for AI that makes consequential decisions. If your RAG system informs decisions — code choices, architecture, policy — the knowledge base is part of that quality management system. Document the vault's structure, scope, and update procedures as part of your Art.17 compliance posture.

These four properties aren't constraints on how you take notes. They're design requirements for a system that works.

Context

What to Design Before the First Note

2 min read

In the lab, you'll design an Obsidian vault template for a specific knowledge domain. Three decisions must be made before the first note is written.

Decision 1 — Folder hierarchy

Top-level folders represent retrieval domains — the primary categories a query might target. Keep it shallow: three levels maximum. More depth means more places a note can hide. Each folder represents a concept, not a time period. Name folders for what they contain, not when they were created.

Decision 2 — Tagging taxonomy

Define the full tag list before using any of them. Every tag must have a one-sentence definition. Flat tags for categories, hierarchical tags for subtypes. The taxonomy is a contract — once notes are indexed against it, changing tag names breaks retrieval. Define it carefully and change it rarely.

Decision 3 — Note template

A note template enforces self-containment. At minimum: a one-sentence context statement at the top, a tags field, a created/updated date, and a body that never uses relative references ("see above," "as discussed"). Decisions notes add: the decision, the alternatives considered, and the rationale. The template is the retrieval contract.

You'll design all three in the lab, for a specific knowledge domain of your choice.

⚙ Skill Lab

Vault Template Design

~25 minutes · 3 design decisions

What you're doing

Pick a knowledge domain you actually work in. Design the three structural elements that make a vault retrieval-ready: folder hierarchy, tagging taxonomy, and note template. Justify each decision.

Roles

🏗

You — Knowledge ArchitectYou design the structure. Every decision needs a rationale a retrieval engineer could act on.

🎯

AI — Retrieval EngineerI'll push on every structural decision. I want to know what breaks if you get it wrong and what changes when the domain evolves.

Three design decisions

Folder hierarchy — concept-based, max 3 levels

Tagging taxonomy — defined list, each tag with a one-sentence meaning

Note template — enforces self-containment and context

NIST GOVERN — who owns the vault, who reviews, who removes stale content

Success criteria

A complete template design with documented rationale for each structural decision — specific enough that a teammate could implement it without asking follow-up questions.

Shift + Enter for a new line

✓ Module Complete

You've completed Module 2 of 8.

Next Module →