← Courses
Leveraging RAG for AI Development
← Module 6
Module 7 of 8
Module 8 →
Intro
Scenario
Lesson
Context
Lab Debate ~25 min
Intro

The Knowledge Base Is Not Neutral

2 min read

The retrieval layer doesn't just surface information — it shapes what the AI knows. Every decision about what goes into the knowledge base, how it's weighted by embeddings, and who can add or remove content is a governance decision with real consequences. A poorly governed knowledge base produces answers that reflect whoever controlled the content, not reality.

Developers building RAG systems spend most of their time on the technical layer — chunking strategies, embedding models, vector stores, retrieval quality. The governance layer gets left for later. Later usually means after the first governance failure: a junior developer discovers that the AI has been presenting one senior engineer's opinions as team policy, or that a resolved bug is still the top retrieval result three months after the fix.

This module forces you to take positions on three hard governance scenarios and defend them. Not abstract positions — specific decisions about what content belongs in a shared knowledge base, who has authority over it, and what happens when it goes stale.

Your artifact — Debate
Written defense of a knowledge governance position across three scenarios involving content inclusion, retrieval bias, and access control — positions stated and justified under NIST GOVERN framework.
  • Distinguish governed knowledge artifacts from neutral information stores
  • Apply NIST GOVERN thinking to content ownership, opinion marking, and lifecycle policy
  • Identify retrieval equity failures and explain their governance response
  • Define a content expiry policy with specific review cadences and owners
  • Defend a governance position under challenge and revise it with explicit reasoning
Scenario

Three Problems in the First Month

3 min read

A team of five developers builds a RAG-powered knowledge base for their shared codebase. The idea is straightforward: anyone on the team can add notes, decisions, and documentation to an Obsidian vault. The AI retrieves from that vault when developers ask questions about design choices, architecture, and implementation patterns. After one month, three governance failures have already appeared.

A senior developer added their own architecture notes to the shared vault shortly after launch. The notes described their preferred approach to service decomposition — written in assertive, declarative language. When junior developers ask the AI about design choices, it retrieves those notes and returns them as answers. The AI frames a single developer's preferences as established design decisions. The junior developers have no way to know they're receiving one person's opinion, not team policy. Three junior developers have already made implementation choices based on it.

The team notices that retrieval quality isn't consistent across team members. Developers with deep domain experience phrase their queries in precise technical terms and receive highly relevant, specific results. Developers who are newer to the domain phrase their queries in plain language and receive generic, surface-level results — or sometimes nothing at all. The knowledge base effectively works better for the people who need it least. The newer developers have started going directly to the senior developer instead of using the system, which defeats the entire purpose of building it.

Six months before the knowledge base was built, a developer documented a serious performance issue with a third-party API vendor — slow response times, inconsistent error codes, unreliable uptime. That note was added to the vault at launch. The issue was resolved four months ago after the vendor released a major update. But the note is still in the vault, still unarchived, and still returned as the top retrieval result for any query about that vendor. The team's relationship with the vendor has been damaged. The vendor's account manager has asked twice why the team's AI keeps telling developers the vendor is unreliable.

Lesson

A RAG Knowledge Base Is a Governed Artifact

4 min read

The three problems in the scenario have different symptoms, but they share the same root: the team built a retrieval system and left governance for later. A RAG knowledge base is not a neutral information store. It is a governed artifact — and every governance gap will eventually produce a retrieval failure that damages something real.

All content in a shared knowledge base must be attributed, and opinion content must be marked as such. An architecture note from one developer is not the same as a team decision. The knowledge base must distinguish between "this is what we decided as a team" and "this is what one person recommended." Without that distinction, the retrieval layer launders individual opinion into apparent consensus.

The governance mechanism is content classification with mandatory attribution. Every note has an author, a creation date, and a content type. Content types carry different epistemic weight: a Decision requires team sign-off; a Recommendation is attributed to its author and labeled as individual perspective; a Reference documents factual external information; a Draft is marked in-progress and excluded from high-confidence retrieval by default. The retrieval system can surface this metadata alongside the content so the developer receiving the answer knows exactly what kind of knowledge they're receiving.

If retrieval quality varies by how users phrase queries, the knowledge base is not equitably accessible. Under UNESCO fairness principles, a shared organizational knowledge system should return comparable quality results regardless of a user's level of domain expertise. A system that works well only for expert users is not a shared resource — it's a private resource with public branding.

Two mechanisms address this. First, hybrid retrieval combines semantic search (which benefits expert phrasing) with keyword search (which is more robust for plain-language queries). The two scores are blended so that neither approach dominates. Second, query reformulation assistance — where the system suggests alternative phrasings when a query returns low-confidence results — helps users who don't yet know the technical vocabulary. NIST GOVERN requires monitoring retrieval quality across user types for shared systems. If you're not measuring whether your knowledge base works equally well for all team members, you can't govern it.

Content has a lifecycle: created, used, outdated, removed. A knowledge base without a content review and removal process accumulates stale information that degrades retrieval quality over time. The stale vendor note isn't a technical problem — it's a governance gap. Nobody was assigned responsibility for reviewing that note. No expiry date was set. No archival process was defined.

Every content type should have a defined review cadence and a named owner responsible for archiving or removing it. Decisions should be reviewed when the underlying circumstances change. References to external systems should be reviewed quarterly or when the external system updates. Recommendations should have a default expiry of six months unless explicitly extended. Content that passes its review date without action should be automatically flagged — not automatically removed, because automated removal of content is its own governance risk — but surfaced to its owner for an explicit decision.

NIST GOVERN — Content Ownership and Lifecycle

NIST GOVERN requires defined policies for who owns AI system inputs, not just outputs. For a RAG knowledge base, this means: who can add content, who can remove it, and who is responsible when the content is wrong. Governance without enforcement is not governance. If contribution rules exist but anyone can add anything without review, the rules don't function.

NIST MAP — Whose Perspective Is Encoded

NIST MAP asks you to identify who is affected by the AI system and how. For a knowledge base, MAP requires identifying whose perspective is being encoded in the retrieved content. A knowledge base that over-represents senior team members' opinions will consistently return answers that reflect their worldview — even if that's not the team's actual position. MAP the sources, not just the outputs.

UNESCO Fairness — Equitable Access to the Knowledge Store

UNESCO's AI ethics principles require that AI systems serve all users equitably. For a shared knowledge base, this means equitable retrieval quality across skill levels. A system that gives better answers to developers who already know the answers is not serving its governance purpose. Retrieval equity is a measurable property — measure it.

Context

Three Questions Before the Lab

2 min read

In the lab, you'll defend governance positions across the three scenarios. These context questions prepare you to argue specifically — not in principles, but in policy decisions with real consequences.

Question 1 — Content Classification

Before you can govern content, you have to classify it. Define at least three content types for a shared developer knowledge base — for example: Decisions (team consensus, requires sign-off), Recommendations (individual opinion, attributed to author), References (factual documentation about external systems), Drafts (in-progress, excluded from high-confidence retrieval). Each type needs different governance rules: who can add it, how it's marked in the note and in retrieval results, and when it expires or requires review. Classification without enforcement is decoration.

Question 2 — Access and Contribution Rules

Who can add content to the shared vault? All team members? Only designated curators? Does the answer depend on the content type — any developer can add a Draft, but only team leads can add a Decision? Who can remove content, and under what conditions? Who can archive it versus permanently delete it? Governance without enforcement is not governance. If your policy allows anyone to add anything, you don't have a content governance policy — you have a wiki. That's a different thing.

Question 3 — Content Expiry and the Stale Content Problem

Define a content expiry policy before you enter the lab. Which content types expire? What triggers a review — a calendar date, a system event (vendor update, architecture change), or both? What happens to content that reaches its review date without action — is it flagged, archived, or removed? Who owns the archival decision? The stale vendor note in the scenario wasn't a technical failure. It was a policy gap: nobody was responsible for reviewing it, and no expiry trigger was defined. Name the person, the cadence, and the consequence of inaction for each content type.

These aren't rhetorical questions. In the lab, you'll be challenged to defend specific answers to all three. Come in with positions.

⚖ Debate Lab
Knowledge Governance Positions
~25 minutes · 3 governance scenarios
Three Scenarios
Individual opinions in a shared vault · Retrieval equity failure · Stale content governance
Roles
🏛
You — Governance ArchitectState a specific governance position on each scenario. Hold it under challenge or revise it with explicit reasoning.
AI — Governance ChallengerI'll push back on every position. Vague policy is not governance. I want enforcement mechanisms, named owners, and measurable outcomes.
Framework — apply to each
NIST GOVERN — who owns the content and enforces the policy?
NIST MAP — whose perspective is being encoded or excluded?
UNESCO Fairness — does the policy serve all users equitably?
Name the enforcement mechanism, not just the principle
Success criteria
A specific governance position on all three scenarios — with enforcement mechanisms, named owners or roles, and a measurable outcome for each. Position changes are acceptable with explicit reasoning.
Shift + Enter for a new line
✓ Module Complete
You've completed Module 7 of 8.
Next Module →