A knowledge graph without ranking is a graph where every path is equally worth following. The AI has no way to prioritize traversal — it follows edges at random, or by arbitrary ordering, and retrieves context that may or may not be relevant. Edge ranking gives the graph a gradient: some paths lead toward useful context, others don't.
Reinforcement makes that gradient adaptive. Every time an edge is traversed and the context it led to was useful, its weight increases. Every time the destination context was ignored or irrelevant, the weight decays. Over time, the graph becomes a map of what actually helps — not what was theoretically connected at import time.
This module teaches you to build a ranked edge implementation: the data structures and update rules that make a static graph into a dynamic, learning knowledge system.
Three months after importing a 400-note Obsidian vault into a memory graph, the developer notices a pattern: the AI keeps traversing the same unhelpful paths. When it retrieves a decision node, it consistently follows the "related" edges to general context notes — even when the project-specific notes that actually explain the decision are connected by "derived_from" edges one hop away.
The problem: all edges were initialized with the same weight. The graph has no preference. It doesn't know that derived_from edges tend to lead to more useful context than generic related edges for this use case. It doesn't know that the project architecture node is traversed in 80% of sessions and the research summary node is almost never used. Every path looks equally promising at traversal time.
Separately, a different problem has emerged: some edges were relevant six months ago and haven't been traversed since. They still carry their original weight and compete with current context on equal footing. An edge from an old project phase that was completed in January is just as "heavy" as an edge created last week.
The graph needs ranking — a system that orders traversal candidates by actual usefulness. And it needs reinforcement — a mechanism that updates weights based on what the graph learns about which paths produce good outcomes. Without these, a knowledge graph is a static artifact. With them, it becomes a system that gets better the more it's used.
Edge ranking is determined by a priority score computed at traversal time. The score combines the edge's stored weight with real-time signals about the current retrieval context. Higher-scoring edges are traversed first. When context budget runs out, lower-scoring edges are skipped.
Three signals combine to produce traversal priority: edge weight (the stored, updated value from previous reinforcement), semantic proximity (how similar the target node is to the current query), and edge type priority (a static bonus assigned to edge types that consistently lead to useful context for this use case). A weighted sum of these three produces the traversal score. Edge type priority lets you encode domain knowledge: for a software project, depends_on edges may consistently outperform related edges, so you give them a higher base multiplier.
When an edge is traversed and the retrieved context was used — meaning it appeared in the AI response or was explicitly confirmed as useful — the edge weight increases. A simple additive rule: weight += learning_rate × (1 − weight). This increases weight proportionally, with diminishing returns as weight approaches the ceiling (1.0). When context was not used, the weight does not increase (passive non-reinforcement). Explicit negative feedback — the context was wrong or irrelevant — can apply a small decrease: weight -= learning_rate × weight.
Edges that are not traversed should decay over time, or they occupy high rank indefinitely. A time-based decay function applies periodically: weight = weight × decay_rate, where decay_rate is slightly below 1.0 (e.g., 0.95 per week). Set a floor (e.g., 0.1) so edges never fully disappear unless explicitly archived. The decay rate should match how quickly your domain changes — fast-moving projects need faster decay.
A ranking system that isn't measured drifts invisibly. NIST MEASURE requires defining evaluation metrics before they're needed. For edge ranking: track the distribution of traversal scores (are they spreading out or clustering?), the rate of weight updates (is reinforcement actually happening?), and the fraction of sessions where top-ranked edges were traversed and used. These metrics tell you whether the ranking system is learning or stagnating.
Reinforcement updates modify the stored state of the memory system. Each update should be logged: which edge, what event triggered the update, what the before/after weight was. This log enables auditing (why did the AI prefer this path?) and recovery (reverting a batch of updates that turned out to be wrong).
Ranking and reinforcement are conceptually straightforward, but three implementation decisions determine whether the system works in practice.
Reinforcement only works if you can reliably detect when traversed context was useful. Explicit signals — the user confirms an answer, the retrieved context appears in the response — are reliable but infrequent. Implicit signals — the AI included the retrieved content in its output, the session continued productively after retrieval — are more frequent but noisier. You need to define specifically what constitutes a use event for your system and how you'll detect it without requiring manual feedback on every turn.
The learning rate controls how quickly edges respond to reinforcement. Too high: a single positive use event sends an edge to near-maximum weight, drowning out other signals. Too low: the system takes hundreds of uses to meaningfully update weights — the graph barely learns. A rate between 0.05 and 0.20 is typical. The right value depends on how many sessions your system runs per day and how quickly your use case changes.
If a small set of edges gets reinforced frequently, they approach the ceiling and block other edges from competing. A normalization pass — periodically scaling all weights so the maximum remains at 1.0 — prevents ceiling creep. Alternatively, a soft ceiling (weight increases become smaller as they approach 1.0) achieves the same effect continuously. Specify which approach you'll use.
You'll apply all three decisions in the lab — building a complete ranked edge implementation spec with defined update rules, decay schedule, and measurement plan.