Intro

Specs That Agents Can Execute

2 min read

Agents don't fail because they're not smart enough. They fail because the specification they're operating from is ambiguous, incomplete, or built for a human reader instead of a machine executor.

There's a specific kind of spec that an agent can use: one that defines the working context at the pipeline level (what this agent knows, what it has access to, what it's responsible for), and the task scope at the stage level (what input it receives, what output it must produce, what quality bar it must meet). That's what a Context.md, a Claude.md, and an Architecture Record Document (ARD) are for — each one handles a different layer of the agent's working environment.

Most developers write specs after they build. This module teaches you to write the spec that makes the build possible — and to structure it so an agent can execute from it directly, not just understand it.

Your artifact — Build Lab

A pipeline specification document — ARD, Context.md, and Claude.md written for a real pipeline, with each section justified and structured for agent execution

By the end of this module, you will:

Write a Context.md that gives an agent an accurate model of the project at the pipeline level
Write a Claude.md (or equivalent system config) that defines agent behavior, constraints, and escalation rules
Structure an ARD that captures architectural decisions with enough context for downstream agents to build from
Identify the difference between a spec written for humans and one written for agent execution
Apply NIST GOVERN principles to pipeline configuration: who owns the spec, who can change it, and what happens when agents conflict with it

Scenario

Two Teams, Same Pipeline

3 min read

Two teams both build the same thing: a pipeline that takes a product requirement and produces a tested, reviewed code change. Same AI models, same infrastructure, same task.

Team A has no shared specification. Each developer configures their agent sessions individually. The AI knows what the developer typing the prompt knows. It doesn't know the project's architectural conventions, the naming standards, the existing modules it shouldn't duplicate, or the test format the team has agreed on. Every agent session starts from scratch. The developers are good, so outputs are usually reasonable. But "usually reasonable" means rework on one in four PRs, and nobody can explain why that one looked different.

Team B writes a Context.md before they build. It describes the project structure, what exists, what the agent is responsible for, and what it should not touch. They write a Claude.md that defines how the agent should behave: when to ask for clarification, when to proceed, what to do when it encounters something ambiguous, and when to stop and escalate. They write an ARD for each architectural decision — not a changelog, but a document that captures the options considered, the decision made, and the tradeoffs accepted. When a new agent session starts, it loads all three.

Team B's outputs are consistent enough to be reviewable by automated gates. Team A's aren't. By the time both teams have ten pipelines running, Team B can add a new agent to any pipeline without a setup session. Team A can't hand anything off without a 30-minute briefing.

The difference isn't configuration management. It's that Team B treats the spec as the first thing they build — not documentation they write after the fact, but a working artifact that the pipeline depends on.

Lesson

Three Documents, Three Layers

4 min read

A pipeline specification lives at three layers. Each document is responsible for a different scope of the agent's working environment. Miss any one layer and the agent will fill the gap with assumptions — usually reasonable ones, occasionally catastrophic.

Context.md — the project layer

Context.md answers one question: what does this agent need to know about the project to do its job without asking? It is not a README. It is a machine-readable model of the working environment: what exists, what the agent is responsible for, what it should not touch, and what external dependencies it needs to know about.

A good Context.md is specific enough that two different developers reading it would produce the same mental model of the project. If it's vague enough that reasonable people interpret it differently, agents will interpret it differently too.

Claude.md (system config) — the behavior layer

Claude.md defines how the agent behaves, not what it knows. It covers: when to proceed vs. ask for clarification, what output format to use, what the agent should never do, and when to escalate or stop. It's the agent's standing orders.

The most important section most developers omit: escalation rules. Under what conditions should the agent stop and wait? What constitutes an ambiguous instruction it should not resolve on its own? An agent without defined escalation rules will always proceed — because proceeding feels like progress. An agent with defined escalation rules will sometimes stop, and that stop is information.

ARD — the decision layer

An Architecture Record Document captures a specific decision: what options were considered, what was chosen, why, and what tradeoffs were accepted. It's not a log of what was built — it's the reasoning behind a choice that downstream agents need to understand to build consistently with it.

An ARD that says "we use Postgres" is useless to an agent. An ARD that says "we chose Postgres over SQLite because we need concurrent writes from multiple pipeline stages, and we accepted the operational overhead of running a separate database process" gives the agent what it needs to make consistent choices in the spaces the ARD doesn't cover.

Governance Standards — The Regulatory Layer

NIST AI RMF — GOVERN Function

GOVERN requires that organizational policies for AI systems are defined before deployment. For agentic pipelines, this means: who owns the Context.md? Who can change it? What happens when a developer modifies the Claude.md without team review? The spec documents are governance artifacts — they define what the AI system is authorized to do. Treating them as living documents with no ownership is a GOVERN failure.

O*NET — Complex Problem Solving (2.B.3.g)

Identifying complex problems and reviewing related information to develop and evaluate options and implement solutions. Writing an ARD is applied complex problem solving — you're not just recording a decision, you're forcing yourself to name the alternatives and articulate why you rejected them. An agent reading that ARD inherits the problem-solving work you did.

EU AI Act — Article 17: Quality Management

Article 17 requires documented quality management processes for AI systems used in consequential contexts. A Claude.md with defined escalation rules and a Context.md with defined scope boundaries are quality management artifacts. If your pipeline is used for anything consequential, these documents are not optional infrastructure — they're part of your quality record.

Context

Before You Write the Spec

2 min read

Three tests for a spec before you write it. Each one catches a different class of failure.

1. The cold-start test

Could an agent with no prior knowledge of this project read your spec and produce consistent output on the first try? If the answer depends on an agent having "seen this before," the spec is missing something. A good spec makes the first session and the tenth session produce equivalent quality. If it doesn't, the delta is context that belongs in the spec.

2. The conflict test

If two instructions in your spec pull in different directions, which one wins? Agents that receive conflicting instructions don't error — they choose. They choose based on what feels more salient to the model, which is not the same as what you intended. Spec conflicts are governance failures: they mean the agent has discretion you didn't mean to give it. Find every place in your spec where two instructions could produce different behavior and resolve them explicitly.

3. The escalation test

Name five situations where you want the agent to stop and wait rather than proceed. If you can't name five, your escalation rules aren't complete. The most common missing escalation: "when the agent encounters something that contradicts the ARD." An agent without that rule will silently build inconsistently with prior decisions.

In the lab, you'll write all three spec documents for a real pipeline, then test them against all three checks. You'll apply all three in the lab.

■ Build Lab

Pipeline Specification

~25 minutes · 3 documents

What you're building

A full pipeline specification: Context.md (project layer), Claude.md (behavior layer), and an ARD (decision layer). Pick a real pipeline — yours or one you're designing. We'll build all three documents and test them against the cold-start, conflict, and escalation checks.

Roles

📐

You — Spec AuthorDescribe the pipeline and draft each document. Be honest about what you don't know yet.

🔬

AI — Spec ReviewerI'll test each document against the three checks and push until the spec passes all of them.

Three documents to produce

Context.md — project scope, what exists, what agent is responsible for

Claude.md — behavior rules, escalation conditions, output format

ARD — one key architectural decision with options, choice, and tradeoffs

Three checks to pass

Cold-start test — consistent output without prior context

Conflict test — no two instructions that pull in different directions

Escalation test — five named situations where agent must stop

Shift + Enter for a new line

✓ Module Complete

You've completed Module 2 of 8.

Next Module →