← Courses
Building Agentic Pipelines
← Module 5
Module 6 of 8
Module 7 →
Intro
Scenario
Lesson
Context
Lab Build ~25 min
Intro

Staging, Linting, and Quality Gates

2 min read

A pipeline without a staging environment is a pipeline where the first real test is in production. Staging, linting, and quality gates are not bureaucratic overhead — they are the infrastructure that makes pipeline failures recoverable instead of catastrophic. This module is about building them as first-class pipeline components, not as afterthoughts bolted on after the first production incident.

The artifact you'll produce: a gate specification document — the formal record of what a pipeline stage accepts as input, what it must produce as output, what validation rules apply, and what happens when validation fails. This document is what you hand to the next engineer who maintains the pipeline. Without it, every gate is undocumented behavior.

Your artifact — Build Lab
A gate specification document — input schema, output schema, validation rules, failure behavior, and staging environment requirements written for a real pipeline stage
  • Write a gate specification document with input schema, output schema, and validation rules
  • Define the failure behavior for a gate — retry, reroute, escalate, or terminate — before the gate is deployed
  • Specify a staging environment for a pipeline stage — what does the stage need to run safely without production consequences?
  • Apply NIST MANAGE to pipeline gates — maintaining pipeline integrity means maintaining gate specifications over time
  • Identify which validation rules are schema-checkable vs. which require semantic evaluation
Scenario

Three Weeks of Silent Failures

3 min read

A team's pipeline has been running in production for four months. The pipeline takes a natural language task description and produces a database migration script. The script runs through a syntax validator and, if it passes, is queued for execution.

Four months in, the team's database model provider updates their API. The update introduces a subtle change to the model's output format — field names that were camelCase become snake_case. The syntax validator passes the new output because snake_case is syntactically valid. The migration scripts execute. They insert records with wrong field names, which create new columns that don't match the application's schema.

For three weeks, the application creates ghost columns in the database while the expected columns receive no data. The symptom — silently missing data — only appears in reports. Reports run weekly. Three report cycles pass before anyone notices. By the time the diagnosis is complete, 47,000 records have missing data that must be manually corrected.

The syntax validator caught what it was designed to catch. No one designed it to catch schema drift. There was no output schema — no document that said "field names must match this list." There was no staging environment — the first run on the new model output format was a production run. There was no failure behavior defined for format mismatches — because the possibility of a format mismatch wasn't anticipated.

A gate specification document would not have prevented the model update. It would have defined an output schema with explicit field name validation, caught the format change in staging before production, and defined a failure behavior — alert, quarantine, do not execute — for schema validation failures.

Lesson

First-Class Gates

4 min read

A gate is a contract. The input schema describes what the gate will accept. The output schema describes what the gate guarantees will pass. The validation rules are the terms of the contract. The failure behavior is what happens when the contract is violated.

Schema Validation

Does the output match the expected structure? Field names, types, required fields, format constraints. Automatable. Fast. Catches format drift, model update surprises, and missing fields. The most common type to implement and the easiest to skip because it feels like overhead — until it isn't.

Semantic Validation

Does the output mean what it's supposed to mean? Does the migration script actually implement the intent of the task description? Requires a model or human reviewer. Slower. Catches logical errors that are structurally valid. The most common type to skip because it's harder — and the most common source of production failures.

Regression Validation

Does the output match the expected output for known inputs? Requires a test suite of input-output pairs. Catches regressions when the producing model or stage changes. The least commonly implemented and the most valuable when the producing stage undergoes a model update.

All three are different, and the most common mistake is using schema validation when semantic validation is needed — or skipping validation entirely because semantic validation is hard.

A staging environment for a pipeline stage is any execution context where the stage can run without production consequences. It doesn't have to be a separate server — it can be a dry-run mode that logs what the stage would do without doing it, a separate database schema, a sandboxed execution environment, or a test dataset that mirrors production without being production.

The key requirement: the staging environment must expose the same failure modes as production, or it is useless. A staging environment that only tests happy paths is not a staging environment — it's a formality.

NIST AI RMF — MANAGE Function

The MANAGE function requires maintaining AI system performance over time — including after model updates, input distribution changes, and scaling events. For pipeline gates, MANAGE means maintaining the gate specification as the pipeline evolves. A gate specification that was written at deployment and never updated is an artifact that describes the past, not the present. NIST MANAGE applied to gate design means scheduling gate specification reviews when the producing stage changes.

O*NET — Systems Evaluation (6.A.1.b)

Systems evaluation requires identifying when a system is not meeting its goals. A gate specification document is the primary tool for systems evaluation in a pipeline — it defines what "meeting goals" means for each stage, which is the prerequisite for evaluating whether it's happening. If the spec doesn't exist, evaluation is guesswork.

Context

What the Spec Must Contain

3 min read

Three sections every gate specification document must contain — all produced in the lab.

1. Input and output schemas — the contract's terms

The input schema describes what the gate will accept: data types, required fields, format constraints, size limits. The output schema describes what the gate guarantees will pass to the next stage: what fields are present, what format they're in, what invariants hold. Both schemas must be specific enough to write a validator for. "Valid JSON" is not an output schema. "Valid JSON with fields: task_id (string), migration_sql (string, max 10,000 chars), target_table (string matching /^[a-z_]+$/)" is.

2. Validation rules — the specific checks

List each validation rule explicitly: what it checks, what it accepts, what it rejects, and which rule type it is (schema / semantic / regression). This list is what you maintain when the pipeline changes. If a new failure mode is discovered in production, the fix is adding a validation rule to this list — not patching the producing stage in isolation.

3. Failure behavior — what happens when the contract is violated

For each validation rule, specify the failure behavior: retry the producing stage with the same input, retry with modified parameters, escalate to a human reviewer, or terminate and log. Some violations are recoverable — nondeterministic model outputs that might succeed on retry. Some are not — input that violates the input schema before the stage even runs means the upstream stage needs to be fixed. Specify which is which before the gate is deployed.

In the lab, you'll produce a gate specification document for a real pipeline stage — your own or the database migration stage from the scenario. The AI will push you to complete all three sections with sufficient specificity for implementation.

◆ Build Lab
Gate Specification Document
~25 minutes · 1 pipeline stage
What you're building
A gate specification document for a real pipeline stage — input schema, output schema, validation rules with types, failure behavior per rule, and staging environment requirements. Use your own pipeline or the database migration stage from the scenario.
Roles
🏗
You — Gate ArchitectProduce the spec. Start with schemas, add validation rules, define failure behavior. Be specific enough to implement.
🔍
AI — Specification ReviewerA senior engineer reviewing your gate spec before implementation. Will not accept vague schemas or undefined failure behavior.
Framework — apply to your gate
Input schema: data types, required fields, format constraints
Output schema: specific enough to write a validator for
Validation rule types: schema / semantic / regression
Failure behavior: retry / escalate / terminate — per rule, not per gate
Staging environment: exposes the same failure modes as production
NIST MANAGE: when does this spec get reviewed and by whom?
Success criteria
A complete spec: both schemas specific enough to validate, at least three validation rules with types named, failure behavior defined per rule, and a maintenance trigger for NIST MANAGE.
Shift + Enter for a new line
✓ Module Complete
You've completed Module 6 of 8.
Next Module →