Multi-Model Configuration

Module 2 of 8 — Build a Command Center

One of the first things people discover when they build their own command center: not all tasks deserve the same model. Paying flagship prices for a task a lighter model handles perfectly isn't a minor inefficiency — it's a signal that your system has no brain. Choosing the right model for the right job is one of the clearest wins you get from owning your own infrastructure.

But model selection isn't just about cost. Capability ceilings matter. Context window limits matter. Latency for interactive tasks matters. A routing configuration that maps task types to models — and enforces cost ceilings when things go wrong — is a piece of infrastructure that pays for itself on the first week of real use.

In this module, you'll build that routing configuration. You'll work through a model selection matrix, define task-type routing rules, set API key management strategy, and write cost threshold behavior. By the end, your command center will know which model to call and why — not just "the latest one."

What You'll Be Able to Do

Build a model selection matrix mapping task types to specific models with cost and capability justification
Write routing rules that direct traffic by task classification, not just user preference
Define cost ceiling behavior — what the system does when usage hits a threshold
Articulate API key management strategy for multi-model setups
Apply NIST MAP to identify model selection failure modes before they happen

Portfolio Artifact

Build Lab — Routing Configuration Document

A multi-model routing specification: task-type classifications, model assignments with cost and capability justification, API key management approach, and cost ceiling behavior — documented with explicit NIST MAP failure mode identification.

One Model for Everything

A solo developer is building a research assistant for their team. The assistant handles three very different tasks: quick reformatting of pasted text, multi-document synthesis for reports, and a coding sub-agent that writes and critiques implementation plans. All three currently call the same model — the largest, most capable one available.

Month one invoice: $340. Month two: $390. The developer pulls usage logs and finds that 68% of API spend went to reformatting tasks — work that's essentially find-and-replace with light reasoning. The synthesis tasks, which require multi-document context and careful argument construction, used 19% of the spend. The coding sub-agent, which needed real capability, used 13%.

The allocation is backwards. The most expensive model is doing the cheapest work at scale, while the tasks that actually need heavy reasoning are a small slice of traffic.

The developer's instinct is correct: this is a routing problem, not a model problem. The tools exist. What's missing is a configuration layer that classifies tasks and assigns them to appropriate models — with explicit rules, not intuition.

What This Reveals

Vendor-provided clients give you one model selector and one API key. You pick a model at session start, and everything that session uses it. That's fine for exploration. It breaks down the moment you have a system with heterogeneous task types running at any meaningful volume.

A command center with routing configuration changes the cost structure of AI work. It also changes the failure structure: when you've explicitly mapped task types to models, you can see exactly where the system is calling the wrong tool. Debugging gets easier. Costs become predictable. And you have a document that explains the system's decisions, which matters when something goes wrong at 2am.

The Model Selection Matrix

The most important insight in multi-model configuration: task type is the primary routing variable, not user preference. Users don't know which model handles their task best. Your configuration layer should. The routing config is the place where that knowledge lives — explicit, versioned, and auditable.

Four Dimensions of Model Selection

When you're building a routing matrix, evaluate models along four axes:

Dimension	What It Drives	When It's Binding
Cost per token	Monthly spend at scale	High-volume, low-complexity tasks
Capability ceiling	Output quality on hard problems	Multi-step reasoning, synthesis, code review
Context window	How much input it can hold	Long document tasks, multi-turn sessions
Latency profile	Responsiveness for interactive use	Real-time chat, streaming interfaces

A routing matrix picks the cheapest model whose capability ceiling, context window, and latency all clear the bar for the task. You don't use a heavier model than the task requires. You don't use a lighter model that can't reliably hit the quality floor.

Task Classification

Before you can route, you need a taxonomy. A workable starting taxonomy has three tiers:

Tier 1 — Lightweight: Reformatting, summarization of short inputs, classification, extraction from structured data. These tasks have clear correct answers, bounded scope, and rarely require judgment. Route to the cheapest capable model.
Tier 2 — Standard: Multi-turn conversation, code generation, single-document analysis, short-form writing with revision. These tasks need fluency and moderate reasoning. Route to mid-tier.
Tier 3 — Heavy: Multi-document synthesis, architectural review, complex reasoning chains, tasks where errors compound. Route to the most capable model you have configured.

Cost Ceiling Behavior

A routing config without a cost ceiling is incomplete. Define what happens when a session, user, or day hits a spend threshold:

Soft ceiling: Notify and continue — log the breach, alert the operator, keep serving.
Hard ceiling: Downgrade routing — bump all Tier 3 tasks to Tier 2 until reset.
Shutdown ceiling: Stop serving API calls until a human intervenes.

Which ceiling behavior you pick depends on who pays the bill and what the tolerance for service interruption is. Document the choice. If you don't, the default behavior is "keep spending," which is not a choice — it's an omission.

Governance Standards — The Regulatory Layer

NIST AI RMF — MAP MAP requires identifying and categorizing AI tasks before deployment. A routing matrix is a MAP artifact: it documents which model handles which task class and why, creating an auditable record of system design decisions.

NIST AI RMF — MEASURE Cost ceilings and routing logs create measurement infrastructure. Without them, you can't tell whether the system is behaving as designed or drifting toward unintended spend or capability mismatches.

O*NET — Complex Problem Solving (4.A.4.b) Designing a routing matrix requires identifying the problem (heterogeneous tasks, one model), developing and evaluating alternatives (tier taxonomy, model options), and selecting the appropriate solution — the core O*NET complex problem solving skill.

NIST AI RMF — GOVERN Documenting routing decisions, cost threshold behavior, and API key strategy creates the governance record that lets a team understand, audit, and change the system's behavior over time.

Three Decisions Before You Route

A routing configuration document needs to answer three structural questions before it covers task-to-model mappings. These are the decisions that determine whether the routing rules are coherent or just a list of preferences.

Decision 1 — What is the unit of classification?

Are you routing by request type (what the user asked for), agent identity (which sub-agent is calling), or content characteristics (length, complexity, presence of code)? These aren't the same. A coding sub-agent making a simple lookup is a Tier 1 task, even though it came from a "heavy" agent. Define the unit before you define the rules — otherwise, the rules will be ambiguous at every edge case.

Decision 2 — How many API keys, and who holds them?

Multi-model setups typically need at least one API key per provider. The questions that matter: Are keys stored at the system level or per-user? Who can rotate them? What's the access control model if this command center serves a team? A routing config that doesn't specify key management is assuming someone else has figured this out. They probably haven't.

Decision 3 — What does the system do when it can't classify?

Every routing system has an unclassified case — the task that doesn't fit any tier. Define the fallback now: default to mid-tier? Route to the operator for classification? Reject with an explanation? The fallback behavior is often where cost surprises and capability gaps hide. Your configuration document should state it explicitly.

You'll apply all three decisions in the lab — the AI will push you to define each one before accepting a routing rule as complete.

BUILD LAB

⏱ Estimated time: 20–30 minutes

Your Role

You

Architect. You're designing a routing configuration document for your command center — making explicit decisions about task classification, model assignment, API key strategy, and cost ceiling behavior.

AI Role

Infrastructure Reviewer

Challenges vague routing rules, asks what happens at the edges, and checks whether the three structural decisions (classification unit, key management, fallback behavior) are actually answered before accepting a routing config as complete.

Framework Reminders

Classification unit What determines the tier — request type, agent identity, or content characteristics?

Key management Per-provider keys, storage level, rotation ownership.

Fallback behavior What happens to a task that doesn't fit any tier?

Cost ceiling Soft / hard / shutdown — and who decides?

Completion Criteria

Produce a routing configuration document that covers: task tier taxonomy, model assignments with justification, API key strategy, cost ceiling behavior, and fallback handling. The AI will check each element.

Shift + Enter for a new line

✓ Module Complete

You've completed Module 2 of 8.

Next Module →

Shift + Enter for new line