Intro

When Agents Go Wrong

2 min read

Every deployed agent will fail. Not if, when. The question is not whether failure happens — it is who bears responsibility for the failure and what could have prevented it.

An agent that makes autonomous decisions creates a new accountability problem. Traditional software bugs are products of bad code. Agentic failures are products of bad design, bad data, bad objectives, and bad oversight.

When a human hires a contractor and the contractor causes damage, the law is clear: the contractor is liable, but the hiring party is also liable for negligent hiring or supervision. Agents are different. The agent is the contractor. The person who deployed it is the one responsible for any negligence.

This module teaches you to trace agent failures back to their root causes — and to assign accountability fairly.

By the end of this module, you will:

Identify the three main failure modes unique to agent systems
Map accountability for an agent failure across developer, deployer, and user
Argue a principled position on responsibility when an agent causes harm
Evaluate whether a failure was preventable and who should have prevented it
Propose a specific accountability structure for a contested agent deployment

Portfolio Artifact — Debate An accountability framework for a specific agent deployment scenario, mapping responsibilities to EU AI Act Art.13 transparency requirements, NIST MEASURE risk indicators, and UNESCO transparency principles

Scenario

Four Million Dollars in Three Minutes

3 min read

A hedge fund deployed a trading agent. It reads market data, identifies patterns, executes trades. Two years of successful trades. Good returns. The system was working.

Then a market data feed breaks. Instead of sending correct prices, it sends corrupted signals — a value that looks like a massive market shift. The trading agent reads the signal. Its training told it: when you see a shift this large, you rebalance. So it does. It starts selling positions in one sector and buying heavily in another. The code executes correctly. The agent is behaving exactly as designed.

But the code also has no rate limit. No circuit breaker. No human review for trades under $50,000. No escalation if the portfolio moves more than 5% in an hour.

In 180 seconds, the agent executes 10,000 trades. $4.2 million gone. The position that was worth $100 million is now worth $95.8 million. The human traders watching the dashboard saw the spike and tried to stop the agent. By the time they got to the kill switch, the damage was done.

Then the subpoenas arrive. Three parties are liable:

The fund: "We deployed the agent into a live market. We set the parameters. But we didn't anticipate a market data feed would send corrupted data this badly. That's not our fault — that's the data provider's fault."

The data provider: "We sent the best data we could. Signal corruption happens. But the fund's agent should have had safeguards. We aren't responsible for how downstream systems use our data."

The software vendor: "We built the agent exactly as specified. The fund set the parameters. The fund chose not to implement circuit breakers. We aren't responsible for how the fund deployed our system."

Each party's lawyers point to the others. Nobody is clearly wrong. Everybody had an opportunity to prevent the loss.

Lesson

Three Agent Failure Modes

3 min read

Agents fail in three distinct ways. Understanding which failure mode caused the harm helps you assign accountability.

Goal Misspecification — The agent optimized for the wrong thing

You tell the agent to maximize engagement. It maximizes engagement by surfacing extreme content and outrageousness. You tell it to reduce costs. It reduces costs by cutting corners that matter. You tell it to execute trades when it detects a pattern shift. It detects a pattern shift (from corrupted data) and executes. The agent is behaving perfectly. The objective was wrong. Who bears responsibility? The person who defined the objective.

Tool Misuse — The agent used a tool in an unexpected way

A research agent has access to a database API. It's supposed to query academic papers. Instead, it discovers it can call the API 10,000 times in parallel and hammer the database into overload. Technically valid. Catastrophically wrong. Who bears responsibility? Both: the vendor who didn't rate-limit the API, and the deployer who didn't constrain the agent's tool use.

Cascading Errors — One mistake amplified through the loop

An agent makes a small mistake in step 1. It reads the result, which is unexpected, but plausible. So it moves to step 2 based on the wrong understanding. Step 2 makes sense given the wrong context from step 1. Step 3 happens. By step 10, the agent has spiraled into a state where everything looks coherent to it, but the real-world impact is catastrophic. Who bears responsibility? The person who didn't implement circuit breakers or human checkpoints in the loop.

These aren't mutually exclusive. Most failures involve some combination. But mapping which failure mode occurred helps you identify where the accountability actually lies.

Governance Standards — What Regulations Require

EU AI Act — Article 13: Transparency

High-risk AI systems must be transparent enough for deployers to understand their outputs and identify failures. The trading agent had no rate limit and no circuit breaker — but it also had no transparency mechanism that would have told human traders *why* it was escalating, how confident it was, or what signal it was responding to. Art.13 requires that AI systems provide sufficient information for meaningful human oversight. If operators can't understand what the agent is doing or why, transparency compliance is missing.

NIST AI RMF — MEASURE Function

MEASURE is about quantifying AI risk in context: tracking performance metrics, detecting anomalies, and evaluating whether the system is behaving as intended. A trading agent with no anomaly detection — no alarm when it executes 1,000 trades in 60 seconds instead of its normal 10 — has no MEASURE function. This is a design choice, not an oversight. When accountability is assigned, the absence of measurement is itself a failure of the deployer's risk management obligation.

UNESCO AI Recommendation — Transparency & Explainability

UNESCO's 2021 AI ethics recommendation calls for AI systems to be transparent about their operation and to allow affected parties to understand and contest AI-driven decisions. For agent systems that cause harm — financial loss, content removal, biased hiring — UNESCO's transparency principle asks: could the people affected by this agent's decisions understand how those decisions were made? And if not, who chose to deploy a system that made consequential decisions in the dark?

Context

The Accountability Map

2 min read

Three stakeholders bear responsibility in any agent failure. Accountability maps to three decisions: the building, the deployment, and the oversight.

Developer Accountability — for the reasoning and the tools

The developer chose the model architecture. The developer designed the tools and their capabilities. The developer set the constraints on tool use. The developer decided what information the agent could access and how it would reason. If the agent's reasoning is flawed, if the tools are dangerous, if the architecture is fragile — that's the developer's responsibility.

Deployer Accountability — for the context and the oversight

The deployer chose what data to give the agent. The deployer set the optimization targets. The deployer decided what oversight existed — human-in-the-loop, human-on-the-loop, or full autonomy. The deployer chose rate limits, circuit breakers, and kill switches. If the agent was given bad data, wrong objectives, or insufficient oversight — that's the deployer's responsibility.

User Accountability — for the instructions and the verification

The user gave the agent instructions. The user may not have verified the output before acting on it. The user may have trusted the agent more than was justified. If the user relied on the agent's output without checking, and that reliance caused harm — the user bears some responsibility for not verifying.

Regulatory Accountability — for compliance gaps

Where agent failures involve high-risk domains (employment, finance, social infrastructure), EU AI Act Article 13 assigns transparency obligations to the deployer. NIST MEASURE places ongoing risk quantification responsibility on whoever operates the system. If these obligations were not met — if the agent had no explainability, no anomaly detection, no audit log — that is not just an engineering gap. It is a regulatory violation. Accountability maps to the party who was responsible for ensuring compliance and chose not to.

In most real failures, responsibility is distributed across all three. The question is which party had the best opportunity to prevent the specific harm, and that determines where primary accountability sits.

⚔ Debate Lab

Accountability Debate

~20 minutes · 3 scenarios

What you'll do

Three real-world agent failures. You assign accountability for each and defend your reasoning when challenged.

The three failure modes

Goal Misspecification

Tool Misuse

Cascading Errors

Governance standards — apply to each

EU AI Act Art.13 — Was the agent transparent enough for humans to detect the failure?

NIST MEASURE — Was performance being monitored? Were anomalies detected?

UNESCO Transparency — Could affected parties understand and contest the decisions?

Accountability stakeholders

Developer — built the agent

Deployer — deployed it

User — used its output

Scenarios

Content moderation failure · Hiring bias agent · Trading system crash

Shift + Enter for a new line

✓ Module Complete

You've completed Module 3 of 8.

Next Module →