Intro

Build a Research Agent

2 min read

A research agent is one of the first kinds of agents many practitioners deploy — and one of the most likely to hallucinate without the right architecture.

The task seems simple: given a research question, find relevant sources, synthesize findings, produce a document. But "find" and "synthesize" require precision. An agent that confabulates citations, mixes sources with conflicting methodologies, or misses recent evidence isn't just unhelpful — it's dangerous.

This module teaches you to design a complete research agent from architecture through deployment.

By the end of this module, you will:

Design a complete research agent architecture with decomposition, routing, and verification
Write the system prompt for a research agent with specific scope boundaries
Specify the tool stack: databases, verification, synthesis
Design the loop logic and confidence flagging for uncertain claims
Document the output format and human review gates
Identify where hallucination is most likely and how to prevent it

Portfolio Artifact — Build A complete research agent specification including architecture, system prompt, and output format

Scenario

The Briefing That Didn't

3 min read

A policy think tank wants an agent that can research legislative proposals. When a bill is introduced in Congress, the agent reads it, researches relevant precedents in law and policy, identifies what other jurisdictions have done, and produces a briefing document for legislators.

Currently a researcher does this manually. It takes three days. The think tank's director wants the agent to do it in four hours.

The first draft of the agent works. It searches case law, finds precedents, reads relevant policy papers, synthesizes them. For six weeks it produces briefings that are useful — legislators cite them, other think tanks pick them up.

Then, in week seven, a briefing reaches congressional staff. The briefing claims that a specific law — actually passed and in effect — says the opposite of what it actually says. The staff fact-checks it. The law is on the books. The agent's claim is backwards.

The agent had searched correctly. It found the actual law. But in the synthesis step, it misread the implication and inverted the meaning. The output was internally consistent — it cited real laws, the citations were accurate — but the conclusion was wrong.

The question became: how do you build a research agent that can't invert the meaning of sources it's reading?

Lesson

Research Agent Architecture

3 min read

A research agent has five critical components.

1. Question Decomposition

Break the research task into answerable sub-questions. A research question "What are the policy precedents for AI regulation?" needs to decompose into: "Which countries have passed AI regulations? What frameworks do they use? What problems did they encounter? How have frameworks evolved?" Each sub-question is answerable; the combination is comprehensive.

2. Source Routing

Different questions need different data sources. "What frameworks exist" goes to policy databases. "What problems did they encounter" goes to research papers and case studies. "How have they evolved" goes to legislative history. Match sub-questions to the right sources, not just "search everywhere."

3. Retrieval and Verification

Pull sources and verify they exist. Don't hallucinate citations. If a search returns nothing, say so explicitly. If you can't find a source for a claim, flag it for human review, don't invent one.

4. Synthesis

Combine findings from multiple sources. But be explicit about conflicts: "Source A says X, Source B says Y, they conflict because..." Don't hide uncertainty; expose it. This is where misreading happens most often.

5. Confidence Flagging

Mark claims the agent is uncertain about. "The legislative history suggests X, but only one source explicitly states this" or "I found no precedent for this situation." Let humans make the final judgment on uncertain claims.

Each component protects against different failure modes. Decomposition prevents missing important sub-questions. Routing ensures you're looking in the right place. Verification prevents hallucination. Synthesis prevents confusion. Confidence flagging enables human oversight.

Context

Where Research Agents Fail

2 min read

Three specific failure modes plague research agents.

Hallucinated citations — the agent generates sources that don't exist

The agent is supposed to cite sources. When it can't find a source, it generates one that sounds plausible. It's not malicious; it's trying to be complete. But a hallucinated citation is worse than no citation because it's convincing and false.

Source mixing — combining findings from sources with incompatible methodologies

An agent synthesizes findings from a longitudinal study (20-year data) and a small pilot study (50 subjects) as though they're equivalent. It cites both, but doesn't acknowledge the methodological difference. The synthesis is misleading.

Recency blindness — pulling older evidence when newer evidence exists

An agent completes research, cites sources from 2022, and misses a major 2024 study that contradicts the 2022 findings. The agent's search was correct; the problem is no recency check was built into the verification step.

These failures point to specific architectural fixes: explicit hallucination prevention, methodological comparison in synthesis, and timestamp-aware searching.

🔨 Build Lab

Design a Research Agent

~30 minutes

What you'll do

Design a complete research agent specification. You'll map architecture, choose tools, write the system prompt, and identify failure points.

The five components

Question Decomposition

Source Routing

Retrieval & Verification

Synthesis

Confidence Flagging

Domains to choose from

Policy research, competitive intelligence, scientific literature, legal research, market analysis.

Shift + Enter for a new line

✓ Module Complete

You've completed Module 5 of 8.

Next Module →