A research agent is one of the first kinds of agents many practitioners deploy — and one of the most likely to hallucinate without the right architecture.
The task seems simple: given a research question, find relevant sources, synthesize findings, produce a document. But "find" and "synthesize" require precision. An agent that confabulates citations, mixes sources with conflicting methodologies, or misses recent evidence isn't just unhelpful — it's dangerous.
This module teaches you to design a complete research agent from architecture through deployment.
A policy think tank wants an agent that can research legislative proposals. When a bill is introduced in Congress, the agent reads it, researches relevant precedents in law and policy, identifies what other jurisdictions have done, and produces a briefing document for legislators.
Currently a researcher does this manually. It takes three days. The think tank's director wants the agent to do it in four hours.
The first draft of the agent works. It searches case law, finds precedents, reads relevant policy papers, synthesizes them. For six weeks it produces briefings that are useful — legislators cite them, other think tanks pick them up.
Then, in week seven, a briefing reaches congressional staff. The briefing claims that a specific law — actually passed and in effect — says the opposite of what it actually says. The staff fact-checks it. The law is on the books. The agent's claim is backwards.
The agent had searched correctly. It found the actual law. But in the synthesis step, it misread the implication and inverted the meaning. The output was internally consistent — it cited real laws, the citations were accurate — but the conclusion was wrong.
The question became: how do you build a research agent that can't invert the meaning of sources it's reading?
A research agent has five critical components.
Break the research task into answerable sub-questions. A research question "What are the policy precedents for AI regulation?" needs to decompose into: "Which countries have passed AI regulations? What frameworks do they use? What problems did they encounter? How have frameworks evolved?" Each sub-question is answerable; the combination is comprehensive.
Different questions need different data sources. "What frameworks exist" goes to policy databases. "What problems did they encounter" goes to research papers and case studies. "How have they evolved" goes to legislative history. Match sub-questions to the right sources, not just "search everywhere."
Pull sources and verify they exist. Don't hallucinate citations. If a search returns nothing, say so explicitly. If you can't find a source for a claim, flag it for human review, don't invent one.
Combine findings from multiple sources. But be explicit about conflicts: "Source A says X, Source B says Y, they conflict because..." Don't hide uncertainty; expose it. This is where misreading happens most often.
Mark claims the agent is uncertain about. "The legislative history suggests X, but only one source explicitly states this" or "I found no precedent for this situation." Let humans make the final judgment on uncertain claims.
Each component protects against different failure modes. Decomposition prevents missing important sub-questions. Routing ensures you're looking in the right place. Verification prevents hallucination. Synthesis prevents confusion. Confidence flagging enables human oversight.
Three specific failure modes plague research agents.
The agent is supposed to cite sources. When it can't find a source, it generates one that sounds plausible. It's not malicious; it's trying to be complete. But a hallucinated citation is worse than no citation because it's convincing and false.
An agent synthesizes findings from a longitudinal study (20-year data) and a small pilot study (50 subjects) as though they're equivalent. It cites both, but doesn't acknowledge the methodological difference. The synthesis is misleading.
An agent completes research, cites sources from 2022, and misses a major 2024 study that contradicts the 2022 findings. The agent's search was correct; the problem is no recency check was built into the verification step.
These failures point to specific architectural fixes: explicit hallucination prevention, methodological comparison in synthesis, and timestamp-aware searching.