AutoResearch: Autonomous Paper-Writing Agent with Claude

Source post: @datasciencebrain Telegram (crosspost from Hasan Toor on X)
Scraped claim: "Takes a research idea and outputs a full academic paper" with genuine citations, experiments, and conference-ready LaTeX, no human intervention.
Stack built here: Claude + LangGraph + arXiv API + Tavily search + LaTeX
The core idea
A single-prompt LLM call can give you a blog post. A research paper needs:
- Literature review grounded in real citations
- A genuine research question / hypothesis
- Experiments or analysis, actual runs, not made-up numbers
- Structured LaTeX output, abstract, intro, methods, results, conclusion
- Iterative refinement, the first draft is always rough
This is an agent orchestration problem, not a prompting problem. LangGraph is the right tool. Claude is the right brain (strong at structured writing, code, and reasoning over long contexts).
Architecture
topic
│
▼
[planner]────────▶ research_questions, sections_plan
│
▼
[literature_searcher]──▶ arXiv + web → papers + bibtex
│
▼
[experiment_designer]──▶ executable Python
│
▼
[experiment_runner] ───▶ results.json + plots
│
▼
[writer]──────────▶ LaTeX draft per section
│
▼
[reviewer]────────▶ critique
│
├─ needs_work ──▶ [writer] (revise)
└─ approved ───▶ [compiler]──▶ paper.pdf
Complete implementation
1. Install
pip install langgraph anthropic arxiv tavily-python matplotlib numpy
# LaTeX: install TeXLive or MiKTeX locally for pdflatex
2. State + setup
import os, subprocess, json
from typing import List, Dict
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
import anthropic, arxiv
from tavily import TavilyClient
claude = anthropic.Anthropic()
tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
MODEL = "claude-opus-4-7"
class PaperState(TypedDict):
topic: str
research_questions: List[str]
outline: Dict[str, str] # section -> what it should cover
papers: List[Dict] # {title, authors, year, abstract, bibtex}
experiment_code: str
experiment_results: Dict
draft: Dict[str, str] # section -> latex
critique: str
revision_count: int
final_tex: str
pdf_path: str
def ask_claude(system: str, user: str, max_tokens: int = 4096) -> str:
resp = claude.messages.create(
model=MODEL, max_tokens=max_tokens,
system=system,
messages=[{"role": "user", "content": user}],
)
return resp.content[0].text
3. Planner: decompose topic into research questions and outline
def planner(state: PaperState) -> Dict:
out = ask_claude(
system="You are a research planner. Output ONLY valid JSON.",
user=(
f"Topic: {state['topic']}\n\n"
"Produce:\n"
'1. "research_questions": 2-4 specific, testable questions\n'
'2. "outline": {abstract, introduction, related_work, methods, experiments, results, discussion, conclusion} — '
"one sentence each describing what that section should cover\n\n"
"Return JSON only."
),
)
parsed = json.loads(out)
return {
"research_questions": parsed["research_questions"],
"outline": parsed["outline"],
"revision_count": 0,
}
4. Literature searcher: arXiv + web, with real citations
def literature_searcher(state: PaperState) -> Dict:
papers = []
# arXiv: academic source
for q in state["research_questions"]:
search = arxiv.Search(query=q, max_results=5, sort_by=arxiv.SortCriterion.Relevance)
for p in search.results():
papers.append({
"title": p.title,
"authors": [a.name for a in p.authors],
"year": p.published.year,
"abstract": p.summary[:500],
"arxiv_id": p.entry_id.split("/")[-1],
"bibtex": (
f"@article{{{p.entry_id.split('/')[-1]},\n"
f" title={{{p.title}}},\n"
f" author={{{' and '.join(a.name for a in p.authors)}}},\n"
f" year={{{p.published.year}}},\n"
f" eprint={{{p.entry_id.split('/')[-1]}}},\n"
f" archivePrefix={{arXiv}}\n}}"
),
})
# Tavily: recent web context (blogs, industry reports)
for q in state["research_questions"]:
for r in tavily.search(q, max_results=3)["results"]:
papers.append({"title": r["title"], "url": r["url"], "abstract": r["content"][:400]})
return {"papers": papers}
5. Experiment designer + runner: real code, real numbers
def experiment_designer(state: PaperState) -> Dict:
code = ask_claude(
system=(
"You are a research scientist. Write a Python script that runs a real, small "
"experiment answering the research questions. Use numpy, matplotlib, or sklearn. "
"Save numerical results to `results.json` and plots to `fig_*.png`. "
"Keep runtime under 60 seconds. Output ONLY the Python code, no markdown."
),
user=f"Research questions: {state['research_questions']}\nTopic: {state['topic']}",
)
# Strip markdown fences if present
code = code.replace("```python", "").replace("```", "").strip()
return {"experiment_code": code}
def experiment_runner(state: PaperState) -> Dict:
with open("experiment.py", "w") as f:
f.write(state["experiment_code"])
try:
subprocess.run(["python", "experiment.py"], check=True, timeout=120, capture_output=True)
with open("results.json") as f:
results = json.load(f)
except Exception as e:
results = {"error": str(e), "fallback": "experiment failed, will note in paper"}
return {"experiment_results": results}
6. Writer: LaTeX per section
LATEX_SYSTEM = (
"You are an academic paper writer. Output only LaTeX body content "
"(no preamble, no \\begin{document}). Use \\cite{arxiv_id} for citations. "
"Be precise, hedged, and quantitative. No marketing language."
)
def writer(state: PaperState) -> Dict:
draft = {}
context = (
f"Topic: {state['topic']}\n"
f"Research questions: {state['research_questions']}\n"
f"Experiment results: {json.dumps(state['experiment_results'])}\n"
f"Available citations:\n"
+ "\n".join(f"- {p['title']} ({p.get('arxiv_id','web')})" for p in state['papers'][:20])
)
if state.get("critique"):
context += f"\n\nPREVIOUS CRITIQUE TO ADDRESS:\n{state['critique']}"
for section, desc in state["outline"].items():
section_tex = ask_claude(
system=LATEX_SYSTEM,
user=f"{context}\n\nWrite the '{section}' section. Target: {desc}. 150-400 words.",
max_tokens=2048,
)
draft[section] = section_tex
return {"draft": draft}
7. Reviewer: critique loop (self-correction)
def reviewer(state: PaperState) -> Dict:
full = "\n\n".join(f"=== {k} ===\n{v}" for k, v in state["draft"].items())
critique = ask_claude(
system=(
"You are a tough peer reviewer. Identify: unsupported claims, missing citations, "
"logical gaps, overclaims, missing limitations. Return JSON: "
'{"approved": bool, "issues": [str], "suggestions": [str]}'
),
user=full,
)
parsed = json.loads(critique)
return {"critique": json.dumps(parsed)}
def should_revise(state: PaperState) -> str:
critique = json.loads(state["critique"])
if critique["approved"] or state["revision_count"] >= 2:
return "compile"
return "revise"
def increment_revision(state: PaperState) -> Dict:
return {"revision_count": state["revision_count"] + 1}
8. Compiler: assemble + pdflatex
LATEX_PREAMBLE = r"""
\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{graphicx,amsmath,cite,hyperref}
\title{%s}
\author{AutoResearch Agent (Claude)}
\date{\today}
\begin{document}
\maketitle
"""
def compiler(state: PaperState) -> Dict:
body = LATEX_PREAMBLE % state["topic"]
for section in ["abstract", "introduction", "related_work", "methods",
"experiments", "results", "discussion", "conclusion"]:
if section in state["draft"]:
body += f"\n\\section{{{section.replace('_',' ').title()}}}\n{state['draft'][section]}\n"
# Bibliography
body += "\n\\begin{thebibliography}{99}\n"
for p in state["papers"]:
if "bibtex" in p:
body += f"\\bibitem{{{p['arxiv_id']}}} {p['authors'][0]} et al. ({p['year']}). \\emph{{{p['title']}}}.\n"
body += "\\end{thebibliography}\n\\end{document}\n"
with open("paper.tex", "w", encoding="utf-8") as f:
f.write(body)
subprocess.run(["pdflatex", "-interaction=nonstopmode", "paper.tex"], capture_output=True)
return {"final_tex": body, "pdf_path": "paper.pdf"}
9. Wire the graph
g = StateGraph(PaperState)
g.add_node("plan", planner)
g.add_node("search_lit", literature_searcher)
g.add_node("design_exp", experiment_designer)
g.add_node("run_exp", experiment_runner)
g.add_node("write", writer)
g.add_node("review", reviewer)
g.add_node("revise_counter", increment_revision)
g.add_node("compile", compiler)
g.add_edge(START, "plan")
g.add_edge("plan", "search_lit")
g.add_edge("search_lit", "design_exp")
g.add_edge("design_exp", "run_exp")
g.add_edge("run_exp", "write")
g.add_edge("write", "review")
g.add_conditional_edges("review", should_revise, {"revise": "revise_counter", "compile": "compile"})
g.add_edge("revise_counter", "write") # loop back — writer uses state['critique']
g.add_edge("compile", END)
app = g.compile()
# Run
result = app.invoke(
{"topic": "Are smaller LLMs with chain-of-thought competitive with larger LLMs on math word problems?"},
{"recursion_limit": 25},
)
print("Paper compiled:", result["pdf_path"])
Safety notes and limitations
- Hallucinated citations are the #1 failure mode. The arXiv search here returns real papers. Never let the writer invent citation keys. Always constrain to the retrieved set (the writer prompt does this).
- Experiments can fail silently. The
experiment_runnercatches errors and the writer must acknowledge them. Don't let the reviewer pass a paper with fabricated results. - Reviewer bias toward approval. Set a hard
revision_count >= 2cap to avoid infinite loops, then force compile with flagged limitations. - This is not peer review. Outputs are research drafts, useful for accelerating writing, not replacing scholarly work.
Why Claude is the right choice for this
- Long context (200K). The reviewer can see the entire paper draft at once.
- Structured output. JSON for plans, LaTeX for sections, both reliable.
- Code generation. The experiment designer needs runnable Python and Claude is strong here.
- Hedged writing. Claude's default style suits academic voice (vs. ChatGPT's promotional tendency).
Resume angle
"Built an autonomous research agent with LangGraph + Claude: planner decomposes a topic, arXiv-grounded literature searcher retrieves real citations, code-gen agent designs and runs experiments, writer produces LaTeX per section, reviewer enforces self-critique loops, pdflatex compiles the output. End-to-end: idea to conference-format PDF in roughly 5 minutes."