Back to white papers
White Paper

Self-Healing RAG with LangGraph: Build It in 60 Minutes

Jake McCluskey
Self-Healing RAG with LangGraph: Build It in 60 Minutes

Source post: datasciencebrain Instagram, 10-slide carousel

Claim: Build in 60 minutes, free. Stack: LangGraph + Groq + ChromaDB

Underlying pattern: Self-RAG (LangChain/LangGraph reference architecture)

The core idea (what the slides are selling)

Naive RAG blindly trusts retrieved chunks and hallucinates with confidence. This system adds a feedback loop where the LLM grades its own work at three checkpoints:

  1. Are the retrieved docs actually relevant? (retrieval grader)
  2. Is the generated answer grounded in those docs? (hallucination grader)
  3. Does the answer actually address the question? (answer grader)

If any grader fails, rewrite the query and retry. If it still fails, return "I don't know" instead of hallucinating.

The graph (what the diagram on the slides shows)

START → retrieve → grade_documents → [decide_to_generate]
                                        ├─ relevant?  → generate
                                        └─ none good? → transform_query → retrieve (loop)

generate → [grade_generation_v_documents_and_question]
             ├─ "not supported"  → generate (retry)
             ├─ "not useful"     → transform_query → retrieve (loop)
             └─ "useful"         → END

Four nodes, two conditional edges. That's the whole system.

Complete implementation

1. Install

pip install langgraph langchain langchain-community langchain-groq chromadb fastembed pydantic

2. Setup: Groq + Chroma + embeddings

import os
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from pydantic import BaseModel, Field
from typing import List
from typing_extensions import TypedDict
from langgraph.graph import END, StateGraph, START

os.environ["GROQ_API_KEY"] = "YOUR_GROQ_KEY"   # free tier available

llm = ChatGroq(model="llama-3.1-8b-instant", temperature=0)
embeddings = FastEmbedEmbeddings(model_name="BAAI/bge-base-en-v1.5")  # free, local

3. Build the vector store

urls = ["https://your-source-1", "https://your-source-2"]
docs = [WebBaseLoader(u).load() for u in urls]
docs_list = [d for sub in docs for d in sub]

splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250, chunk_overlap=0
)
splits = splitter.split_documents(docs_list)

vectorstore = Chroma.from_documents(
    documents=splits,
    collection_name="rag-chroma",
    embedding=embeddings,
)
retriever = vectorstore.as_retriever()

4. The three graders plus rewriter (the "self-healing" brain)

All graders return a binary yes/no using structured output.

class BinaryScore(BaseModel):
    binary_score: str = Field(description="'yes' or 'no'")

def grading_prompt(system, human):
    return ChatPromptTemplate.from_messages([("system", system), ("human", human)])

# A. Retrieval grader — is this chunk actually relevant?
retrieval_grader = grading_prompt(
    "You are a grader assessing the relevance of a retrieved document to a user question. "
    "If it contains keywords or semantic meaning related to the question, grade it relevant. "
    "Give a binary 'yes' or 'no'.",
    "Retrieved document:\n\n{document}\n\nUser question: {question}"
) | llm.with_structured_output(BinaryScore)

# B. Hallucination grader — is the answer grounded in the docs?
hallucination_grader = grading_prompt(
    "You are a grader assessing whether an LLM generation is grounded in / supported by "
    "a set of retrieved facts. Give a binary 'yes' or 'no'.",
    "Set of facts:\n\n{documents}\n\nLLM generation: {generation}"
) | llm.with_structured_output(BinaryScore)

# C. Answer grader — does the answer actually resolve the question?
answer_grader = grading_prompt(
    "You are a grader assessing whether an answer addresses / resolves a question. "
    "Give a binary 'yes' or 'no'.",
    "User question:\n\n{question}\n\nLLM generation: {generation}"
) | llm.with_structured_output(BinaryScore)

# D. Question rewriter — when retrieval fails, improve the query
question_rewriter = grading_prompt(
    "You are a question re-writer that converts an input question to a better version "
    "optimized for vectorstore retrieval. Reason about the underlying semantic intent.",
    "Initial question:\n\n{question}\n\nFormulate an improved question."
) | llm | StrOutputParser()

# E. RAG chain — the actual answer generator
rag_prompt = ChatPromptTemplate.from_template(
    "Answer the question using only the context.\n\nContext: {context}\n\nQuestion: {question}"
)
rag_chain = rag_prompt | llm | StrOutputParser()

5. Graph state

class GraphState(TypedDict):
    question: str
    generation: str
    documents: List[str]

6. The four node functions

def retrieve(state):
    docs = retriever.invoke(state["question"])
    return {"documents": docs, "question": state["question"]}

def grade_documents(state):
    q = state["question"]
    kept = [
        d for d in state["documents"]
        if retrieval_grader.invoke({"question": q, "document": d.page_content}).binary_score == "yes"
    ]
    return {"documents": kept, "question": q}

def generate(state):
    gen = rag_chain.invoke({"context": state["documents"], "question": state["question"]})
    return {"documents": state["documents"], "question": state["question"], "generation": gen}

def transform_query(state):
    better = question_rewriter.invoke({"question": state["question"]})
    return {"documents": state["documents"], "question": better}

7. The two decision edges (the self-healing logic)

def decide_to_generate(state):
    # After grading: do we have ANY relevant docs?
    return "generate" if state["documents"] else "transform_query"

def grade_generation_v_documents_and_question(state):
    # 1. Is the generation grounded? (hallucination check)
    grounded = hallucination_grader.invoke({
        "documents": state["documents"], "generation": state["generation"]
    }).binary_score
    if grounded != "yes":
        return "not supported"      # retry generate

    # 2. Does the answer address the question?
    useful = answer_grader.invoke({
        "question": state["question"], "generation": state["generation"]
    }).binary_score
    return "useful" if useful == "yes" else "not useful"  # END vs. rewrite query

8. Wire the graph

workflow = StateGraph(GraphState)

workflow.add_node("retrieve", retrieve)
workflow.add_node("grade_documents", grade_documents)
workflow.add_node("generate", generate)
workflow.add_node("transform_query", transform_query)

workflow.add_edge(START, "retrieve")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents", decide_to_generate,
    {"transform_query": "transform_query", "generate": "generate"},
)
workflow.add_edge("transform_query", "retrieve")
workflow.add_conditional_edges(
    "generate", grade_generation_v_documents_and_question,
    {"not supported": "generate", "useful": END, "not useful": "transform_query"},
)

app = workflow.compile()

9. Run it

result = app.invoke(
    {"question": "How do transformers handle long context?"},
    {"recursion_limit": 10}    # caps the retry loop → this is where "I don't know" falls out
)
print(result["generation"])

The "I don't know" behavior comes from the recursion_limit. When the rewrite then retrieve then generate loop burns through its attempts without passing the graders, LangGraph raises GraphRecursionError. Catch it and return a fallback:

from langgraph.errors import GraphRecursionError

try:
    result = app.invoke({"question": q}, {"recursion_limit": 6})
    answer = result["generation"]
except GraphRecursionError:
    answer = "I don't know — I couldn't find grounded support in the documents."

Why this maps 1:1 to the Instagram slides

Slide claimCode location
"LLM doesn't know when it's wrong"Problem framing, no code
"Self-grades against retrieved docs"hallucination_grader + answer_grader
"Rewrite the question and retry"transform_query node + edge back to retrieve
"Return honest 'I don't know'"recursion_limit + GraphRecursionError catch
"LangGraph + Groq + ChromaDB"StateGraph, ChatGroq, Chroma
"60 minutes, free"FastEmbed (local embeddings) + Groq free tier + Chroma local

Original sources this is derived from

  • LangChain official Self-RAG tutorial (the canonical reference this Instagram post repackages)
  • DataCamp Self-RAG tutorial: https://www.datacamp.com/tutorial/self-rag
  • Medium "Build a Reliable RAG Agent using LangGraph" (Plaban Nayak), adds a web-search fallback branch
  • Medium "Self-Healing RAG" (Toni Ramchandani), paywalled, same pattern

The Instagram carousel is a visual repackaging of the standard Self-RAG pattern. Nothing novel, but the diagram-first explanation is the value.