Self-Healing RAG with LangGraph: Build It in 60 Minutes

Source post: datasciencebrain Instagram, 10-slide carousel
Claim: Build in 60 minutes, free. Stack: LangGraph + Groq + ChromaDB
Underlying pattern: Self-RAG (LangChain/LangGraph reference architecture)
The core idea (what the slides are selling)
Naive RAG blindly trusts retrieved chunks and hallucinates with confidence. This system adds a feedback loop where the LLM grades its own work at three checkpoints:
- Are the retrieved docs actually relevant? (retrieval grader)
- Is the generated answer grounded in those docs? (hallucination grader)
- Does the answer actually address the question? (answer grader)
If any grader fails, rewrite the query and retry. If it still fails, return "I don't know" instead of hallucinating.
The graph (what the diagram on the slides shows)
START → retrieve → grade_documents → [decide_to_generate]
├─ relevant? → generate
└─ none good? → transform_query → retrieve (loop)
generate → [grade_generation_v_documents_and_question]
├─ "not supported" → generate (retry)
├─ "not useful" → transform_query → retrieve (loop)
└─ "useful" → END
Four nodes, two conditional edges. That's the whole system.
Complete implementation
1. Install
pip install langgraph langchain langchain-community langchain-groq chromadb fastembed pydantic
2. Setup: Groq + Chroma + embeddings
import os
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from pydantic import BaseModel, Field
from typing import List
from typing_extensions import TypedDict
from langgraph.graph import END, StateGraph, START
os.environ["GROQ_API_KEY"] = "YOUR_GROQ_KEY" # free tier available
llm = ChatGroq(model="llama-3.1-8b-instant", temperature=0)
embeddings = FastEmbedEmbeddings(model_name="BAAI/bge-base-en-v1.5") # free, local
3. Build the vector store
urls = ["https://your-source-1", "https://your-source-2"]
docs = [WebBaseLoader(u).load() for u in urls]
docs_list = [d for sub in docs for d in sub]
splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
chunk_size=250, chunk_overlap=0
)
splits = splitter.split_documents(docs_list)
vectorstore = Chroma.from_documents(
documents=splits,
collection_name="rag-chroma",
embedding=embeddings,
)
retriever = vectorstore.as_retriever()
4. The three graders plus rewriter (the "self-healing" brain)
All graders return a binary yes/no using structured output.
class BinaryScore(BaseModel):
binary_score: str = Field(description="'yes' or 'no'")
def grading_prompt(system, human):
return ChatPromptTemplate.from_messages([("system", system), ("human", human)])
# A. Retrieval grader — is this chunk actually relevant?
retrieval_grader = grading_prompt(
"You are a grader assessing the relevance of a retrieved document to a user question. "
"If it contains keywords or semantic meaning related to the question, grade it relevant. "
"Give a binary 'yes' or 'no'.",
"Retrieved document:\n\n{document}\n\nUser question: {question}"
) | llm.with_structured_output(BinaryScore)
# B. Hallucination grader — is the answer grounded in the docs?
hallucination_grader = grading_prompt(
"You are a grader assessing whether an LLM generation is grounded in / supported by "
"a set of retrieved facts. Give a binary 'yes' or 'no'.",
"Set of facts:\n\n{documents}\n\nLLM generation: {generation}"
) | llm.with_structured_output(BinaryScore)
# C. Answer grader — does the answer actually resolve the question?
answer_grader = grading_prompt(
"You are a grader assessing whether an answer addresses / resolves a question. "
"Give a binary 'yes' or 'no'.",
"User question:\n\n{question}\n\nLLM generation: {generation}"
) | llm.with_structured_output(BinaryScore)
# D. Question rewriter — when retrieval fails, improve the query
question_rewriter = grading_prompt(
"You are a question re-writer that converts an input question to a better version "
"optimized for vectorstore retrieval. Reason about the underlying semantic intent.",
"Initial question:\n\n{question}\n\nFormulate an improved question."
) | llm | StrOutputParser()
# E. RAG chain — the actual answer generator
rag_prompt = ChatPromptTemplate.from_template(
"Answer the question using only the context.\n\nContext: {context}\n\nQuestion: {question}"
)
rag_chain = rag_prompt | llm | StrOutputParser()
5. Graph state
class GraphState(TypedDict):
question: str
generation: str
documents: List[str]
6. The four node functions
def retrieve(state):
docs = retriever.invoke(state["question"])
return {"documents": docs, "question": state["question"]}
def grade_documents(state):
q = state["question"]
kept = [
d for d in state["documents"]
if retrieval_grader.invoke({"question": q, "document": d.page_content}).binary_score == "yes"
]
return {"documents": kept, "question": q}
def generate(state):
gen = rag_chain.invoke({"context": state["documents"], "question": state["question"]})
return {"documents": state["documents"], "question": state["question"], "generation": gen}
def transform_query(state):
better = question_rewriter.invoke({"question": state["question"]})
return {"documents": state["documents"], "question": better}
7. The two decision edges (the self-healing logic)
def decide_to_generate(state):
# After grading: do we have ANY relevant docs?
return "generate" if state["documents"] else "transform_query"
def grade_generation_v_documents_and_question(state):
# 1. Is the generation grounded? (hallucination check)
grounded = hallucination_grader.invoke({
"documents": state["documents"], "generation": state["generation"]
}).binary_score
if grounded != "yes":
return "not supported" # retry generate
# 2. Does the answer address the question?
useful = answer_grader.invoke({
"question": state["question"], "generation": state["generation"]
}).binary_score
return "useful" if useful == "yes" else "not useful" # END vs. rewrite query
8. Wire the graph
workflow = StateGraph(GraphState)
workflow.add_node("retrieve", retrieve)
workflow.add_node("grade_documents", grade_documents)
workflow.add_node("generate", generate)
workflow.add_node("transform_query", transform_query)
workflow.add_edge(START, "retrieve")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
"grade_documents", decide_to_generate,
{"transform_query": "transform_query", "generate": "generate"},
)
workflow.add_edge("transform_query", "retrieve")
workflow.add_conditional_edges(
"generate", grade_generation_v_documents_and_question,
{"not supported": "generate", "useful": END, "not useful": "transform_query"},
)
app = workflow.compile()
9. Run it
result = app.invoke(
{"question": "How do transformers handle long context?"},
{"recursion_limit": 10} # caps the retry loop → this is where "I don't know" falls out
)
print(result["generation"])
The "I don't know" behavior comes from the recursion_limit. When the rewrite then retrieve then generate loop burns through its attempts without passing the graders, LangGraph raises GraphRecursionError. Catch it and return a fallback:
from langgraph.errors import GraphRecursionError
try:
result = app.invoke({"question": q}, {"recursion_limit": 6})
answer = result["generation"]
except GraphRecursionError:
answer = "I don't know — I couldn't find grounded support in the documents."
Why this maps 1:1 to the Instagram slides
| Slide claim | Code location |
|---|---|
| "LLM doesn't know when it's wrong" | Problem framing, no code |
| "Self-grades against retrieved docs" | hallucination_grader + answer_grader |
| "Rewrite the question and retry" | transform_query node + edge back to retrieve |
| "Return honest 'I don't know'" | recursion_limit + GraphRecursionError catch |
| "LangGraph + Groq + ChromaDB" | StateGraph, ChatGroq, Chroma |
| "60 minutes, free" | FastEmbed (local embeddings) + Groq free tier + Chroma local |
Original sources this is derived from
- LangChain official Self-RAG tutorial (the canonical reference this Instagram post repackages)
- DataCamp Self-RAG tutorial: https://www.datacamp.com/tutorial/self-rag
- Medium "Build a Reliable RAG Agent using LangGraph" (Plaban Nayak), adds a web-search fallback branch
- Medium "Self-Healing RAG" (Toni Ramchandani), paywalled, same pattern
The Instagram carousel is a visual repackaging of the standard Self-RAG pattern. Nothing novel, but the diagram-first explanation is the value.