Infrastructure

GraphRAG explained: knowledge graphs and better retrieval

May 12, 2026 · 8 min read

Retrieval-augmented generation (RAG) changed how language models access external knowledge. Instead of encoding everything in weights, you retrieve relevant context at inference time and hand it to the model. It works. But it has a ceiling, and that ceiling shows up clearly when you start building serious agent systems.

GraphRAG is an approach to retrieval that uses a knowledge graph as its substrate instead of a flat vector index. The retrieval is structurally aware: it follows relationships between concepts rather than just finding the most semantically similar chunks. The results are meaningfully better for questions that require synthesizing information across multiple sources, which is exactly the kind of question agents tend to ask.

The limits of standard RAG

Standard RAG works like this: you chunk your documents, embed each chunk as a vector, and at query time you embed the question and retrieve the top-k most similar chunks. Those chunks go into the model's context window alongside the query.

This is effective for local questions: "What does this document say about X?" But agents rarely ask local questions. They ask relational ones: "What decisions have produced good outcomes in situations similar to this one?" "What is the relationship between these two concepts across all the material I have processed?" "Which sources agree on this, and which contradict it?"

Flat vector retrieval cannot answer these questions well. It finds similar chunks, not connected reasoning chains. You get relevant fragments, not a coherent picture.

Vector search finds what looks like the answer. Graph traversal finds what connects to the answer.

What GraphRAG actually does

GraphRAG builds a knowledge graph from your source material during indexing, then uses that graph during retrieval. Instead of returning top-k similar chunks, it traverses the graph to find structurally relevant context: concepts connected to the query, evidence chains that support or complicate the answer, and hub nodes that bridge multiple relevant areas.

Microsoft Research published significant work on this in 2024, demonstrating that graph-based retrieval substantially outperformed standard RAG on questions requiring global synthesis across a large document corpus. Their key finding: for complex, multi-hop questions, the graph structure provides retrieval signal that pure embedding similarity cannot.

The core mechanism is community detection. The graph is partitioned into clusters of tightly connected nodes, and each cluster gets a summary. When a question comes in, retrieval starts at the community level (coarse, high-level context) and drills into individual nodes (specific facts and relationships). This two-level retrieval gives the model both context and detail.

Why graphs improve retrieval quality

The improvement comes from two properties that flat indexes do not have.

First, graphs encode relationships explicitly. A vector index knows that two chunks are similar, but it does not know how they are related. A knowledge graph knows that concept A "caused" event B, which "contradicts" claim C. That relationship structure is retrieval signal you cannot get from embedding distance alone.

Second, graphs support multi-hop reasoning. An agent can follow a chain: decision D was triggered by observation O, which references tool T, which has a known failure mode F. Answering "what could cause D to fail?" requires traversing that chain. A flat index can retrieve each of those nodes independently, but it cannot surface the chain as a unit.

What this means for agent memory

The implications for agent memory systems are direct. An agent that stores its operational history as a knowledge graph can retrieve not just "what happened in similar situations" but "what chain of events led to outcomes like the one I am trying to produce or avoid."

This is qualitatively different from embedding-based memory retrieval. It gives agents access to their own reasoning chains, not just their past outputs. The memory is navigable, not just searchable.

It also changes what "relevance" means during retrieval. In a flat index, relevance is embedding similarity to the current query. In a graph, relevance includes structural proximity: a node that is three hops from the query concept but sits on the critical path of the reasoning chain is more useful than a node with high embedding similarity but no structural connection.

How VanillaGraph relates to GraphRAG

VanillaGraph is not a full GraphRAG implementation, but it builds the foundation that GraphRAG requires: a structured, typed knowledge graph generated from your source material using wikilink-based relationships.

The pipeline takes documents and produces a graph of typed nodes connected by wikilinks. That graph is the substrate. What you do with it during retrieval is up to your agent architecture. You can run embedding search over the nodes, traverse the graph using the wikilink structure, or combine both approaches.

The reason we ship it this way is intentional. GraphRAG retrieval strategies are still evolving. The community detection approach from Microsoft Research is one method. Traversal-first approaches are another. The right strategy depends on your data structure and query patterns. Rather than baking in a retrieval strategy, we give you the graph and clear interfaces, so you can implement the retrieval logic that fits your use case.

Tradeoffs worth knowing

GraphRAG is not strictly better than standard RAG in every situation. A few honest tradeoffs.

Indexing cost is higher. Building a knowledge graph requires more work than chunking and embedding. The pipeline has to extract entities, detect relationships, and maintain the graph structure as new material is added. For small, static document sets, standard RAG may be simpler and fast enough.

Graph quality matters a lot. A poorly constructed graph with missing or incorrect relationships will produce worse retrieval than a well-tuned vector index. The quality of your retrieval is bounded by the quality of your graph construction.

Not all questions are relational. For simple factual lookups, standard RAG is fine. GraphRAG's advantage is on complex, multi-source, relationship-dependent questions. If your agent mostly asks simple questions, the additional complexity may not be worth it.

The cases where GraphRAG clearly wins are agents with long operational histories, multi-agent systems where context spans multiple sources, and any situation where reasoning chains matter more than individual facts.