A Practical Guide for Enterprises Evaluating GraphRAG

April 2026

This paper is about what it actually takes to give an LLM reliable access to enterprise data — and why building a domain ontology, properly, is the foundational investment that makes everything else work.

Business executives are being told, with great enthusiasm, that they can point a large language model at their data and start asking questions in plain English. The demo usually looks impressive, but things get complicated a few months into the project. The pitch is not wrong. LLMs really can provide natural language access to structured and unstructured data. They also can assist with data analysis. The problem is not the outcome — it’s the gap between the demo and making it work reliably, at scale, on actual enterprise data.

That gap has a name: data quality. Most enterprises have a data quality problem that they have been deferring for years. Like a toddler fed on junk food, an LLM fed on junk data will act up. It will confidently answer questions based on whatever data it is given. If that data is incomplete, inconsistent, ambiguous, or poorly structured, the LLM’s answers will be confidently wrong. The model is not broken. The data is broken. The LLM just made it visible. That new visibility is frustrating executives and making life uncomfortable for the IT department.

This document is about what it actually takes to give an LLM reliable access to enterprise data — and why building a domain ontology, properly, is the foundational investment that makes everything else work.

A note on scopeThis document addresses GraphRAG — the pattern of combining a knowledge graph with an LLM so that answers are grounded in your own data. It does not address general LLM fine-tuning or embedding-only RAG. Both of those approaches have their place, but neither solves the structural data problems this document discusses.

The Junkyard Problem

Large enterprises have been moving data around for years: Shared directories. Data warehouses. Data lakes. Data meshes. Each new architectural pattern promised to make data accessible and useful. Many of them delivered on that promise for specific use cases — reporting, analytics, dashboards — while quietly inheriting the quality problems of the systems that fed them.

Here is a description I have used with clients that tends to land: imagine you have six junkyards. Each one has been accumulating parts for decades. Some of those parts are useful. Many are not. The labeling is inconsistent. Duplicates abound. Nobody knows exactly what is in any of them.

Now imagine consolidating all six into one bigger junkyard. You have not solved anything. You have just built a bigger pile of the same junk, and now it takes longer to search through it. This is, unfortunately, an accurate description of a significant percentage of enterprise data consolidation efforts. The problems just move with the data.

What GraphRAG does — and this is important to understand — is not solve the junkyard problem. It does not fix data quality. It does not eliminate inconsistency. What it does is make the junkyard problem impossible to ignore. When a non-technical user asks a question in plain English and the answer is wrong or incomplete, the failure is visible in a way it never was when the same problem was buried inside a SQL query that only a data analyst ever looked at.

This is actually a good thing. Organizations that implement GraphRAG tend to end up with better data than they started with, not because the technology fixed anything, but because the visibility of the problem finally created the organizational will to address it.

Importantly, data pipelines now need to be built for data to be consumed by computers, not by humans. Humans are pretty forgiving consumers—just look at the behavior of the users of web search tools. You don’t have to supply those users with “correct data,” you just have to rank results. But LLMs aren’t so forgiving, especially when asked to summarize or count the totality of results. They don’t know where to differentiate between good results and bad. LLMs also don’t know how much of your corporate wiki is outdated garbage (hint: it’s probably a lot).

An LLM is Not a Data Quality Tool

It cannot:

Determine that two records with different identifiers refer to the same real-world entity

Resolve conflicting values for the same attribute across source systems

Know that a field labeled “status” in one system means something different from “status” in another

Fill gaps in data that was never captured in the first place

You Are Building a Data Pipeline

One of the most common misconceptions I encounter is that GraphRAG is an alternative to building a data pipeline. It is not. It is a data pipeline with a knowledge graph and an LLM at the end of it.

If you want an LLM to answer questions reliably about your business data, three things must be true: the data must be current, it must be clean, and it must be structured in a way the system can reason about. Ensuring all three is the job of data engineers, knowledge engineers, and data stewards. It is not a job the LLM can do for itself.

This is not a reason not to build a GraphRAG system. It is a reason to be clear-eyed about what you are committing to. Organizations that succeed at this treat it as a data management discipline first and an AI project second. Organizations that approach it the other way — “let’s get the AI working and figure out the data later” — tend to produce systems that look good in demos and fail in production.

The Three Roles That Matter

Role	Responsibility in a GraphRAG System

Data Engineers	Build and maintain the pipelines that move data from source systems into the graph. Ensure freshness, handle schema changes, manage failures.
Knowledge Engineers (Ontologists)	Design and maintain the ontology that gives the data its meaning. Define classes, predicates, and relationships. Ensure the ontology reflects how the business actually thinks about its domain.
Data Stewards	Own the quality of data within their domain. Resolve ambiguity, define canonical values, enforce standards. The ontology gives them a formal language to express those standards.

None of these roles is optional. A well-built ontology sitting on top of poor-quality data produces confidently wrong answers. Good-quality data without a well-built ontology produces a system that can answer only the questions the engineers happened to anticipate. Both problems are common. The solution to both is the same: invest in the ontology and in the data quality work together, not separately.

Ontology Is the Answer

An ontology is not a schema. This distinction matters.

A schema tells a database what tables and columns exist. An ontology tells a reasoning system what things mean — what they are, how they relate to each other, what can be inferred from those relationships, and what would be a logical contradiction. A schema is a description of storage. An ontology is a description of knowledge.

When you connect an LLM to a knowledge graph built on a well-constructed ontology, you give the model something it cannot get from a vector database or a property graph: a formal, machine-readable account of what the data means. The LLM does not have to guess that “customer” and “client” refer to the same concept. It does not have to infer that a contract has a party and a term and an obligation. Those relationships are declared, explicitly, in the ontology. The model works from those declarations rather than from statistical inference over text.

This is the difference between grounding and guessing. Both approaches can produce fluent, confident answers. Only one of them is reliably correct.

What the Ontology Gives the LLM

Without Ontology	With Ontology
“We asked the same question three different ways and got three different answers”	Meaning is formally defined once. Synonyms, aliases, and variant terms are declared. The same concept always resolves to the same answer.
“It found the document but missed the five related ones”	Typed relationships in the graph connect related entities. A query for a contract retrieves its parties, its obligations, its amendments — because those connections are declared, not inferred.
“The LLM keeps making up fields that don’t exist”	The ontology constrains what properties can exist on what entities. Hallucinated fields fail validation before they reach the user.
“It doesn’t understand that our ‘active’ means something specific”	Domain vocabulary is defined in the ontology with explicit scope and meaning. “Active” in your context means exactly what you declare it to mean.
“We added a new product line and had to update everything”	A new subclass is automatically included in all existing queries through the class hierarchy. The ontology absorbs the change; the queries do not need to.
“I didn’t know we had a PDF-to-partner deadline”	Predicate families group related properties. A query for “all deadlines” returns every deadline type in the ontology — including specialist ones users may not know by name.

The Serialization Advantage

There is a practical benefit to RDF-based ontologies that often goes unmentioned in evaluations: they are documents. A well-built RDFS/OWL ontology is a file you can version-control, publish, share with partners, align with industry standards, and load into a different system when your current one is no longer adequate.

This matters more than it sounds. Enterprises build systems that last for years. The vendor landscape shifts. Architectural assumptions change. An ontology expressed in RDF/OWL is portable in a way that a schema embedded in a specific database product is not. If you ever need to migrate, extend, federate, or audit your knowledge graph, having the ontology as a standalone, standards-based artifact is a significant advantage. It is also something you can show to a regulator, an auditor, or a business partner as a formal statement of how your data is structured and what it means.

“Can We Just Use Neo4j?”

This is, by some margin, the most common technology question I get asked in this space. Labeled property graphs (LPGs) — Neo4j being the most prominent — are fast to set up, have good developer tooling, and have a large community. The question is a reasonable one.

The benefits of LPGs are real but, in my experience, overstated for enterprise LLM grounding specifically. Let me explain why, and where the tradeoffs land.

What LPGs Do Well

LPGs are genuinely good at certain things: rapid prototyping, graph traversal on relatively clean, well-understood data, and developer onboarding. If you need something working quickly with a small, controlled dataset and a team of engineers who know the domain well, an LPG can get you there faster than an RDF-based knowledge graph.

That speed is the main thing LPGs have going for them in corporate settings. And it is worth acknowledging: for some projects, speed to a working prototype genuinely matters. Some vendors, including ArangoDB, have made meaningful progress on the scaling issues that historically limited LPGs. The technology is not standing still.

Where the Benefits

The problem is not whether LPGs scale. The problem is what they fundamentally are, and what that means for LLM grounding.

An LPG has a schema. Node labels, relationship types are property keys in a schema that lives inside the database. The schema is not an ontology. It cannot express class hierarchies, predicate families, domain and range constraints, or formal inference rules in any standard way. When you want the LLM to reason about what your data means, an LPG schema gives you much less to work with than an OWL ontology. Some LPG implementations attempt to layer ontological structure on top, but this adds complexity without the standards-based portability of native RDF. No native ontology.
LPGs require their own storage layer. In most enterprise environments, data lives in relational databases, data warehouses, document stores, APIs, and flat files. Getting that data into an LPG requires the same ETL work as any other consolidation effort — the same pipelines, the same data quality investment, the same maintenance burden. The speed-to-setup advantage erodes quickly once you are dealing with real enterprise data at real enterprise scale.The data still has to get there.
Hybrid Integrations are Common. In practice, most graph implementations in enterprise settings do not materialize all data in the graph. Some data is virtualized — meaning queries are rewritten on the fly against underlying relational databases. In an RDF-based knowledge graph, virtualization is well-supported through standards-based SPARQL-to-SQL rewriting. In an LPG environment, this kind of hybrid architecture is more complex and less standardized. The result is a system where some data goes through the graph and some does not, and the LLM receives inconsistent context depending on which path the query happened to take. This is a subtle but serious problem for reliability.Hybrid implementations are harder than they look.

Organizations that choose an LPG for speed and simplicity frequently discover, one to two years into the project, that the architecture does not support what the business has come to want from it. The ontological structure is not there to support complex semantic queries. The virtualization story does not work the way the vendor described. The schema has drifted as requirements changed and nobody maintained it. At that point, the options are to live with a system that does less than it should, or to start over with a proper RDF-based knowledge graph — and to rebuild the data pipelines, the data quality work, and the user trust that the first system failed to earn.You will likely start over.

I have seen this play out enough times that I no longer consider it an edge case. The time saved at the start is real. The cost paid at the restart is larger.

On ArangoDB and multi-model databases

Vendors like ArangoDB offer multi-model databases that combine graph, document, and key-value storage with improved scaling characteristics. These are genuine engineering advances. But the scaling argument misses the point. The issue with LPGs for LLM grounding is not primarily whether they can handle large graphs at speed. It is that they lack the ontological grounding — the formal semantics — that makes LLM answers reliable. A faster pile of unlabeled parts is still a pile of unlabeled parts.

What GraphRAG with an Ontology Actually Buys You

Let me describe concretely what a well-implemented GraphRAG system backed by a domain ontology does differently from keyword search, embedding-only RAG, or an LPG-based graph.

Queries That Know What They Are Looking For

A standard RAG system embeds your query and finds text that is semantically similar to it. This works reasonably well when the answer is in the document and the question is phrased similarly to the document’s language. It works poorly when the question uses different terminology from the document, when the answer requires combining information from multiple places, or when the question implies a relationship that is not stated explicitly in any single document.

An ontology-backed graph handles all three of those cases. Terminology is resolved through formal equivalence declarations — the ontology says that “client” and “customer” refer to the same class, so both terms retrieve the same results. Multi-source combination happens through graph traversal — the ontology defines what connects to what, so the system can follow those connections reliably. Implied relationships are made explicit as typed predicates — there is no need to infer that a contract has parties when the ontology declares it.

Property Families: The Unsung Hero

Here is a capability that rarely appears in vendor demos but delivers immediate operational value: RDF ontologies support property families through rdfs:subPropertyOf. This means you can group related properties under a common parent — all identifiers, all deadlines, all financial metrics — and query the entire family with a single statement. The query returns everything in the family, including properties added after the query was written.

The business implication is concrete. A user who asks “what are all the deadlines for this item?” gets a complete answer — including specialist deadlines like partner delivery cutoffs that sit outside the standard production schedule and that the user did not know to ask about. This is not something a keyword search, an embedding search, or an LPG schema can reliably deliver. It is a direct consequence of the formal ontological structure.

This is also where the LLM performance argument closes. Without property families, an LLM generating a query has to guess or enumerate specific property names. It misses the ones it does not know about. It sometimes invents ones that do not exist. With property families declared in the ontology, the LLM only needs to know the family name. Completeness is guaranteed by the ontology, not by the LLM’s training data.

The Comparison in Practice

	Embedding RAG	LPG + LLM	Ontology-backed GraphRAG
Terminology	Similarity-dependent: different terms may not match	Schema-dependent: only what the developer anticipated	Formally declared: synonyms, aliases, and equivalences are explicit
Relationships	Implicit: inferred from text proximity	Explicit but schema-only: no formal semantics	Explicit and typed: with domain/range constraints and inference
Completeness	Top-k: whatever scored highest	Whatever the query explicitly asked for	Guaranteed by subClassOf* and subPropertyOf*: all members of a class or property family
New data types	Re-embed: existing queries unchanged but may miss new content	Schema update required: queries may need revision	Declare as subclass or subproperty: automatically included in all existing queries
Portability	Embeddings are model-specific	Schema is database-specific	OWL ontology is a standards-based document: version-controlled, shareable, portable
LLM failure mode	Hallucination from vague retrieval	Missing schema entry = missing answer; hallucinated property names = no results	Missing ontology declaration = degraded but predictable fallback

What Good Looks Like

A well-implemented GraphRAG system with a domain ontology is not a moonshot. Enterprises that do this successfully follow a few simple rules:

Start with the Ontology, Not the Technology

Before choosing a graph database, before writing a line of pipeline code, before the first LLM API call: build the ontology. Map your domain — what are the things in your world, what are the relationships between them, what do you call them, what do they mean? This is knowledge engineering work. It requires domain experts, not just data engineers. It requires time. The organizations that try to shortcut this step — to derive the ontology from existing data, or to have an LLM generate it, or to adopt someone else’s generic ontology without adaptation — consistently produce systems that answer only a narrow range of questions well.

To be clear: existing ontologies and LLMs can both be useful starting points. The problem is treating them as finished products. An ontology derived from your data describes your data as it is. An ontology built for your domain describes your data as it should be. Those are very different things, and only the second one supports reliable LLM grounding.

Treat Data Quality as a Prerequisite

The pipeline question comes before the LLM question. Organizations that succeed ask: is our data current? Is it consistent? Do we have a canonical source for each data type, or are there competing definitions in different systems? These questions need answers before the LLM has useful work to do.

This is where data stewardship becomes critical. The ontology provides the framework — the formal definitions, the controlled vocabularies, the validation constraints. The data stewards provide the judgment — which source is authoritative, how conflicts get resolved, what counts as a data quality violation. Both are necessary. Neither can substitute for the other.

Define the Business Case for the LLM

In my experience, the single most common reason GraphRAG projects underperform is that the organization never clearly answered this question: what do we want the LLM to do? There are three defensible answers, and they require somewhat different implementations:

Business users should be able to ask questions about operational data in plain language and get reliable answers. This requires a well-built ontology, clean data, and a query generation layer that uses ontological structure rather than guessing at property names.Non-technical access to structured data.
1. The LLM processes documents, emails, reports, and other unstructured content and populates the knowledge graph with extracted entities and relationships. This requires clear ontological definitions of what to extract and a validation layer to check extractions against ontology constraints before they enter the graph.Extraction from unstructured sources.
2. The LLM performs common analytical tasks — summarization, comparison, trend identification — using graph-retrieved context. This requires the retrieval system to assemble relevant, complete context, which depends on the typed predicate chains that the ontology provides.Analysis assistance.

Most organizations want all three, eventually. That is fine. The discipline is to be explicit about which one you are building first, what success looks like for that use case specifically, and how the ontology design supports it.

Take the Maintenance Contract Seriously

An ontology is not a one-time deliverable. It is a living artifact that must be maintained as the business changes. When a new product type is introduced, it needs to be added as a subclass. When a new partner system introduces a new identifier, that identifier needs to be declared as a subproperty of the identifier family. When the business decides that a term means something different than it used to, the ontology needs to reflect that change.

This is not a burden unique to ontologies. As we discussed, business are already doing this work through layers of organizational structure and management. That work just looks like the everyday function of the business, not the IT budget. Every data system requires maintenance, which is an accountable cost. The difference is that in a well-built ontology-backed GraphRAG system, the maintenance protocol is explicit, testable and the results are accountable. You can write a test that verifies every new property is declared in the right predicate family. You can run a validation suite that catches data quality violations before they reach the LLM. You can version-control the ontology and review changes before they go to production. This is what it means to have a data management discipline rather than a data management by fiat.

The Bottom Line

Executives usually ask: “Is this worth the effort?” After twenty years of working with enterprises on knowledge graphs and data infrastructure, my answer is: absolutely.

The ontology is worth the effort because broader access to clear, accurate data is always worth more than the cost of maintaining it. Let’s put it this: every company at some level is in the business of providing clear, accurate data—to employees, to managers, to shareholders and regulators and to customers. It’s not optional: failure has market and legal ramifications. The cost of meeting that need, however, involved layers of interpretation that look invisible because “interpretation” = “organization.” The company structure follows the function.

The organizations that try to skip the pipeline— to use a property graph with no formal semantics, or to wire an LLM directly to a data warehouse with no graph at all, or to have the LLM “figure out” the structure from the data — are not saving time they are adding a faulty interpretation layer, with predictable results.

The marketplace is full of products that promise to make this easy. Some of them are genuinely useful. None of them eliminates the need for clean data, a well-built ontology, and people who are accountable for both. The sooner an organization accepts that, the sooner it can stop evaluating tools and start building something that works.

A summary for decision-makers

LLMs cannot produce reliable answers from unreliable data. Implementing GraphRAG does not fix your data pipeline — it exposes what is wrong with it. That exposure is valuable, but you need to be prepared to act on it.
- A domain ontology is not optional overhead. It is the formal structure that lets the LLM know what your data means. Without it, you have a system that makes educated guesses about your business. With it, you have a system that works from declarations.
  - Labeled property graphs are faster to set up but structurally limited for LLM grounding. The time saved at the start is real. The likelihood of having to start over is also real.
    - You are building a data pipeline. The data engineers, knowledge engineers, and data stewards who maintain it are not supporting the AI project — they are the AI project.
      - Build the ontology first. Design it for your domain, not derived from your data as-is. Treat it as a living document, not a one-time deliverable. Everything downstream gets easier when the ontology is right.

Build an Ontology for LLM Grounding