Knowledge Representation and Graphs

Stephen Kelley

Published Dec 21, 2025

Knowledge representation is a key foundation for AI, intimately intertwined with data structures. Retrieval augmented generative AI, RAG, provides one window into this. RAG augments an AI Large Language Model (LLM) by giving it a searchable database providing additional context to what the LLM has been trained on.

Consider an LLM trained at a particular time, for example 3 months ago. This LLM will have no knowledge of events after its training date, and will do poorly on questions about current events. Providing a RAG database of news articles over the last 3 months allows the LLM to search this database to provide context for questions about recent events. A common corporate use is to augment an LLM with non-public company data the LLM has not seen: proprietary code, SOPs, legal documents, etc.

The data in these examples is mostly text, though images and video are often relevant RAG data. Traditional relational databases are not well suited for these types of "unstructured" data. Most RAG systems use a vector database which represents text, images, and video as a vector, a list of numbers.

Dense and Sparse Vectors

Vector databases employ two complementary approaches to represent and search content.

Dense vectors (also called semantic or embedding vectors) capture meaning. A neural network encoder transforms text into a high-dimensional vector (typically 384–1536 dimensions) where semantically similar content clusters together in vector space. Searching with dense vectors finds conceptually related content even when exact words differ—a query about "heart attack" retrieves documents discussing "myocardial infarction" or "cardiac arrest."

Sparse vectors capture keywords. These are high-dimensional but mostly zeros, with non-zero values only for terms present in the text. Traditional methods like TF-IDF and BM25 weight terms by frequency and distinctiveness. Sparse search excels at precise term matching—finding exact product names, codes, or technical terminology that dense vectors might blur together.

Hybrid search combines both: dense vectors catch semantic relationships while sparse vectors ensure keyword precision. Modern RAG systems typically score and merge results from both approaches, leveraging their complementary strengths.

From Vectors to Graphs: RDF and Knowledge Representation

RAG using vector databases works well for representing text as data. However, language and the "knowledge" represented by that language have structure beyond numerical vector representations. GraphRAG uses a knowledge graph data structure as a RAG source.

Resource Description Framework (RDF), adopted as a W3C recommendation in 1999, and related standards like RDFS and OWL, developed well before modern LLMs, are providing powerful knowledge representations that extend the capabilities of LLMs through graph databases. RDF and related protocols attempt to standardize the ubiquitous, simple, but powerful subject-predicate-object structure of human thought and language.

RDF, RDFS, and OWL: Building Knowledge Graphs

RDF (Resource Description Framework) represents knowledge as triples: subject-predicate-object statements. Each triple is an atomic fact: "Aspirin treats Headache," "Aspirin is_a Drug," "Headache affects Head." Subjects and objects are nodes; predicates are edges. URIs uniquely identify resources, enabling graphs to link across datasets—the foundation of the semantic web.

RDFS (RDF Schema) adds vocabulary for defining classes and hierarchies. You can declare that "Drug" is a class, "Aspirin" is an instance of Drug, and "Analgesic" is a subclass of Drug. RDFS enables inference: if Aspirin is an Analgesic and Analgesics are Drugs, reasoners conclude Aspirin is a Drug.

Recommended by LinkedIn

DeepSeek R1 vs. O1 Pro Mode by OpenAI: The Battle for…

Christopher Day 1 year ago

🥇Top AI Papers of the Week

DAIR.AI 12 months ago

Generative AI, Databases, and the Multidimensional…

Simone Morellato 2 years ago

OWL (Web Ontology Language) provides richer expressiveness for complex domains. OWL can specify cardinality constraints (a person has exactly one biological mother), property characteristics (if A is married to B, then B is married to A), and complex class definitions (a "Parent" is a Person who has at least one child). OWL ontologies enable sophisticated automated reasoning.

Together, these standards enable definition of a knowledge graph schema: the vocabulary of entity types (classes), relationship types, and constraints governing valid graph structure. The schema provides the blueprint; instance data populates actual nodes and edges conforming to that blueprint.

Hybrid Search: Combining Vectors and Graphs

RAG systems can represent the same text as dense vectors, sparse vectors, and a knowledge graph, enabling hybrid search that combines and synergizes the best of each. Vector search finds semantically relevant passages; graph traversal surfaces structured relationships and enables multi-hop reasoning that pure vector similarity cannot capture.

The Schema Challenge

A graph schema is critical for a useful knowledge graph. A schema defines the types of entities (things), their relationships, and the properties of these entities and relationships the graph can contain. For well-defined, focused applications, schemas can be manually defined. For example: Patient Notes Schema

For more comprehensive applications, like representing the complete knowledge graph of a real, evolving busines, creating and maintaining a realistic knowledge graph schema can be challenging and time consuming. Human language is messy, with subtleties like synonyms (different words referring to the same entity) or disambiguation challenges (the same word used for different entities). As a business evolves, new entities and relationships continually arise while existing ones change.

Automating creation and maintenance of an accurate, complete knowledge graph schema as the world changes is a more challenging AI task than resolving text and other resources according to a given, fixed knowledge graph schema. It's somewhat of a chicken-and-egg situation. A schema is needed to guide what types of entities and relationships should be extracted from documents to construct the knowledge graph. But appropriately creating this schema requires a deep understanding of the knowledge the graph should represent.

In practice, both for humans and AI, creating the schema and instances of the schema proceed in parallel discovery. AI knowledge graph construction systems including schema discovery exist, for example AutoSchemaKG. In addition to GraphRAG, knowledge graphs enhance and guide agentic AI systems, in particular providing powerfull intelligent memory. AI construction and use of knowledge graphs is both an active research area and growing vendor ecosystem, including Neo4j, Stardog, and Linkurious.

Here's an example of a knowledge schema and graph generated by neo4j's graph builder from a 35 page overview generated by ChatGPT deep research.

Awake

235 followers

+ Subscribe

Tony Camero 4mo

Linked data graphs also need to be grounded in ledgers that can attest to their provinance and authenticity to close this loop. Google "Hedera verified compute" to see what's happening here. Great work Steve, nice article!

See more comments

To view or add a comment, sign in

Knowledge Representation and Graphs

Stephen Kelley

Dense and Sparse Vectors

From Vectors to Graphs: RDF and Knowledge Representation

RDF, RDFS, and OWL: Building Knowledge Graphs

Recommended by LinkedIn

Hybrid Search: Combining Vectors and Graphs

The Schema Challenge

Awake

235 followers

More articles by Stephen Kelley

Others also viewed

This Week in AI #1: Detecting Hallucinations, Taming Unstructured Data, Boosting LLM Performance

The Falcon soars high with Good Data

RAG vs PageIndexRAG vs Recursive Language Models: Three Different Paths to Smarter AI Systems

AI Innovations: Unveiling the Latest Breakthroughs

Emerging Tech Digest

The Illusion of Thought: Building AI That Thinks Beyond the Prompt

#27: Llama-2-7B Benchmarks for RAG

Leading with Purpose: Responsible AI and Mission-Driven Innovation

Super-Intelligence According to Daniel Kokotajlo

How LLMs, RAG & Vector Databases Power Modern AI Applications

How to Use RAG Architecture for Better Information Retrieval

How to Improve RAG Retrieval Methods

New Approaches to RAG Models

Key Challenges in LLM Interpretability Research

How Language Models Transform Information Discovery

Explore content categories

Dense and Sparse Vectors

From Vectors to Graphs: RDF and Knowledge Representation

RDF, RDFS, and OWL: Building Knowledge Graphs

Recommended by LinkedIn

Hybrid Search: Combining Vectors and Graphs

The Schema Challenge

Awake

235 followers

More articles by Stephen Kelley

Physiology of Consciousness: What Biological Computation Reveals About Minds and Machines

The Symbol Grounding Problem: How Language Connects to Meaning

Personal AI Knowledge Management

The Architecture of Intelligence: From Attention to Nested Learning

Accelerating Innovation: Wonder and Awe

Perception and Action: Interacting with the World

Using AI for Data Science

LangChain Deep Agents: How Thought Solves Complex Tasks

Corporate AI: Your Chief AI Officer (CAIO)

Others also viewed

This Week in AI #1: Detecting Hallucinations, Taming Unstructured Data, Boosting LLM Performance

The Falcon soars high with Good Data

RAG vs PageIndexRAG vs Recursive Language Models: Three Different Paths to Smarter AI Systems

AI Innovations: Unveiling the Latest Breakthroughs

Emerging Tech Digest

The Illusion of Thought: Building AI That Thinks Beyond the Prompt

#27: Llama-2-7B Benchmarks for RAG

Leading with Purpose: Responsible AI and Mission-Driven Innovation

Super-Intelligence According to Daniel Kokotajlo

How LLMs, RAG & Vector Databases Power Modern AI Applications

Similar topics

How to Use RAG Architecture for Better Information Retrieval

How to Improve RAG Retrieval Methods

New Approaches to RAG Models

Key Challenges in LLM Interpretability Research

How Language Models Transform Information Discovery

Explore content categories