Evolution of Knowledge Graphs

Mike Dillinger, PhD

Published Sep 15, 2023

Natural evolution proceeds in fits and starts, sometimes resulting in progress, sometimes not. So does AI research.

In a recent [August, 2023] article, Luna Dong – one of the most visible and successful people building industrial-strength Knowledge Graphs at Meta, Amazon, and Google – offers an interesting and insightful characterization of the evolution of successive generations of knowledge graphs. There are very few people with this level of expertise working on knowledge graphs at web scale, so when she stops to consider what works and what doesn't, we need to listen carefully. Her view is also valuable because it seems to capture clearly how AI engineers as a group generally think about knowledge graphs and it describes quite accurately how methods for building knowledge graphs have evolved.

From an engineering perspective, knowledge graphs have evolved from entity-based knowledge graphs with clear semantics to "text-rich" graphs with more flexible but more ambiguous free text as entities to "dual neural" graphs that attempt to sidestep the explicit representation of semantic structure and rely instead on implicit relations as represented in embeddings.

Let's unpack this description to understand these evolutionary steps in more detail.

Entity-based knowledge graphs

Entity-based knowledge graphs are based on the "seed crazy idea" that we can get computers to model the world as we do: in terms of entities with attributes and the relations between them. The nodes of the graph, then, are mostly named entity instances corresponding to distinct real-world individuals which are aggregated by the hand-crafted categories of entities (other nodes) in an ontology.

The semantics of entity-based knowledge graphs are transparently defined in terms of the mappings from node or relation labels (strings) to real-world individuals, attributes, and categories. When the mappings between labels and real-world entities [i.e., between strings and things] are explicit and reliable, then the strings' semantics are clear. Q39729 has no semantics at all until we systematically associate it with the real-world individual named Jack Nicholson. For centuries, these mappings have been at the core of our understanding of what meaning and semantics are.

Entity-based knowledge graphs enable the explicit, stored or computed, mappings between strings and things that constitute a machine-accessible semantics which is available for algorithms, so they are crucial for getting computers to model the world as we do.

The key engineering challenge for building entity-based knowledge graphs is linguistic variability: the categories, relations, and instances are expressed with very diverse strings across different sources and languages, and this hinders data integration, making the mapping to things that much harder. Very much progress has been made in developing tools and systems for establishing these mappings at scale but more progress is needed deploying them to reduce the dependence on human experts – a key blocker for scalability.

Text-rich graphs

Text-rich graphs are based on the "seed crazy idea" that we can mine and store semantic structure from unstructured or semi-structured source data alone, i.e., corpora of strings. Strings are informative surrogates for concepts in humans; so machines could also treat them as such.

Text-rich graphs evolved to address the problem of modeling domains where pre-existing resources with semantic structure are sparse and ambiguities are abundant, with vague and fluid semantic boundaries between values and even classes – i.e., most of the real world. The other motivation was scale: many domains are both economically important and so vast that depending on slow, high-expertise human workers to provide explicit semantics is not seen as feasible.

Examples given are domains like Products, Bioinformatics, Health, Law, and Events. Engineers argue that we cannot clearly model these domains with entities and relationships because there are millions of types and attributes and many of them overlap – not to speak of the massive variability of the strings we work with. Essentially, engineers are confessing to an over-dependence on the existing human-created knowledge resources that work so well when available. But all too often, engineers see the linguists and ontologists who created those essential knowledge resources as blockers rather than assets.

Recommended by LinkedIn

Historical Developments in the Field of AI Planning…

Nethika Suraweera, PhD 7 years ago

The Future of AI: Data Dominance in an Era of Advanced…

Gonçalo (G) Martins Ribeiro 2 years ago

AI's New Apex Predator: Transformers vs Mamba (Part 1)

Udara Nilupul 1 year ago

Interestingly, there are ontologies and entity-based knowledge graphs for Products, Bioinformatics, Health, Law, and Events – lots of them! The past few years have seen huge investments by financial institutions, legal firms, industry associations, and healthcare organizations to create them at scale. I've built some of them myself – for products and services as well as parts of healthcare. That fact undermines a key motivation for text-rich graphs – we simply don't need text-rich graphs as substitutes for entity-based knowledge graphs.

But the more fundamental reason why text-rich graphs are maladaptive is because they have no semantics – no mappings between strings and things. The efforts to build text-rich graphs shifted focus to identifying and modeling graph-structured relations between strings and only strings: nodes and attributes are mostly (uninterpreted) free text. Early transformer approaches and initial language models recognized no distinction between strings and things, and made no efforts to map one to the other.

So evolving from entity-based knowledge graphs to text-rich graphs seems very clearly to be regressive evolution. In this stage of evolution, researchers seem to have skipped machine-accessible semantics entirely – the vast majority of features in these models are not attributes of real-world individuals, attributes, or categories but are other co-occurring strings, usually compressed into values of embeddings. This is why I hesitate to call their results knowledge graphs. But developing text-rich graphs was not wasted effort; it was simply an incomplete solution for building graphs of conceptual knowledge.

"Dual-neural" Graphs

The "seed crazy idea" behind the next generation of graphs was that in some cases it may not be necessary to explicitly model knowledge or semantic structure at all. Instead we might be able to capture semantics implicitly through things like embeddings in LLMs. In an initial version of this scenario, some knowledge would be encoded in knowledge graphs, some in LLMs, and some in both.

Things get confusing here. We enter strings into an LLM and get strings out in response. Then we understand the responses as relevant and meaningful. This is because we base our assessments on our own human-accessible semantics; we understand what the strings mean or how they map to our concepts of things in the world.

But many people jump to the conclusion that the machines are doing the same thing – understanding – just as we would conclude if we were talking with other human interlocutors. Text-rich graphs and early LLMs exploit the many correlations between strings to mimic understanding but do not have machine-accessible semantics. These models include no representation of concepts or of conceptual structure or of how patterns of strings (e.g., embeddings) map to these structures – they assume that it is not necessary to explicitly model knowledge at all. So clearly there was no understanding because there was no model of how strings map to our concepts or things in the world.

Evolving from entity-based knowledge graphs to text-rich graphs and dual-neural graphs seems very clearly to be maladaptive – a continuation of regressive evolution. Mixing and merging text-rich graphs and text-only LLMs in different ways does not contribute to the survival and proliferation of machine-accessible semantics.

Next Evolutionary Steps for Knowledge Graphs

Things are evolving dramatically with recent research. Entity-based knowledge graphs are being used to tame unruly LLMs and LLMs are becoming instrumental in creating and expanding knowledge graphs.

Explicit conceptual representations – in the form of entity-based knowledge graphs and ontologies – are being leveraged at each step of LLM training, tuning, and deployment. Entity-based knowledge graphs now serve as training data, inform loss functions, guide prompt optimization, filter outputs, and guide LLM construction, evaluation, and use in many ways. Each time entity-based knowledge graphs are used, the performance of an LLM improves – because the LLM now has access to one kind of machine-accessible semantics: string-concept mappings.

Attributes of real-world entities in the form of multimodal inputs are also being mapped systematically to strings, so multimodal LLMs have moved beyond modeling only string-string relations. These LLMs perform better because they build and have access to another kind of machine-accessible semantics: string-sensation mappings.

Now, because they build and leverage machine-accessible semantics, it seems reasonable to say that these newer systems really understand like humans do. They may not always understand, but it is clear that these advances already constitute very significant evolutionary progress of knowledge graphs.

Flavia Stoppa 2y

Saudades ; mike! Vc não volta mais ao Brasil?

Fabiola Aparecida Vizentim 2y

Alexander Rodrigues Silva, Neuza Árbocz esse artigo do Mike e também o artigo da Luna Dong que ele mencionou abordam os cases de grafos de conhecimento que vocês comentaram nas lives da série Tecnologias da Web Semântica :).

3 Reactions

Caio Calado 2y

Fabiola Aparecida Vizentim Duda Bona

3 Reactions

Mark Montgomery 2y

I've been observing with close interest for entire process of commercialization of products, and it's been good to see multiple old friends and colleagues build startups and products for leading companies. They've come a long way.

Evolution of Knowledge Graphs

Mike Dillinger, PhD

Entity-based knowledge graphs

Text-rich graphs

Recommended by LinkedIn

"Dual-neural" Graphs

Next Evolutionary Steps for Knowledge Graphs

More articles by Mike Dillinger, PhD

Others also viewed

Semantic Drift in Artificial Intelligence (LLMs)

NewMind AI Journal #184

Cosine Similarity: The Unsung Hero of AI Tools

5 Surprising Lessons from SpikingBrain, the AI That Thinks More Like a Neuron

Cross-Layer Transcoders | The Tool Built to Explain How AI Thinks Can Produce Explanations That Are Factually Wrong

Exploring Big Data and AI: Highlights from the 2023 Toronto Conference

Retrieval-Augmented Generation (RAG): The Future of AI-Powered Information Access 🚀

The Power of Vector Stores

~.Not All Attention is Needed

Why Graph Theory Matters in Artificial Intelligence Systems

Explore content categories

Entity-based knowledge graphs

Text-rich graphs

Recommended by LinkedIn

"Dual-neural" Graphs

Next Evolutionary Steps for Knowledge Graphs

More articles by Mike Dillinger, PhD

Hands-on KG Relation Resolution

Relation Resolution in Practice: Lab Log for February

Better and Better Knowledge Graphs: Hands-on Prompting

Linked Data just ain't a Knowledge Graph

Baking π and Building Better AI

Building more Expressive Knowledge Graph Nodes

Matchmaking with Knowledge Graphs

Making Knowledge Graph Relations First-class Citizens

What's on the menu in AI today?

Knowledge Graphs: Artificial Knowledge for Artificial Intelligence

Others also viewed

Semantic Drift in Artificial Intelligence (LLMs)

NewMind AI Journal #184

Cosine Similarity: The Unsung Hero of AI Tools

5 Surprising Lessons from SpikingBrain, the AI That Thinks More Like a Neuron

Cross-Layer Transcoders | The Tool Built to Explain How AI Thinks Can Produce Explanations That Are Factually Wrong

Exploring Big Data and AI: Highlights from the 2023 Toronto Conference

Retrieval-Augmented Generation (RAG): The Future of AI-Powered Information Access 🚀

The Power of Vector Stores

~.Not All Attention is Needed

Why Graph Theory Matters in Artificial Intelligence Systems

Similar topics

Knowledge Graph Applications for Engineering Leaders

How Knowledge Graphs Improve AI

Lightweight LLM Solutions for Knowledge Graph QA

Explore content categories