A Knowledge Graph?

A Knowledge Graph?

Just a short post regarding the now popular term, the KnowledgeGraph. This is not just about semantics (whether the term KnowledgeGraph is the right term or not), but more about how it seems to mislead many into thinking that what it contains is knowledge, although it is just Information. I will then later show with a few examples why this is (somewhat) important.

Let's start with the most basic: data, which is just a collection of uninterpreted differences (symbols, literals, signals, etc.) and 'uninterpreted' here is key. Information, on the other hand, is a meaningful organization of data, and here 'meaningful' is key. But meaningful to whom? Well, it depends. So, '23' or 'F' or 'Xavia' on their own are meaningless, but this data can be used in a structured (meaningful!) representation

('Xavia', 'F', 23, 'accountant')

So data like '23', 'F' and 'Xavia' can be used to represent some entity whose name is 'Xavia', who is a 23 years old, who is a female, and who is an accountant. Each field in this record (or tuple, or vector, ...) semantically means something to the designer, and it semantically correspond to some attribute (or feature or property...) and we usually have some implicit label for each ('age', ''gender', 'occupation', etc.). It is unfortunate therefore that we call these collections of structured data (or this information), a 'database', as technically speaking the entire collection of structured data is an 'information base' (no one wants a huge store of just uninterpreted and meaningless blobs of data!)

Anyway, that's the end of the obvious.

Now the popular KnowledgeGraph term is just an InformationGraph - it is a huge store of structured data, mostly in triple store (basically, Entity-RELATION-Entity) which is not much different from any ER (entity-relationship) diagram known very well by all data modelers and in particular in the (now standard) relational databases. But then why is that that such structures do not contain 'knowledge'?

Let's assume we have this in some database:

# (person, child-id) 
('Elvis Presely', 2322)
('Bob Marely', 3421)
('Bob Marely', 5343)
('Bob Marely', 7643)
('Jimi Hendrix', 8737)
('Jimi Hendrix', 1501)
('Carlos Santana', 2828)

# (child-id, name, gender) 
(2322, 'Lisa Marie Presley', 'F')
(5343, 'Ziggy Marely', 'M')
(7643, 'Bob Marely Jr.', 'M')
(3421, 'Miriam Marely', 'F')
(8737, 'Erica Hendrix', 'F')
(1501, 'Keith Hendrix', 'M')
(2828, 'Salvador Santana', 'M')

Now one can ask (query) this store questions like: how many children did Elvis Presley have and you will get the right answer: just 1, 'Lisa Presley'. But what about a question (query) like how many sons did Elvis Presley have? In a system that does not do any inferencing (even basic, level-1 reasoning) you would not get the obvious answer 0. Why? Because you do not have knowledge in the so-called knowledge graph. In particular, you do not have knowledge like this:

(child(x,y) & gender(x, 'F')) => daughter(x, y)
(child(x,y) & gender(x, 'M')) => son(x, y)

The above says: if x is a child of y, and the gender of x is 'F' then x is a daughter of y and the same for son, who is a male child - basic knowledge that a 4-year old has. This, btw, is called intensional information (information that is only implicitly there and thus has to be inferred) as opposed to extensional information (that is explicitly there and need only be extracted by the appropriate query).

Now, let's go to an actual system powered by a so-called KnowledgeGraph. Try these queries on your Google assistant:

how many children did Elvis Presley have?
how many sons did Elvis Presley have?

Yup. It can extract the fact that Elvis Presley has one child, and she is a female, but cannot deduce the simple implication of that information, namely that Elvis Presley had zero (or no) sons.

So is it a KnowledgeGraph, or just a relational structure of many simple (and logically unconnected) factoids?

Later ...

PS: I just had a few minutes to kill and I thought why not write something useful - OK, so I'm assuming it is useful :)

You might get the logically correct answer but it presumes that you’ve asked the right question.... How does data and focus by others at a distance attempting to filter - get the person on the spot the right perspective (you can get lost down the rabbit hole where filtering of plentiful data generates more information not relevant)

Like
Reply

Rightly suggested knowledge graph as Information graph. But is there any better way of representation to be used in conversational dialogue.  As we think it is better to have a knowledge graph that can be used during the conversation. Pls suggest.

Like
Reply

Agreed! Well put, this is precisely what's being overlooked.. data is data, structure is structure, the knowledge is not there (let alone wisdom ;-) ).

Like
Reply

I know these 'data-->information --> knowledge' debates are tiring for some people, but maybe if we resolve it we won't continue to have such a high failure rates in Information Management projects. '23' isn't "meaningless"... if anything, by itself it is too full of meaning...  Twenty-three is the ninth prime number, the smallest odd prime that is not a twin prime. Twenty-three is also the fifth factorial prime,[1] the second Woodall prime.[2] It is an Eisenstein prime with no imaginary part and real part of the form 3n − 1. There are 23 definitions of the number in Urban Dictionary:  https://www.urbandictionary.com/define.php?term=23 It's also the name of a song. Likewise, there are 123 people named 'Xavia' in Detroit; 17 uses for the letter 'F'; but only 1 definition for 'accountant'. By my reckoning, just for Detroit that makes for a possible 48,093 interpretations of the information you provided in your example. Of course, adding headings would reduce the number drastically, but you get my point: there is still far too much Information in your example than their needs to be... Data is always recorded with intent, and Information always needs a physical/structural carrier of which Data is just one example. I think Shannon would say, based on the numbers above, that '23' carries more information than 'accountant'. Besides, even if your example unambiguously identified one individual, I would still have to have 'Knowledge' - independent of the truth of the statement - that '23' is a number in a base ten counting system.  When you put it that way, the proper progression should be: Knowledge of the strings 'Xavia', 'F', 23, 'accountant' Information about the nature of the relationship between those four strings I 'know' about Data as the recorded observation that "Xavia is a 23 year old female accountant."

Thanks for taking time to articulate this. I wonder if nomenclature is a brand-recognition issue: Ontology is too obscure (i said it once in a meeting and was rightfully mocked as “college guy”) Knowledgegraph on the other hand stimulates the curiosity of those handling both data and representations. If I understand correctly, data —> pipeline is in broad terms an empiric —> rational project. 0. Data 1. Data + induction —> info. 2. Info + deduction —> knowledge

Like
Reply

To view or add a comment, sign in

More articles by Walid Saba

Others also viewed

Explore content categories