EINQA: Knowledge As Inquiry the EINGRAPH as Q&A with supportive AI
2> The Context and the objective
In my previous article “EINGRAPH - Knowledge as Elementary Information Node Graph” I exposed a practical way to map knowledge from any domain using a completely general and basic ontology.
Now we need to ask:
I try to answer all those question in this paper.
Let’s, as usual, cut elephant problem in small pieces.
We will see:
And at the end some conclusion considerations.
3> EINs and inquiry, Infinite Diversity in Infinite Combinations
Each EIN’s “quantic nature” i.e., the combination of color (remember GreenàWhen, BlueàWhere, YellowàWho, MagentaàWhat), shape (bounded or free) and flavor (from human or from AI) generate 16 possible informational EINs natures.
And each nature has its own specific world of formal query languages.
Let's take some examples:
Some languages have a code implementation, others are just a set of directives suitable for the creation of custom information retrieval systems.
In general, these are complex languages, sometimes extremely specific, sometimes progressively hybridized to try to cover more information natures.
For example, the CQL, although specifically tailored for the bibliographic textual world, is inadequate for the processing of graph information and is not integrated in a DB engine.
And, moreover, all those languages require a perfect knowledge of information structure to be used.
Did not seem a promising situation, indeed.
But it rains in the wet: also admitting that a knowledge graph end user would study the structure and would study this bunch of languages how can we ensure that the inquiry he wrote gave a correct answer?
If we are a government organization, we are auditable about the information, the knowledge we gave to citizen, not to speak if we are dealing with legal or medical information.
The human evolution resolved this problem in a fair complex way: the professions, the study, the “culture”. If we need to know something in legal environment we go to an attorney, in medical one to a doctor and so on.
Despite we, with EINGRAPH, scaled down the knowledge representation to a quantic level expressing it with the infinite combination of EINs.
This fact put in front of us the reality:
to access to knowledge domain in an effective, fair, explicable, transparent, resistant way, well, we need “human minds” from experts in a domain.
So we, with EINs, solved the problem connected to knowledge discovery, storing and representation but not the problem connected to use that knowledge.
Till now.
4> Multimodel Db approach to EINGRAPH materialization
We saw, despite the simplified EINGRAPH approach to ontology, how complex could be to efficiently access to knowledge. Yet to “efficiently access to knowledge” we, before, need a technological enabler to store, index, inquire (or in a word “materialize”) EINGRAPH and related information.
So, we, again, need to break this “prerequisite problem” in pieces:
Let’s start.
4.1> EINGRAPH basic Ontology
We begin with a very basic representation of EINGRAPH ontology using Graffoo symbols (see https://opencitations.wordpress.com/2011/06/29/graffoo-a-graphical-framework-for-owl-ontologies/ ).
We must say some important things:
So, for example, suppose we want represent this assertion (Assertion A in the following):
the Person A, whom name is Alice, is friend of Person B, whom name is Bob, and both are born in 1990 and they live in Rome.
Using RDF triple (in OWL format) we would have:
<rdf:RDF xmlns="http://localhost/ein#
xml:base="http://localhost/ein"
xmlns:ein="http://localhost/ein#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:eingraph="ein#">
<!--
///////////////////////////////////////////////////////////////////////////////////////
// Individuals
///////////////////////////////////////////////////////////////////////////////////////
-->
<!-- http://localhost/ein#Alice -->
<owl:NamedIndividual rdf:about="http://localhost/ein#Alice">
<rdf:type rdf:resource="http://localhost/ein#who"/>
<People_related rdf:resource="http://localhost/ein#Bob"/>
<When_someone_deal_with_time rdf:resource="http://localhost/ein#1990"/>
<Where_someone_stay_or_go_or_come_from rdf:resource="http://localhost/ein#Rome"/>
</owl:NamedIndividual>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://localhost/ein#Alice"/>
<owl:annotatedProperty rdf:resource="http://localhost/ein#People_related"/>
<owl:annotatedTarget rdf:resource="http://localhost/ein#Bob"/>
<rdfs:isDefinedBy>GUID_RelationType (is frind of)</rdfs:isDefinedBy>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://localhost/ein#Alice"/>
<owl:annotatedProperty rdf:resource="http://localhost/ein#When_someone_deal_with_time"/>
<owl:annotatedTarget rdf:resource="http://localhost/ein#1990"/>
<rdfs:isDefinedBy>GUID_RelationType (is born)</rdfs:isDefinedBy>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://localhost/ein#Alice"/>
<owl:annotatedProperty rdf:resource="http://localhost/ein#Where_someone_stay_or_go_or_come_from"/>
<owl:annotatedTarget rdf:resource="http://localhost/ein#Rome"/>
<rdfs:isDefinedBy>GUID_RelationType (stay in)</rdfs:isDefinedBy>
</owl:Axiom>
<!-- http://localhost/ein#Bob -->
<owl:NamedIndividual rdf:about="http://localhost/ein#Bob">
<rdf:type rdf:resource="http://localhost/ein#who"/>
<When_someone_deal_with_time rdf:resource="http://localhost/ein#1990"/>
<Where_someone_stay_or_go_or_come_from rdf:resource="http://localhost/ein#Rome"/>
</owl:NamedIndividual>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://localhost/ein#Bob"/>
<owl:annotatedProperty rdf:resource="http://localhost/ein#When_someone_deal_with_time"/>
<owl:annotatedTarget rdf:resource="http://localhost/ein#1990"/>
<rdfs:isDefinedBy>GUID_RelationType (is born)</rdfs:isDefinedBy>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://localhost/ein#Bob"/>
<owl:annotatedProperty rdf:resource="http://localhost/ein#Where_someone_stay_or_go_or_come_from"/>
<owl:annotatedTarget rdf:resource="http://localhost/ein#Rome"/>
<rdfs:isDefinedBy>GUID_RelationType (stay in)</rdfs:isDefinedBy>
</owl:Axiom>
<!-- http://localhost/ein#Rome -->
<owl:NamedIndividual rdf:about="http://localhost/ein#Rome">
<rdf:type rdf:resource="http://localhost/ein#where"/>
<location rdf:datatype="http://localhost/ein#GIS_PolyLine">Location coordinate</location>
</owl:NamedIndividual>
<!-- http://localhost/ein#1990 -->
<owl:NamedIndividual rdf:about="http://localhost/ein#1990">
<rdf:type rdf:resource="http://localhost/ein#when"/>
<when_microstructure rdf:datatype="http://www.w3.org/2000/01/rdf-schema#Literal">Sample XML time microstructure</when_microstructure>
</owl:NamedIndividual>
<rdf:Description>
<rdfs:comment>Type of relation</rdfs:comment>
</rdf:Description>
</rdf:RDF>"
Not much simple indeed…
Considering the nature of RDF representation (i.e., triple in form of subject à predicate à object) to represent the simple Assertion A we need (at least) 9 triple:
Not only, observe that the predicate we used is specific and not generic/typed as in figure 1 and this is correct in RDF knowledge representation because RDF assume, like basic principle, that you “know the exact ontology” underlying the knowledge graph.
As consequence the number of triple will increase more because we need to further clarify the predicate type using more triples.
There is no constrained need, actually, to have an exact ontology to use RDF, you can write what you want in a triple but in that case the RDF store became a data mud/swamp without any use.
This observation i.e., the need for a deep ontology as logical prerequisite in RDF usage open a “religious struggle” between RDF and other way to store knowledge graph. I will discuss my proposition in §4.3.
So we have a EIN basic ontology that is really simple and generic/abstract BUT this seem to be in conflict with the mainstream way to represent knowledge, the RDF triple store.
4.2> Occam’s razor -> why not use Multimodel DB?
Aside from theory of knowledge representation there are some “little” technical and physical aspects to consider:
There are essentially 2 data management methods that can be used but let me explain with a side-by-side sample:
In the end I can express what derive from above approach as follow:
if you need a very specific data/knowledge approach use the a mix of specific DB/Data Stores but if you do not know exactly how the system will evolve, then use the Swiss Knife i.e., the Multimodel DB. In any case avoid thinking that you can use a specific DB/Data Store to mimic other data model. Data are powerful but harsh….
4.3> The eternal fight: RDF Vs. Property Graph
Now that we have, from my POV, defined both basic ontology (see §4.1) both technical instruments to operate it (see §4.2) there is the last Hamletic question: RDF or LPG (Labeled Property Graph)?
There are plenty of article, publications, blogs that discuss about this question so, from my POV, I strongly suggest three of them:
The general portrait that emerges could be summarized into following points:
Recommended by LinkedIn
In one word RDF (the true nature of RDF) is indeed extremely rigid, not intuitive, not self-evident for humans. Is perfect in academic arenas but not in real imperfect and incomplete world.
In fact, in last years more or less all “new” graph DB producers (and indeed also the Multimodel DB ones) are moving to a completely different approach: the Labeled Property Graph o LPG.
The LPG approach use the intuitive representation in which an Entity, a “Labeled Node with some Property” (sometimes called VERTEX), relates to other Entities through “Labeled Connections with some Property” (sometimes called EDGES).
The natural idea of graph, that’s all.
And the ontology? It’s vaporized?
No, it “emerges” from LPG analysis because there is an entire mathematic branch (the Graph Theory) that give us instruments to analyze graphs.
So, we can discover clusters, typical paths between vertex, topological patterns and so on; all things that imply indirectly an ontology.
Think about the elegance of this approach: we did not impose an order to the knowledge, we discover the order inside knowledge. From a philosophical POV I think that this is the correct way to see things because every time someone attempted to harness human creativity, it fails.
And more importantly the fact that Vertex and Edges can be LABELED and can possess PROPERTYES let us (in a Multimodel DB) to connect the Graph Data Model with Relational Model and or Document Based Model (for example).
Now I am sure you are howling: “But the performance? RDF is a champion!”.
That’s true also with Oracle RDF (see: https://www.oracle.com/a/tech/docs/rdfgraph-1-trillion-benchmark.pdf) BUT is true also for LPG always with Oracle (see: https://www.oracle.com/a/tech/docs/ldbc-graph-benchmark-2020-06-30-neo-only-v3.1.pdf).
Consider, on top, that LPG require 1 or 2 order of magnitude elements in graph representation than RDF and this ratio is more favorable for LPG more property you need to manage.
In other words, LPG, in a Multimodel DB in which you can use property as “pointers” to other integrated data models, it can be more “compact” in term of “bytes on disk” because his primary job is to “catch the topology” between “other data model” piece of information.
Furthermore, those pieces of information could be all the classical documental, cataloging, archivistic knowledge corpus. So no need for a mistic LPGizer (remember RDFizer ?).
Simple!!
4.4> Final Considerations about technological choices
In previous chapters we understand 3 main things:
5> Q&A classical and with collaborative AI approaches
Now we have (or could have) a Multimodel DB loaded with LPG based EINGRAPH + other knowledge stuff in a collaborative data model scheme, let’s call all those bunch of information “Corpus”.
Now we face the problem to make effective inquiry on it. Consider, again, that our corpus is multimodel in the sense that it contains:
The beautiful side of the matter is that, using a Multimodel DB, all those searches can be collapsed into a single extended syntax SQL like predicate. This simplifies enormously the application’s inquiry building, from a certain POV, yet the resulting single SQL like search predicate is “more, much more, complex than simply professional one”. This is the bad side of the matter.
Moreover, we had to remember what we say in §3 about the formal language we could use to inquiry corpus and this aspect depict us the theoretical and unavoidable complexity in interaction with users to build an effective inquiry. Indeed, we are not speaking nor of a “recommendation system” nor of a “faceted search” nor of a “geolocated/time located search” nor, finally, of a “Text search”. We are speaking of a complete and free “knowledge inquiry”; that is the same to say we “Ask a question to the corpus” possibly in natural language. But how?
To solve the problem we slice the elephant, as usual:
Let’s start.
5.1> How “normally” works a Classical Q&A system
A classical Q&A, beware NOT a chatbot but a real Natural Language Q&A system, in general works according to the following high level process:
Un detail we have:
NOTE A: All techniques I briefly resume above are object of uncountable publications and are in continuous evolution. If you want to go deeper follow 5 reference that can be useful:
5.2> Classical approach weakness
There are many weaknesses in this approach that is “academically perfect” but… Let’s see:
From my POV there is enough to say that with this approach we are into a “Kobayashi Maru” style scenario (see Star Trek). So we need to change “scenario conditions”.
5.3> The EINGRAPH Q&A (EINQA) collaborative approach
In classical Q&A approach the “human” is relegated into asking questions and reading results phases, did you notice that?
But what if “humans” take care of search and query creativity?
What if “humans” take care of “what, how, why” a certain “search template” will be created and used?
We could obtain an EINQA bases on following process:
Let’s analyze in detail the process:
Note B: it will be necessary to provide for the creation of a “QDP’s library management console”. In fact, could be involved also question related to QDP’s usage authorization or QDP’s certification and deploy process, for example. In any case we are speaking about a relatively simple and completely classic application.
Note C: there are plenty of algorithms that could be used. In general, the embeddings need to be as “fine” as possible (so use looong vectors) and the language related customization is important. This process’s step is the key point (and luckily the only one in which “probability” act) for al EINQA.
This is only the basic process.
Some “evolutions” are possible:
In any case I suppose that the basic principle is clear:
domain experts decide, certifies and documents what search, how to search and where to search while thereby instilling human creativity and consciousness into the system while, in the meantime, AI helps unexperienced end user to query the knowledge corpus using their language.
5.4> Ethical Giveback in EINGRAPH Q&A approach
Aside from the evident good probabilistic effect of EINQA approach (if EINQA’s AI segment has an 80% performance we have and 80% overall performance…) the biggest giveback of this approach is from ethical perspective.
Let’s see how AI ethics pillar are affected:
On top there is also a sociological effect: “the EINQA approach is, by design, respectful of the work, professionalism and creativity of domain experts”.
EINQA did not remove “human labor” but “valorize human labor”.
6> Conclusions
This article completes the theoretical journey I started in “Knowledge as Elementary Information Node Graph A.k.a. the EINGRAPH”.
We see how, avoiding mainstream lines of thought, conjugate Ethics and Technology, abstract methods and real life in project, industry, society.
At the end the, EINGRAPH/EINQA final target is to be a powerful system that augment human intelligence and knowledge and NEVER substitute them.