EINQA: Knowledge As Inquiry the EINGRAPH as Q&A with supportive AI
From EINGRAPH to EINQA

EINQA: Knowledge As Inquiry the EINGRAPH as Q&A with supportive AI

2> The Context and the objective

In my previous article “EINGRAPH - Knowledge as Elementary Information Node Graph” I exposed a practical way to map knowledge from any domain using a completely general and basic ontology.

Now we need to ask:

  • How technically realize this graph?
  • How scale the system considering EINs contains mainly pointer to big information chunks?
  • How query the knowledge in an effective way?

I try to answer all those question in this paper.

Let’s, as usual, cut elephant problem in small pieces.

We will see:

  • Chapter 3: EINs and inquiry, Infinite Diversity in Infinite Combinations,
  • Chapter 4: Multimodel Db approach to EINGRAPH materialization
  • Chapter 5: Q&A classical and with collaborative AI approaches
  • Chapter 6: Conclusions

And at the end some conclusion considerations.

3> EINs and inquiry, Infinite Diversity in Infinite Combinations

Each EIN’s “quantic nature” i.e., the combination of color (remember GreenàWhen, BlueàWhere, YellowàWho, MagentaàWhat), shape (bounded or free) and flavor (from human or from AI) generate 16 possible informational EINs natures.

And each nature has its own specific world of formal query languages.

Let's take some examples:

  • Textual information: CQL, XPath, Xquery, JSONiq, Jql, Poliqarp,…..
  • Relational information: SQL, DMX, MDX, …….
  • Geolocated Information: GIS specializations/functions of various languages
  • Graph information: Chiper, Gremlin, GraphQL, RDFQL, SPARQL,….
  • All those may be filtered by confidence factor considering the flavor
  • All those may derive from a join considering the shape

Some languages have a code implementation, others are just a set of directives suitable for the creation of custom information retrieval systems.

In general, these are complex languages, sometimes extremely specific, sometimes progressively hybridized to try to cover more information natures.

For example, the CQL, although specifically tailored for the bibliographic textual world, is inadequate for the processing of graph information and is not integrated in a DB engine.

And, moreover, all those languages require a perfect knowledge of information structure to be used.

Did not seem a promising situation, indeed.

But it rains in the wet: also admitting that a knowledge graph end user would study the structure and would study this bunch of languages how can we ensure that the inquiry he wrote gave a correct answer?

If we are a government organization, we are auditable about the information, the knowledge we gave to citizen, not to speak if we are dealing with legal or medical information.

The human evolution resolved this problem in a fair complex way: the professions, the study, the “culture”. If we need to know something in legal environment we go to an attorney, in medical one to a doctor and so on.

Despite we, with EINGRAPH, scaled down the knowledge representation to a quantic level expressing it with the infinite combination of EINs.

This fact put in front of us the reality:

to access to knowledge domain in an effective, fair, explicable, transparent, resistant way, well, we need “human minds” from experts in a domain.

So we, with EINs, solved the problem connected to knowledge discovery, storing and representation but not the problem connected to use that knowledge.

Till now.

4> Multimodel Db approach to EINGRAPH materialization

We saw, despite the simplified EINGRAPH approach to ontology, how complex could be to efficiently access to knowledge. Yet to “efficiently access to knowledge” we, before, need a technological enabler to store, index, inquire (or in a word “materialize”) EINGRAPH and related information.

So, we, again, need to break this “prerequisite problem” in pieces:

  • Problem 1 à define a more formal EINGRAPH basic ontology. See §4.1.
  • Problem 2 à simplify as much as possible the technology arena. See §4.2.
  • Problem 3 à the Multimodel DB, the RDF and other stuff. See §4.3.
  • Conclusions à So what and under what conditions? See §4.4.

Let’s start.

4.1> EINGRAPH basic Ontology

We begin with a very basic representation of EINGRAPH ontology using Graffoo symbols (see https://opencitations.wordpress.com/2011/06/29/graffoo-a-graphical-framework-for-owl-ontologies/ ).

Non è stato fornito nessun testo alternativo per questa immagine
Figure 1: EINGRAPH Basic Ontolgy

We must say some important things:

  1. Every colored EIN is a sub-class of a generic totipotent “grey EIN”.
  2. Every Predicate is a sub-class of a Generic predicate.
  3. Inside the EIN there is the minimum possible payload (ideally to use “in memory” EINGRAPH where possible)
  4. Every EIN or Predicate possess a “Type”. This means that we “consciously avoid” the sub classing proliferation in ontology’s nodes.
  5. Every EIN “points/refer” with a GUID (or similar primary key method) to a Data Lakes resource (this may be also an external URI but, we see before, is better to have “lightweight EINs”)

So, for example, suppose we want represent this assertion (Assertion A in the following):

the Person A, whom name is Alice, is friend of Person B, whom name is Bob, and both are born in 1990 and they live in Rome.

Using RDF triple (in OWL format) we would have:

<rdf:RDF xmlns="http://localhost/ein#

     xml:base="http://localhost/ein"

     xmlns:ein="http://localhost/ein#"

     xmlns:owl="http://www.w3.org/2002/07/owl#"

     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

     xmlns:xml="http://www.w3.org/XML/1998/namespace"

     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"

     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

     xmlns:eingraph="ein#"> 

  <!--

    ///////////////////////////////////////////////////////////////////////////////////////

    // Individuals

    ///////////////////////////////////////////////////////////////////////////////////////

     -->

    <!-- http://localhost/ein#Alice -->

    <owl:NamedIndividual rdf:about="http://localhost/ein#Alice">

        <rdf:type rdf:resource="http://localhost/ein#who"/>

        <People_related rdf:resource="http://localhost/ein#Bob"/>

        <When_someone_deal_with_time rdf:resource="http://localhost/ein#1990"/>

        <Where_someone_stay_or_go_or_come_from rdf:resource="http://localhost/ein#Rome"/>

    </owl:NamedIndividual>

    <owl:Axiom>

        <owl:annotatedSource rdf:resource="http://localhost/ein#Alice"/>

        <owl:annotatedProperty rdf:resource="http://localhost/ein#People_related"/>

        <owl:annotatedTarget rdf:resource="http://localhost/ein#Bob"/>

        <rdfs:isDefinedBy>GUID_RelationType (is frind of)</rdfs:isDefinedBy>

    </owl:Axiom>

    <owl:Axiom>

        <owl:annotatedSource rdf:resource="http://localhost/ein#Alice"/>

        <owl:annotatedProperty rdf:resource="http://localhost/ein#When_someone_deal_with_time"/>

        <owl:annotatedTarget rdf:resource="http://localhost/ein#1990"/>

        <rdfs:isDefinedBy>GUID_RelationType (is born)</rdfs:isDefinedBy>

    </owl:Axiom>

    <owl:Axiom>

        <owl:annotatedSource rdf:resource="http://localhost/ein#Alice"/>

        <owl:annotatedProperty rdf:resource="http://localhost/ein#Where_someone_stay_or_go_or_come_from"/>

        <owl:annotatedTarget rdf:resource="http://localhost/ein#Rome"/>

        <rdfs:isDefinedBy>GUID_RelationType (stay in)</rdfs:isDefinedBy>

    </owl:Axiom>

 

    <!-- http://localhost/ein#Bob -->

    <owl:NamedIndividual rdf:about="http://localhost/ein#Bob">

        <rdf:type rdf:resource="http://localhost/ein#who"/>

        <When_someone_deal_with_time rdf:resource="http://localhost/ein#1990"/>

        <Where_someone_stay_or_go_or_come_from rdf:resource="http://localhost/ein#Rome"/>

    </owl:NamedIndividual>

    <owl:Axiom>

        <owl:annotatedSource rdf:resource="http://localhost/ein#Bob"/>

        <owl:annotatedProperty rdf:resource="http://localhost/ein#When_someone_deal_with_time"/>

        <owl:annotatedTarget rdf:resource="http://localhost/ein#1990"/>

        <rdfs:isDefinedBy>GUID_RelationType (is born)</rdfs:isDefinedBy>

    </owl:Axiom>

    <owl:Axiom>

        <owl:annotatedSource rdf:resource="http://localhost/ein#Bob"/>

        <owl:annotatedProperty rdf:resource="http://localhost/ein#Where_someone_stay_or_go_or_come_from"/>

        <owl:annotatedTarget rdf:resource="http://localhost/ein#Rome"/>

        <rdfs:isDefinedBy>GUID_RelationType (stay in)</rdfs:isDefinedBy>

    </owl:Axiom>

    <!-- http://localhost/ein#Rome -->

    <owl:NamedIndividual rdf:about="http://localhost/ein#Rome">

        <rdf:type rdf:resource="http://localhost/ein#where"/>

        <location rdf:datatype="http://localhost/ein#GIS_PolyLine">Location coordinate</location>

    </owl:NamedIndividual>

   

    <!-- http://localhost/ein#1990 -->

    <owl:NamedIndividual rdf:about="http://localhost/ein#1990">

        <rdf:type rdf:resource="http://localhost/ein#when"/>

        <when_microstructure rdf:datatype="http://www.w3.org/2000/01/rdf-schema#Literal">Sample XML time microstructure</when_microstructure>

    </owl:NamedIndividual>

    <rdf:Description>

        <rdfs:comment>Type of relation</rdfs:comment>

    </rdf:Description>

</rdf:RDF>"        

Not much simple indeed…

Considering the nature of RDF representation (i.e., triple in form of subject à predicate à object) to represent the simple Assertion A we need (at least) 9 triple:

  1. DataRapresentation -> HasValue -> 1990
  2. Place 1 -> HasValue -> Rome_Coordinates 
  3. Person A -> HasName -> Alice
  4. Person A -> IsBorn -> DateRapresentation1
  5. Person A -> LiveIn -> Place1
  6. Person B -> HasName -> Bob
  7. Person B -> IsBorn -> DateRapresentation1
  8. Person B -> LiveIn -> Place 1
  9. Person A -> IsFriendOf -> Person B

Not only, observe that the predicate we used is specific and not generic/typed as in figure 1 and this is correct in RDF knowledge representation because RDF assume, like basic principle, that you “know the exact ontology” underlying the knowledge graph.

As consequence the number of triple will increase more because we need to further clarify the predicate type using more triples.

There is no constrained need, actually, to have an exact ontology to use RDF, you can write what you want in a triple but in that case the RDF store became a data mud/swamp without any use.

This observation i.e., the need for a deep ontology as logical prerequisite in RDF usage open a “religious struggle” between RDF and other way to store knowledge graph. I will discuss my proposition in §4.3.

So we have a EIN basic ontology that is really simple and generic/abstract BUT this seem to be in conflict with the mainstream way to represent knowledge, the RDF triple store.

4.2> Occam’s razor -> why not use Multimodel DB?

Aside from theory of knowledge representation there are some “little” technical and physical aspects to consider:

  • Where have we put this data?
  • How we manage and index them?
  • How we can build a complete (i.e., considering not only relationships between elements but also properties of elements and of relations) query?

There are essentially 2 data management methods that can be used but let me explain with a side-by-side sample:

Non è stato fornito nessun testo alternativo per questa immagine
Table 1: Specialized DB Vs. Multimodel DB

In the end I can express what derive from above approach as follow:

if you need a very specific data/knowledge approach use the a mix of specific DB/Data Stores but if you do not know exactly how the system will evolve, then use the Swiss Knife i.e., the Multimodel DB. In any case avoid thinking that you can use a specific DB/Data Store to mimic other data model. Data are powerful but harsh….

4.3> The eternal fight: RDF Vs. Property Graph

Now that we have, from my POV, defined both basic ontology (see §4.1) both technical instruments to operate it (see §4.2) there is the last Hamletic question: RDF or LPG (Labeled Property Graph)?

There are plenty of article, publications, blogs that discuss about this question so, from my POV, I strongly suggest three of them:

The general portrait that emerges could be summarized into following points:

  • RDF is “the” mature and formally faultless approach to knowledge graph with various decades of history.
  • Yet it never become widely diffused due to some intrinsic breaking factors like:

  1. RDF, used accordingly the machine reasoning capability and not as uncontrolled tripe store, assumes a deep reference ontology.
  2. If this reference ontology did not exist RDF (and his reasoners) are futile, not useful.
  3. There is no room for “labeling” (observe how RDF* suggest a sort of “hybridization” with LPG) and that compels to represent all with a huge number of triples (2 or 3 order of magnitude more respect pure information topology) making even simple things complex.

  • RDF represents a choice of field i.e., all you represent need to be in triple store format and this is a huge problem because you need to “transform”, or better RDFize (yes there are piece of code called RDFizer), all the classical documental, cataloging, archivistic knowledge corpus under the specific ontology you apply. So, there is no room for other data model types. Nor relational, nor document based, nothing… al triple.

In one word RDF (the true nature of RDF) is indeed extremely rigid, not intuitive, not self-evident for humans. Is perfect in academic arenas but not in real imperfect and incomplete world.

In fact, in last years more or less all “new” graph DB producers (and indeed also the Multimodel DB ones) are moving to a completely different approach: the Labeled Property Graph o LPG.

The LPG approach use the intuitive representation in which an Entity, a “Labeled Node with some Property” (sometimes called VERTEX), relates to other Entities through “Labeled Connections with some Property” (sometimes called EDGES).

The natural idea of graph, that’s all.

And the ontology? It’s vaporized?

No, it “emerges” from LPG analysis because there is an entire mathematic branch (the Graph Theory) that give us instruments to analyze graphs.

So, we can discover clusters, typical paths between vertex, topological patterns and so on; all things that imply indirectly an ontology.

Think about the elegance of this approach: we did not impose an order to the knowledge, we discover the order inside knowledge. From a philosophical POV I think that this is the correct way to see things because every time someone attempted to harness human creativity, it fails.

And more importantly the fact that Vertex and Edges can be LABELED and can possess PROPERTYES let us (in a Multimodel DB) to connect the Graph Data Model with Relational Model and or Document Based Model (for example).

Now I am sure you are howling: “But the performance? RDF is a champion!”.

That’s true also with Oracle RDF (see: https://www.oracle.com/a/tech/docs/rdfgraph-1-trillion-benchmark.pdf) BUT is true also for LPG always with Oracle (see: https://www.oracle.com/a/tech/docs/ldbc-graph-benchmark-2020-06-30-neo-only-v3.1.pdf).

Consider, on top, that LPG require 1 or 2 order of magnitude elements in graph representation than RDF and this ratio is more favorable for LPG more property you need to manage.

In other words, LPG, in a Multimodel DB in which you can use property as “pointers” to other integrated data models, it can be more “compact” in term of “bytes on disk” because his primary job is to “catch the topology” between “other data model” piece of information.

Furthermore, those pieces of information could be all the classical documental, cataloging, archivistic knowledge corpus. So no need for a mistic LPGizer (remember RDFizer ?).

Simple!!

4.4> Final Considerations about technological choices

In previous chapters we understand 3 main things:

  1. EINGRAPH use indeed a “Labeled Basic Ultra-General Ontology” that can progressively be adapted using nodes and relation types. This choice enables the knowledge modeling with EINGRAPH also in situations in which a complete, formal, and deep ontology isn’t available. EINGRAPH is also useful when there are time constrains, consider that build a complete ontology “upfront data usage” is complex and extremely time consuming.
  2. Due to Labeled nature of EINGRAPH’s ontology the RDF representation is not the best choice. Using an LPG Knowledge Graph model, preferably using properties as “pointers” to other data model structures, give us a much better efficiency, and the real ontology (the real knowledge) will emerge from LPG. In one statement we can say: “EINGRAPH fit well with an LPG of pointers” i.e., LPG deal with knowledge topology and other data models/structures deal with the other knowledge aspects. Let’s say thet we enable a “data models team collaboration”.
  3. Due to “collaborative and multimodel” approach inherent to ENIGRAPH LPG is natural to use a “Multimodel DB” avoiding all integration, aggregation, reconciliation problems that will arose from a forest of different punctual and “locally optimal” technologies. Is better to win the war than single battles, don’t you think? 

5> Q&A classical and with collaborative AI approaches

Now we have (or could have) a Multimodel DB loaded with LPG based EINGRAPH + other knowledge stuff in a collaborative data model scheme, let’s call all those bunch of information “Corpus”.

Now we face the problem to make effective inquiry on it. Consider, again, that our corpus is multimodel in the sense that it contains:

  • Knowledge topology in LPG with labeled Vertex and Edges,
  • Text, images, their embeddings, their metadata in parallel data structure linked to LPG,
  • Geolocation information linked to LPG,
  • Time information connected to LPG elements,
  • Levels of confidence in Vertex and Edges if they derive from AI based elaborations over standard data.

The beautiful side of the matter is that, using a Multimodel DB, all those searches can be collapsed into a single extended syntax SQL like predicate. This simplifies enormously the application’s inquiry building, from a certain POV, yet the resulting single SQL like search predicate is “more, much more, complex than simply professional one”. This is the bad side of the matter.

Moreover, we had to remember what we say in §3 about the formal language we could use to inquiry corpus and this aspect depict us the theoretical and unavoidable complexity in interaction with users to build an effective inquiry. Indeed, we are not speaking nor of a “recommendation system” nor of a “faceted search” nor of a “geolocated/time located search” nor, finally, of a “Text search”. We are speaking of a complete and free “knowledge inquiry”; that is the same to say we “Ask a question to the corpus” possibly in natural language. But how?

To solve the problem we slice the elephant, as usual:

  • §5.1 How “normally” works a Classical Q&A system
  • §5.2 Classical approach weakness with EINGRAPH
  • §5.3 The EINGRAPH Q&A collaborative approach
  • §5.4 Ethical giveback in EINGRAPH Q&A approach

Let’s start.

5.1> How “normally” works a Classical Q&A system

A classical Q&A, beware NOT a chatbot but a real Natural Language Q&A system, in general works according to the following high level process:

Non è stato fornito nessun testo alternativo per questa immagine
Figure 2: Classic Q&A system

Un detail we have:

Non è stato fornito nessun testo alternativo per questa immagine
Table 2: Macro Steps of classical Q&A system

NOTE A: All techniques I briefly resume above are object of uncountable publications and are in continuous evolution. If you want to go deeper follow 5 reference that can be useful:

  1. Question Answering over Linked Data (QALD-5) à https://ceur-ws.org/Vol-1391/173-CR.pdf
  2. SPARQL Template-Based Question Answering à https://www.researchgate.net/publication/240615221
  3. HAWK – Hybrid Question Answering using Linked Data à https://www.researchgate.net/publication/300884120_HAWK_-_Hybrid_Question_Answering_Using_Linked_Data
  4. A Universal Question-Answering Platform for Knowledge Graphs à https://arxiv.org/abs/2303.00595
  5. Leveraging Abstract Meaning Representation for Knowledge Base Question Answering à https://arxiv.org/abs/2012.01707

5.2> Classical approach weakness

There are many weaknesses in this approach that is “academically perfect” but… Let’s see:

Non è stato fornito nessun testo alternativo per questa immagine
Table 3: Classical Q&A Weaknesses

From my POV there is enough to say that with this approach we are into a “Kobayashi Maru” style scenario (see Star Trek). So we need to change “scenario conditions”. 

5.3> The EINGRAPH Q&A (EINQA) collaborative approach

In classical Q&A approach the “human” is relegated into asking questions and reading results phases, did you notice that?

But what if “humans” take care of search and query creativity?

What if “humans” take care of “what, how, why” a certain “search template” will be created and used?

We could obtain an EINQA bases on following process:

Non è stato fornito nessun testo alternativo per questa immagine
Figure 3: EINQA with standard AI approach

Let’s analyze in detail the process:

Non è stato fornito nessun testo alternativo per questa immagine
Table 4: EINQA process steps

Note B: it will be necessary to provide for the creation of a “QDP’s library management console”. In fact, could be involved also question related to QDP’s usage authorization or QDP’s certification and deploy process, for example. In any case we are speaking about a relatively simple and completely classic application.

Note C: there are plenty of algorithms that could be used. In general, the embeddings need to be as “fine” as possible (so use looong vectors) and the language related customization is important. This process’s step is the key point (and luckily the only one in which “probability” act) for al EINQA.

This is only the basic process.

Some “evolutions” are possible:

  • Instead of a pattern “single shot question à Possible QDP’s list” could be possible a pattern like “Chat à Possible QDP’s list” and, in that case, the chat could be governed by a LLM tuned to guide end user in QDP choice,
  • Could be possible “extract” parameters concrete values from question or better from chat avoiding the step “Select QDP and input search parameters”.

In any case I suppose that the basic principle is clear:

domain experts decide, certifies and documents what search, how to search and where to search while thereby instilling human creativity and consciousness into the system while, in the meantime, AI helps unexperienced end user to query the knowledge corpus using their language.

5.4> Ethical Giveback in EINGRAPH Q&A approach

Aside from the evident good probabilistic effect of EINQA approach (if EINQA’s AI segment has an 80% performance we have and 80% overall performance…) the biggest giveback of this approach is from ethical perspective.

Let’s see how AI ethics pillar are affected:

Non è stato fornito nessun testo alternativo per questa immagine
Table 5: EINQA ethic giveback

On top there is also a sociological effect: “the EINQA approach is, by design, respectful of the work, professionalism and creativity of domain experts”.

EINQA did not remove “human labor” but “valorize human labor”.

6> Conclusions

This article completes the theoretical journey I started in “Knowledge as Elementary Information Node Graph A.k.a. the EINGRAPH”.

We see how, avoiding mainstream lines of thought, conjugate Ethics and Technology, abstract methods and real life in project, industry, society.

 At the end the, EINGRAPH/EINQA final target is to be a powerful system that augment human intelligence and knowledge and NEVER substitute them.

To view or add a comment, sign in

Others also viewed

Explore content categories