Structured Query Interface to a Knowledge Graph
http://www.stanford.edu/~vinayc/intelligent-life/

Structured Query Interface to a Knowledge Graph

A structured query interface can be an important ingredient to interacting with a knowledge graph. In such an interface, the user starts typing expressions, with system suggesting completions in a way that the resulting expression can be mapped into an underlying query language such as Cypher or SPARQL. To illustrate structured queries, consider the following snippet of a knowledge graph schema from Wikidata.

Country schema snippet

For the instance data corresponding to the above schema, we can pose the following queries.

city with largest area
top five cities by area
countries whose capitals have area at least 500 squared kilometers
states bordering Oregon and Washington

second tallest mountain in France
country with the most number of rivers

One way to specify a structured query interface is to specify rules of grammar in the Backus Naur Form (BNF). The rules shown below illustrate this approach for the set of examples considered above.

<np> ::= <noun> "and" <noun>
<np> ::= <geographical-region> |
        <geographical-region> <spatial-relation> <geographical-region>
<geographical-region> ::= "capital of country" | "city" | "country" |
                         "mountain" | "river" | "state"
<property> ::= "area" | "height"
<aggregate-relator> ::= "with the most number of" | "with largest" |"with"
<aggregate-modifier> ::= "top" <number> | "second tallest"
<spatial-relation> ::= "bordering" | "inside"
<number-constaint> ::= "atleast" <quantity>
<quantity> ::= <number> <unit>
<unit> ::= "Square Kilometer"
<ranking> ::= "by"

We can reduce the queries above to expressions that conform to the grammar above. For example, the first query, "city with the largest area", conforms to the following expression:

<geographical-region> <aggregate-relator> <property>

Next consider the query "top five cities by area". The following expression that captures this query is equivalent to "top five city by area". For our structured query interface to be faithful to original English, we will need to incorporate the lexical knowledge about plurals.

<aggregate-modifier> <geographical-region> <ranking> <property>

Finally, we show below the expression for the third query: "countries whose capitals have area at least 500 squared kilometers". The expression shown below has the following version in English: "country with capital with area at least 500 squared kilometers". In addition to the incorrect pluralization, it uses different words, for example, "with" instead of "whose", and "with" instead of "have". Most reasonable speakers of English would consider both the queries to be equivalent. This example highlights the tradeoffs in designing structured query interfaces: they may not be faithful to all the different ways of posing the query in English, but they can handle a large number of practical useful cases. For example, the above grammar will correctly handle the query "state with largest area", and a numerous other variations.

<geographical-region> <aggregate-relator> <geographical-region>   
       <aggregate-relator> <property> <number-constraint> <quantity>

Once we have developed a BNF grammar for a structured query interface that specifies the range of queries of interest, it is straightforward to check whether an input query is legal, and also to generate a set of legal queries which could be suggested to the user proactively to autocomplete what they have already typed. We show below a logic programming translation of the above BNF grammar.

np(X) :- np(Y) & np(Z) & append(X,Y,Z)
np(X) :- geographical_region(X)
np(A) :- geographical_region(X) & spatial_relation(Y) & geographical_region(Z) & append(A,X,Y,Z)
geographical_region(capital)
geographical_region(city)
geographical_region(country)
geographical_region(mountain)
geographical_region(river)
geographical_region(state)
property(area)
property(height)
aggregate_relator(with_the_most_number_of)
aggregate_relator(with_largest)
aggregate_relator(with)
aggregate_modifier(second_tallest)
aggregate_modifier(X) :- number(N) & append(X,top,N)
spatial_relation(bordering)
aggregate_relation(insider)
number_constraint(X) :- quantity(Q) & append(X,atleast,Q)
quantity(Q) :- number(N) & unit(U) & append(Q,N,U)
unit(square_kilomters)
ranking(by)

Improvements in structured queries accepted by the system require improving the grammar. This approach can be very cost-effective for situations where the entities of interest and their desired relationships can be easily captured in a grammar. In the next section, we consider approaches that aim to accept queries in English, and try to overcome the problem of required engineering of the grammar through a machine learning approach.

To view or add a comment, sign in

More articles by Vinay K. Chaudhri

  • Natural Language Interfaces to a Knowledge Graph

    In a previous article, I talked about structured query interfaces for a knowledge graph. People tend to reject the…

  • General Principles of Visualization Design

    In a previous article, I argued that showing lots of nodes and edges on screen is not a compelling ways to understand a…

    2 Comments
  • How do users interact with a Knowledge Graph?

    The purpose of a knowledge graph is to answer a user's questions. Some of the questions may be known upfront, while…

    4 Comments
  • Knowledge Graphs and Human-Level AI

    Early work in AI focused on explicit representation of knowledge and initiated the field of knowledge graphs through…

  • Knowledge Graphs as a Test-Bed for Current Generation AI Algorithms

    Knowledge graphs have a two way relationship with the current generation AI algorithms. On one hand, knowledge graphs…

    1 Comment
  • What is Graph Data Science?

    Graph data science is an emerging discipline that aims to derive knowledge by leveraging structure in the data…

  • Knowledge Graph as a Representation of AI Output

    In this note, I consider how knowledge graphs are being used as a target output representation for natural language…

    1 Comment
  • A Dozen Long-Term Systems Research Problems

    In 2003, Journal of the Association of Computing Machinery invited several Turing Award Winners to write their vision…

    2 Comments
  • What should be next for Hybrid Natural Language Processing?

    Dr. Gomez-Perez has just released his book titled A Practical Guide to Hybrid Natural Language Processing, and he…

  • What is a Knowledge Graph?

    Knowledge graphs have emerged as a compelling abstraction for organizing world's structured knowledge over the…

    1 Comment

Others also viewed

Explore content categories