Structured Query Interface to a Knowledge Graph
A structured query interface can be an important ingredient to interacting with a knowledge graph. In such an interface, the user starts typing expressions, with system suggesting completions in a way that the resulting expression can be mapped into an underlying query language such as Cypher or SPARQL. To illustrate structured queries, consider the following snippet of a knowledge graph schema from Wikidata.
For the instance data corresponding to the above schema, we can pose the following queries.
city with largest area top five cities by area countries whose capitals have area at least 500 squared kilometers states bordering Oregon and Washington
second tallest mountain in France country with the most number of rivers
One way to specify a structured query interface is to specify rules of grammar in the Backus Naur Form (BNF). The rules shown below illustrate this approach for the set of examples considered above.
<np> ::= <noun> "and" <noun> <np> ::= <geographical-region> | <geographical-region> <spatial-relation> <geographical-region> <geographical-region> ::= "capital of country" | "city" | "country" | "mountain" | "river" | "state" <property> ::= "area" | "height" <aggregate-relator> ::= "with the most number of" | "with largest" |"with" <aggregate-modifier> ::= "top" <number> | "second tallest" <spatial-relation> ::= "bordering" | "inside" <number-constaint> ::= "atleast" <quantity> <quantity> ::= <number> <unit>
<unit> ::= "Square Kilometer" <ranking> ::= "by"
We can reduce the queries above to expressions that conform to the grammar above. For example, the first query, "city with the largest area", conforms to the following expression:
<geographical-region> <aggregate-relator> <property>
Next consider the query "top five cities by area". The following expression that captures this query is equivalent to "top five city by area". For our structured query interface to be faithful to original English, we will need to incorporate the lexical knowledge about plurals.
<aggregate-modifier> <geographical-region> <ranking> <property>
Finally, we show below the expression for the third query: "countries whose capitals have area at least 500 squared kilometers". The expression shown below has the following version in English: "country with capital with area at least 500 squared kilometers". In addition to the incorrect pluralization, it uses different words, for example, "with" instead of "whose", and "with" instead of "have". Most reasonable speakers of English would consider both the queries to be equivalent. This example highlights the tradeoffs in designing structured query interfaces: they may not be faithful to all the different ways of posing the query in English, but they can handle a large number of practical useful cases. For example, the above grammar will correctly handle the query "state with largest area", and a numerous other variations.
<geographical-region> <aggregate-relator> <geographical-region>
<aggregate-relator> <property> <number-constraint> <quantity>
Once we have developed a BNF grammar for a structured query interface that specifies the range of queries of interest, it is straightforward to check whether an input query is legal, and also to generate a set of legal queries which could be suggested to the user proactively to autocomplete what they have already typed. We show below a logic programming translation of the above BNF grammar.
np(X) :- np(Y) & np(Z) & append(X,Y,Z) np(X) :- geographical_region(X) np(A) :- geographical_region(X) & spatial_relation(Y) & geographical_region(Z) & append(A,X,Y,Z) geographical_region(capital) geographical_region(city) geographical_region(country) geographical_region(mountain) geographical_region(river) geographical_region(state) property(area) property(height) aggregate_relator(with_the_most_number_of) aggregate_relator(with_largest) aggregate_relator(with) aggregate_modifier(second_tallest) aggregate_modifier(X) :- number(N) & append(X,top,N) spatial_relation(bordering) aggregate_relation(insider) number_constraint(X) :- quantity(Q) & append(X,atleast,Q) quantity(Q) :- number(N) & unit(U) & append(Q,N,U)
unit(square_kilomters) ranking(by)
Improvements in structured queries accepted by the system require improving the grammar. This approach can be very cost-effective for situations where the entities of interest and their desired relationships can be easily captured in a grammar. In the next section, we consider approaches that aim to accept queries in English, and try to overcome the problem of required engineering of the grammar through a machine learning approach.