Entity-attribute queries in Web search

Entity-attribute queries in Web search

Prereq.: Research papers: foundation of industry-academic...

May be relevant: Commerce queries in natural language

Approaches for interpreting free-format queries into a more structured representation segment or partition query tokens by purpose (references to types, entities, attribute names, attribute values, relations) and then launch interpreted query on structured knowledge bases. [1] proposes two new, natural formulations that exploit bidirectional flow of information between knowledge base and corpus. One, inspired by probabilistic language models, computes expected response scores over uncertainties of query interpretation. The other is based on max-margin discriminative learning, with latent variables representing those uncertainties. In the context of typed entity search, both formulations bridge a considerable part of accuracy gap between a generic query that does not constrain the type at all, and upper bound where "perfect" target entity type of each query is provided by humans. Such formulations are superior to a two-stage approach of first choosing a target type using recent query type prediction techniques, and then launching a type-restricted entity search query.

Much recent work focuses on formal interpretation of natural question utterances, with the goal of executing structured queries on knowledge graphs (KGs) such as Freebase. [2] addresses two limitations of this approach when applied to open-domain, entity-oriented Web queries. First, Web queries are rarely well-formed questions. They are telegraphic, with missing verbs, prepositions, clauses, case and phrase clues. Second, KG is always incomplete, unable to directly answer many queries. A novel technique is proposed to segment a telegraphic query and assign a coarse-grained purpose to each segment: a base entity e1, a relation type r, a target entity type t2, and contextual words s. The query seeks entity e2 ∈ t2 where r(e1, e2) holds, further evidenced by schema-agnostic words s. Query segmentation is integrated with the KG and an unstructured corpus where mentions of entities have been linked to the KG. Instead of best or any specific query segmentation, evidence in favor of candidate e2s are aggregated across several segmentations.

Search engines are increasingly relying on large knowledge bases of facts (e.g., famous individuals, organizations or locations) to provide direct answers to users’ queries (e.g., Sarkozy’s wife). However, the construction of these knowledge bases is largely manual and does not scale to the long and heavy tail of facts. Open information extraction assumes that facts are expressed with verb phrases, and therefore has had difficulty with noun-based relations. ReNoun [3] focuses on nominal attributes on the long tail (e.g., Hollande’s ex-girlfriend) and is based on leveraging a large ontology of noun attributes mined from a text corpus/user queries. ReNoun creates a seed set of training data by using specialized patterns and requiring that facts mention an attribute in the ontology. ReNoun then generalizes from this seed set to produce a much larger set of extractions that are then scored.

[4] exploits Web search queries to uncover semantics of open-domain class labels in particular; and of compound noun phrases in general. A weakly-supervised method is applied to anonymized queries to extract lexical interpretations of compound noun phrases (e.g., fortune 500 companies”). The interpretations turn implicit properties or subsuming roles (listed in, from, made by) that modifiers (fortune 500, italian, victorinox) play within longer noun phrases (fortune 500 companies, italian composers, victorinox knives) into explicit strings. The roles of modifiers relative to heads of noun phrase compounds cannot be characterized in terms of a finite list of possible compounding relationships. Hence, interpretations are not restricted to a closed, pre-defined set.

1. Learning Joint Query Interpretation and Response Ranking

2. Knowledge graph and corpus driven segmentation and answer inference for telegraphic entity-seeking queries

3. Renoun: Fact extraction for nominal attributes

4.  Interpreting compound noun phrases using web search queries


To view or add a comment, sign in

More articles by Muthusamy Chelliah

Others also viewed

Explore content categories