Entity linking in commerce queries

Muthusamy Chelliah

Published May 21, 2018

+ Follow

Prereq.: Research papers: foundation of industry-academic...

May be relevant:

Product entity synonym discovery

Product mention recognition in social forums

Search queries provide a large amount of information, which can be organized and converted into a knowledge base for use in downstream applications (e.g., parsing, coreference resolution and entity linking). [1] focuses on the problem of automatically identifying - rare yet useful (e.g., online advertising) - brand and product entities from a large collection of Web queries in online shopping domain. An unsupervised approach based on adaptor grammars is proposed which does not require any human annotation efforts nor rely on any external resources (e.g., IMDB, DBpedia). To reduce noise and normalize query patterns, a standardization step is introduced which groups multiple search patterns and word orderings together into their most frequent ones. Three different sets of grammar rules are presented to infer query structures and extract brand and product entities.

Conventional automatic techniques use large corpora (e.g., news articles) to learn entity types (e.g., person, movie or place). Such text corpora though focus on general knowledge about entities which makes it difficult to satisfy those users with specific and personalized needs. Query logs, which contain billions of entities, help mine word patterns and click-throughs not found in text corpora thus providing a complementary source for discovering entity types based on user behaviors. [2] tackles following challenges in this regard: (1) queries are short texts, and information related to entities is usually very sparse; (2) large amounts of irrelevant/noisy information exists in search logs. Query logs are first modeled using a bipartite graph with entities and their auxiliary information, such as contextual words and clicked URLs. Then a graph-based framework called(ELP:Ensemble framework based on Label Propagation) is proposed to learn both types of entities and auxiliary signals. In ELP, two separate strategies are designed to fix problems of sparsity/noise in query logs.

Identifying and disambiguating entity references in queries is a key enabler for semantic search. Challenges in this regard are limited context which query provides coupled with time constraint of an online setting. This hampers ability to understand searcher’s intent and provide relevant/focused response. Supervised methods are expected to yield high effectiveness coupled with lower efficiency while for unsupervised approaches it is the other way around. Following features are needed for entity linking: (i) contextual similarity between a candidate and surrounding text of mention, and (ii) interdependence between all entity linking decisions in text (extracted from underlying KB). [3] strikes a balance between effectiveness and efficiency by employing supervised learning for entity ranking, while tackling disambiguation with a simple unsupervised algorithm. Experimental results/analysis leveraged is that high-quality results from ranking is much helpful - and entity interdependency in contrast - very little - in disambiguation.

Search engines are the closest available substitute for world knowledge that is required for solving complex natural language understanding tasks. [4] piggybacking on such an engine alleviates noise and irregularities in query language - characterized by misspellings, unreliable tokenization, word order and capitalization - thus putting queries in a larger context in which it is easier to make sense of them. Key underlying algorithmic idea is to first discover a candidate set of entities and then link-back those entities to their mentions occurring in input query. This allows confining possible concepts pertinent to query to only ones really mentioned in it. Link-back is implemented via a collective disambiguation step based upon a supervised ranking model that makes one joint prediction for annotation of complete query optimizing directly F1 measure. Both known features (e.g., semantic relatedness among entities, word embeddings) and several novel features (e.g., approximate distance between mentions/entities (which can handle spelling errors)) are evaluated.

Entity linking in commerce queries

Muthusamy Chelliah

Product mention recognition in social forums

1. Query to Knowledge: Unsupervised Entity Extraction from Shopping Queries using Adaptor Grammars

2. Learning Entity Types from Query Logs via Graph-Based Modeling

3. Entity Linking in Queries: Efficiency vs. Effectiveness

4. A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries

More articles by this author

Others also viewed

Inside Ask Brave – Understanding AI-Assisted Search & Source Evaluation

Entity standardization for GEO: why consistency makes you visible

Structured Data Isn't Just for Rich Snippets Anymore

Peer Review - August 2025

From Keyword Search to Meaning: How Vector Search and RAG Are Transforming Enterprise Knowledge in 2025

June Core Update wraps, AI Overviews get video boost, and Search Console flags missing data

My Friend Asked: “Why Can’t I Just Give My Blog to an LLM for GEO?”

How to Optimize FAQs for AI-Centric Search Models

EP08 GEO Explained: What Actually Changed in AI Search (and What Didn’t)

Query Augmentation in modern SEO

Explore content categories

Product mention recognition in social forums

1. Query to Knowledge: Unsupervised Entity Extraction from Shopping Queries using Adaptor Grammars

2. Learning Entity Types from Query Logs via Graph-Based Modeling

3. Entity Linking in Queries: Efficiency vs. Effectiveness

4. A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries

Next Basket Recommendation - Potpourri (SOTA)

Apr 22, 2023

Next Basket Recommendation - Potpourri (Recent)

Apr 20, 2023

Repeat purchase recommendation for consumable replenishment: SOTA

Apr 18, 2023

Aspect/sentiment-aware review summarization (SOTA)

Apr 15, 2023

Aspect/sentiment-aware review summarization (Recent)

Apr 13, 2023

Aspect/sentiment-aware review summarization (Seminal)

Apr 10, 2023

Product categorisation: recent work

Jan 20, 2023

Text generation [3]: explainable recommendation

Jan 12, 2023

Text generation [2]: product reviews

Jan 9, 2023

Multimodal product summarization

Jan 3, 2023

Others also viewed

Inside Ask Brave – Understanding AI-Assisted Search & Source Evaluation

Entity standardization for GEO: why consistency makes you visible

Structured Data Isn't Just for Rich Snippets Anymore

Peer Review - August 2025

From Keyword Search to Meaning: How Vector Search and RAG Are Transforming Enterprise Knowledge in 2025

June Core Update wraps, AI Overviews get video boost, and Search Console flags missing data

My Friend Asked: “Why Can’t I Just Give My Blog to an LLM for GEO?”

How to Optimize FAQs for AI-Centric Search Models

EP08 GEO Explained: What Actually Changed in AI Search (and What Didn’t)

Query Augmentation in modern SEO

Explore content categories