Yeshwant (Yesh) Dattatreya

Yeshwant (Yesh) Dattatreya

San Francisco Bay Area
11K followers 500+ connections

About

As a Senior Applied Science Manager at Amazon Robotics, I lead a team of talented…

Activity

Join now to see all activity

Experience

  • Amazon Fulfillment Technologies & Robotics Graphic

    Amazon Fulfillment Technologies & Robotics

    Santa Clara County, California, United States

  • -

    Palo Alto

  • -

    Atlanta, Georgia, United States

  • -

    Sunnyvale, California, United States

  • -

    San Francisco Bay Area

  • -

    San Francisco Bay Area

  • -

    Greater Atlanta Area

  • -

    Greater Atlanta Area

  • -

    Atlanta, GA

  • -

    Greater Atlanta Area

  • -

    Bangalore, India

Education

Licenses & Certifications

Publications

  • Bi-CAT: Improving Robustness of LLM-based Text Rankers to Conditional Distribution Shifts

    WWW '24

    Retrieval and ranking lie at the heart of several applications like search, question-answering, and recommendations. The use of Large language models (LLMs) such as BERT in these applications have shown promising results in recent times. Recent works on text-based retrievers and rankers show promising results by using bi-encoders (BE) architecture with BERT like LLMs for retrieval and a cross-attention transformer (CAT) architecture BERT or other LLMs for ranking the results retrieved. Although…

    Retrieval and ranking lie at the heart of several applications like search, question-answering, and recommendations. The use of Large language models (LLMs) such as BERT in these applications have shown promising results in recent times. Recent works on text-based retrievers and rankers show promising results by using bi-encoders (BE) architecture with BERT like LLMs for retrieval and a cross-attention transformer (CAT) architecture BERT or other LLMs for ranking the results retrieved. Although the use of CAT architecture for re-ranking improves ranking metrics, their robustness to data shifts is not guaranteed. In this work we analyze the robustness of CAT-based rankers. Specifically, we show that CAT rankers are sensitive to item distribution shifts conditioned on a query, we refer to this as conditional item distribution shift (CIDS). CIDS naturally occurs in large online search systems as the retrievers keep evolving, making it challenging to consistently train and evaluate rankers with the same item distribution. In this paper, we formally define CIDS and show that while CAT rankers are sensitive to this, BE models are far more robust to CIDS. We propose a simple yet effective approach referred to as BI-CAT which augments BE model outputs with CAT rankers, to significantly improve the robustness of CAT rankers without any drop in in-distribution performance. We conducted a series of experiments on two publicly available ranking datasets and one dataset from a large e-commerce store. Our results on dataset with CIDS demonstrate that the BI-CAT model significantly improves the robustness of CAT rankers by roughly 100-1000bps in F1 without any reduction in in-distribution model performance.

    Other authors
    See publication
  • Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

    ICML '23

    While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem…

    While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem, which often produces vast amounts of unlabeled data.

    In this work, we study whether unsupervised pretraining can improve LTR performance over GBDTs and other non-pretrained models. Using simple design choices--including SimCLR-Rank, our ranking-specific modification of SimCLR (an unsupervised pretraining method for images)--we produce pretrained deep learning models that soundly outperform GBDTs (and other non-pretrained models) in the case where labeled data is vastly outnumbered by unlabeled data. We also show that pretrained models also often achieve significantly better robustness than non-pretrained models (GBDTs or DL models) in ranking outlier data.

    Other authors
    See publication
  • On the Value of Behavioral Representations for Dense Retrieval

    We consider text retrieval within dense representational space in real-world settings such as e-commerce search where (a) document popularity and (b) diversity of queries associated with a document have a skewed distribution. Most of the contemporary dense retrieval literature presents two shortcomings in these settings. (1) They learn an almost equal number of representations per document, agnostic to the fact that a few head documents are disproportionately more critical to achieving a good…

    We consider text retrieval within dense representational space in real-world settings such as e-commerce search where (a) document popularity and (b) diversity of queries associated with a document have a skewed distribution. Most of the contemporary dense retrieval literature presents two shortcomings in these settings. (1) They learn an almost equal number of representations per document, agnostic to the fact that a few head documents are disproportionately more critical to achieving a good retrieval performance. (ii) They learn purely semantic document representations inferred from intrinsic document characteristics which may not contain adequate information to determine the queries for which the document is relevant--especially when the document is short. We propose to overcome these limitations by augmenting semantic document representations learned by bi-encoders with behavioral document representations learned by our proposed approach MVG. To do so, MVG (1) determines how to divide the total budget for behavioral representations by drawing a connection to the Pitman-Yor process, and (2) simply clusters the queries related to a given document (based on user behavior) within the representational space learned by a base bi-encoder, and treats the cluster centers as its behavioral representations. Our central contribution is the finding such a simple intuitive light-weight approach leads to substantial gains in key first-stage retrieval metrics by incurring only a marginal memory overhead. We establish this via extensive experiments over three large public datasets comparing several single-vector and multi-vector bi-encoders, a proprietary e-commerce search dataset compared to production-quality bi-encoder, and an A/B test.

    Other authors
    See publication
  • Massive Text Normalization via an Efficient Randomized Algorithm

    WWW, 2022

    Current machine learning techniques in NLP and data mining rely heavily on high-quality text sources. Nevertheless, real-world text datasets contain a significant amount of spelling errors and improperly punctuated variants where the performance of these models would quickly deteriorate. Moreover, existing text normalization methods are prohibitively expensive to execute over web-scale datasets, can hardly process noisy texts from social networks, or require annotations to learn the corrections…

    Current machine learning techniques in NLP and data mining rely heavily on high-quality text sources. Nevertheless, real-world text datasets contain a significant amount of spelling errors and improperly punctuated variants where the performance of these models would quickly deteriorate. Moreover, existing text normalization methods are prohibitively expensive to execute over web-scale datasets, can hardly process noisy texts from social networks, or require annotations to learn the corrections in a supervised manner. In this paper, we present Flan (Fast LSH Algorithm for Text Normalization), a scalable randomized algorithm to clean and canonicalize massive text data. Our approach suggests corrections based on the morphology of the words, where lexically similar words are considered the same with high probability. We efficiently handle the pairwise word-to-word comparisons via locality sensitive hashing (LSH). We also propose a novel stabilization process to address the issue of hash collisions between dissimilar words, which is a consequence of the randomized nature of LSH and is exacerbated by the massive scale of real-world datasets. Compared with existing approaches, our method is more efficient, both asymptotically and in empirical evaluations, does not rely on feature engineering, and does not require any annotation. Our experimental results on real-world datasets demonstrate the efficiency and efficacy of Flan. Based on recent advances in densified Minhash, our approach requires much less computational time compared to baseline text normalization techniques on large-scale Twitter and Reddit datasets. In a human evaluation of the quality of the normalization, Flan achieves 5% and 14% improvement against the baselines over the Reddit and Twitter datasets, respectively. Our method also improves performance on Twitter sentiment classification applications and the perturbed GLUE benchmark datasets, where we introduce random errors into the text.

    Other authors
    See publication
  • A Study of Context Dependencies in Multi-page Product Search

    CIKM '19

    In product search, users tend to browse results on multiple search result pages (SERPs) (e.g., for queries on clothing and shoes) before deciding which item to purchase. Users' clicks can be considered as implicit feedback which indicates their preferences and used to re-rank subsequent SERPs. Relevance feedback (RF) techniques are usually involved to deal with such scenarios. However, these methods are designed for document retrieval, where relevance is the most important criterion. In…

    In product search, users tend to browse results on multiple search result pages (SERPs) (e.g., for queries on clothing and shoes) before deciding which item to purchase. Users' clicks can be considered as implicit feedback which indicates their preferences and used to re-rank subsequent SERPs. Relevance feedback (RF) techniques are usually involved to deal with such scenarios. However, these methods are designed for document retrieval, where relevance is the most important criterion. In contrast, product search engines need to retrieve items that are not only relevant but also satisfactory in terms of customers' preferences. Personalization based on users' purchase history has been shown to be effective in product search. However, this method captures users' long-term interest, which do not always align with their short-term interest, and does not benefit customers with little or no purchase history. In this paper, we study RF techniques based on both long-term and short-term context dependencies in multi-page product search. We also propose an end-to-end context-aware embedding model which can capture both types of context. Our experimental results show that short-term context leads to much better performance compared with long-term and no context. Moreover, our proposed model is more effective than state-of-art word-based RF models.

    Other authors
    See publication
  • Leverage Implicit Feedback for Context-aware Product Search

    SIGIR 2019

    Product search serves as an important entry point for online shopping. In contrast to web search, the retrieved results in product search not only need to be relevant but also should satisfy customers’ preferences in order to elicit purchases. Previous work has shown the efficacy of purchase history in personalized product search. However, customers with little or no purchase history do not benefit from personalized product search. Furthermore, preferences extracted from a customer’s purchase…

    Product search serves as an important entry point for online shopping. In contrast to web search, the retrieved results in product search not only need to be relevant but also should satisfy customers’ preferences in order to elicit purchases. Previous work has shown the efficacy of purchase history in personalized product search. However, customers with little or no purchase history do not benefit from personalized product search. Furthermore, preferences extracted from a customer’s purchase history are usually long-term and may not always align with her short-term interests. Hence, in this paper, we leverage clicks within a query session, as implicit feedback, to represent users’ hidden intents, which further act as the basis for re-ranking subsequent result pages for the query. It has been studied extensively to model user preference with implicit feedback in recommendation tasks. However, there has been little research on modeling users’ short-term interest in product search. We study whether short-term context could help promote users’ ideal item in the following result pages for a query. Furthermore, we propose an end-to-end context-aware embedding model which can capture long-term and short-term context dependencies. Our experimental results on the datasets collected from the search log of a commercial product search engine show that short-term context leads to much better performance compared with long-term and no context. Our results also show that our proposed model is more effective than word-based context-aware models.

    Other authors
    See publication

Patents

  • Locality-Sensitive Hashing to Clean and Normalize Text Logs

    Issued US 11,244,156

    This patent relates to normalizing text through the use of LSH and graph structures to efficiently generate mappings and normalize the text in a corpus being analyzed. Multiple signatures are generated for each word in the input using a semi-random process, and a graph is generated based on the signatures. This graph is then evaluated to identify related words and generate mappings from each variant to the canonical form.

    Other inventors
    See patent

Courses

  • Artificial Intelligence

    CS 6601

  • Artificial Intelligence for Robotics

    CS 7638

  • Computability, Complexity and Algorithms

    CS 6505

  • Computer Networking

    CS 6250

  • Data and Visual Analytics

    CSE 6242

  • Introduction to Information Security

    CS 6035

  • Machine Learning

    CS 7641

  • Machine Learning for Trading

    CS 7646

  • Network Security

    CS 6262

  • Reinforcement Learning

    CS 7642

Projects

Test Scores

  • GMAT

    Score: 730

More activity by Yeshwant (Yesh)

View Yeshwant (Yesh)’s full profile

  • See who you know in common
  • Get introduced
  • Contact Yeshwant (Yesh) directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses