About
As a Senior Applied Science Manager at Amazon Robotics, I lead a team of talented…
Activity
-
I'm excited to be a part of this year's RSS Pioneers cohort! I'm grateful to my mentors and to the RSS community for this opportunity.
I'm excited to be a part of this year's RSS Pioneers cohort! I'm grateful to my mentors and to the RSS community for this opportunity.
Liked by Yeshwant (Yesh) Dattatreya
-
Update: There is a lot of interest in this internship and I have a lot of applicants. To help me, please 1. Provide all three pieces of info I asked…
Update: There is a lot of interest in this internship and I have a lot of applicants. To help me, please 1. Provide all three pieces of info I asked…
Posted by Yeshwant (Yesh) Dattatreya
-
I’m hiring an Applied Scientist II Intern for Fall 2026 at Amazon’s Industrial Robotics Group! 🤖 We are looking for someone to conduct research and…
I’m hiring an Applied Scientist II Intern for Fall 2026 at Amazon’s Industrial Robotics Group! 🤖 We are looking for someone to conduct research and…
Liked by Yeshwant (Yesh) Dattatreya
Experience
Education
Licenses & Certifications
Publications
-
Bi-CAT: Improving Robustness of LLM-based Text Rankers to Conditional Distribution Shifts
WWW '24
Retrieval and ranking lie at the heart of several applications like search, question-answering, and recommendations. The use of Large language models (LLMs) such as BERT in these applications have shown promising results in recent times. Recent works on text-based retrievers and rankers show promising results by using bi-encoders (BE) architecture with BERT like LLMs for retrieval and a cross-attention transformer (CAT) architecture BERT or other LLMs for ranking the results retrieved. Although…
Retrieval and ranking lie at the heart of several applications like search, question-answering, and recommendations. The use of Large language models (LLMs) such as BERT in these applications have shown promising results in recent times. Recent works on text-based retrievers and rankers show promising results by using bi-encoders (BE) architecture with BERT like LLMs for retrieval and a cross-attention transformer (CAT) architecture BERT or other LLMs for ranking the results retrieved. Although the use of CAT architecture for re-ranking improves ranking metrics, their robustness to data shifts is not guaranteed. In this work we analyze the robustness of CAT-based rankers. Specifically, we show that CAT rankers are sensitive to item distribution shifts conditioned on a query, we refer to this as conditional item distribution shift (CIDS). CIDS naturally occurs in large online search systems as the retrievers keep evolving, making it challenging to consistently train and evaluate rankers with the same item distribution. In this paper, we formally define CIDS and show that while CAT rankers are sensitive to this, BE models are far more robust to CIDS. We propose a simple yet effective approach referred to as BI-CAT which augments BE model outputs with CAT rankers, to significantly improve the robustness of CAT rankers without any drop in in-distribution performance. We conducted a series of experiments on two publicly available ranking datasets and one dataset from a large e-commerce store. Our results on dataset with CIDS demonstrate that the BI-CAT model significantly improves the robustness of CAT rankers by roughly 100-1000bps in F1 without any reduction in in-distribution model performance.
Other authorsSee publication -
Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity
ICML '23
While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem…
While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem, which often produces vast amounts of unlabeled data.
In this work, we study whether unsupervised pretraining can improve LTR performance over GBDTs and other non-pretrained models. Using simple design choices--including SimCLR-Rank, our ranking-specific modification of SimCLR (an unsupervised pretraining method for images)--we produce pretrained deep learning models that soundly outperform GBDTs (and other non-pretrained models) in the case where labeled data is vastly outnumbered by unlabeled data. We also show that pretrained models also often achieve significantly better robustness than non-pretrained models (GBDTs or DL models) in ranking outlier data.Other authorsSee publication -
On the Value of Behavioral Representations for Dense Retrieval
We consider text retrieval within dense representational space in real-world settings such as e-commerce search where (a) document popularity and (b) diversity of queries associated with a document have a skewed distribution. Most of the contemporary dense retrieval literature presents two shortcomings in these settings. (1) They learn an almost equal number of representations per document, agnostic to the fact that a few head documents are disproportionately more critical to achieving a good…
We consider text retrieval within dense representational space in real-world settings such as e-commerce search where (a) document popularity and (b) diversity of queries associated with a document have a skewed distribution. Most of the contemporary dense retrieval literature presents two shortcomings in these settings. (1) They learn an almost equal number of representations per document, agnostic to the fact that a few head documents are disproportionately more critical to achieving a good retrieval performance. (ii) They learn purely semantic document representations inferred from intrinsic document characteristics which may not contain adequate information to determine the queries for which the document is relevant--especially when the document is short. We propose to overcome these limitations by augmenting semantic document representations learned by bi-encoders with behavioral document representations learned by our proposed approach MVG. To do so, MVG (1) determines how to divide the total budget for behavioral representations by drawing a connection to the Pitman-Yor process, and (2) simply clusters the queries related to a given document (based on user behavior) within the representational space learned by a base bi-encoder, and treats the cluster centers as its behavioral representations. Our central contribution is the finding such a simple intuitive light-weight approach leads to substantial gains in key first-stage retrieval metrics by incurring only a marginal memory overhead. We establish this via extensive experiments over three large public datasets comparing several single-vector and multi-vector bi-encoders, a proprietary e-commerce search dataset compared to production-quality bi-encoder, and an A/B test.
Other authorsSee publication -
Massive Text Normalization via an Efficient Randomized Algorithm
WWW, 2022
Current machine learning techniques in NLP and data mining rely heavily on high-quality text sources. Nevertheless, real-world text datasets contain a significant amount of spelling errors and improperly punctuated variants where the performance of these models would quickly deteriorate. Moreover, existing text normalization methods are prohibitively expensive to execute over web-scale datasets, can hardly process noisy texts from social networks, or require annotations to learn the corrections…
Current machine learning techniques in NLP and data mining rely heavily on high-quality text sources. Nevertheless, real-world text datasets contain a significant amount of spelling errors and improperly punctuated variants where the performance of these models would quickly deteriorate. Moreover, existing text normalization methods are prohibitively expensive to execute over web-scale datasets, can hardly process noisy texts from social networks, or require annotations to learn the corrections in a supervised manner. In this paper, we present Flan (Fast LSH Algorithm for Text Normalization), a scalable randomized algorithm to clean and canonicalize massive text data. Our approach suggests corrections based on the morphology of the words, where lexically similar words are considered the same with high probability. We efficiently handle the pairwise word-to-word comparisons via locality sensitive hashing (LSH). We also propose a novel stabilization process to address the issue of hash collisions between dissimilar words, which is a consequence of the randomized nature of LSH and is exacerbated by the massive scale of real-world datasets. Compared with existing approaches, our method is more efficient, both asymptotically and in empirical evaluations, does not rely on feature engineering, and does not require any annotation. Our experimental results on real-world datasets demonstrate the efficiency and efficacy of Flan. Based on recent advances in densified Minhash, our approach requires much less computational time compared to baseline text normalization techniques on large-scale Twitter and Reddit datasets. In a human evaluation of the quality of the normalization, Flan achieves 5% and 14% improvement against the baselines over the Reddit and Twitter datasets, respectively. Our method also improves performance on Twitter sentiment classification applications and the perturbed GLUE benchmark datasets, where we introduce random errors into the text.
Other authorsSee publication -
A Study of Context Dependencies in Multi-page Product Search
CIKM '19
In product search, users tend to browse results on multiple search result pages (SERPs) (e.g., for queries on clothing and shoes) before deciding which item to purchase. Users' clicks can be considered as implicit feedback which indicates their preferences and used to re-rank subsequent SERPs. Relevance feedback (RF) techniques are usually involved to deal with such scenarios. However, these methods are designed for document retrieval, where relevance is the most important criterion. In…
In product search, users tend to browse results on multiple search result pages (SERPs) (e.g., for queries on clothing and shoes) before deciding which item to purchase. Users' clicks can be considered as implicit feedback which indicates their preferences and used to re-rank subsequent SERPs. Relevance feedback (RF) techniques are usually involved to deal with such scenarios. However, these methods are designed for document retrieval, where relevance is the most important criterion. In contrast, product search engines need to retrieve items that are not only relevant but also satisfactory in terms of customers' preferences. Personalization based on users' purchase history has been shown to be effective in product search. However, this method captures users' long-term interest, which do not always align with their short-term interest, and does not benefit customers with little or no purchase history. In this paper, we study RF techniques based on both long-term and short-term context dependencies in multi-page product search. We also propose an end-to-end context-aware embedding model which can capture both types of context. Our experimental results show that short-term context leads to much better performance compared with long-term and no context. Moreover, our proposed model is more effective than state-of-art word-based RF models.
Other authorsSee publication -
Leverage Implicit Feedback for Context-aware Product Search
SIGIR 2019
Product search serves as an important entry point for online shopping. In contrast to web search, the retrieved results in product search not only need to be relevant but also should satisfy customers’ preferences in order to elicit purchases. Previous work has shown the efficacy of purchase history in personalized product search. However, customers with little or no purchase history do not benefit from personalized product search. Furthermore, preferences extracted from a customer’s purchase…
Product search serves as an important entry point for online shopping. In contrast to web search, the retrieved results in product search not only need to be relevant but also should satisfy customers’ preferences in order to elicit purchases. Previous work has shown the efficacy of purchase history in personalized product search. However, customers with little or no purchase history do not benefit from personalized product search. Furthermore, preferences extracted from a customer’s purchase history are usually long-term and may not always align with her short-term interests. Hence, in this paper, we leverage clicks within a query session, as implicit feedback, to represent users’ hidden intents, which further act as the basis for re-ranking subsequent result pages for the query. It has been studied extensively to model user preference with implicit feedback in recommendation tasks. However, there has been little research on modeling users’ short-term interest in product search. We study whether short-term context could help promote users’ ideal item in the following result pages for a query. Furthermore, we propose an end-to-end context-aware embedding model which can capture long-term and short-term context dependencies. Our experimental results on the datasets collected from the search log of a commercial product search engine show that short-term context leads to much better performance compared with long-term and no context. Our results also show that our proposed model is more effective than word-based context-aware models.
Other authorsSee publication
Patents
-
Locality-Sensitive Hashing to Clean and Normalize Text Logs
Issued US 11,244,156
This patent relates to normalizing text through the use of LSH and graph structures to efficiently generate mappings and normalize the text in a corpus being analyzed. Multiple signatures are generated for each word in the input using a semi-random process, and a graph is generated based on the signatures. This graph is then evaluated to identify related words and generate mappings from each variant to the canonical form.
Other inventorsSee patent
Courses
-
Artificial Intelligence
CS 6601
-
Artificial Intelligence for Robotics
CS 7638
-
Computability, Complexity and Algorithms
CS 6505
-
Computer Networking
CS 6250
-
Data and Visual Analytics
CSE 6242
-
Introduction to Information Security
CS 6035
-
Machine Learning
CS 7641
-
Machine Learning for Trading
CS 7646
-
Network Security
CS 6262
-
Reinforcement Learning
CS 7642
Projects
-
AI tutorials
-
A collection of tutorials written by me on various AI topics
Test Scores
-
GMAT
Score: 730
More activity by Yeshwant (Yesh)
-
🚀 MIT Flow Matching and Diffusion Lecture 2026 Released (https://lnkd.in/e6jxXTkn)! We just released our new MIT 2026 course on flow matching and…
🚀 MIT Flow Matching and Diffusion Lecture 2026 Released (https://lnkd.in/e6jxXTkn)! We just released our new MIT 2026 course on flow matching and…
Liked by Yeshwant (Yesh) Dattatreya
-
White paper with Emmanuel Dupoux and Yann LeCun
White paper with Emmanuel Dupoux and Yann LeCun
Liked by Yeshwant (Yesh) Dattatreya
-
We’re hiring: Applied Scientist (Amazon Robotics – Foundation Models) in Sunnyvale, CA. If you’re excited about bringing large…
We’re hiring: Applied Scientist (Amazon Robotics – Foundation Models) in Sunnyvale, CA. If you’re excited about bringing large…
Shared by Yeshwant (Yesh) Dattatreya
-
I’m excited to announce that a paper (https://lnkd.in/gQvxz8KQ) my team worked on last summer was accepted to ICLR 2026. In this work, we introduce…
I’m excited to announce that a paper (https://lnkd.in/gQvxz8KQ) my team worked on last summer was accepted to ICLR 2026. In this work, we introduce…
Shared by Yeshwant (Yesh) Dattatreya
-
My debut book ,"Fields of Purpose" was launched by Sri.Shivraj Singh Chauhan, Union Minister for Agriculture & Farmers Welfare with N. Chaluvaraya…
My debut book ,"Fields of Purpose" was launched by Sri.Shivraj Singh Chauhan, Union Minister for Agriculture & Farmers Welfare with N. Chaluvaraya…
Liked by Yeshwant (Yesh) Dattatreya
Other similar profiles
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content