Yeshwant (Yesh) Dattatreya

Yeshwant (Yesh) Dattatreya · 2026-02-19T01:38:37.235Z

I’m excited to announce that a paper (https://lnkd.in/gQvxz8KQ) my team worked on last summer was accepted to ICLR 2026. In this work, we introduce ARMOR (Adaptive Round-based Multi-task mOdel for Robotic failure detection and reasoning), a method that helps robots not only detect when something went wrong, but also explain why in plain language. In real deployments, failures are often subtle and hard to enumerate ahead of time, and while “success/failure” labels are easy to log at scale, detailed explanations are expensive to annotate. ARMOR is designed specifically for this reality: it learns from lots of cheap binary labels plus a smaller set of rich human explanations, and it improves its outputs through a multi-round “self-refinement” process that iteratively updates both (1) a failure detector and (2) an open-ended natural language explanation conditioned on previous rounds. Empirically, ARMOR shows strong gains across both simulated and real-world warehouse datasets, improving failure detection and producing more accurate, human-like explanations than prior baselines - especially in settings where reasoning labels are scarce. Project page: https://lnkd.in/gVwW6JcK Huge congratulations to Carl Qi for leading this work, and driving the core ideas and execution end-to-end, and it’s great to see it land.

San Francisco Bay Area
11K followers 500+ connections

View mutual connections with Yeshwant (Yesh)

Yeshwant (Yesh) can introduce you to 10+ people at Amazon Fulfillment Technologies & Robotics

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Join to follow

Amazon Fulfillment Technologies & Robotics

Georgia Institute of Technology

About

As a Senior Applied Science Manager at Amazon Robotics, I lead a team of talented…

Activity

I'm excited to be a part of this year's RSS Pioneers cohort! I'm grateful to my mentors and to the RSS community for this opportunity.

I'm excited to be a part of this year's RSS Pioneers cohort! I'm grateful to my mentors and to the RSS community for this opportunity.

Liked by Yeshwant (Yesh) Dattatreya
Update: There is a lot of interest in this internship and I have a lot of applicants. To help me, please 1. Provide all three pieces of info I asked…

Update: There is a lot of interest in this internship and I have a lot of applicants. To help me, please 1. Provide all three pieces of info I asked…

Posted by Yeshwant (Yesh) Dattatreya
I’m hiring an Applied Scientist II Intern for Fall 2026 at Amazon’s Industrial Robotics Group! 🤖 We are looking for someone to conduct research and…

I’m hiring an Applied Scientist II Intern for Fall 2026 at Amazon’s Industrial Robotics Group! 🤖 We are looking for someone to conduct research and…

Liked by Yeshwant (Yesh) Dattatreya

Join now to see all activity

Experience

Amazon Fulfillment Technologies & Robotics

Santa Clara County, California, United States
-

Palo Alto
-

Atlanta, Georgia, United States
-

Sunnyvale, California, United States
-

San Francisco Bay Area
-

San Francisco Bay Area
-

Greater Atlanta Area
-

Greater Atlanta Area
-

Atlanta, GA
-

Greater Atlanta Area
-

Bangalore, India

Education

Georgia Institute of Technology

-

2015 - 2018

Major in Machine Learning
-

1997 - 1999
-

1993 - 1997

Licenses & Certifications

Natural Language Processing with Probabilistic Models

Coursera

Issued Sep 2021

Credential ID UUTLD4C7GDAH

See credential
Natural Language Processing with Classification and Vector Spaces

Coursera

Issued Aug 2021

Credential ID S23W7Z62AMCW

See credential
Developing on AWS

Amazon Web Services (AWS)

Issued Apr 2021

Credential ID YrJ6JhLfAE2aGc6QixjcCw2

See credential
Algorithmic Toolbox

Coursera

Issued Dec 2019

Credential ID QCTD93TD6W7J

See credential
Sequence Models

Coursera

Issued Oct 2019

Credential ID E6GTTTP66YA2

See credential
Convolutional Neural Networks

Coursera

Issued Sep 2019

Credential ID VBVN8G6QSU6N

See credential
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

Coursera

Issued Jul 2019

Credential ID 7RS3S5KPF55C

See credential
Neural Networks and Deep Learning

Coursera

Issued Jul 2019

Credential ID 258NMUBCK3VG

See credential
Structuring Machine Learning Projects

Coursera

Issued Jul 2019

Credential ID MQ4NGGHH4TJM

See credential
Project Management Professional (PMP)

Project Management Institute

Issued Mar 2005

Credential ID 219682

See credential

Join now to see all certifications

Publications

Bi-CAT: Improving Robustness of LLM-based Text Rankers to Conditional Distribution Shifts

WWW '24 May 13, 2024
Retrieval and ranking lie at the heart of several applications like search, question-answering, and recommendations. The use of Large language models (LLMs) such as BERT in these applications have shown promising results in recent times. Recent works on text-based retrievers and rankers show promising results by using bi-encoders (BE) architecture with BERT like LLMs for retrieval and a cross-attention transformer (CAT) architecture BERT or other LLMs for ranking the results retrieved. Although…

Retrieval and ranking lie at the heart of several applications like search, question-answering, and recommendations. The use of Large language models (LLMs) such as BERT in these applications have shown promising results in recent times. Recent works on text-based retrievers and rankers show promising results by using bi-encoders (BE) architecture with BERT like LLMs for retrieval and a cross-attention transformer (CAT) architecture BERT or other LLMs for ranking the results retrieved. Although the use of CAT architecture for re-ranking improves ranking metrics, their robustness to data shifts is not guaranteed. In this work we analyze the robustness of CAT-based rankers. Specifically, we show that CAT rankers are sensitive to item distribution shifts conditioned on a query, we refer to this as conditional item distribution shift (CIDS). CIDS naturally occurs in large online search systems as the retrievers keep evolving, making it challenging to consistently train and evaluate rankers with the same item distribution. In this paper, we formally define CIDS and show that while CAT rankers are sensitive to this, BE models are far more robust to CIDS. We propose a simple yet effective approach referred to as BI-CAT which augments BE model outputs with CAT rankers, to significantly improve the robustness of CAT rankers without any drop in in-distribution performance. We conducted a series of experiments on two publicly available ranking datasets and one dataset from a large e-commerce store. Our results on dataset with CIDS demonstrate that the BI-CAT model significantly improves the robustness of CAT rankers by roughly 100-1000bps in F1 without any reduction in in-distribution model performance.

Other authors
See publication
Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

ICML '23 June 28, 2023
While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem…

While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem, which often produces vast amounts of unlabeled data.

In this work, we study whether unsupervised pretraining can improve LTR performance over GBDTs and other non-pretrained models. Using simple design choices--including SimCLR-Rank, our ranking-specific modification of SimCLR (an unsupervised pretraining method for images)--we produce pretrained deep learning models that soundly outperform GBDTs (and other non-pretrained models) in the case where labeled data is vastly outnumbered by unlabeled data. We also show that pretrained models also often achieve significantly better robustness than non-pretrained models (GBDTs or DL models) in ranking outlier data.

Other authors
See publication
On the Value of Behavioral Representations for Dense Retrieval

August 11, 2022
We consider text retrieval within dense representational space in real-world settings such as e-commerce search where (a) document popularity and (b) diversity of queries associated with a document have a skewed distribution. Most of the contemporary dense retrieval literature presents two shortcomings in these settings. (1) They learn an almost equal number of representations per document, agnostic to the fact that a few head documents are disproportionately more critical to achieving a good…

We consider text retrieval within dense representational space in real-world settings such as e-commerce search where (a) document popularity and (b) diversity of queries associated with a document have a skewed distribution. Most of the contemporary dense retrieval literature presents two shortcomings in these settings. (1) They learn an almost equal number of representations per document, agnostic to the fact that a few head documents are disproportionately more critical to achieving a good retrieval performance. (ii) They learn purely semantic document representations inferred from intrinsic document characteristics which may not contain adequate information to determine the queries for which the document is relevant--especially when the document is short. We propose to overcome these limitations by augmenting semantic document representations learned by bi-encoders with behavioral document representations learned by our proposed approach MVG. To do so, MVG (1) determines how to divide the total budget for behavioral representations by drawing a connection to the Pitman-Yor process, and (2) simply clusters the queries related to a given document (based on user behavior) within the representational space learned by a base bi-encoder, and treats the cluster centers as its behavioral representations. Our central contribution is the finding such a simple intuitive light-weight approach leads to substantial gains in key first-stage retrieval metrics by incurring only a marginal memory overhead. We establish this via extensive experiments over three large public datasets comparing several single-vector and multi-vector bi-encoders, a proprietary e-commerce search dataset compared to production-quality bi-encoder, and an A/B test.

Other authors
See publication
Massive Text Normalization via an Efficient Randomized Algorithm

WWW, 2022 April 8, 2022
Current machine learning techniques in NLP and data mining rely heavily on high-quality text sources. Nevertheless, real-world text datasets contain a significant amount of spelling errors and improperly punctuated variants where the performance of these models would quickly deteriorate. Moreover, existing text normalization methods are prohibitively expensive to execute over web-scale datasets, can hardly process noisy texts from social networks, or require annotations to learn the corrections…

Current machine learning techniques in NLP and data mining rely heavily on high-quality text sources. Nevertheless, real-world text datasets contain a significant amount of spelling errors and improperly punctuated variants where the performance of these models would quickly deteriorate. Moreover, existing text normalization methods are prohibitively expensive to execute over web-scale datasets, can hardly process noisy texts from social networks, or require annotations to learn the corrections in a supervised manner. In this paper, we present Flan (Fast LSH Algorithm for Text Normalization), a scalable randomized algorithm to clean and canonicalize massive text data. Our approach suggests corrections based on the morphology of the words, where lexically similar words are considered the same with high probability. We efficiently handle the pairwise word-to-word comparisons via locality sensitive hashing (LSH). We also propose a novel stabilization process to address the issue of hash collisions between dissimilar words, which is a consequence of the randomized nature of LSH and is exacerbated by the massive scale of real-world datasets. Compared with existing approaches, our method is more efficient, both asymptotically and in empirical evaluations, does not rely on feature engineering, and does not require any annotation. Our experimental results on real-world datasets demonstrate the efficiency and efficacy of Flan. Based on recent advances in densified Minhash, our approach requires much less computational time compared to baseline text normalization techniques on large-scale Twitter and Reddit datasets. In a human evaluation of the quality of the normalization, Flan achieves 5% and 14% improvement against the baselines over the Reddit and Twitter datasets, respectively. Our method also improves performance on Twitter sentiment classification applications and the perturbed GLUE benchmark datasets, where we introduce random errors into the text.

Other authors
See publication
A Study of Context Dependencies in Multi-page Product Search

CIKM '19 November 3, 2019
In product search, users tend to browse results on multiple search result pages (SERPs) (e.g., for queries on clothing and shoes) before deciding which item to purchase. Users' clicks can be considered as implicit feedback which indicates their preferences and used to re-rank subsequent SERPs. Relevance feedback (RF) techniques are usually involved to deal with such scenarios. However, these methods are designed for document retrieval, where relevance is the most important criterion. In…

In product search, users tend to browse results on multiple search result pages (SERPs) (e.g., for queries on clothing and shoes) before deciding which item to purchase. Users' clicks can be considered as implicit feedback which indicates their preferences and used to re-rank subsequent SERPs. Relevance feedback (RF) techniques are usually involved to deal with such scenarios. However, these methods are designed for document retrieval, where relevance is the most important criterion. In contrast, product search engines need to retrieve items that are not only relevant but also satisfactory in terms of customers' preferences. Personalization based on users' purchase history has been shown to be effective in product search. However, this method captures users' long-term interest, which do not always align with their short-term interest, and does not benefit customers with little or no purchase history. In this paper, we study RF techniques based on both long-term and short-term context dependencies in multi-page product search. We also propose an end-to-end context-aware embedding model which can capture both types of context. Our experimental results show that short-term context leads to much better performance compared with long-term and no context. Moreover, our proposed model is more effective than state-of-art word-based RF models.

Other authors
See publication
Leverage Implicit Feedback for Context-aware Product Search

SIGIR 2019
Product search serves as an important entry point for online shopping. In contrast to web search, the retrieved results in product search not only need to be relevant but also should satisfy customers’ preferences in order to elicit purchases. Previous work has shown the efficacy of purchase history in personalized product search. However, customers with little or no purchase history do not benefit from personalized product search. Furthermore, preferences extracted from a customer’s purchase…

Product search serves as an important entry point for online shopping. In contrast to web search, the retrieved results in product search not only need to be relevant but also should satisfy customers’ preferences in order to elicit purchases. Previous work has shown the efficacy of purchase history in personalized product search. However, customers with little or no purchase history do not benefit from personalized product search. Furthermore, preferences extracted from a customer’s purchase history are usually long-term and may not always align with her short-term interests. Hence, in this paper, we leverage clicks within a query session, as implicit feedback, to represent users’ hidden intents, which further act as the basis for re-ranking subsequent result pages for the query. It has been studied extensively to model user preference with implicit feedback in recommendation tasks. However, there has been little research on modeling users’ short-term interest in product search. We study whether short-term context could help promote users’ ideal item in the following result pages for a query. Furthermore, we propose an end-to-end context-aware embedding model which can capture long-term and short-term context dependencies. Our experimental results on the datasets collected from the search log of a commercial product search engine show that short-term context leads to much better performance compared with long-term and no context. Our results also show that our proposed model is more effective than word-based context-aware models.

Other authors
See publication

Patents

Locality-Sensitive Hashing to Clean and Normalize Text Logs

Issued February 8, 2022 US 11,244,156
This patent relates to normalizing text through the use of LSH and graph structures to efficiently generate mappings and normalize the text in a corpus being analyzed. Multiple signatures are generated for each word in the input using a semi-random process, and a graph is generated based on the signatures. This graph is then evaluated to identify related words and generate mappings from each variant to the canonical form.

Other inventors
See patent

Courses

Artificial Intelligence

CS 6601
Artificial Intelligence for Robotics

CS 7638
Computability, Complexity and Algorithms

CS 6505
Computer Networking

CS 6250
Data and Visual Analytics

CSE 6242
Introduction to Information Security

CS 6035
Machine Learning

CS 7641
Machine Learning for Trading

CS 7646
Network Security

CS 6262
Reinforcement Learning

CS 7642

Projects

AI tutorials

-

A collection of tutorials written by me on various AI topics
Some Algorithms in Java

-

See project

Test Scores

GMAT

Score: 730

May 2014

More activity by Yeshwant (Yesh)

🚀 MIT Flow Matching and Diffusion Lecture 2026 Released (https://lnkd.in/e6jxXTkn)! We just released our new MIT 2026 course on flow matching and…

🚀 MIT Flow Matching and Diffusion Lecture 2026 Released (https://lnkd.in/e6jxXTkn)! We just released our new MIT 2026 course on flow matching and…

Liked by Yeshwant (Yesh) Dattatreya
White paper with Emmanuel Dupoux and Yann LeCun

White paper with Emmanuel Dupoux and Yann LeCun

Liked by Yeshwant (Yesh) Dattatreya
We’re hiring: Applied Scientist (Amazon Robotics – Foundation Models) in Sunnyvale, CA. If you’re excited about bringing large…

We’re hiring: Applied Scientist (Amazon Robotics – Foundation Models) in Sunnyvale, CA. If you’re excited about bringing large…

Shared by Yeshwant (Yesh) Dattatreya
I’m excited to announce that a paper (https://lnkd.in/gQvxz8KQ) my team worked on last summer was accepted to ICLR 2026. In this work, we introduce…

I’m excited to announce that a paper (https://lnkd.in/gQvxz8KQ) my team worked on last summer was accepted to ICLR 2026. In this work, we introduce…

Shared by Yeshwant (Yesh) Dattatreya
My debut book ,"Fields of Purpose" was launched by Sri.Shivraj Singh Chauhan, Union Minister for Agriculture & Farmers Welfare with N. Chaluvaraya…

My debut book ,"Fields of Purpose" was launched by Sri.Shivraj Singh Chauhan, Union Minister for Agriculture & Farmers Welfare with N. Chaluvaraya…

Liked by Yeshwant (Yesh) Dattatreya

View Yeshwant (Yesh)’s full profile

See who you know in common
Get introduced
Contact Yeshwant (Yesh) directly

Join to view full profile

Other similar profiles

Nashlie Sephus, Ph.D.

Nashlie Sephus, Ph.D.

Atlanta, GA

Connect
Parminder Bhatia

Parminder Bhatia

Seattle, WA

Connect
Hao Li

Hao Li

Mountain View, CA

Connect
Zahra Sarmast

Zahra Sarmast

Los Angeles, CA

Connect
Le Li

Le Li

Greater Seattle Area

Connect
Dan Resnic

Dan Resnic

Miami, FL

Connect
Pan Wu

Pan Wu

San Francisco Bay Area

Connect
Melissa Gadebusch

Melissa Gadebusch

Roswell, GA

Connect
Kevin Just

Kevin Just

Fairfax, CA

Connect
Ioannis (Yanni) Antonellis

Ioannis (Yanni) Antonellis

Menlo Park, CA

Connect
Rensheng Zhou

Rensheng Zhou

Beijing, China

Connect
Arthi Suresh

Arthi Suresh

San Francisco Bay Area

Connect
Hoofar Pourzand

Hoofar Pourzand

Philadelphia, PA

Connect
Dorothy(Dongmei) Ren

Dorothy(Dongmei) Ren

Sunnyvale, CA

Connect
Ujjwal K.

Ujjwal K.

United States

Connect
Abraham Ray

Abraham Ray

Greater Seattle Area

Connect
Heather Hui Cheng

Heather Hui Cheng

Redwood City, CA

Connect
Feng Zhu

Feng Zhu

Seattle, WA

Connect
Vikramank Singh

Vikramank Singh

San Francisco, CA

Connect
Kun Huang

Kun Huang

San Mateo, CA

Connect

Explore more posts

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses

See all courses

Yeshwant (Yesh) Dattatreya

San Francisco Bay Area 11K followers 500+ connections

About

Activity

I'm excited to be a part of this year's RSS Pioneers cohort! I'm grateful to my mentors and to the RSS community for this opportunity.

Liked by Yeshwant (Yesh) Dattatreya

Update: There is a lot of interest in this internship and I have a lot of applicants. To help me, please 1. Provide all three pieces of info I asked…

Posted by Yeshwant (Yesh) Dattatreya

I’m hiring an Applied Scientist II Intern for Fall 2026 at Amazon’s Industrial Robotics Group! 🤖 We are looking for someone to conduct research and…

Liked by Yeshwant (Yesh) Dattatreya

Experience

-

-

-

-

-

-

-

-

-

-

Education

-

-

-

Licenses & Certifications

Publications

WWW '24 May 13, 2024

ICML '23 June 28, 2023

August 11, 2022

WWW, 2022 April 8, 2022

CIKM '19 November 3, 2019

SIGIR 2019

Patents

Issued February 8, 2022 US 11,244,156

Courses

Artificial Intelligence

CS 6601

Artificial Intelligence for Robotics

CS 7638

Computability, Complexity and Algorithms

CS 6505

Computer Networking

CS 6250

Data and Visual Analytics

CSE 6242

Introduction to Information Security

CS 6035

Machine Learning

CS 7641

Machine Learning for Trading

CS 7646

Network Security

CS 6262

Reinforcement Learning

CS 7642

Projects

AI tutorials

-

-

Test Scores

GMAT

Score: 730

More activity by Yeshwant (Yesh)

🚀 MIT Flow Matching and Diffusion Lecture 2026 Released (https://lnkd.in/e6jxXTkn)! We just released our new MIT 2026 course on flow matching and…

Liked by Yeshwant (Yesh) Dattatreya

White paper with Emmanuel Dupoux and Yann LeCun

Liked by Yeshwant (Yesh) Dattatreya

We’re hiring: Applied Scientist (Amazon Robotics – Foundation Models) in Sunnyvale, CA. If you’re excited about bringing large…

Shared by Yeshwant (Yesh) Dattatreya

I’m excited to announce that a paper (https://lnkd.in/gQvxz8KQ) my team worked on last summer was accepted to ICLR 2026. In this work, we introduce…

Shared by Yeshwant (Yesh) Dattatreya

My debut book ,"Fields of Purpose" was launched by Sri.Shivraj Singh Chauhan, Union Minister for Agriculture & Farmers Welfare with N. Chaluvaraya…

Liked by Yeshwant (Yesh) Dattatreya

View Yeshwant (Yesh)’s full profile

Other similar profiles

Nashlie Sephus, Ph.D.

Parminder Bhatia

Hao Li

Zahra Sarmast

San Francisco Bay Area
11K followers 500+ connections