Publications

Amazon is a great place to practice science and have real business impact, but that's only one part of the story. Our scientists continue to publish, teach, and engage with the worldwide research community, sharing insights across diverse disciplines from machine learning to operations research. Through these contributions, we're advancing scientific knowledge while developing innovations that address complex challenges for customers and society.

4,183 results found

Sort

DocFormer: End-to-end transformer for document understanding

Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, R. Manmatha

ICCV 2021

2021

We present DocFormer - a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU). VDU is a challenging problem which aims to understand documents in their varied formats (forms, receipts etc.) and layouts. In addition, DocFormer is pre-trained in an unsupervised fashion using carefully designed tasks which encourage multi-modal interaction. DocFormer uses text, vision

Computer vision
Single view physical distance estimation using human pose

Xiaohan Fei, Henry Wang, Lin Lee Cheong, Xiangyu Zeng, Meng Wang, Joe Tighe

ICCV 2021

2021

We propose a fully automated system that simultaneously estimates the camera intrinsics, the ground plane, and physical distances between people from a single RGB image or video captured by a camera viewing a 3-D scene from a fixed vantage point. To automate camera calibration and distance estimation, we leverage priors about human pose and develop a novel direct formulation for pose-based auto-calibration

Computer vision
Detail me more: Improving GAN’s photo-realism of complex scenes

Raghudeep Gadde, Qianli Feng, Aleix M Martinez

ICCV 2021

2021

Generative models can synthesize photo-realistic images of a single object. For example, for human faces, algorithms learn to model the local shape and shading of the face components, i.e., changes in the brows, eyes, nose, mouth, jaw line, etc. This is possible because all faces have two brows, two eyes, a nose and a mouth, approximately in the same location. The modeling of complex scenes is however much

Computer vision
Enabling efficiency-precision trade-offs for label trees in extreme classification

Tavor Z. Baharav, Daniel L. Jiang, Kedarnath Kolluri, Sujay Sanghavi, Inderjit S. Dhillon

CIKM 2021

2021

Extreme multi-label classification (XMC) aims to learn a model that can tag data points with a subset of relevant labels from an extremely large label set. Real world e-commerce applications like personalized recommendations and product advertising can be formulated as XMC problems, where the objective is to predict for a user a small subset of items from a catalog of several million products. For such

Machine learning
Tabular data concept type detection using star-transformers

Yiwei Zhou, Siffi Singh, Christos Christodoulopoulos

CIKM 2021

2021

Tabular data is an invaluable information resource for search, information extraction and question answering about the world. It is critical to understand the semantic concept types for table columns in order to fully exploit the information in tabular data. In this paper, we focus on learning-based approaches for column concept type detection without relying on any metadata or queries to existing knowledge

Conversational AI
End-to-end piece-wise unwarping of document images

Sagnik Das, Kunwar Yashraj Singh, Jon Wu, Erhan Bas, Vijay Mahadevan, Rahul Bhotika, Dimitris Samaras

ICCV 2021

2021

Document unwarping attempts to undo physical deformations of the paper and recover a ’flatbed’ scanned document-image for downstream tasks such as OCR. Current state-of-the-art relies on global unwarping of the document which is not robust to local deformation changes. Moreover, a global unwarping often produces spurious warping artifacts in less warped regions to compensate for severe warps present in

Computer vision
QUEACO: Borrowing treasures from weakly-labeled behavior data for query attribute value extraction

Danni (Danqing) Zhang, Zheng Li, Tianyu Cao, Chen Luo, Tony Wu, Hanqing Lu, Yiwei Song, Bing Yin, Tuo Zhao, Qiang Yang

CIKM 2021

2021

We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms. Such a problem consists of two phases: named entity recognition (NER) and attribute value normalization (AVN). However, existing works only focus on the NER phase but neglect equally important

Conversational AI
Quantifying social biases in NLP: A generalization and empirical comparison of extrinsic fairness metrics

Paula Czarnowska, Yogarshi Vyas, Kashif Shah

Transactions of the Association for Computational Linguistics

2021

Measuring bias is key for better understanding and addressing unfairness in NLP/ML models. This is often done via fairness metrics which quantify the differences in a model’s behaviour across a range of demographic groups. In this work, we shed more light on the differences and similarities between the fairness metrics used in NLP. First, we unify a broad range of existing metrics under three generalized

Conversational AI
Learning attribute-driven disentangled representations for interactive fashion retrieval

Yuxin Hou, Eleonora Vig, Michael Donoser, Loris Bazzani

ICCV 2021

2021

Interactive retrieval for online fashion shopping provides the ability to change image retrieval results according to the user feedback. One common problem in interactive retrieval is that a specific user interaction (e.g., changing the color of a T-shirt) causes other aspects to change inadvertently (e.g., the retrieved item has a sleeve type different than the query). This is a consequence of existing

Computer vision
Personalized compatibility metric learning

Meet Taraviya, Anurag Beniwal, Yen-Liang Lin, Larry Davis

KDD 2021 International Workshop on Industrial Recommendation Systems

2021

Recommending sets of items that include both personalized and compatible items is crucial to personalized styling programs such as Amazon’s Personal Shopper. There is both an extensive literature on learning generic fashion compatibility and also on personalization in fashion. However, recommending pairs of items that the customer would like to wear together is still less studied as it involves learning

Search and information retrieval
Session-aware query auto-completion using extreme multi-label ranking

Nishant Yadav, Rajat Sen, Daniel N. Hill, Arya Mazumdar, Inderjit S. Dhillon

KDD 2021

2021

Query auto-completion (QAC) is a fundamental feature in search engines where the task is to suggest plausible completions of a prefix typed in the search bar. Previous queries in the user session can provide useful context for the user’s intent and can be leveraged to suggest auto-completions that are more relevant while adhering to the user’s prefix. Such session-aware QACs can be generated by recent sequence-to-sequence

Related: Applying PECOS to product retrieval and text autocompletion

Search and information retrieval
End-to-end question generation to assist formative assessment design for conceptual knowledge learning

Jinjin Zhao, Weijie Xu, Candace Thille

AETS 2021

2021

Formative assessment can be used by learning designers to evaluate a learners' comprehension, learning needs, and learning progress during a lesson, unit, or course. The general goal of a formative assessment is to collect detailed information that can be used to improve instruction and learning while learning is happening. Designing effective formative assessments for complex or technical knowledge can

Conversational AI
Graphire: Novel Intent Discovery with Pretraining on Prior Knowledge using Contrastive Learning

Xibin Gao, Radhika Arava, Qian Hu, Thahir Mohamed, Wei Xiao, Zheng Gao, Mohammad AbdelHady

KDD 2021 Workshop on Pretraining: Algorithms, Architectures, and Applications

2021

In this paper, we introduce Graphire, an intent discovery system leveraging pretraining on predefined intents to automatically discover novel intents for intelligent personal assistants (IPA). In order to transfer the prior knowledge of predefined intents, Graphire first transforms predefined class memberships into pairwise relationships, and then learns a Siamese Neural Network (SNN) model classifying

Conversational AI
Data mining for discovering cognitive models of learning

Jinjin Zhao, Candace Thille, Dawn Zimmaro

EAIT 2021

2021

A cognitive model is a descriptive account or computational representation of human thinking about a given concept, skill, or domain. A cognitive model of learning, includes both a way of organizing knowledge within a subject area and an account of how humans develop accurate and complete knowledge of that subject area. Learning designers engage in a variety of practices to unpack knowledge from subject

Conversational AI
Multi-instance pose networks: Rethinking top down pose estimation

Rawal Khirodkar, Visesh Chari, Amit Agrawal, Ambrish Tyagi

ICCV 2021

2021

A key assumption of top-down human pose estimation approaches is their expectation of having a single person/instance present in the input bounding box. This often leads to failures in crowded scenes with occlusions. We propose a novel solution to overcome the limitations of this fundamental assumption. Our Multi-Instance Pose Network (MIPNet) allows for predicting multiple 2D pose instances within a given

Computer vision

...

195

196

197

...

279

Publications

Latest news

Work with us