Search - Amazon Science

WikiWiki - A large scale entity typing dataset

Shuyang Li, Mukund Sridhar, Chandana Satya Prakash, Jin Cao, Wael Hamza

2022

Understanding human language often necessitates understanding entities and their place in a taxonomy of knowledge—their types. Previous methods to learn entity types rely on training classifiers on datasets with coarse, noisy, and incomplete labels. We introduce a method to instill fine-grained type knowledge in language models with text-to-text pre-training on type-centric questions leveraging knowledge

Conversational AI

MaCLR

Fanyi Xiao, Joe Tighe, Davide Modolo

2022

We present MaCLR, a novel method to explicitly perform cross-modal self-supervised video representations learning from visual and motion modalities. Compared to previous video representation learning methods that mostly focus on learning motion cues implicitly from RGB inputs, MaCLR enriches standard contrastive learning objectives for RGB video clips with a cross-modal learning objective between a Motion

Computer vision

Ensemble transformer for efficient and accurate ranking tasks: An application to question answering systems

Yoshitomo Matsubara, Luca Soldaini, Eric Lind, Alessandro Moschitti

2022

This is the official CERBERUS model code repository for our long paper in Findings of EMNLP 2022, "Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems".

Conversational AI

Large scale real-world multi-person tracking

Bing Shuai, Alessandro Bergamo, Uta Buechler, Andrew Berneshawi, Alyssa Boden, Joe Tighe

2022

This paper presents a new large scale multi-person tracking dataset – PersonPath22, which is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20 datasets. The lack of large scale training and test data for this task has limited the community’s ability to understand the performance of their tracking systems on a wide range

Computer vision

TubeR: Tubelet transformer for video action detection

Jiaojiao Zhao, Yanyi Zhang, Xinyu Li, Hao Chen, Bing Shuai, Mingze Xu, Chunhui Liu, Kaustav Kundu, Yuanjun Xiong, Davide Modolo, Ivan Marsic, Cees G.M. Snoek, Joe Tighe

2022

We propose TubeR: a simple solution for spatio-temporal video action detection. Different from existing methods that depend on either an off-line actor detector or hand-designed actor-positional hypotheses like proposals or anchors, we propose to directly detect an action tubelet in a video by simultaneously performing action localization and recognition from a single representation. TubeR learns a set

Computer vision

DQ-BART: Efficient sequence-to-sequence model via joint distillation and quantization

Zheng Li, Zijian Wang, Ming Tan, Ramesh Nallapati, Parminder Bhatia, Andrew O. Arnold, Bing Xiang, Dan Roth

2022

Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks. However, such models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency. To alleviate this issue, we propose to jointly distill and quantize the model, where knowledge is transferred from the full-precision

Conversational AI

Amazon Alexa language assisted product search challenge

Xu Zhang, Sanqiang Zhao, Skyler Zheng

2022

The package defines a set of tool to access the annotated data we collected for 2022 Amazon Alexa language-assisted product search challenge.

Computer vision

MixGen: A new multi-modal data augmentation

Xiaoshuai Hao, Yi Zhu, Srikar Appalaraju, Aston Zhang, Wanqian Zhang, Bo Li, Mu Li

2022

Data augmentation is a necessity to enhance data efficiency in deep learning. For vision-language pre-training, data is only augmented either for images or for text in previous works. In this paper, we present MixGen: a joint data augmentation for vision-language representation learning to further improve data efficiency. It generates new image-text pairs with semantic relationships preserved by interpolating

Computer vision

Multimodal neural SLAM for interactive instruction following

Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi (QZ) Gao, Govind Thattai, Gaurav Sukhatme

2022

Recent years have witnessed an emerging paradigm shift toward embodied artificial intelligence, in which an agent must learn to solve challenging tasks by interacting with its environment. There are several challenges in solving embodied multimodal tasks, including long-horizon planning, vision-and-language grounding, and efficient exploration. We focus on a critical bottleneck, namely the performance of

Robotics

Detecting adversarial samples using SageMaker Model Monitor and Debugger

Nathalie Rauschmayr, Sergul Aydore, Yigitcan Kaya, Bilal Zafar

2022

This repository contains the files for the blogpost "Detect adversarial inputs using Amazon SageMaker Model Monitor and Amazon SageMaker Debugger". Create a SageMaker notebook instance and clone the repository: git clone https://github.com/amazon-research/detecting-adversarial-samples-using-sagemaker.git In the notebook Detecting_adversarial_samples.ipynb we first train an image classification model (ResNet18

Machine learning

Mintaka: A complex, natural, and multilingual dataset for end-to-end question answering

Priyanka Sen, Alham Fikri Aji, Amir Saffari

2022

We introduce MINTAKA, a complex, natural, and multilingual dataset designed for experimenting with end-to-end question-answering models. Mintaka is composed of 20,000 question-answer pairs collected in English, annotated with Wikidata entities, and translated into Arabic, French, German, Hindi, Italian, Japanese, Portuguese, and Spanish for a total of 180,000 samples. Mintaka includes 8 types of complex

Conversational AI

Amazon Berkeley Objects (ABO) dataset

Matthieu Guillaumin, Thomas Dideriksen, Kenan Deng, Himanshu Arora, Arnab Dhua, Xi Zhang, Tomas Yago-Vicente, Jasmine Collins, Shubham Goel, Jitendra Malik

2022

An open-licensed dataset of Amazon products with metadata, catalog images, and artist-created 3D models with complex geometries and physically-based materials that correspond to real, household objects. The unique properties of ABO allow to benchmark the state of the art on several open problems for real-world 3D object understanding: single-view 3D reconstruction, fine-grained part labelling, material

Computer vision

CaDiCaL simplified satisfiability solver

Armin Biere, Md Solimul Chowdhury, Marijn J.H. Heule, Benjamin Kiesl-Reiter, Mike Whalen

2022

This is a modified version of the original CaDiCaL repository on GitHub. We are releasing this version of CaDiCaL as a fork of the original CaDiCaL repository because this is preferred by the original author, Armin Biere. The version of CaDiCaL presented here can store the most relevant parts of its state (irredundant clauses, redundant clauses, and the reconstruction stack) in files, and it can also read

Security, privacy, and abuse prevention

MDNS facial expression perception dataset

De'Aira Bryant, Tiffany Deng, Nashlie Sephus, Wei Xia, Pietro Perona

2022

This project is credited to De'Aira Bryant's internship project at Amazon AWS. Humans can perceive multiple expressions in a single face. Such expressions may also be perceived at varying intensities. By coupling annotations from multiple observers, it can be observed that some expressions are also ambiguous. The MDNS (multi-dimensional, nuanced, subjective) annotation dataset provides a new set of multi-dimensional

Computer vision

Active sampling for min-max fairness

Jacob Abernethy, Pranjal Awasthi, Matthaus Kleindessner, Jamie Morgenstern, Chris Russell, Jie Zhang

2022

We propose simple active sampling and reweighting strategies for optimizing min-max fairness that can be applied to any classification or regression model learned via loss minimization. The key intuition behind our approach is to use at each timestep a datapoint from the group that is worst off under the current model for updating the model. The ease of implementation and the generality of our robust formulation

Machine learning

Shopping queries dataset: A large-scale ESCI benchmark for improving product search

Chandan Reddy, Nurendra Choudhary, Lluís Marquez, Fran Valero, Nikhil Rao, Hugo Zaragoza, Sambaran Bandyopadhyay, Arnab Biswas

2022

We introduce the “Shopping Queries Data Set”, a large dataset of difficult search queries, released with the aim of fostering research in the area of semantic matching of queries and products. For each query, the dataset provides a list of up to 40 potentially relevant results, together with ESCI relevance judgements (Exact, Substitute, Complement, Irrelevant) indicating the relevance of the product to

Search and information retrieval

Causal forecasting: Generalization bounds for autoregressive models

Leena Chennuru Vankadara, Philipp Faller, Michaela Hardt, Lenon Vogel, Debarghya Ghoshdastidar, Dominik Janzing

2022

Here, we study the problem of causal generalization—generalizing from the observational to interventional distributions—in forecasting. Our goal is to find answers to the question: How does the efficacy of an autoregressive (VAR) model in predicting statistical associations compare with its ability to predict under interventions? To this end, we introduce the framework of causal learning theory for forecasting

Machine learning

Efficient classification of long documents using transformers

Hyunji Hayley Park, Yogarshi Vyas, Kashif Shah

2022

Our datasets cover binary, multi-class, and multilabel classification tasks and represent various ways information is organized in a long text (e.g. information that is critical to making the classification decision is at the beginning or toward the end of the document). Our results show that more complex models often fail to outperform simple baselines and yield inconsistent performance across datasets

Conversational AI

Task-agnostic continual RL: In praise of a simple baseline

Massimo Caccia, Jonas Mueller, Taesup Kim, Laurent Charlin, Rasool Fakoor

2022

We study task-agnostic continual reinforcement learning (TACRL) in which standard RL challenges are compounded with partial observability stemming from ask agnosticism, as well as additional difficulties of continual learning (CL), i.e., learning on a non-stationary sequence of tasks. Here we compare TACRL methods with their soft upper bounds prescribed by previous literature: multi-task learning (MTL)

Machine learning

MASSIVE

Jack G. M. FitzGerald, Chris Hench, Charith Peris, Scott Mackie, Kay Rottmann, Ana Sanchez, Aaron Nash, Liam Urbach, Vishesh Kakarala, Richa Singh, Swetha Ranganath, Laurie Crist, Misha Britan, Wouter Leeuwis, Gokhan Tur, Prem Natarajan

2022

MASSIVE is a parallel dataset of > 1M utterances across 52 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the Spoken Language Understanding Resource Package (SLURP) dataset, composed of general intelligent voice assistant single-shot interactions.

Conversational AI

Search results

Work with us