Search - Amazon Science

Multimodal neural SLAM for interactive instruction following

Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi (QZ) Gao, Govind Thattai, Gaurav Sukhatme

2022

Recent years have witnessed an emerging paradigm shift toward embodied artificial intelligence, in which an agent must learn to solve challenging tasks by interacting with its environment. There are several challenges in solving embodied multimodal tasks, including long-horizon planning, vision-and-language grounding, and efficient exploration. We focus on a critical bottleneck, namely the performance of

Robotics

Detecting adversarial samples using SageMaker Model Monitor and Debugger

Nathalie Rauschmayr, Sergul Aydore, Yigitcan Kaya, Bilal Zafar

2022

This repository contains the files for the blogpost "Detect adversarial inputs using Amazon SageMaker Model Monitor and Amazon SageMaker Debugger". Create a SageMaker notebook instance and clone the repository: git clone https://github.com/amazon-research/detecting-adversarial-samples-using-sagemaker.git In the notebook Detecting_adversarial_samples.ipynb we first train an image classification model (ResNet18

Machine learning

Mintaka: A complex, natural, and multilingual dataset for end-to-end question answering

Priyanka Sen, Alham Fikri Aji, Amir Saffari

2022

We introduce MINTAKA, a complex, natural, and multilingual dataset designed for experimenting with end-to-end question-answering models. Mintaka is composed of 20,000 question-answer pairs collected in English, annotated with Wikidata entities, and translated into Arabic, French, German, Hindi, Italian, Japanese, Portuguese, and Spanish for a total of 180,000 samples. Mintaka includes 8 types of complex

Conversational AI

Amazon Berkeley Objects (ABO) dataset

Matthieu Guillaumin, Thomas Dideriksen, Kenan Deng, Himanshu Arora, Arnab Dhua, Xi Zhang, Tomas Yago-Vicente, Jasmine Collins, Shubham Goel, Jitendra Malik

2022

An open-licensed dataset of Amazon products with metadata, catalog images, and artist-created 3D models with complex geometries and physically-based materials that correspond to real, household objects. The unique properties of ABO allow to benchmark the state of the art on several open problems for real-world 3D object understanding: single-view 3D reconstruction, fine-grained part labelling, material

Computer vision

CaDiCaL simplified satisfiability solver

Armin Biere, Md Solimul Chowdhury, Marijn J.H. Heule, Benjamin Kiesl-Reiter, Mike Whalen

2022

This is a modified version of the original CaDiCaL repository on GitHub. We are releasing this version of CaDiCaL as a fork of the original CaDiCaL repository because this is preferred by the original author, Armin Biere. The version of CaDiCaL presented here can store the most relevant parts of its state (irredundant clauses, redundant clauses, and the reconstruction stack) in files, and it can also read

Security, privacy, and abuse prevention

MDNS facial expression perception dataset

De'Aira Bryant, Tiffany Deng, Nashlie Sephus, Wei Xia, Pietro Perona

2022

This project is credited to De'Aira Bryant's internship project at Amazon AWS. Humans can perceive multiple expressions in a single face. Such expressions may also be perceived at varying intensities. By coupling annotations from multiple observers, it can be observed that some expressions are also ambiguous. The MDNS (multi-dimensional, nuanced, subjective) annotation dataset provides a new set of multi-dimensional

Computer vision

Active sampling for min-max fairness

Jacob Abernethy, Pranjal Awasthi, Matthaus Kleindessner, Jamie Morgenstern, Chris Russell, Jie Zhang

2022

We propose simple active sampling and reweighting strategies for optimizing min-max fairness that can be applied to any classification or regression model learned via loss minimization. The key intuition behind our approach is to use at each timestep a datapoint from the group that is worst off under the current model for updating the model. The ease of implementation and the generality of our robust formulation

Machine learning

Shopping queries dataset: A large-scale ESCI benchmark for improving product search

Chandan Reddy, Nurendra Choudhary, Lluís Marquez, Fran Valero, Nikhil Rao, Hugo Zaragoza, Sambaran Bandyopadhyay, Arnab Biswas

2022

We introduce the “Shopping Queries Data Set”, a large dataset of difficult search queries, released with the aim of fostering research in the area of semantic matching of queries and products. For each query, the dataset provides a list of up to 40 potentially relevant results, together with ESCI relevance judgements (Exact, Substitute, Complement, Irrelevant) indicating the relevance of the product to

Search and information retrieval

Causal forecasting: Generalization bounds for autoregressive models

Leena Chennuru Vankadara, Philipp Faller, Michaela Hardt, Lenon Vogel, Debarghya Ghoshdastidar, Dominik Janzing

2022

Here, we study the problem of causal generalization—generalizing from the observational to interventional distributions—in forecasting. Our goal is to find answers to the question: How does the efficacy of an autoregressive (VAR) model in predicting statistical associations compare with its ability to predict under interventions? To this end, we introduce the framework of causal learning theory for forecasting

Machine learning

Efficient classification of long documents using transformers

Hyunji Hayley Park, Yogarshi Vyas, Kashif Shah

2022

Our datasets cover binary, multi-class, and multilabel classification tasks and represent various ways information is organized in a long text (e.g. information that is critical to making the classification decision is at the beginning or toward the end of the document). Our results show that more complex models often fail to outperform simple baselines and yield inconsistent performance across datasets

Conversational AI

Task-agnostic continual RL: In praise of a simple baseline

Massimo Caccia, Jonas Mueller, Taesup Kim, Laurent Charlin, Rasool Fakoor

2022

We study task-agnostic continual reinforcement learning (TACRL) in which standard RL challenges are compounded with partial observability stemming from ask agnosticism, as well as additional difficulties of continual learning (CL), i.e., learning on a non-stationary sequence of tasks. Here we compare TACRL methods with their soft upper bounds prescribed by previous literature: multi-task learning (MTL)

Machine learning

MASSIVE

Jack G. M. FitzGerald, Chris Hench, Charith Peris, Scott Mackie, Kay Rottmann, Ana Sanchez, Aaron Nash, Liam Urbach, Vishesh Kakarala, Richa Singh, Swetha Ranganath, Laurie Crist, Misha Britan, Wouter Leeuwis, Gokhan Tur, Prem Natarajan

2022

MASSIVE is a parallel dataset of > 1M utterances across 52 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the Spoken Language Understanding Resource Package (SLURP) dataset, composed of general intelligent voice assistant single-shot interactions.

Conversational AI

Alexa Teacher Models

Jack G. M. FitzGerald, Shankar Ananthakrishnan, Konstantine Arkoudas, Davide Bernardi, Abhishek Bhagia, Claudio Delli Bovi, Jin Cao, Rakesh Chada, Amit Chauhan, Luoxin Chen, Anurag Dwarakanath, Satyam Dwivedi, Turan Gojayev, Karthik Gopalakrishnan, Thomas Gueudre, Dilek Hakkani-Tür, Wael Hamza, Jonathan Hueser, Kevin Martin Jose, Haidar Khan, Beiye(Ben) Liu, Jianhua Lu, Alessandro Manzotti, Pradeep Natarajan, Karolina Owczarzak, Goekmen Oez, Enrico Palumbo, Charith Peris, Chandana Satya Prakash, Stephen Rawls, Andy Rosenbaum, Anjali Shenoy, Saleh Soltan, Mukund Harakere, Liz Tan, Fabian Triefenbach, Pan WEI, Haiyang Yu, Shuai Zheng, Gokhan Tur, Prem Natarajan

2022

AlexaTM 20B is a 20B-Parameter sequence-to-sequence transformer model created by the Alexa Teacher Model (AlexaTM) team at Amazon. The model was trained on a mixture of Common Crawl (mC4) and Wikipedia data across 12 languages using denoising and Causal Language Modeling (CLM) tasks. AlexaTM 20B can be used for in-context learning. "In-context learning," also known as "prompting," refers to a method for

Conversational AI

Exploiting invariance in training deep neural networks

Chengxi Ye, Xiong Zhou, Tristan McKinney, Yanfeng Liu, Qinggang Zhou, Fedor Zhdanov

2022

Inspired by two basic mechanisms in animal visual systems, we introduce a feature transform technique that imposes invariance properties in the training of deep neural networks. The resulting algorithm requires less parameter tuning, trains well with an initial learning rate 1.0, and easily generalizes to different tasks. We enforce scale invariance with local statistics in the data to align similar samples

Computer vision

Combinatorial optimization with graph neural networks

Martin J. A. Schuetz, J. Kyle Brubaker, Helmut Katzgraber

2022

Combinatorial optimization problems are pervasive across science and industry. Modern deep learning tools are poised to solve these problems at unprecedented scales, but a unifying framework that incorporates insights from statistical physics is still outstanding. Here we demonstrate how graph neural networks can be used to solve combinatorial optimization problems. Our approach is broadly applicable to

Quantum technologies

GLASS: Global to local attention for scene-text spotting

Roi Ronen, Shahar Tsiper, Oron Anschel, Inbal Lavi, Amir Markovitz, R. Manmatha

2022

In recent years, the dominant paradigm for text spotting is to combine the tasks of text detection and recognition into a single end-to-end framework. Under this paradigm, both tasks are accomplished by operating over a shared global feature map extracted from the input image. Among the main challenges that end-to-end approaches face is the performance degradation when recognizing text across scale variations

Computer vision

Omni-DETR: Omni-supervised object detection with transformers

Pei Wang, Zhaowei Cai, Hao Yang, Gurumurthy Swaminathan, Nuno Vasconcelos, Bernt Schiele, Stefano Soatto

2022

We consider the problem of omni-supervised object detection, which can use unlabeled, fully labeled and weakly labeled annotations, such as image tags, counts, points, etc., for object detection. This is enabled by a unified architecture, Omni-DETR, based on the recent progress on student-teacher framework and end-to-end transformer based object detection. Under this unified architecture, different types

Computer vision

Real-time fraud detection with graph neural network on DGL

Jian Zhang, Haozhu Wang, Mengxin Zhu

2022

It's an end-to-end blueprint architecture for real-time fraud detection using graph database Amazon Neptune, Amazon SageMaker and Deep Graph Library (DGL) to construct a heterogeneous graph from tabular data and train a Graph Neural Network(GNN) model to detect fraudulent transactions in the IEEE-CIS Fraud detection dataset. See the more detail in this AWS blog post.

Machine learning

Intent induction from conversations for task-oriented dialogue

James Gung, Raphael Shu, Jason Krone, Salvatore Romeo, Arshit Gupta, Yassine Benajiba, Saab Mansour, Yi Zhang

2022

This repository contains data, relevant scripts and baseline code for the DSTC11 summer track on Intent Induction from Conversations for Task-Oriented Dialogue. This track aims to evaluate methods for the automatic induction of customer intents in the realistic setting of customer service interactions between human agents and customers. As complete conversations will be provided, participants can make use

Conversational AI

Rekognition stochastic backpropagation

Feng Cheng, Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Li, Wei Xia

2022

We propose a memory efficient method, named Stochastic Backpropagation (SBP), for training deep neural networks on videos. It is based on the finding that gradients from incomplete execution for backpropagation can still effectively train the models with minimal accuracy loss, which attributes to the high redundancy of video. SBP keeps all forward paths but randomly and independently removes the backward

Computer vision

Search results

Work with us