Search - Amazon Science

EmBERT: A transformer model for embodied, language-guided visual task completion

Alessandro Suglia, Qiaozi (QZ) Gao, Jesse Thomason, Govind Thattai, Gaurav Sukhatme

2021

We present Embodied BERT (EmBERT), a transformer-based model which can attend to high-dimensional, multi-modal inputs across long temporal horizons for language-conditioned task completion. Additionally, we bridge the gap between successful object-centric navigation models used for non-interactive agents and the language-guided visual task completion benchmark, ALFRED, by introducing object navigation targets

Conversational AI

A statistical extension of byte-pair Encoding

David Vilar, Marcello Federico

2021

Sub-word segmentation is currently a standard tool for training neural machine translation (MT) systems and other NLP tasks. The goal is to split words (both in the source and target languages) into smaller units which then constitute the input and output vocabularies of the MT system. The aim of reducing the size of the input and output vocabularies is to increase the generalization capabilities of the

Conversational AI

Adversarial robustness with non-uniform perturbations

Ecenaz Erdemir, Jeffrey Bickford, Luca Melis, Luca Melis, Sergul Aydore

2021

The key idea of our proposed approach is to enable non-uniform perturbations that can adequately represent these feature dependencies during adversarial training. We propose using characteristics of the empirical data distribution, both on correlations between the features and the importance of the features themselves. Using experimental datasets for malware classification, credit risk prediction, and spam

Machine learning

Attention-based contextual language modeling adaptation

Richard Diehl Martinez, Scott Novotney, Ivan Bulyko, Ariya Rastrow, Andreas Stolcke, Ankur Gandhe

2021

This project provides the source to reproduce the main methods of the paper "Attention-based contextual language model adaptation for speech recognition", submitted to ACL 2021. This codebase also implements additional functionality that was not explicitly described in the paper, such as experimental methods for combining multiple types of non-linguistic context together (e.g. geo-location, and datetime

Conversational AI

Gender-filtered self-training (GFST) for NMT

Prafulla Kumar Choubey, Anna Currey, Prashant Mathur, Georgiana Dinu

2021

We propose gender-filtered self-training (GFST) to improve gender translation accuracy on unambiguously gendered inputs. Our GFST approach uses a source monolingual corpus and an initial model to generate gender-specific pseudo-parallel corpora which are then filtered and added to the training data. We evaluate GFST on translation from English into five languages, finding that it improves gender accuracy

Conversational AI

FeatGraph: Sparse kernels for GNNs based on TVM

Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, Yida Wang

2021

This paper proposes FeatGraph to accelerate GNN workloads by co-optimizing graph traversal and feature dimension computation. FeatGraph provides a flexible programming interface to express diverse GNN models by composing coarse-grained sparse templates with fine-grained user-defined functions (UDFs) on each vertex/edge. FeatGraph incorporates optimizations for graph traversal into the sparse templates and

Cloud and systems

Proteno

Shubhi Tyagi, Antonio Bonafonte, Jaime Lorenzo Trueba, Javier Latorre

2021

Developing Text Normalization (TN) systems for Text-to-Speech (TTS) on new languages is hard. We propose a novel architecture to facilitate it for multiple languages while using data less than 3% of the size of the data used by the state of the art results on English. We treat TN as a sequence classification problem and propose a granular tokenization mechanism that enables the system to learn majority

Conversational AI

LUMINOUS: Indoor scene generation for embodied AI challenges

Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi (QZ) Gao, Govind Thattai, Jesse Thomason, Gaurav Sukhatme

2021

Luminous is a framework for testing the performance of embodied AI (EAI) models in indoor tasks. Generally, we integrate different kind of functionalities into this repository that are related to evaluate EAI performance for indoor tasks. The Indoor Scene Synthesis module provides different methods for synthesize randomized indoor scenes that be visualized in Unity Engine. The Luminous for Alfred offers

Computer vision

TANL: Structured prediction as translation between augmented natural languages

Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang, Stefano Soatto

2021

We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. Instead of tackling the problem by training task-specific discriminative

Conversational AI

CrossCLR: Cross-modal contrastive learning for multi-modal video representations

Mohammadreza Zolfaghari, Yi Zhu, Peter Gehler, Thomas Brox

2021

Contrastive learning allows us to flexibly define powerful losses by contrasting positive pairs from sets of negative samples. Recently, the principle has also been used to learn cross-modal embeddings for video and text, yet without exploiting its full potential. In particular, previous losses do not take the intra-modality similarities into account, which leads to inefficient embeddings, as the same content

Computer vision

Uniform sampling over episode difficulty

Sébastien M. R. Arnold, Guneet Singh Dhillon, Avinash Ravichandran, Stefano Soatto

2021

Episodic training is a core ingredient of few-shot learning to train models on tasks with limited labelled data. Despite its success, episodic training remains largely understudied, prompting us to ask the question: what is the best way to sample episodes? In this paper, we first propose a method to approximate episode sampling distributions based on their difficulty. Building on this method, we perform

Computer vision

Amazon DenseClus

Charles Frenzel, Baichuan Sun, Eden Duthie, Yin Song

2021

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN. Allowing for both categorical and numerical data, DenseClus makes it possible to incorporate all features in clustering.

Machine learning

DSTC10 Track 2 - Knowledge-grounded task-oriented dialogue modeling on spoken conversations

Seokhwan Kim, Yang Liu, Di Jin, Alexandros Papangelis, Behnam Hedayatnia, Karthik Gopalakrishnan, Dilek Hakkani-Tür

2021

A lot of recent work in dialogue modeling has been on written conversations, partly because of available data sets. However, written dialogues are not sufficient to fully capture the nature of spoken conversations as well as the potential effect of speech recognition errors on practical spoken dialogue systems. This challenge track aims to provide a new benchmark on spoken task-oriented conversations. We

Conversational AI

Learning better visual dialog agents with pretrained visual-linguistic representation

Tao Tu, Qing Ping, Govind Thattai, Gokhan Tur, Prem Natarajan

2021

GuessWhat?! is a visual dialog guessing game which incorporates a Questioner agent that generates a sequence of questions, while an Oracle agent answers the respective questions about a target object in an image. Based on this dialog history between the Questioner and the Oracle, a Guesser agent makes a final guess of the target object. While previous work has focused on dialogue policy optimization and

Computer vision

Question answering using web lists

Anoop Katti, Kai Hui, Adrià de Gispert, Hagen Fuerstenau

2021

This repository contains the ListQA datasets described in the paper - Question Answering using Web Lists. Datasets, NQWebList and GQWebList, use a subset of questions from Natural Questions and GooAQ respectively. To build these datasets, each annotator was shown a question and a relevant URL from the web and was asked to annotate the list answer on the URL, if it exists. For annotating a list, the annotators

Conversational AI

Multi-Agent Experience (MAX) Toolkit

David Henry, Sriram Ramakrishnan, Ryan Sherritt, Alok Bhat, Jayalakshmi Surendran, Aneesh Bhat, Akhilesh Srikakulapu, Ankit Sablok, Teja Theegela, Sam Coulter

2021

The MAX Toolkit provides software which aims to accelerate the development of devices which integrate multiple voice agents. The Toolkit provides guidance to both device makers and agent developers towards this goal.

Conversational AI

Contextual answer sentence selection

Ivano Lauriola, Alessandro Moschitti, Stefano Campese

2021

Coala is a python package for Contextual Answer Sentence Selection. Answer Sentence Selection (AS2) is a sub-task of question answering, and it aims at finding the sentence containing the answer for a given input question from a pool of possible candidate sentences (e.g. retrieved from a search engine). In our contextual AS2 task, we provide models that leverage contextual information coming from the documents

Search and information retrieval

Hidden biases in unreliable news detection datasets

Xiang Zhou, Heba Elfardy, Christos Christodoulopoulos, Thomas Butler, Mohit Bansal

2021

Automatic unreliable news detection is a research problem with great potential impact. Recently, several papers have shown promising results on large-scale news datasets with models that only use the article itself without resorting to any fact-checking mechanism or retrieving any supporting evidence. In this work, we take a closer look at these datasets. The code is tested on Python 3.7 and PyTorch 1.6.0

Conversational AI

WOW++

Mihail Eric, Nicole Burnstein (Chartier), Behnam Hedayatnia, Karthik Gopalakrishnan, Pankaj Rajan, Yang Liu, Dilek Hakkani-Tür

2021

Incorporating external knowledge sources effectively in conversations is a longstanding problem in open-domain dialogue research. The existing literature on open-domain knowledge selection is limited and makes certain brittle assumptions on knowledge sources to simplify the overall task (Dinan et al., 2019), such as the existence of a single relevant knowledge sentence per context. In this work, we evaluate

Conversational AI

VISITRON: Visual semantics-aligned interactively trained object-navigator

Ayush Shrivastava, Karthik Gopalakrishnan, Yang Liu, Robinson Piramuthu, Gokhan Tur, Devi Parikh, Dilek Hakkani-Tür

2021

Interactive robots navigating photo-realistic environments need to be trained to effectively leverage and handle the dynamic nature of dialogue in addition to the challenges underlying vision-and-language navigation (VLN). In this paper, we present VISITRON, a multi-modal Transformer-based navigator better suited to the interactive regime inherent to Cooperative Vision-and-Dialog Navigation (CVDN). VISITRON

Computer vision

Search results

Work with us