Publications

Amazon is a great place to practice science and have real business impact, but that's only one part of the story. Our scientists continue to publish, teach, and engage with the worldwide research community, sharing insights across diverse disciplines from machine learning to operations research. Through these contributions, we're advancing scientific knowledge while developing innovations that address complex challenges for customers and society.

4,141 results found

Sort

Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

Alex Sokolov, Tracy Rohlin, Ariya Rastrow

Interspeech 2019

2019

Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech Recognition (ASR) systems, such as the ASR system in Alexa, as they are used to generate pronunciations for out-of-vocabulary words that do not exist in the pronunciation lexicons (mappings like ”e c h o” → ”E k oU”). Most G2P systems are monolingual and based on traditional joint-sequence-based n-gram models. As an alternative, we

Machine learning
Time Masking: Leveraging Temporal Information in Spoken Dialogue Systems

Rylan Conway, Lambert Mathias

SIGDIAL 2019

2019

In a spoken-dialogue system, dialogue state tracker (DST) components track the state of the conversation by updating a distribution of values associated with each of the slots being tracked for the current user turn, using the interactions until then. Much of the previous work has relied on modeling the natural order of the conversation, using distance based offsets as an approximation of time. In this

Conversational AI
Searching for fashion products from images in the wild

Son Tran, R. Manmatha, C. J. Taylor

KDD 2019 Workshop on AI for Fashion

2019

In this age of social media, people often look at what others are wearing. In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar clothes.We propose a system to automatically find the closest visually similar clothes in the online Catalog (street-to-shop searching). The problem is challenging since

Computer vision
A fast & simple RFI mitigation method without compromising signal integrity

Jagan Rajagopalan, Deepak Pai, Amit Gaikwad, Chen Chen

DesignCon 2019

2019

In modern wireless consumer electronic devices, there is an increasing need for smaller, compact, and denser design. This often requires wireless components like transceiver, front-end and antenna to be placed very close to noise sources like memory, power supply, and main processor in the device. Electromagnetic noise from noise sources interferes with wireless receiver components causing radio frequency
GluonTS: Probabilistic Time Series Models in Python

Valentin Flunkert, Alexander Alexandrov, Jasper Schulz, Jan Gasthaus, David Salinas, Danielle Maddix Robinson, Yuyang (Bernie) Wang, Syama Rangapuram, Lorenzo Stella, Michael Bohlke-Schneider, Konstantinos Benidis, Tim Januschowski

ICML 2019 Workshop on Time Series

2019

We introduce Gluon Time Series (GluonTS)1, a library for deep-learning-based time series modeling. GluonTS simplifies the development of and experimentation with time series models for common tasks such as forecasting or anomaly detection. It provides all necessary components and tools that scientists need for quickly building new models, for efficiently running and analyzing experiments and for evaluating

Cloud and systems
Unsupervised 3D Pose Estimation with Geometric Self-Supervision

Dylan Drover, Ching-Hang Chen, Rohith MV, Amit Agrawal, Stefan Stojanov, Ambrish Tyagi

CVPR 2019

2019

We present an unsupervised learning approach to recover 3D human pose from 2D skeletal joints extracted from a single image. Our method does not require any multiview image data, 3D skeletons, correspondences between 2D-3D points, or use previously learned 3D priors during training. A lifting network accepts 2D landmarks as inputs and generates a corresponding 3D skeleton estimate. During training, the

Computer vision
Variational information distillation for knowledge transfer

Sung-soo Ahn, Shell Hu, Andreas Damianou, Neil Lawrence, Zhenwen Dai

CVPR 2019

2019

Transferring knowledge from a teacher neural network pretrained on the same or a similar task to a student neural network can significantly improve the performance of the student neural network. Existing knowledge transfer approaches match the activations or the corresponding handcrafted features of the teacher and the student networks. We propose an information-theoretic framework for knowledge transfer

Computer vision
Spatial acoustic modeling invariant to multiple microphone array geometries

Kenichi Kumatani, Wu Minhua, Shiva Sundaram, Nikko Ström, Björn Hoffmeister

ICASSP 2019

2019

The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the difference between

Conversational AI
One-click formal methods

John Backes, Byron Cook, Andrew Gacek, Neha Rungta, Mike Whalen

ICST 2020, IEEE Software

2019

Formal methods are mathematically based approaches for specifying, building, and reasoning about software. Despite 50 years of research and development, formal methods have had only limited impact in industry. While we have seen success in such domains as microprocessor design and aerospace (e.g., proofs of security properties for helicopter control systems1), we have not seen wide adoption of formal methods

Security, privacy, and abuse prevention
Detecting Sensitive Content in Spoken Language

Rahul Tripathi, Balaji Dhamodharaswamy, Srinivasan Jagannathan, Abhishek Nandi

DSAA 2019, IEEE DSAA 2019

2019

Spoken language can include sensitive topics including profanity, insults, political and offensive speech. In order to engage in contextually appropriate conversations, it is essential for voice services such as Alexa, Google Assistant, Siri, etc. to detect sensitive topics in the conversations and react appropriately. A simple approach to detect sensitive topics is to use regular expression or keyword

Conversational AI
Neural Named Entity Recognition from Subword Units

Abdalghani Abujabal, Judith Gaspers

Interspeech 2019

2019

Named entity recognition (NER) is a vital task in spoken language understanding, which aims to identify mentions of named entities in text e.g., from transcribed speech. Existing neural models for NER rely mostly on dedicated word-level representations, which suffer from two main shortcomings. First, the vocabulary size is large, yielding large memory requirements and training time. Second, these models

Conversational AI
Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles

Prabhakar Gupta, Mayank Sharma

International Journal of Semantic Computing

2019

We demonstrate the potential for using aligned bilingual word embeddings in developing an unsupervised method to evaluate machine translations without a need for parallel translation corpus or reference corpus. We explain different aspects of digital entertainment content subtitles. We share our experimental results for four languages pairs — English to French, German, Portuguese, Spanish — and present

Conversational AI
Dynamic local regret for non-convex online forecasting

Sergul Aydore, Tianhao Zhu, Dean Foster

NeurIPS 2019

2019

We consider online forecasting problems for non-convex machine learning models. Forecasting introduces several challenges such as (i) frequent updates are necessary to deal with concept drift issues since the dynamics of the environment change overtime, and (ii) the state of the art models are non-convex models. We address these challenges with a novel regret framework. Standard regret measures commonly

Operations research and optimization
Joint visual-textual embedding for multimodal style search

Gil Sadeh, Lior Fritz, Gabi Shalev, Eduard Oks

CVPR 2019 Workshop on Language and Vision

2019

We introduce a multimodal visual-textual search refinement method for fashion garments. Existing search engines do not enable intuitive, interactive, refinement of retrieved results based on the properties of a particular product. We propose a method to retrieve similar items, based on a query item image and textual refinement properties. We believe this method can be leveraged to solve many real-life customer

Computer vision
Generating diverse and informative natural language fashion feedback

Gil Sadeh, Lior Fritz, Gabi Shalev, Eduard Oks

CVPR 2019 Workshop on Language and Vision

2019

Recent advances in multi-modal vision and language tasks enable a new set of applications. In this paper, we consider the task of generating natural language fashion feedback on outfit images. We collect a unique dataset, which contains outfit images and corresponding positive and constructive fashion feedback. We treat each feedback type separately, and train deep generative encoder-decoder models with

Computer vision

...

254

255

256

...

277

Publications

Latest news

Work with us