Search - Amazon Science

Eigen

University of California, Berkeley

The UC Berkeley team consists of William, Phillip, Piyush, and James, and is the only fully undergraduate team in the competition.

Magnus

Carnegie Mellon University

This team, consisting of PhD and Masters students from Carnegie Mellon University, brings together experience in chatbot interaction strategies, question answering, neural modeling and machine learning.

Alquist (2017)

Czech Technical University in Prague

We are the Alquist team from CTU, Prague, Czech Republic.

Edina

University of Edinburgh

We are Edina, from the University of Edinburgh, a world-leading institution in Artificial Intelligence.

What's Up Bot

Heriot-Watt University

Our international team of 6 PhD students and faculty advisors has a wide range of experience from both academic and industrial research.

Wise Macaw

Rensselaer Polytechnic Institute

We are five graduate and undergraduate students of cognitive science, computer science, and applied physics from Rensselaer Polytechnic Institute.

Chatty Chat

Seoul National University

Our team has been developed from a deep learning study group at SNU.

DeisBot

Brandeis University

The DeisBot team is comprised of seven graduate students in the Computational Linguistics department at Brandeis University.

Model compression applied to small-footprint keyword spotting

George Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, Shiv Vitaladevuni

Interspeech 2016

2016

Several consumer speech devices feature voice interfaces that perform on-device keyword spotting to initiate user interactions. Accurate on-device keyword spotting within a tight CPU budget is crucial for such devices. Motivated by this, we investigated two ways to improve deep neural network (DNN) acoustic models for keyword spotting without increasing CPU usage. First, we used low-rank weight matrices

Conversational AI

Max-Pooling Loss Trained Long Short Term Memory Network For Small-Footprint Keyword Spotting

Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Gengshen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Ström, Shiv Vitaladevuni

SLT 2016

2016

We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance. Our experimental

Conversational AI

Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting

Sankaran Panchapagesan, Ming Sun, Aparna Khare, Spyros Matsoukas, Arindam Mandal, Björn Hoffmeister, Shiv Vitaladevuni

Interspeech 2016

2016

We propose improved Deep Neural Network (DNN) training loss functions for more accurate single keyword spotting on resource-constrained embedded devices. The loss function modifications consist of a combination of multi-task training and weighted cross entropy. In the multi-task architecture, the keyword DNN acoustic model is trained with two tasks in parallel - the main task of predicting the keyword-specific

Conversational AI

Optimizing Speech Recognition Evaluation Using Stratified Sampling

Janne Pylkkonen, Thomas Drugman, Max Bisani

Interspeech 2016

2016

Producing large enough quantities of high-quality transcriptions for accurate and reliable evaluation of an automatic speech recognition (ASR) system can be costly. It is therefore desirable to minimize the manual transcription work for producing metrics with an agreed precision. In this paper we demonstrate how to improve ASR evaluation precision using stratified sampling. We show that by altering the

Conversational AI

Search-based Evaluation from Truth Transcripts for Voice Search Applications

Francois Mairesse, Paul Raccuglia, Shiv Vitaladevuni

SIGIR 2016

2016

Voice search applications are typically evaluated by comparing the predicted query to a reference human transcript, regardless of the search results returned by the query. While we find that an exact transcript match is highly indicative of user satisfaction, a transcript which does not match the reference still produces satisfactory search results a significant fraction of the time. This paper therefore

Conversational AI

Kalman Folding 5: Non-linear models and the EKF

Brian Beckman

ACM 2016

2016

We exhibit a foldable Extended Kalman Filter that internally integrates non-linear equations of motion with a nested fold of generic integrators over lazy streams in constant memory. Functional form allows us to switch integrators easily and to diagnose filter divergence accurately, achieving orders of magnitude better speed than the source example from the literature. As with all Kalman folds, we can move

Cloud and systems

Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models

Thomas Drugman, Janne Pylkkonen, Reinhard Kneser

Interspeech 2016

2016

The goal of this paper is to simulate the benefits of jointly applying active learning (AL) and semi-supervised training (SST) in a new speech recognition application. Our data selection approach relies on confidence filtering, and its impact on both the acoustic and language models (AM and LM) is studied. While AL is known to be beneficial to AM training, we show that it also carries out substantial improvements

Conversational AI

LATTICE RNN: Recurrent Neural Networks over Lattices

Faisal Ladhak, Ankur Gandhe, Markus Dreyer, Lambert Mathias, Ariya Rastrow, Björn Hoffmeister

Interspeech 2016

2016

We present a new model called LATTICERNN, which generalizes recurrent neural networks (RNNs) to process weighted lattices as input, instead of sequences. A LATTICERNN can encode the complete structure of a lattice into a dense representation, which makes it suitable to a variety of problems, including rescoring, classifying, parsing, or translating lattices using deep neural networks (DNNs). In this paper

Conversational AI

Anchored speech detection

Roland Maas, Sree Hari Krishnan Parthasarathi, Brian King, Ruitong Huang, Björn Hoffmeister

Interspeech 2016

2016

We propose two new methods of speech detection in the context of voice-controlled far-field appliances. While conventional detection methods are designed to differentiate between speech and nonspeech, we aim at distinguishing desired speech, which we define as speech originating from the person interacting with the device, from background noise and interfering talkers. Our two proposed methods use the first

Conversational AI

Adaptive, personalized diversity for visual discovery

Choon Hui Teo, Houssam Nassif, Daniel N. Hill, Sriram Srinivasan, Mitchell Goodman, Vijai Mohan, S. V. N. Vishwanathan

RecSys 2016

2016

Search queries are appropriate when users have explicit intent, but they perform poorly when the intent is difficult to express or if the user is simply looking to be inspired. Visual browsing systems allow e-commerce platforms to address these scenarios while offering the user an engaging shopping experience. Here we explore extensions in the direction of adaptive personalization and item diversification

Search and information retrieval

Amazon Search: The joy of ranking products

Daria Sorokina, Erick Cantú-Paz

SIGIR 2016

2016

Amazon is one of the world’s largest e-commerce sites and Amazon Search powers the majority of Amazon’s sales. As a consequence, even small improvements in relevance ranking both positively influence the shopping experience of millions of customers and significantly impact revenue. In the past, Amazon’s product search engine consisted of several handtuned ranking functions using a handful of input features

Search and information retrieval

Efficient exploration of text regions in natural scene images using adaptive image sampling

Ismet Zeki Yalniz, Douglas Gray, R. Manmatha

ECCV 2016

2016

An adaptive image sampling framework is proposed for identifying text regions in natural scene images. A small fraction of the pixels actually correspond to text regions. It is desirable to eliminate non-text regions at the early stages of text detection. First, the image is sampled row-by-row at a specific rate and each row is tested for containing text using an 1D adaptation of the Maximally Stable Extremal

Computer vision

Search results

Work with us