Search - Amazon Science

R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection

Weiran Wang, Chieh-Chi Kao, Ming Sun, Chao Wang

Interspeech 2018

2018

This paper proposes a Region-based Convolutional Recurrent Neural Network (R-CRNN) for audio event detection (AED). The proposed network is inspired by Faster-RCNN [1], a wellknown region-based convolutional network framework for visual object detection. Different from the original Faster-RCNN, a recurrent layer is added on top of the convolutional network to capture the long-term temporal context from

Conversational AI

Play Duration Based User-entity Affinity Modeling in Spoken Dialog System

Bo Xiao, Nicholas Monath, Shankar Ananthakrishnan

Interspeech 2018

2018

Multimedia streaming services over spoken dialog systems have become ubiquitous. User-entity affinity modeling is critical for the system to understand and disambiguate user intents and personalize user experiences. However, fully voice-based interaction demands quantification of novel behavioral cues to determine user affinities. In this work, we propose using play duration cues to learn a matrix factorization

Conversational AI

Parameter Generation Algorithms for Text-to-speech Synthesis With Recurrent Neural Networks

Viacheslav Klimkov, Alexis Moinet, Adam Nadolski, Thomas Drugman

SLT 2018

2018

Recurrent Neural Networks (RNN) have recently proved to be effective in acoustic modeling for TTS. Various techniques such as the Maximum Likelihood Parameter Generation (MLPG) algorithm have been naturally inherited from the HMM-based speech synthesis framework. This paper investigates in which situations parameter generation and variance restoration approaches help for RNN-based TTS. We explore how their

Conversational AI

Continual learning in practice

Tom Diethe, Tom Borchert, Eno Thereska, Borja de Balle Pigem, Cédric Archambeau, Neil Lawrence

NeurIPS 2018

2018

This paper describes a reference architecture for self-maintaining systems that can learn continually, as data arrives. In environments where data evolves, we need architectures that manage Machine Learning (ML) models in production, adapt to shifting data distributions, cope with outliers, retrain when necessary, and adapt to new tasks. This represents continual AutoML or Automatically Adaptive Machine

Machine learning

A scalable algorithm for higher-order features generation using MinHash

Pooja A, Naveen Nair, Rajeev Rastogi

CIKM 2018

2018

Linear models have been widely used in the industry for their low computation time, small memory footprint and interpretability. However, linear models are not capable of leveraging non-linear feature interactions in predicting the target. This limits their performance. A classical approach to overcome this limitation is to use combinations of the original features, referred to as higher-order features,

Machine learning

Combining Acoustic Embeddings and Decoding Features for End-of-Utterance Detection in Real-Time Far-Field Speech Recognition Systems

Roland Maas, Ariya Rastrow, Chengyuan Ma, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Björn Hoffmeister

ICASSP 2018

2018

We present an end-of-utterance detector for real-time automatic speech recognition in far-field scenarios. The proposed system consists of three components: a long short-term memory (LSTM) neural network trained on acoustic features, an LSTM trained on 1-best recognition hypotheses of the automatic speech recognition (ASR) decoder, and a feedforward deep neural network (DNN) combining embeddings derived

Conversational AI

Privacy amplification by subsampling: Tight analyses via couplings and divergences

Borja de Balle Pigem, Gilles Barthe, Marco Gaboardi

NeurIPS 2018

2018

Differential privacy comes equipped with multiple analytical tools for the design of private data analyses. One important tool is the so-called “privacy amplification by subsampling” principle, which ensures that a differentially private mechanism run on a random subsample of a population provides higher privacy guarantees than when run on the entire population. Several instances of this principle have

Security, privacy, and abuse prevention

The effectiveness of a two-layer neural network for recommendations

Oleg Rybakov, Vijai Mohan, Avishkar Misra, Scott LeGrand, Rejith Joseph, Kiuk Chung, Siddharth Singh, Qian You, Eric Nalisnick, Runfei Luo

ICLR 2018

2018

We present a personalized recommender system using neural network for recommending products, such as eBooks, audio-books, Mobile Apps, Video and Music. It produces recommendations based on customer’s implicit feedback history such as purchases, listens or watches. Our key contribution is to formulate recommendation problem as a model that encodes historical behavior to predict the future behavior using

Search and information retrieval

Multilayer Adaptation Based Complex Echo Cancellation and Voice Enhancement

Jun Yang

ICASSP 2018

2018

The paper proposes an efficient signal processing system mainly consisting of an adaptation-based nonlinear echo cancellation (NLEC) layer and a joint perceptual subband residual echo suppression (SBRES) layer and noise reduction (SBNR) layer. The theoretical analyses, subjective and objective test results show that the proposed signal processing system can offer a significant improvement for automatic

Conversational AI

Multi-Task Learning For Parsing The Alexa Meaning Representation Language

Vittorio Perera, Tagyoung Chung, Thomas Kollar, Emma Strubell

AAAI 2018

2018

The Alexa Meaning Representation Language (AMRL) is a compositional graph-based semantic representation that includes fine-grained types, properties, actions, and roles and can represent a wide variety of spoken language. AMRL increases the ability of virtual assistants to represent more complex requests, including logical and conditional statements as well as ones with nested clauses. Due to this representational

Conversational AI

Can 3D pose be learned from 2D projections alone?

Dylan Drover, Rohith MV, Ching-Hang Chen, Amit Agrawal, Ambrish Tyagi, Cong Phuoc Huynh

ECCV 2018

2018

3D pose estimation from a single image is a challenging task in computer vision. We present a weakly supervised approach to estimate 3D pose points, given only 2D pose landmarks. Our method does not require correspondences between 2D and 3D points to build explicit 3D priors. We utilize an adversarial framework to impose a prior on the 3D structure, learned solely from their random 2D projections. Given

Computer vision

CRAFT: Complementary recommendation by adversarial feature transform

Cong Phuoc Huynh, Arridhana Ciptadi, Ambrish Tyagi, Amit Agrawal

ECCV 2018

2018

We propose a framework that harnesses visual cues in an unsupervised manner to learn the co-occurrence distribution of items in real-world images for complementary recommendation. Our model learns a non-linear transformation between the two manifolds of source and target item categories (e.g., tops and bottoms in outfits). Given a large dataset of images containing instances of co-occurring items, we train

Computer vision

Learning fashion by simulated human supervision

Eli Alshan, Sharon Alpert, Assaf Neuberger, Nathaniel Bubis, Eduard Oks

CVPR 2018

2018

We consider the task of predicting subjective fashion traits from images using neural networks. Specifically, we are interested in training a network for ranking outfits according to how well they fit the user. In order to capture the variability induced by human subjective considerations, each training example is annotated by a panel of fashion experts. Similarly to previous works on subjective data, the

Computer vision

Question type guided attention in visual question answering

Yang Shi, Tommaso Furlanello, Sheng Zha, Animashree Anandkumar

ECCV 2018

2018

Visual Question Answering (VQA) requires integration of feature maps with drastically different structures. Image descriptors have structures at multiple spatial scales, while lexical inputs inherently follow a temporal sequence and naturally cluster into semantically different question types. A lot of previous works use complex models to extract feature representations but neglect to use high-level information

Computer vision

Context encoding for semantic segmentation

Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal

CVPR 2018

2018

Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic

Computer vision

Demand-weighted completeness prediction for a knowledge base

Andrew Hopkinson, Amit Gurdasani, Dave Palfrey, Arpit Mittal

NAACL 2018

2018

In this paper we introduce the notion of Demand-Weighted Completeness, allowing estimation of the completeness of a knowledge base with respect to how it is used. Defining an entity by its classes, we employ usage data to predict the distribution over relations for that entity. For example, instances of person in a knowledge base may require a birth date, name and nationality to be considered complete.

Conversational AI

Learning word embeddings for low-resource languages by PU learning

Chao Jiang, Cho-Jui Hsieh, Hsiang-Fu Yu, Kai-Wei Chang

NAACL 2018

2018

Word embedding is a key component in many downstream applications in processing natural languages. Existing approaches often assume the existence of a large collection of text for learning effective word embedding. However, such a corpus may not be available for some low-resource languages. In this paper, we study how to effectively learn a word embedding model on a corpus with only a few million tokens

Conversational AI

Unsupervised induction of linguistic categories with records of reading, speaking, and writing

Maria Barrett, Lea Frermann, Ana Valeria Gonzalez-Garduño, Anders Søgaard

NAACL 2018

2018

When learning POS taggers and syntactic chunkers for low-resource languages, different resources may be available, and often all we have is a small tag dictionary, motivating type-constrained unsupervised induction. Even small dictionaries can improve the performance of unsupervised induction algorithms. This paper shows that performance can be further improved by including data that is readily available

Conversational AI

Automatic stance detection using end-to-end memory networks

Mitra Mohtarami, Ramy Baly, James Glass, Preslav Nakov, Lluís Marquez, Alessandro Moschitti

NAACL 2018

2018

We present an effective end-to-end memory network model that jointly (i) predicts whether a given document can be considered as relevant evidence for a given claim, and (ii) extracts snippets of evidence that can be used to reason about the factuality of the target claim. Our model combines the advantages of convolutional and recurrent neural networks as part of a memory network. We further introduce a

Conversational AI

The Alexa Meaning Representation Language

Thomas Kollar, Danielle Berry, Lauren Stuart, Karolina Owczarzak, Tagyoung Chung, Lambert Mathias, Michael Kayser, Bradford Snow, Spyros Matsoukas

NAACL 2018

2018

This paper introduces a meaning representation for spoken language understanding. The Alexa meaning representation language (AMRL), unlike previous approaches, which factor spoken utterances into domains, provides a common representation for how people communicate in spoken language. AMRL is a rooted graph, links to a large-scale ontology, supports cross-domain queries, finegrained types, complex utterances

Conversational AI

Search results

Work with us