Search - Amazon Science

Mutual information guided distillation for transfer learning

Sung-soo Ahn, Shell Hu, Zhenwen Dai, Andreas Damianou, Neil Lawrence

NeurIPS 2018

2018

We consider the teacher-student framework for knowledge transfer, where the goal is to improve learning of a “student” neural network, given a “teacher” neural network pretrained on the same or a similar task. The majority of existing approaches for distilling knowledge from a teacher network to a student network rely on matching either activations or handcrafted features from the teacher network. Instead

Machine learning

Facilitating Bayesian continual learning by natural gradients and Stein gradients

Yu Chen, Tom Diethe, Neil Lawrence

NeurIPS 2018

2018

Continual learning aims to enable machine learning models to learn a general solution space for past and future tasks in a sequential manner. Conventional models tend to forget the knowledge of previous tasks while learning a new task, a phenomenon known as catastrophic forgetting. When using Bayesian models in continual learning, knowledge from previous tasks can be retained in two ways: (i) posterior

Machine learning

Structured variational auto-encoded optimization.

Xiaoyu Lu, Javier González, Zhenwen Dai, Neil Lawrence

ICML 2018

2018

We tackle the problem of optimizing a blackbox objective function defined over a highly structured input space. This problem is ubiquitous in machine learning. Inferring the structure of a neural network or the Automatic Statistician (AS), where the kernel combination for a Gaussian process is optimized, are two of many possible examples. We use the AS as a case study to describe our approach, that can

Machine learning

Detecting and correcting for label shift with black box predictors

Zachary Lipton, Yu-Xiang Wang, Alex Smola

ICML 2018

2018

Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets), cause symptoms (observations), we focus on label shift, where the label marginal p(y) changes but the conditional p(x|y) does not. We propose Black Box Shift Estimation (BBSE) to estimate the

Machine learning

signSGD: compressed optimisation for non-convex problems

Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, Animashree Anandkumar

ICML 2018

2018

Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. SIGNSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative `1/`2 geometry of gradients

Machine learning

Efficient deep learning inference on edge devices

Ziheng Jiang, Tianqi Chen, Mu Li

SysML 2018

2018

Deploying deep learning (DL) models on edge devices is getting popular nowadays. The huge diversity of edge devices, with both computation and memory constraints, however, make efficient deployment challenging. In this paper, we propose a two-stage pipeline that optimizes DL models on target devices. The first stage optimizes the inference workloads, and the second stage searches optimal kernel implementations

Machine learning

Semi-supervised learning on data streams via temporal label propagation

Tal Wagner, Sudipto Guha, Shiva Kasiviswanathan, Nina Mishra

ICML 2018

2018

We consider the problem of labeling points on a fast-moving data stream when only a small number of labeled examples are available. In our setting, incoming points must be processed efficiently and the stream is too large to store in its entirety. We present a semi-supervised learning algorithm for this task. The algorithm maintains a small synopsis of the stream which can be quickly updated as new points

Machine learning

Learning fashion traits with label uncertainty

Assaf Neuberger, Sharon Alpert, Eli Alshan, Nati Bubis, Eduard Oks

CVPR 2018

2018

We consider the task of predicting subjective fashion traits from images. Specifically, we are interested in understanding which outfit actually better suites the user. Since these traits are highly subjective, they tend to be noisier. One solution is to annotate each example several times, but this makes it hard to collect large amounts of data.

Machine learning

Online sparse linear regression

Dean Foster, Satyen Kale, Howard Karloff

STOC 2014

2018

We consider the online sparse linear regression problem, which is the problem of sequentially making predictions observing only a limited number of features in each round, to minimize regret with respect to the best sparse linear regressor, where prediction accuracy is measured by square loss. We give an inefficient algorithm that obtains regret bounded by O˜( √ T) after T prediction rounds. We complement

Machine learning

Neural Machine Translation For Paraphrase Generation

Alex Sokolov, Denis Filimonov

NeurIPS 2018

2018

Training a spoken language understanding system, as the one in Alexa, typically requires a large human-annotated corpus of data. Manual annotations are expensive and time consuming. In Alexa Skill Kit (ASK) user experience with the skill greatly depends on the amount of data provided by skill developer. In this work, we present an automatic natural language generation system, capable of generating both

Conversational AI

A call for clarity in reporting BLEU Scores

Matt Post

WMT 2018

2018

The field of machine translation faces an under-recognized problem because of inconsistency in the reporting of scores from its dominant metric. Although people refer to “the” BLEU score, BLEU is in fact a parameterized metric whose values can vary wildly with changes to these parameters. These parameters are often not reported or are hard to find, and consequently, BLEU scores between papers cannot be

Conversational AI

Detecting offensive content in open-domain conversations using two stage semi-supervision

Chandra Khatri, Behnam Hedayatnia, Rahul Goel, Anushree Venkatesh, Raefer Gabriel, Arindam Mandal

NeurIPS 2018

2018

As open-ended human-chatbot interaction becomes commonplace, sensitive content detection gains importance. In this work, we propose a two stage semi-supervised approach to bootstrap large-scale data for automatic sensitive language detection from publicly available web resources. We explore various data selection methods including 1) using a blacklist to rank online discussion forums by the level of their

Conversational AI

Contextual topic modeling for conversational agents

Behnam Hedayatnia, Chandra Khatri, Rahul Goel, Anushree Venkatesh, Angeliki Metallinou

SLT 2018

2018

Accurate prediction of conversation topics can be a valuable signal for creating coherent and engaging dialog systems. In this work, we focus on context-aware topic classification methods for identifying topics in free-form human-chatbot dialogs. We extend previous work on neural topic classification and unsupervised topic keyword detection by incorporating conversational context and dialog act features

Conversational AI

Coupled representation learning for domains, intents and slots in spoken language understanding

Jihwan Lee

SLT 2018

2018

Representation learning is an essential problem in a wide range of applications and it is important for performing downstream tasks successfully. In this paper, we propose a new model that learns coupled representations of domains, intents, and slots by taking advantage of their hierarchical dependency in a Spoken Language Understanding system. Our proposed model learns the vector representation of intents

Conversational AI

Scalable language model adaptation for spoken dialogue systems¬†

Ankur Gandhe, Ariya Rastrow, Björn Hoffmeister

SLT 2018

2018

Language models (LM) for interactive speech recognition systems are trained on large amounts of data and the model parameters are optimized on past user data. New application intents and interaction types are released for these systems over time, imposing challenges to adapt the LMs since the existing training data is no longer sufficient to model the future user interactions. It is unclear how to adapt

Conversational AI

Direct optimization of F-measure for retrieval-based personal question answering

Rasool Fakoor, Amanjit Kainth, Siamak Shakeri, Christopher Winestock, Abdel-Rahman Mohamed, Ruhi Sarikaya

SLT 2018

2018

Recent advances in spoken language technologies and the introduction of many customer facing products, have given rise to a wide customer reliance on smart personal assistants for many of their daily tasks. In this paper, we present a system to reduce users’ cognitive load by extending personal assistants with long-term personal memory where users can store and retrieve by voice, arbitrary pieces of information

Conversational AI

Design Challenges in Robust and Multilingual Named Entity Transliteration

Yuval Merhav, Steve Ash

ICCL 2018

2018

We analyze some of the fundamental design challenges that impact the development of a multilingual state-of-the-art named entity transliteration system, including curating bilingual named entity datasets and evaluation of multiple transliteration methods. We empirically evaluate the transliteration task using the traditional weighted finite state transducer (WFST) approach against two neural approaches:

Conversational AI

Parsing Coordination for Spoken Language Understanding System

Sanchit Agarwal, Rahul Goel, Tagyoung Chung, Abhishek Sethi, Arindam Mandal, Spyros Matsoukas

SLT 2018

2018

Typical spoken language understanding systems provide narrow semantic parses using a domain-specific ontology. The parses contain intents and slots that are directly consumed by downstream domain applications. In this work we discuss expanding such systems to handle compound entities and intents by introducing a domain-agnostic shallow parser that handles linguistic coordination. We show that our model

Conversational AI

Cross-lingual approaches to reference resolution in spoken dialogue

Amr Sharaf, Arpit Gupta, Hancheng Ge, Chetan Naik, Rylan Conway, Lambert Mathias

NeurIPS 2018

2018

In the slot-filling paradigm, where a user can refer back to slots in the context during the conversation, the goal of the contextual understanding system is to resolve the referring expressions to the appropriate slots in the context. In this paper, we build on (Naik et al., 2018), which provides a scalable multi-domain framework for resolving references. However, scaling this approach across languages

Contextual Language Model Adaptation for Conversational Agents

Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anushree Venkatesh, Ariya Rastrow

Interspeech 2018

2018

Statistical language models (LM) play a key role in Automatic Speech Recognition (ASR) systems used by conversational agents. These ASR systems should provide a high accuracy under a variety of speaking styles, domains, vocabulary and argots. In this paper, we present a DNN-based method to adapt the LM to each user-agent interaction based on generalized contextual information, by predicting an optimal,

Conversational AI

Search results

Work with us