Search - Amazon Science

SpotLight: Detecting anomalies in streaming graphs

Dhivya Eswaran, Christos Faloutsos, Sudipto Guha, Nina Mishra

KDD 2018

2018

How do we spot interesting events from e-mail or transportation logs? How can we detect port scan or denial of service attacks from IP-IP communication data? In general, given a sequence of weighted, directed or bipartite graphs, each summarizing a snapshot of activity in a time window, how can we spot anomalous graphs containing the sudden appearance or disappearance of large dense subgraphs (e.g., near

Information and knowledge management

Leveraging data resources for cross linguistic information retrieval using statistical machine translation

Steve Sloto, Ann Clifton, Greg Hanneman, Patrick Porter, Donna Gates, A. Silja Hil

AMTA 2018

2018

Retail websites may provide customers with a localized user experience by allowing them to use a secondary language of preference. Automatic translation of user search queries is a crucial component of this experience. Several domain-adapted SMT systems for search query translation were trained, including language pairs for which smaller-than desired parallel resources were available, such as Polish-German

Machine learning

The SOCKEYE neural machine translation toolkit at AMTA 2018

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post

AMTA 2018

2018

We describe SOCKEYE, 1 an open-source sequence-to-sequence toolkit for Neural Machine Translation (NMT). SOCKEYE is a production-ready framework for training and applying models as well as an experimental platform for researchers. Written in Python and built on MXNET, the toolkit offers scalable training and inference for the three most prominent encoder-decoder architectures: attentional recurrent neural

Machine learning

How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures

Tobias Domhan

ACL 2018

2018

With recent advances in network architectures for Neural Machine Translation (NMT) recurrent models have effectively been replaced by either convolutional or self-attentional approaches, such as in the Transformer. While the main innovation of the Transformer architecture is its use of self-attentional layers, there are several other aspects, such as attention with multiple heads and the use of many attention

Machine learning

Mutual information guided distillation for transfer learning

Sung-soo Ahn, Shell Hu, Zhenwen Dai, Andreas Damianou, Neil Lawrence

NeurIPS 2018

2018

We consider the teacher-student framework for knowledge transfer, where the goal is to improve learning of a “student” neural network, given a “teacher” neural network pretrained on the same or a similar task. The majority of existing approaches for distilling knowledge from a teacher network to a student network rely on matching either activations or handcrafted features from the teacher network. Instead

Machine learning

Facilitating Bayesian continual learning by natural gradients and Stein gradients

Yu Chen, Tom Diethe, Neil Lawrence

NeurIPS 2018

2018

Continual learning aims to enable machine learning models to learn a general solution space for past and future tasks in a sequential manner. Conventional models tend to forget the knowledge of previous tasks while learning a new task, a phenomenon known as catastrophic forgetting. When using Bayesian models in continual learning, knowledge from previous tasks can be retained in two ways: (i) posterior

Machine learning

Structured variational auto-encoded optimization.

Xiaoyu Lu, Javier González, Zhenwen Dai, Neil Lawrence

ICML 2018

2018

We tackle the problem of optimizing a blackbox objective function defined over a highly structured input space. This problem is ubiquitous in machine learning. Inferring the structure of a neural network or the Automatic Statistician (AS), where the kernel combination for a Gaussian process is optimized, are two of many possible examples. We use the AS as a case study to describe our approach, that can

Machine learning

Detecting and correcting for label shift with black box predictors

Zachary Lipton, Yu-Xiang Wang, Alex Smola

ICML 2018

2018

Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets), cause symptoms (observations), we focus on label shift, where the label marginal p(y) changes but the conditional p(x|y) does not. We propose Black Box Shift Estimation (BBSE) to estimate the

Machine learning

signSGD: compressed optimisation for non-convex problems

Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, Animashree Anandkumar

ICML 2018

2018

Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. SIGNSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative `1/`2 geometry of gradients

Machine learning

Efficient deep learning inference on edge devices

Ziheng Jiang, Tianqi Chen, Mu Li

SysML 2018

2018

Deploying deep learning (DL) models on edge devices is getting popular nowadays. The huge diversity of edge devices, with both computation and memory constraints, however, make efficient deployment challenging. In this paper, we propose a two-stage pipeline that optimizes DL models on target devices. The first stage optimizes the inference workloads, and the second stage searches optimal kernel implementations

Machine learning

Semi-supervised learning on data streams via temporal label propagation

Tal Wagner, Sudipto Guha, Shiva Kasiviswanathan, Nina Mishra

ICML 2018

2018

We consider the problem of labeling points on a fast-moving data stream when only a small number of labeled examples are available. In our setting, incoming points must be processed efficiently and the stream is too large to store in its entirety. We present a semi-supervised learning algorithm for this task. The algorithm maintains a small synopsis of the stream which can be quickly updated as new points

Machine learning

Learning fashion traits with label uncertainty

Assaf Neuberger, Sharon Alpert, Eli Alshan, Nati Bubis, Eduard Oks

CVPR 2018

2018

We consider the task of predicting subjective fashion traits from images. Specifically, we are interested in understanding which outfit actually better suites the user. Since these traits are highly subjective, they tend to be noisier. One solution is to annotate each example several times, but this makes it hard to collect large amounts of data.

Machine learning

Online sparse linear regression

Dean Foster, Satyen Kale, Howard Karloff

STOC 2014

2018

We consider the online sparse linear regression problem, which is the problem of sequentially making predictions observing only a limited number of features in each round, to minimize regret with respect to the best sparse linear regressor, where prediction accuracy is measured by square loss. We give an inefficient algorithm that obtains regret bounded by O˜( √ T) after T prediction rounds. We complement

Machine learning

Neural Machine Translation For Paraphrase Generation

Alex Sokolov, Denis Filimonov

NeurIPS 2018

2018

Training a spoken language understanding system, as the one in Alexa, typically requires a large human-annotated corpus of data. Manual annotations are expensive and time consuming. In Alexa Skill Kit (ASK) user experience with the skill greatly depends on the amount of data provided by skill developer. In this work, we present an automatic natural language generation system, capable of generating both

Conversational AI

A call for clarity in reporting BLEU Scores

Matt Post

WMT 2018

2018

The field of machine translation faces an under-recognized problem because of inconsistency in the reporting of scores from its dominant metric. Although people refer to “the” BLEU score, BLEU is in fact a parameterized metric whose values can vary wildly with changes to these parameters. These parameters are often not reported or are hard to find, and consequently, BLEU scores between papers cannot be

Conversational AI

Detecting offensive content in open-domain conversations using two stage semi-supervision

Chandra Khatri, Behnam Hedayatnia, Rahul Goel, Anushree Venkatesh, Raefer Gabriel, Arindam Mandal

NeurIPS 2018

2018

As open-ended human-chatbot interaction becomes commonplace, sensitive content detection gains importance. In this work, we propose a two stage semi-supervised approach to bootstrap large-scale data for automatic sensitive language detection from publicly available web resources. We explore various data selection methods including 1) using a blacklist to rank online discussion forums by the level of their

Conversational AI

Contextual topic modeling for conversational agents

Behnam Hedayatnia, Chandra Khatri, Rahul Goel, Anushree Venkatesh, Angeliki Metallinou

SLT 2018

2018

Accurate prediction of conversation topics can be a valuable signal for creating coherent and engaging dialog systems. In this work, we focus on context-aware topic classification methods for identifying topics in free-form human-chatbot dialogs. We extend previous work on neural topic classification and unsupervised topic keyword detection by incorporating conversational context and dialog act features

Conversational AI

Coupled representation learning for domains, intents and slots in spoken language understanding

Jihwan Lee

SLT 2018

2018

Representation learning is an essential problem in a wide range of applications and it is important for performing downstream tasks successfully. In this paper, we propose a new model that learns coupled representations of domains, intents, and slots by taking advantage of their hierarchical dependency in a Spoken Language Understanding system. Our proposed model learns the vector representation of intents

Conversational AI

Scalable language model adaptation for spoken dialogue systems¬†

Ankur Gandhe, Ariya Rastrow, Björn Hoffmeister

SLT 2018

2018

Language models (LM) for interactive speech recognition systems are trained on large amounts of data and the model parameters are optimized on past user data. New application intents and interaction types are released for these systems over time, imposing challenges to adapt the LMs since the existing training data is no longer sufficient to model the future user interactions. It is unclear how to adapt

Conversational AI

Direct optimization of F-measure for retrieval-based personal question answering

Rasool Fakoor, Amanjit Kainth, Siamak Shakeri, Christopher Winestock, Abdel-Rahman Mohamed, Ruhi Sarikaya

SLT 2018

2018

Recent advances in spoken language technologies and the introduction of many customer facing products, have given rise to a wide customer reliance on smart personal assistants for many of their daily tasks. In this paper, we present a system to reduce users’ cognitive load by extending personal assistants with long-term personal memory where users can store and retrieve by voice, arbitrary pieces of information

Conversational AI

Search results

Work with us