Search - Amazon Science

A simple transfer-learning extension of Hyperband

Lazar Valkov, Rodolphe Jenatton, Fela Winkelmolen, Cédric Archambeau

NeurIPS 2018

2018

Hyperband has become a popular method to tune the hyperparameters (HPs) of expensive machine learning models, whose performance depends on the amount of resources allocated for training. While Hyperband is conceptually simple, combining random search to a successive halving technique to reallocate resources to the most promising HPs, it often outperforms standard Bayesian optimization when solutions with

Machine learning

Model checking boot code from AWS data centers

Byron Cook, Kareem Khazem, Daniel Kroening, Serdar Tasiran, Michael Tautschnig, Mark R. Tuttle

CAV 2018

2018

This paper describes our experience with symbolic model checking in an industrial setting. We have proved that the initial boot code running in data centers at Amazon Web Services is memory safe, an essential step in establishing the security of any data center. Standard static analysis tools cannot be easily used on boot code without modification owing to issues not commonly found in higher-level code,

Security, privacy, and abuse prevention

ProxQuant: Quantized neural networks via proximal operators

Yu Bai, Yu-Xiang Wang, Edo Liberty

ICLR 2018

2018

Deep neural networks are often desired in environments with limited memory and computational power (such as mobile devices), where it is beneficial to perform model quantization – training networks with low-precision weights. A key mechanism commonly used in training quantized nets is the straight-through gradient method, which enables back-propagation through the quantization mapping. Despite its success

Machine learning

Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising

Borja Balle, Yu-Xiang Wang

ICML 2018

2018

The Gaussian mechanism is an essential building block used in multitude of differentially private data analysis algorithms. In this paper we revisit the Gaussian mechanism and show that the original analysis has several important limitations. Our analysis reveals that the variance formula for the original mechanism is far from tight in the high privacy regime (ε → 0) and it cannot be extended to the low

Security, privacy, and abuse prevention

Learning large scale ordinal ranking model via divide-and-conquer technique

Lu Tang, Sougata Chaudhuri, Abraham Bagherjeiran, Ling Zhou

WWW 2018

2018

Structured prediction, where outcomes have a precedence order, lies at the heart of machine learning for information retrieval, movie recommendation, product review prediction, and digital advertising. Ordinal ranking, in particular, assumes that the structured response has a linear ranked order. Due to the extensive applicability of these models, substantial research has been devoted to understanding them

Machine learning

Contextual multi-armed bandits for causal marketing

Neela Sawant, Chitti Babu Namballa, Narayanan Sadagopan, Houssam Nassif

ICML 2018

2018

This work explores the idea of a causal contextual multi-armed bandit approach to automated marketing, where we estimate and optimize the causal (incremental) effects. Focusing on causal effect leads to better return on investment (ROI) by targeting only the persuadable customers who wouldn’t have taken the action organically. Our approach draws on strengths of causal inference, uplift modeling, and multi-armed

Machine learning

Deep learning for forecasting

Tim Januschowski, Jan Gasthaus, Syama Rangapuram, Laurent Callot

arXiv

2018

While the term "deep learning" (DL) has only been coined in the last few years, the techniques it refers to have been in development since the 1950s, namely artificial neural networks (NN or ANN for short). DL has scored major successes in image recognition, natural language processing (e.g. machine translation and speech recognition), and autonomous agents such as Google Deep Mind's AlphaGo. It is often

Search and information retrieval

Time-Delayed Bottleneck Highway Networks Using A DFT Feature for Keyword Spotting

Jinxi Guo, Kenichi Kumatani, Ming Sun, Minhua Wu, Anirudh Raju, Nikko Ström, Arindam Mandal

ICASSP 2018

2018

This paper presents a novel deep neural network (DNN) architecture with highway blocks (HWs) using a complex discrete Fourier transform (DFT) feature for keyword spotting. In our previous work, we showed that the feed-forward DNN with a time-delayed bottleneck layer (TDB-DNN) directly trained from the audio input outperformed the model with the log-mel filter bank energy feature (LFBE), given a large amount

Conversational AI

Smoothing model predictions using adversarial training procedures for speech based emotion recognition

Rahul Gupta

ICASSP 2018

2018

Training discriminative classifiers involves learning a conditional distribution p(yi|xi), given a set of feature vectors xi and the corresponding labels yi, i = 1..N. For a classifier to be generalizable and not overfit to training data, the resulting conditional distribution p(yi|xi) is desired to be smoothly varying over the inputs xi. Adversarial training procedures enforce this smoothness using manifold

Conversational AI

Smooth calibration, leaky forecasts, finite recall, and Nash dynamics

Dean Foster, Sergiu Hart

Games & Economic Behavior

2018

We propose to smooth out the calibration score, which measures how good a forecaster is, by combining nearby forecasts. While regular calibration can be guaranteed only by randomized forecasting procedures, we show that smooth calibration can be guaranteed by deterministic procedures. As a consequence, it does not matter if the forecasts are leaked, i.e., made known in advance: smooth calibration can nevertheless

Economics

Record2Vec: Unsupervised representation learning for structured records

Adelene Sim, Andrew Borthwick

ICDM 2018

2018

Structured records – data with a fixed number of descriptive fields (or attributes) – are often represented by onehot encoded or term frequency-inverse document frequency (TF-IDF) weighted vectors. These vectors are typically sparse and long, and are inefficient in representing structured records. Here, we introduce Record2Vec, a framework for generating dense embeddings of structured records by training

Machine learning

CERES: Distantly supervised relation extraction from the semi-structured web

Colin Lockard, Xin Luna Dong, Arash Einolghozati, Prashant Shiralkar

VLDB 2018

2018

The web contains countless semi-structured websites, which can be a rich source of information for populating knowledge bases. Existing methods for extracting relations from the DOM trees of semi-structured webpages can achieve high precision and recall only when manual annotations for each website are available. Although there have been efforts to learn extractors from automatically generated labels, these

Information and knowledge management

OpenTag: Open attribute extraction from product profiles

Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, Feifei Li

KDD 2018

2018

Extraction of missing attribute values is to find values describing an attribute of interest from a free text input. Most past related work on extraction of missing attribute values work with a closed world assumption with the possible set of values known beforehand, or use dictionaries of values and hand-crafted features. How can we discover new attribute values that we have never seen before? Can we do

Information and knowledge management

SpotLight: Detecting anomalies in streaming graphs

Dhivya Eswaran, Christos Faloutsos, Sudipto Guha, Nina Mishra

KDD 2018

2018

How do we spot interesting events from e-mail or transportation logs? How can we detect port scan or denial of service attacks from IP-IP communication data? In general, given a sequence of weighted, directed or bipartite graphs, each summarizing a snapshot of activity in a time window, how can we spot anomalous graphs containing the sudden appearance or disappearance of large dense subgraphs (e.g., near

Information and knowledge management

Leveraging data resources for cross linguistic information retrieval using statistical machine translation

Steve Sloto, Ann Clifton, Greg Hanneman, Patrick Porter, Donna Gates, A. Silja Hil

AMTA 2018

2018

Retail websites may provide customers with a localized user experience by allowing them to use a secondary language of preference. Automatic translation of user search queries is a crucial component of this experience. Several domain-adapted SMT systems for search query translation were trained, including language pairs for which smaller-than desired parallel resources were available, such as Polish-German

Machine learning

The SOCKEYE neural machine translation toolkit at AMTA 2018

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post

AMTA 2018

2018

We describe SOCKEYE, 1 an open-source sequence-to-sequence toolkit for Neural Machine Translation (NMT). SOCKEYE is a production-ready framework for training and applying models as well as an experimental platform for researchers. Written in Python and built on MXNET, the toolkit offers scalable training and inference for the three most prominent encoder-decoder architectures: attentional recurrent neural

Machine learning

How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures

Tobias Domhan

ACL 2018

2018

With recent advances in network architectures for Neural Machine Translation (NMT) recurrent models have effectively been replaced by either convolutional or self-attentional approaches, such as in the Transformer. While the main innovation of the Transformer architecture is its use of self-attentional layers, there are several other aspects, such as attention with multiple heads and the use of many attention

Machine learning

Mutual information guided distillation for transfer learning

Sung-soo Ahn, Shell Hu, Zhenwen Dai, Andreas Damianou, Neil Lawrence

NeurIPS 2018

2018

We consider the teacher-student framework for knowledge transfer, where the goal is to improve learning of a “student” neural network, given a “teacher” neural network pretrained on the same or a similar task. The majority of existing approaches for distilling knowledge from a teacher network to a student network rely on matching either activations or handcrafted features from the teacher network. Instead

Machine learning

Facilitating Bayesian continual learning by natural gradients and Stein gradients

Yu Chen, Tom Diethe, Neil Lawrence

NeurIPS 2018

2018

Continual learning aims to enable machine learning models to learn a general solution space for past and future tasks in a sequential manner. Conventional models tend to forget the knowledge of previous tasks while learning a new task, a phenomenon known as catastrophic forgetting. When using Bayesian models in continual learning, knowledge from previous tasks can be retained in two ways: (i) posterior

Machine learning

Structured variational auto-encoded optimization.

Xiaoyu Lu, Javier González, Zhenwen Dai, Neil Lawrence

ICML 2018

2018

We tackle the problem of optimizing a blackbox objective function defined over a highly structured input space. This problem is ubiquitous in machine learning. Inferring the structure of a neural network or the Automatic Statistician (AS), where the kernel combination for a Gaussian process is optimized, are two of many possible examples. We use the AS as a case study to describe our approach, that can

Machine learning

Search results

Work with us