Search - Amazon Science

Buy it again: Modeling repeat purchase recommendations

Rahul Bhagat, Srevatsan Muralidharan, Alex Lobzhanidze, Shankar Vishwanath

KDD 2018

2018

Repeat purchasing, i.e., a customer purchasing the same product multiple times, is a common phenomenon in retail. As more customers start purchasing consumable products (e.g., toothpastes, diapers, etc.) online, this phenomenon has also become prevalent in e-commerce. However, in January 2014, when we looked at popular e-commerce websites, we did not find any customer-facing features that recommended products

Search and information retrieval

Scalable Hyperparameter Transfer Learning

Valerio Perrone, Rodolphe Jenatton, Matthias Seeger, Cédric Archambeau

NeurIPS 2018

2018

Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization. Typically, BO relies on conventional Gaussian process (GP) regression, whose algorithmic complexity is cubic in the number of evaluations. As a result, GP-based BO cannot leverage large numbers of past function evaluations, for example, to warm-start related BO runs

Machine learning

Mutual information guided distillation for transfer learning

Sung-soo Ahn, Shell Hu, Zhenwen Dai, Andreas Damianou, Neil Lawrence

NeurIPS 2018

2018

We consider the teacher-student framework for knowledge transfer, where the goal is to improve learning of a “student” neural network, given a “teacher” neural network pretrained on the same or a similar task. The majority of existing approaches for distilling knowledge from a teacher network to a student network rely on matching either activations or handcrafted features from the teacher network. Instead

Machine learning

SeCSeq: Semantic coding for sequence-to-sequence based extreme multi-label classification

Wei-Cheng Chang, Hsiang-Fu Yu, Inderjit S. Dhillon, Yiming Wang

NeurIPS 2018

2018

Extreme multi-label classification (XMC) aims at assigning to an instance the most relevant subset of labels from a colossal label set. There have been some success in formulating the multi-label problem as sequence-to-sequence (Seq2Seq) learning, where the positive class labels of each input instance are used as the corresponding output sequence. Seq2Seq methods, nonetheless, have not yet been scalable

Machine learning

Facilitating Bayesian continual learning by natural gradients and Stein gradients

Yu Chen, Tom Diethe, Neil Lawrence

NeurIPS 2018

2018

Continual learning aims to enable machine learning models to learn a general solution space for past and future tasks in a sequential manner. Conventional models tend to forget the knowledge of previous tasks while learning a new task, a phenomenon known as catastrophic forgetting. When using Bayesian models in continual learning, knowledge from previous tasks can be retained in two ways: (i) posterior

Machine learning

Learn to transfer learn by studying task manifolds

Sebastian Flennerhag, Pablo Garcia Moreno, Neil Lawrence, Andreas Damianou

ICLR 2019

2018

In complex transfer learning scenarios new tasks might not be tightly linked to previous tasks. Approaches that transfer information contained only in the final parameters of a source model will therefore struggle. Instead, transfer learning at a higher level of abstraction is needed. We propose Leap, a framework that achieves this by transferring knowledge across learning processes. We associate each task

Machine learning

Compression of acoustic event detection models with low-rank matrix factorization and quantization

Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

NeurIPS 2018

2018

In this paper, we present a compression approach based on the combination of low-rank matrix factorization and quantization training, to reduce complexity for neural network based acoustic event detection (AED) models. Our experimental results show this combined compression approach is very effective. For a threelayer long short-term memory (LSTM) based AED model, the original model size can be reduced

Machine learning

signSGD: compressed optimisation for non-convex problems

Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, Animashree Anandkumar

ICML 2018

2018

Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. SIGNSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative `1/`2 geometry of gradients

Machine learning

Structured variational auto-encoded optimization.

Xiaoyu Lu, Javier González, Zhenwen Dai, Neil Lawrence

ICML 2018

2018

We tackle the problem of optimizing a blackbox objective function defined over a highly structured input space. This problem is ubiquitous in machine learning. Inferring the structure of a neural network or the Automatic Statistician (AS), where the kernel combination for a Gaussian process is optimized, are two of many possible examples. We use the AS as a case study to describe our approach, that can

Machine learning

Detecting and correcting for label shift with black box predictors

Zachary Lipton, Yu-Xiang Wang, Alex Smola

ICML 2018

2018

Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets), cause symptoms (observations), we focus on label shift, where the label marginal p(y) changes but the conditional p(x|y) does not. We propose Black Box Shift Estimation (BBSE) to estimate the

Machine learning

Optimal message scheduling for aggregation

Leyuan Wang, Mu Li, Edo Liberty, Alex Smola

SysML 2018

2018

We derive algorithms for producing optimal aggregation schedules for automatically aggregating gradients across di!erent compute units, both CPUs and GPUs, with arbitrary topologies. We show that this can be accomplished by solving a linear program on the spanning tree polytope. We give analytic bounds for the value of the optimal solution for arbitrary graphs. We also propose simple schedules that meet

Machine learning

β-BNN: A rate-distortion perspective on Bayesian neural networks

Shell Hu, Andreas Damianou, Pablo Garcia Moreno

NeurIPS 2018

2018

We propose an alternative training framework for Bayesian neural networks (BNNs), which is motivated by viewing the Bayesian model for supervised learning as an autoencoder for data transmission. Then, a natural objective can be invoked from the rate-distortion theory. Specifically, we end up minimizing the mutual information between the weights and the dataset with a constraint that the negative log-likelihood

Machine learning

Learning to segment inputs for NMT favors character-level processing

Julia Kreutzer, Artem Sokolov

IWSLT 2018

2018

Most modern neural machine translation (NMT) systems rely on presegmented inputs. Segmentation granularity importantly determines the input and output sequence lengths, hence the modeling depth, and source and target vocabularies, which in turn determine model size, computational costs of softmax normalization, and handling of out-of-vocabulary words. However, the current practice is to use static, heuristic-based

Conversational AI

Unsupervised quality estimation without reference corpus for subtitle machine translation using word embeddings

Prabhakar Gupta, Shaktisingh Shekhawat, Keshav Kumar

ICSC 2018

2018

We demonstrate the potential for using aligned bilingual word embeddings to create an unsupervised method to evaluate machine translations without a need for a parallel translation corpus or reference corpus. We explain why movie subtitles differ from other text and share our experimental results conducted on them for four target languages (French, German, Portuguese and Spanish) with English-source subtitles

Conversational AI

A machine learning approach to detecting start reading location of eBooks

Sravan Bodapati, Sriraghavendra Ramaswamy, Gururaj Narayanan

ICDM 2018

2018

Machine Learning and NLP (Natural Language Processing) have aided the development of new and improved user experience features in many applications. We address the problem of automatically identifying the “Start Reading Location” (SRL) of eBooks, i.e. the location of the logical beginning or start of main content. This improves eBook reading experience by taking users automatically to the logical start

Conversational AI

MLZero: Towards zero touch machine learning

Tom Diethe, Tom Borchert, Eno Thereska, Borja de Balle Pigem, Cédric Archambeau, Neil Lawrence

NeurIPS 2018

2018

This paper describes a reference architecture for self-maintaining systems that can learn continually, as data arrives. In environments where data evolves, we need architectures that manage Machine Learning (ML) models in production, adapt to shifting data distributions, cope with outliers, retrain when necessary, and adapt to new tasks. This represents continual AutoML or Automatically Adaptive Machine

Cloud and systems

Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation

Matt Post, David Vilar

NAACL 2018

2018

The end-to-end nature of neural machine translation (NMT) removes many ways of manually guiding the translation process that were available in older paradigms. Recent work, however, has introduced a new capability: lexically constrained or guided decoding, a modification to beam search that forces the inclusion of pre-specified words and phrases in the output. However, while theoretically sound, existing

Conversational AI

Cross-lingual approaches to reference resolution in spoken dialogue

Amr Sharaf, Arpit Gupta, Hancheng Ge, Chetan Naik, Rylan Conway, Lambert Mathias

NeurIPS 2018

2018

In the slot-filling paradigm, where a user can refer back to slots in the context during the conversation, the goal of the contextual understanding system is to resolve the referring expressions to the appropriate slots in the context. In this paper, we build on (Naik et al., 2018), which provides a scalable multi-domain framework for resolving references. However, scaling this approach across languages

Demand-weighted completeness prediction for a knowledge base

Andrew Hopkinson, Amit Gurdasani, Dave Palfrey, Arpit Mittal

NAACL 2018

2018

In this paper we introduce the notion of Demand-Weighted Completeness, allowing estimation of the completeness of a knowledge base with respect to how it is used. Defining an entity by its classes, we employ usage data to predict the distribution over relations for that entity. For example, instances of person in a knowledge base may require a birth date, name and nationality to be considered complete.

Conversational AI

Learning word embeddings for low-resource languages by PU learning

Chao Jiang, Cho-Jui Hsieh, Hsiang-Fu Yu, Kai-Wei Chang

NAACL 2018

2018

Word embedding is a key component in many downstream applications in processing natural languages. Existing approaches often assume the existence of a large collection of text for learning effective word embedding. However, such a corpus may not be available for some low-resource languages. In this paper, we study how to effectively learn a word embedding model on a corpus with only a few million tokens

Conversational AI

Search results

Work with us