Search - Amazon Science

Record2Vec: Unsupervised representation learning for structured records

Adelene Sim, Andrew Borthwick

ICDM 2018

2018

Structured records – data with a fixed number of descriptive fields (or attributes) – are often represented by onehot encoded or term frequency-inverse document frequency (TF-IDF) weighted vectors. These vectors are typically sparse and long, and are inefficient in representing structured records. Here, we introduce Record2Vec, a framework for generating dense embeddings of structured records by training

Machine learning

"Deep" learning for missing value imputation in tables with non-numeric data

Felix Biessmann, David Salinas, Dustin Lange, Philipp Schmidt, Sebastian Schelter

CIKM 2018

2018

The success of applications that process data critically depends on the quality of the ingested data. Completeness of a data source is essential in many cases. Yet, most missing value imputation approaches suffer from severe limitations. They are almost exclusively restricted to numerical data, and they either offer only simple imputation methods or are difficult to scale and maintain in production. Here

Information and knowledge management

SpotLight: Detecting anomalies in streaming graphs

Dhivya Eswaran, Christos Faloutsos, Sudipto Guha, Nina Mishra

KDD 2018

2018

How do we spot interesting events from e-mail or transportation logs? How can we detect port scan or denial of service attacks from IP-IP communication data? In general, given a sequence of weighted, directed or bipartite graphs, each summarizing a snapshot of activity in a time window, how can we spot anomalous graphs containing the sudden appearance or disappearance of large dense subgraphs (e.g., near

Information and knowledge management

OpenTag: Open attribute extraction from product profiles

Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, Feifei Li

KDD 2018

2018

Extraction of missing attribute values is to find values describing an attribute of interest from a free text input. Most past related work on extraction of missing attribute values work with a closed world assumption with the possible set of values known beforehand, or use dictionaries of values and hand-crafted features. How can we discover new attribute values that we have never seen before? Can we do

Information and knowledge management

CERES: Distantly supervised relation extraction from the semi-structured web

Colin Lockard, Xin Luna Dong, Arash Einolghozati, Prashant Shiralkar

VLDB 2018

2018

The web contains countless semi-structured websites, which can be a rich source of information for populating knowledge bases. Existing methods for extracting relations from the DOM trees of semi-structured webpages can achieve high precision and recall only when manual annotations for each website are available. Although there have been efforts to learn extractors from automatically generated labels, these

Information and knowledge management

Automating large-scale data quality verification

Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Biessmann

VLDB 2018

2018

Modern companies and institutions rely on data to guide every single business process and decision. Missing or incorrect information seriously compromises any decision process downstream. Therefore, a crucial, but tedious task for everyone involved in data processing is to verify the quality of their data. We present a system for automating the verification of data quality at scale, which meets the requirements

Information and knowledge management

How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures

Tobias Domhan

ACL 2018

2018

With recent advances in network architectures for Neural Machine Translation (NMT) recurrent models have effectively been replaced by either convolutional or self-attentional approaches, such as in the Transformer. While the main innovation of the Transformer architecture is its use of self-attentional layers, there are several other aspects, such as attention with multiple heads and the use of many attention

Machine learning

Learning Hidden Unit Contribution for Adapting Neural Machine Translation Models

David Vilar

NAACL 2018

2018

In this paper we explore the use of Learning Hidden Unit Contribution for the task of neural machine translation. The method was initially proposed in the context of speech recognition for adapting a general system to the specific acoustic characteristics of each speaker. Similar in spirit, in a machine translation framework we want to adapt a general system to a specific domain. We show that the proposed

Machine learning

Leveraging data resources for cross linguistic information retrieval using statistical machine translation

Steve Sloto, Ann Clifton, Greg Hanneman, Patrick Porter, Donna Gates, A. Silja Hil

AMTA 2018

2018

Retail websites may provide customers with a localized user experience by allowing them to use a secondary language of preference. Automatic translation of user search queries is a crucial component of this experience. Several domain-adapted SMT systems for search query translation were trained, including language pairs for which smaller-than desired parallel resources were available, such as Polish-German

Machine learning

Persistent and robust execution of MAPF schedules in warehouses

Wolfgang Hönig, Scott Kiesel, Andrew Tinka, Joseph W. Durham, Nora Ayanian

IEEE Robotics and Automation Letters 2018

2018

Multi-Agent Path Finding (MAPF) is a well-studied problem in Artificial Intelligence that can be solved quickly in practice when using simplified agent assumptions. However, real-world applications, such as warehouse automation, require physical robots to function over long time horizons without collisions. We present an execution framework that can use existing single-shot MAPF planners and ensures robust execution in the presence of unknown or time-varying higher-order dynamic limits, unforeseen robot slow-downs, and unpredictable obstacle appearances.

Robotics

Buy it again: Modeling repeat purchase recommendations

Rahul Bhagat, Srevatsan Muralidharan, Alex Lobzhanidze, Shankar Vishwanath

KDD 2018

2018

Repeat purchasing, i.e., a customer purchasing the same product multiple times, is a common phenomenon in retail. As more customers start purchasing consumable products (e.g., toothpastes, diapers, etc.) online, this phenomenon has also become prevalent in e-commerce. However, in January 2014, when we looked at popular e-commerce websites, we did not find any customer-facing features that recommended products

Search and information retrieval

Scalable Hyperparameter Transfer Learning

Valerio Perrone, Rodolphe Jenatton, Matthias Seeger, Cédric Archambeau

NeurIPS 2018

2018

Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization. Typically, BO relies on conventional Gaussian process (GP) regression, whose algorithmic complexity is cubic in the number of evaluations. As a result, GP-based BO cannot leverage large numbers of past function evaluations, for example, to warm-start related BO runs

Machine learning

Mutual information guided distillation for transfer learning

Sung-soo Ahn, Shell Hu, Zhenwen Dai, Andreas Damianou, Neil Lawrence

NeurIPS 2018

2018

We consider the teacher-student framework for knowledge transfer, where the goal is to improve learning of a “student” neural network, given a “teacher” neural network pretrained on the same or a similar task. The majority of existing approaches for distilling knowledge from a teacher network to a student network rely on matching either activations or handcrafted features from the teacher network. Instead

Machine learning

SeCSeq: Semantic coding for sequence-to-sequence based extreme multi-label classification

Wei-Cheng Chang, Hsiang-Fu Yu, Inderjit S. Dhillon, Yiming Wang

NeurIPS 2018

2018

Extreme multi-label classification (XMC) aims at assigning to an instance the most relevant subset of labels from a colossal label set. There have been some success in formulating the multi-label problem as sequence-to-sequence (Seq2Seq) learning, where the positive class labels of each input instance are used as the corresponding output sequence. Seq2Seq methods, nonetheless, have not yet been scalable

Machine learning

Facilitating Bayesian continual learning by natural gradients and Stein gradients

Yu Chen, Tom Diethe, Neil Lawrence

NeurIPS 2018

2018

Continual learning aims to enable machine learning models to learn a general solution space for past and future tasks in a sequential manner. Conventional models tend to forget the knowledge of previous tasks while learning a new task, a phenomenon known as catastrophic forgetting. When using Bayesian models in continual learning, knowledge from previous tasks can be retained in two ways: (i) posterior

Machine learning

Learn to transfer learn by studying task manifolds

Sebastian Flennerhag, Pablo Garcia Moreno, Neil Lawrence, Andreas Damianou

ICLR 2019

2018

In complex transfer learning scenarios new tasks might not be tightly linked to previous tasks. Approaches that transfer information contained only in the final parameters of a source model will therefore struggle. Instead, transfer learning at a higher level of abstraction is needed. We propose Leap, a framework that achieves this by transferring knowledge across learning processes. We associate each task

Machine learning

Compression of acoustic event detection models with low-rank matrix factorization and quantization

Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

NeurIPS 2018

2018

In this paper, we present a compression approach based on the combination of low-rank matrix factorization and quantization training, to reduce complexity for neural network based acoustic event detection (AED) models. Our experimental results show this combined compression approach is very effective. For a threelayer long short-term memory (LSTM) based AED model, the original model size can be reduced

Machine learning

signSGD: compressed optimisation for non-convex problems

Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, Animashree Anandkumar

ICML 2018

2018

Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. SIGNSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative `1/`2 geometry of gradients

Machine learning

Structured variational auto-encoded optimization.

Xiaoyu Lu, Javier González, Zhenwen Dai, Neil Lawrence

ICML 2018

2018

We tackle the problem of optimizing a blackbox objective function defined over a highly structured input space. This problem is ubiquitous in machine learning. Inferring the structure of a neural network or the Automatic Statistician (AS), where the kernel combination for a Gaussian process is optimized, are two of many possible examples. We use the AS as a case study to describe our approach, that can

Machine learning

Detecting and correcting for label shift with black box predictors

Zachary Lipton, Yu-Xiang Wang, Alex Smola

ICML 2018

2018

Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets), cause symptoms (observations), we focus on label shift, where the label marginal p(y) changes but the conditional p(x|y) does not. We propose Black Box Shift Estimation (BBSE) to estimate the

Machine learning

Search results

Work with us