Search - Amazon Science

Blending anti-aliasing into vision transformer

Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia

2021

The transformer architectures, based on self-attention mechanism and convolution-free design, recently found superior performance and booming applications in computer vision. However, the discontinuous patch-wise tokenization process implicitly introduces jagged artifacts into attention maps, arising the traditional problem of aliasing for vision transformers. Aliasing effect occurs when discrete patterns

Computer vision

Gender profession data

Xing Niu, Georgiana Dinu, Prashant Mathur, Anna Currey

2021

The training data used in NMT is rarely controlled with respect to specific attributes, such as word casing or gender, which can cause errors in translations. We argue that predicting the target word and attributes simultaneously is an effective way to ensure that translations are more faithful to the training data distribution with respect to these attributes. Experimental results on two tasks, uppercased

Conversational AI

SiamMOT

Bing Shuai, Andrew Berneshawi, Xinyu Li, Davide Modolo, Joe Tighe

2021

In this paper, we focus on improving online multi-object tracking (MOT). In particular, we introduce a region-based Siamese Multi-Object Tracking network, which we name SiamMOT. SiamMOT includes a motion model that estimates the instance’s movement between two frames such that detected instances are associated. To explore how the motion modelling affects its tracking capability, we present two variants

Computer vision

Multilingual TOP dataset for semantic parsing

Menglin Xia, Emilio Monti

2021

Multilingual semantic parsing is a cost-effective method that allows a single model to understand different languages. However, researchers face a great imbalance of availability of training data, with English being resource rich, and other languages having much less data. To tackle the data limitation problem, we propose using machine translation to bootstrap multilingual training data from the more abundant

Conversational AI

NL2SQL: Natural language to SQL queries tool

Tesfagabir Meharizghi

2021

This repo contains the UI tool and ML model development process to convert natural language questions to SQL queries.

Conversational AI

Nearest neighbour few-shot learning for cross-lingual classification

M Saiful Bari (Maruf), Batool Haider, Saab Mansour

2021

Even though large pre-trained multilingual models (e.g. mBERT, XLM-R) have led to significant performance gains on a wide range of cross-lingual NLP tasks, success on many downstream tasks still relies on the availability of sufficient annotated data. Traditional fine-tuning of pre-trained models using only a few target samples can cause over-fitting. This can be quite limiting as most languages in the

Conversational AI

EMAN: Exponential moving average normalization for self-supervised and semi-supervised Learning

Zhaowei Cai, Avinash Ravichandran, Subhranshu Maji, Charless Fowlkes, Zhuowen Tu, Stefano Soatto

2021

We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques. Unlike the standard BN, where the statistics are computed within each batch, EMAN, used in the teacher, updates its statistics by exponential moving average from the BN statistics

Computer vision

Video contrastive learning with global context (VCLR)

Haofei Kuang, Yi Zhu, Zhi Zhang, Xinyu Li, Joe Tighe, Sören Schwertfeger, Cyrill Stachniss, Mu Li

2021

Contrastive learning has revolutionized the self-supervised image representation learning field and recently been adapted to the video domain. One of the greatest advantages of contrastive learning is that it allows us to flexibly define powerful loss objectives as long as we can find a reasonable way to formulate positive and negative samples to contrast. However, existing approaches rely heavily on the

Computer vision

Improving factual consistency of abstractive text summarization

Feng Nan, Ramesh Nallapati, Zhiguo Wang, Cicero Nogueira dos Santos, Henghui Zhu, Dejiao Zhang, Kathleen McKeown, Bing Xiang

2021

We provide the code for the papers: "Entity-level factual consistency of abstractive text summarization", EACL 2021. We provide a set of new metrics to quantify the entity-level factual consistency of generated summaries. We also provide code for the two methods in our paper: JAENS: joint entity and summary generation, and Summary-worthy entity classification with summarization (multi-task learning) "Improving

Conversational AI

Generalized fairness metrics

Paula Czarnowska, Yogarshi Vyas, Kashif Shah

2021

Measuring bias is key for better understanding and addressing unfairness in NLP/ML models. This is often done via fairness metrics which quantify the differences in a model's behaviour across a range of demographic groups. In this work, we shed more light on the differences and similarities between the fairness metrics used in NLP. First, we unify a broad range of existing metrics under three generalized

Unified-EPT

Fangrui Zhu, Yi Zhu, Li Zhang, Chongruo Wu, Yanwei Fu, Mu Li

2021

Semantic segmentation is a challenging problem due to difficulties in modeling context in complex scenes and class confusions along boundaries. Most literature either focuses on context modeling or boundary refinement, which is less generalizable in open-world scenarios. In this work, we advocate a unified framework (UN-EPT) to segment objects by considering both context information and boundary artifacts

Computer vision

Long short-term transformer for online action detection

Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, Stefano Soatto

2021

We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data. It consists of an LSTR encoder that dynamically leverages coarse-scale historical information from an extended temporal window (e.g., 2048 frames spanning of up to 8 minutes), together with an LSTR decoder that focuses

Computer vision

SCCL: Supporting clustering with contrastive learning

Dejiao Zhang, Feng Nan, Xiaokai Wei, Daniel Li, Henghui Zhu, Kathleen McKeown, Ramesh Nallapati, Andrew O. Arnold, Bing Xiang

2021

Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space. However, different categories often overlap with each other in the representation space at the beginning of the learning process, which poses a significant challenge for distance-based clustering in achieving good separation between different categories. To this end

Conversational AI

GAP-text2SQL: Learning contextual representations for semantic parsing with generation-augmented pre-training

Peng Shi, Patrick Ng, Zhiguo Wang, Henghui Zhu, Alexander Hanbo Li, Jun Wang, Cicero Nogueira dos Santos, Bing Xiang

2021

Most recently, there has been significant interest in learning contextual representations for various NLP tasks, by leveraging large scale text corpora to train large neural language models with self-supervised learning objectives, such as Masked Language Model (MLM). However, based on a pilot study, we observe three issues of existing general-purpose language models when they are applied to text-to-SQL

Conversational AI

Nlu-slot-constraints

Piyawat Lertvittayakumjorn, Daniele Bonadiman, Saab Mansour

2021

In goal-oriented dialogue systems, users provide information through slot values to achieve specific goals. Practically, some combinations of slot values can be invalid according to external knowledge. For example, a combination of “cheese pizza” (a menu item) and “oreo cookies” (a topping) from an input utterance “Can I order a cheese pizza with oreo cookies on top?” exemplifies such invalid combinations

Conversational AI

Real world noise benchmarks for natural language understanding

Sailik Sengupta, Jason Krone, Saab Mansour

2021

Intent Classification (IC) and Slot Labeling (SL) models, which form the basis of dialogue systems, often encounter noisy data in real-word environments. In this work, we investigate how robust IC/SL models are to noisy data. We collect and publicly release a test-suite for seven common noise types found in production human-to-bot conversations (abbreviations, casing, misspellings, morphological variants

Conversational AI

Information content of samples

Hrayr Harutyunyan, Alessandro Achille, Giovanni Paolini, Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

2021

We define a notion of information that an individual sample provides to the training of a neural network, and we specialize it to measure both how much a sample informs the final weights and how much it informs the function computed by the weights. Though related, we show that these quantities have a qualitatively different behavior. We give efficient approximations of these quantities using a linearized

Machine learning

Efficiently summarizing text and graph encodings of multi-document clusters

Ramakanth Pasunuru, Mengwen Liu, Mohit Bansal, Sujith Ravi, Markus Dreyer

2021

This is the implementation of the paper Efficiently summarizing text and graph encodings of multi-document clusters.

Conversational AI

MinimaxFair - Convergent algorithms for (relaxed) minimax group fairness

Emily Diana, Wesley Gill, Michael Kearns, Krishnaram Kenthapadi, Aaron Roth

2021

MinimaxFair is a Python package for training ML models for (relaxed) minimax group fairness as discussed in Minimax group fairness: Algorithms and experiments. This repository contains python code for learning models that achieve minimax group fairness for both regression and classification tasks learning models that minimize error subject to relaxed group fairness constraints visualizing tradeoffs between

Machine learning

Symbolic music generation with transformer-GANs

Aashiq Muhamed, Liang Li, Xingjian Shi, Suri Yaddanapudi, Wayne Chi, Dylan Jackson, Rahul Suresh, Zachary Lipton, Alex Smola

2021

Transformers have emerged as the dominant approach in music literature for generating minute-long compositions with compelling musical structure. These models are trained by minimizing the negative log-likelihood (NLL) of the observed sequence autoregressively. Unfortunately, the quality of samples from these models tends to degrade significantly for long sequences, a phenomenon attributed to exposure bias

Conversational AI

Search results

Work with us