Code and datasets

Sashank Santhanam, Behnam Hedayatnia, Spandana Gella, Aishwarya Padmakumar, Seokhwan Kim, Yang Liu, Dilek Hakkani-Tür

2021

We introduce a human annotated dataset for factual consistency. Annotations are done on generated responses from different configurations of neural response generators, knowledge snippets, and decoding strategies. In addition, to facilitate the development of a factual consistency detector, we automatically create a new corpus called Conv FEVER that is adapted from the Wizard of Wkipedia dataset and includes

Conversational AI

FEVEROUS (2021)

Rami Aly, Zhijiang Guo, Michael Schlichtkrull, James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Oana Cocarascu, Arpit Mittal

2021

Fact verification has attracted a lot of attention in the machine learning and natural language processing communities, as it is one of the key methods for detecting misinformation. Existing large-scale benchmarks for this task have focused mostly on textual sources, i.e. unstructured information, and thus ignored the wealth of information available in structured formats, such as tables. In this paper we

Conversational AI

Purchase modelling with Amazon SageMaker

Theodore Vasiloudis, Ehsan M. Kermani

2020

When customers visit an ecommerce website, they will perform certain actions and will eventually either make a purchase or end their session without a purchase. Website operators can use the browsing behavior of their customers to build machine learning models that allow them to target customers that are more likely to convert with promotions. In this solution we will demonstrate how one can use SageMaker

Machine learning

Masked language model scoring

Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff

2020

Pretrained masked language models (MLMs) require finetuning for most NLP tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are computed by masking tokens one by one. We show that PLLs outperform scores from autoregressive language models like GPT-2 in a variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an end-to-end LibriSpeech model

Conversational AI

Alexa skills kit SDK for Java

Jonathan Breedlove, Prashanth Bheemagani, Olivia Sung, Chris Kocel, Mario Doiron, Nikhil Yogendra Murali, Chris Liao, Joaquin Engelmo Moriche, Kaiming Tao, Memo Döring, Nong (Ron) Wang, Sergio del Amo, Xavier Portilla Edo, Jafer Khan, Gert Jan Kamstra, Josh Bean, Pritesh Soni, Rommel Rico

2020

The Alexa Skills Kit SDK for Java helps you get a skill up and running quickly, letting you focus on skill logic instead of boilerplate code.

Conversational AI

Amazon SageMaker debugger RulesConfig

Nathalie Rauschmayr, Vikas Kumar, Rahul Huilgol, Andrea Olgiati, Satadal Bhattacharjee, Nihal Harish, Vandana Kannan, Amol Lele, Anirudh Acharya, Jared Nielsen, Lakshmi Ramachandran, Ishaaq Chandy, Ishan Bhatt, Zhihan Li, Kohen Chia, Neelesh Dodda, Jiacheng Gu, Miyoung Choi, Balajee Nagarajan, Jeffrey Geevarghes, Denis Davydenko, Sifei Li, Lu Huang, Edward Kim, Tyler Hill, Krishnaram Kenthapadi

2020

Amazon SageMaker Debugger is designed to be a debugger for machine learning models. It lets you go beyond just looking at scalars like losses and accuracies during training and gives you full visibility into all tensors 'flowing through the graph' during training or inference. Amazon SageMaker Debugger RulesConfig provides a mapping of builtin rules with default configurations. These configurations will

Machine learning

QA dataset converter

Priyanka Sen, Amir Saffari

2020

While models have reached superhuman performance on popular question answering (QA) datasets such as SQuAD, they have yet to outperform humans on the task of question answering itself. In this paper, we investigate if models are learning reading comprehension from QA datasets by evaluating BERT-based models across five datasets. We evaluate models on their generalizability to out-of-domain examples, responses

Conversational AI

Entity resolution for smart advertising using Amazon SageMaker

Ehsan M. Kermani, Soji Adeshina

2020

This project shows how to use Deep Graph Library (DGL) on Amazon SageMaker to train a graph neural network (GNN) model to perform entity resolution on customer identity graphs. See the project detail page to learn more about the techniques used.

Machine learning

DGL-KE

Da Zheng, Xiang Song, Chao Ma, Zeyuan Tan, Zihao Ye, Jin Dong, Hao Xiong, Zheng Zhang, George Karypis

2020

Knowledge graphs have emerged as a key abstraction for organizing information in diverse domains and their embeddings are increasingly used to harness their information in various information retrieval and machine learning tasks. However, the ever growing size of knowledge graphs requires computationally efficient algorithms capable of scaling to graphs with millions of nodes and billions of edges. This

Information and knowledge management

Meta-Q-Learning

Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alex Smola

2020

This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state-of-the-art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, a multi-task objective to maximize the average reward across the training

Machine learning

AWS Well-Architected labs

Nathan Besh, Alee Whitman, Duncan Bell

2020

The Well-Architected framework has been developed to help cloud architects build the most secure, high-performing, resilient, and efficient infrastructure possible for their applications. This framework provides a consistent approach for customers and partners to evaluate architectures, and provides guidance to help implement designs that will scale with your application needs over time. This repository

Cloud and systems

Alexa end-to-end SLU

Markus Müller

2020

This setup allows to train end-to-end neural models for spoken language understanding (SLU). It uses either the Snips SLU or the Fluent Speech dataset (FSC). This framework is built using pytorch with torchaudio and the transformer package from HuggingFace. We tested using pytorch 1.5.0 and torchaudio 0.5.0.

Machine learning

Code and datasets

More resources

Related content

Work with us