Code and datasets

Faithful low-resource data-to-text generation through cycle training

Zhuoer Wang, Marcus Collins, Nikhita Vedula, Simone Filice, Shervin Malmasi, Oleg Rokhlenko

2023

Methods to generate text from structured data have advanced significantly in recent years, primarily due to fine-tuning of pre-trained lan-guage models on large datasets. However, such models can fail to produce output faithful to the input data, particularly on out-of-domain data. Sufficient annotated data is often not avail-able for specific domains, leading us to seek an unsupervised approach to improve

Conversational AI

AMR - Understanding Disrupted Sentences

Angus Addlesee, Marco Damonte

2023

ASR systems are not always confident of a certain word as the audio was unclear. For example, if a door slams or siren passes in the middle of a customer's utterance. In this project, we want to explore whether we can represent disrupted customer utterances, and recover this into a full representation with one additional turn.

Conversational AI

DFEE: Interactive DataFlow execution and evaluation kit

Han He, Song Feng, Daniele Bonadiman, Yi Zhang, Saab Mansour

2023

We present DFEE, an interactive DataFlow Execution and Evaluation toolkit that supports execution, visualization and benchmarking of semantic parsers given dialogue input and backend database. We demonstrate the system via a complex dialog task: event scheduling that involves temporal reasoning. It also supports diagnosing the parsing results via a friendly interface that allows developers to examine dynamic

Conversational AI

Multilingual robust contrastive pretraining

Asa Cooper Stickland, Sailik Sengupta, Jason Krone, Saab Mansour

2023

To benchmark the performance of pretrained multilingual language models, we construct noisy datasets covering five languages and four NLP tasks and observe a clear gap in the performance between clean and noisy data in the zero-shot cross-lingual setting. After investigating several ways to boost the robustness of multilingual models in this setting, we propose Robust Contrastive Pretraining (RCP). RCP

Conversational AI

Fortuna: A library for uncertainty quantification

Gianluca Detommaso, Alberto Gasparin, Oleg Smirnov, Thomas Pinder, Christian Leibig, Paul Scemama

2023

Proper estimation of predictive uncertainty is fundamental in applications that involve critical decisions. Uncertainty can be used to assess the reliability of model predictions, trigger human intervention, or decide whether a model can be safely deployed in the wild. We introduce Fortuna, an open-source library for uncertainty quantification. Fortuna provides calibration methods, such as conformal prediction

Machine learning

Xtr-WikiQA

Shivanshu Gupta, Yoshitomo Matsubara, Ankit Chadha, Alessandro Moschitti

2023

Xtr-WikiQA is an Answer Sentence Selection (AS2) dataset in 9 non-English languages, proposed in our paper accepted at ACL 2023 (Findings): Cross-Lingual Knowledge Distillation for answer sentence selection in low-resource languages. This dataset is based on an English AS2 dataset, WikiQA (Original, Hugging Face). For translations, we used Amazon Translate.

Conversational AI

TyDi-AS2

Shivanshu Gupta, Yoshitomo Matsubara, Ankit Chadha, Alessandro Moschitti

2023

TyDi-AS2 and Xtr-TyDi-AS2 are multilingual Answer Sentence Selection (AS2) datasets comprising 8 diverse languages, proposed in our paper accepted at ACL 2023 (Findings): Cross-Lingual Knowledge Distillation for answer sentence selection in low-resource languages. Both the datasets were created from TyDi-QA, a multilingual question-answering dataset. TyDi-AS2 was created by converting the QA instances in

Conversational AI

SWING: Balancing coverage and faithfulness for dialogue summarization

Steeve Huang, Siffi Singh, Xiaofei Ma, Wei Xiao, Feng Nan, Nicholas Dingwall, William Yang Wang, Kathleen McKeown

2023

Missing information is a common issue of dialogue summarization where some information in the reference summaries is not covered in the generated summaries. To address this issue, we propose to utilize natural language inference (NLI) models to improve coverage while avoiding introducing factual inconsistencies. Specifically, we use NLI to compute fine-grained training signals to encourage the model to

Conversational AI

RefChecker

Dongyu Ru, Xiangkun Hu, Lin Qiu

2023

RefChecker provides a standardized assessment framework to identify subtle hallucinations present in the outputs of large language models (LLMs). Highlighted Features Finer granularity — RefChecker breaks down the claims in the LLM’s response into knowledge triplets as opposed to paragraphs, sentences, or sub-sentences. Detecting at the knowledge-triplet level will test the truthfulness of facts. Importantly

Conversational AI

ProbConserv: Probabilistic framework to enforce conservation laws

Derek Hansen, Danielle Maddix Robinson, Shima Alizadeh, Gaurav Gupta, Michael Mahoney

2023

Recent work in scientific machine learning (SciML) has focused on incorporating partial differential equation (PDE) information into the learning process. Much of this work has focused on relatively “easy” PDE operators (e.g., elliptic and parabolic), with less emphasis on relatively “hard” PDE operators (e.g., hyperbolic). Within numerical PDEs, the latter problem class requires control of a type of volume

Machine learning

Privacy adhering machine un-learning in NLP

Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth

2023

Regulations introduced General Data Protection Regulation (GDPR) in the EU or California Consumer Privacy Act (CCPA) in the US have included provisions on the right to be forgotten that mandates industry applications to remove data related to an individual from their systems. In several real world industry applications that use Machine Learning to build models on user data, such mandates require significant

Conversational AI

Rethinking minimax-fairness

Harvineet Singh, Matthaeus Kleindessner, Volkan Cevher, Rumi Chunara, Chris Russell

2023

Minimax-fair machine learning minimizes the error for the worst-off group. However, empirical evidence suggests that when sophisticated models are trained with standard empirical risk minimization (ERM), they often have the same performance on the worst-off group as a minimax-trained model. Our work makes this counterintuitive observation concrete. We prove that if the hypothesis class is sufficiently expressive

Machine learning

Code and datasets

More resources

Related content

Work with us