Code and datasets

CrossCLR: Cross-modal contrastive learning for multi-modal video representations

Mohammadreza Zolfaghari, Yi Zhu, Peter Gehler, Thomas Brox

2021

Contrastive learning allows us to flexibly define powerful losses by contrasting positive pairs from sets of negative samples. Recently, the principle has also been used to learn cross-modal embeddings for video and text, yet without exploiting its full potential. In particular, previous losses do not take the intra-modality similarities into account, which leads to inefficient embeddings, as the same content

Computer vision

Uniform sampling over episode difficulty

Sébastien M. R. Arnold, Guneet Singh Dhillon, Avinash Ravichandran, Stefano Soatto

2021

Episodic training is a core ingredient of few-shot learning to train models on tasks with limited labelled data. Despite its success, episodic training remains largely understudied, prompting us to ask the question: what is the best way to sample episodes? In this paper, we first propose a method to approximate episode sampling distributions based on their difficulty. Building on this method, we perform

Computer vision

Amazon DenseClus

Charles Frenzel, Baichuan Sun, Eden Duthie, Yin Song

2021

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN. Allowing for both categorical and numerical data, DenseClus makes it possible to incorporate all features in clustering.

Machine learning

DSTC10 Track 2 - Knowledge-grounded task-oriented dialogue modeling on spoken conversations

Seokhwan Kim, Yang Liu, Di Jin, Alexandros Papangelis, Behnam Hedayatnia, Karthik Gopalakrishnan, Dilek Hakkani-Tür

2021

A lot of recent work in dialogue modeling has been on written conversations, partly because of available data sets. However, written dialogues are not sufficient to fully capture the nature of spoken conversations as well as the potential effect of speech recognition errors on practical spoken dialogue systems. This challenge track aims to provide a new benchmark on spoken task-oriented conversations. We

Conversational AI

Learning better visual dialog agents with pretrained visual-linguistic representation

Tao Tu, Qing Ping, Govind Thattai, Gokhan Tur, Prem Natarajan

2021

GuessWhat?! is a visual dialog guessing game which incorporates a Questioner agent that generates a sequence of questions, while an Oracle agent answers the respective questions about a target object in an image. Based on this dialog history between the Questioner and the Oracle, a Guesser agent makes a final guess of the target object. While previous work has focused on dialogue policy optimization and

Computer vision

Question answering using web lists

Anoop Katti, Kai Hui, Adrià de Gispert, Hagen Fuerstenau

2021

This repository contains the ListQA datasets described in the paper - Question Answering using Web Lists. Datasets, NQWebList and GQWebList, use a subset of questions from Natural Questions and GooAQ respectively. To build these datasets, each annotator was shown a question and a relevant URL from the web and was asked to annotate the list answer on the URL, if it exists. For annotating a list, the annotators

Conversational AI

Multi-Agent Experience (MAX) Toolkit

David Henry, Sriram Ramakrishnan, Ryan Sherritt, Alok Bhat, Jayalakshmi Surendran, Aneesh Bhat, Akhilesh Srikakulapu, Ankit Sablok, Teja Theegela, Sam Coulter

2021

The MAX Toolkit provides software which aims to accelerate the development of devices which integrate multiple voice agents. The Toolkit provides guidance to both device makers and agent developers towards this goal.

Conversational AI

Contextual answer sentence selection

Ivano Lauriola, Alessandro Moschitti, Stefano Campese

2021

Coala is a python package for Contextual Answer Sentence Selection. Answer Sentence Selection (AS2) is a sub-task of question answering, and it aims at finding the sentence containing the answer for a given input question from a pool of possible candidate sentences (e.g. retrieved from a search engine). In our contextual AS2 task, we provide models that leverage contextual information coming from the documents

Search and information retrieval

Hidden biases in unreliable news detection datasets

Xiang Zhou, Heba Elfardy, Christos Christodoulopoulos, Thomas Butler, Mohit Bansal

2021

Automatic unreliable news detection is a research problem with great potential impact. Recently, several papers have shown promising results on large-scale news datasets with models that only use the article itself without resorting to any fact-checking mechanism or retrieving any supporting evidence. In this work, we take a closer look at these datasets. The code is tested on Python 3.7 and PyTorch 1.6.0

Conversational AI

WOW++

Mihail Eric, Nicole Burnstein (Chartier), Behnam Hedayatnia, Karthik Gopalakrishnan, Pankaj Rajan, Yang Liu, Dilek Hakkani-Tür

2021

Incorporating external knowledge sources effectively in conversations is a longstanding problem in open-domain dialogue research. The existing literature on open-domain knowledge selection is limited and makes certain brittle assumptions on knowledge sources to simplify the overall task (Dinan et al., 2019), such as the existence of a single relevant knowledge sentence per context. In this work, we evaluate

Conversational AI

VISITRON: Visual semantics-aligned interactively trained object-navigator

Ayush Shrivastava, Karthik Gopalakrishnan, Yang Liu, Robinson Piramuthu, Gokhan Tur, Devi Parikh, Dilek Hakkani-Tür

2021

Interactive robots navigating photo-realistic environments need to be trained to effectively leverage and handle the dynamic nature of dialogue in addition to the challenges underlying vision-and-language navigation (VLN). In this paper, we present VISITRON, a multi-modal Transformer-based navigator better suited to the interactive regime inherent to Cooperative Vision-and-Dialog Navigation (CVDN). VISITRON

Computer vision

GraVL-BERT

Arpit Gupta

2021

GRAVL-BERT is a unified multimodal coreference resolution (MCR) framework which combines visual relationships between objects, background scenes, dialogue, and metadata by integrating graph neural networks with VL-BERT.

Conversational AI

Code and datasets

More resources

Related content

Work with us