Customer-obsessed science
Research areas
-
November 20, 20254 min readA new evaluation pipeline called FiSCo uncovers hidden biases and offers an assessment framework that evolves alongside language models.
-
October 2, 20253 min read
-
-
-
September 2, 20253 min read
Featured news
-
ACL 2023 Workshop on SustaiNLP2023Popular models for Knowledge Graph Question Answering (KGQA), including semantic parsing and End-to-End (E2E) models, decode into a constrained space of KG relations. Al-though E2E models accommodate novel entities at test-time, this constraint means they cannot access novel relations, requiring expensive and time-consuming retraining whenever a new relation is added to the KG. We propose KG-Flex, a new
-
ACL 2023 Workshop on Matching Entities2023Large Language Models (LLMs) are capable of performing zero-shot closed-book question answering tasks, based on their internal knowl-edge stored in parameters during pre-training. However, such internalized knowledge might be insufficient and incorrect, which could lead LLMs to generate factually wrong answers. Furthermore, fine-tuning LLMs to update their knowledge is expensive. To this end, we pro-pose
-
ACL 20232023Recent NLP literature pays little attention to the robustness of toxicity language predictors, while these systems are most likely to be used in adversarial contexts. This paper presents a novel adversarial attack, ToxicTrap, introducing small word-level perturbations to fool SOTA text classifiers to predict toxic text samples as benign. ToxicTrap exploits greedy based search strategies to enable fast and
-
ACL Findings 2023, ACL 2023 Workshop on SustaiNLP2023Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each have advantages; however, training both model types from scratch is computationally expensive. We explore recipes to improve pre-training efficiency by initializing one model from the other. (1) Extracting the encoder from a seq2seq model, we show it underperforms a Masked Language Modeling (MLM) encoder, particularly on sequence labeling
-
IWSLT 20232023This paper describes the speech translation system submitted as part of the IWSLT 2023 shared task on low resource speech translation. The low resource task aids in building models for language pairs where the training corpus is limited. In this paper, we focus on two language pairs, namely, Tamasheq-French (Tmh→Fra) and Marathi-Hindi (Mr→Hi) and implement a speech translation system that is unconstrained
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all