-
2025In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets. However, the large size and high computation demands of LLMs limit their practicality in many applications, especially when further fine-tuning is required. To address these limitations, smaller models are typically preferred for deployment. However, their training is
-
2025Recent advancements in speech encoders have drawn attention due to their integration with Large Language Models for various speech tasks. While most research has focused on either causal or full-context speech encoders, there’s limited exploration to effectively handle both streaming and non-streaming applications, while achieving state-of-the-art performance. We introduce DuRep, a Dual-mode Speech Representation
-
2025The use of human speech to train LLMs poses privacy concerns due to these models’ ability to generate samples that closely resemble artifacts in the training data. We propose a speaker privacy-preserving representation learning method through the Universal Speech Codec (USC), a computationally efficient codec that disentangles speech into: (i) privacy-preserving semantically rich representations, capturing
-
2025Language localization is the adaptation of written content to different linguistic and cultural contexts. Ability to localize written content is crucial for global businesses to provide consistent and reliable customer experience across diverse markets. Traditional methods have approached localization as an application of machine translation (MT), but localization requires more than linguistic conversion
-
AAAI 2025 Workshop on Preventing and Detecting LLM Misinformation2025Unlearning aims to remove copyrighted, sensitive, or private content from large language models (LLMs) without a full retraining. In this work, we develop a multi-task unlearning benchmark (LUME) which features three tasks: (1) unlearn synthetically generated creative short novels, (2) unlearn synthetic biographies with sensitive information, and (3) unlearn a collection of public biographies. We further
Related content
-
August 01, 2022McKeown awarded IEEE Innovation in Societal Infrastructure Award and named a member of the American Philosophical Society.
-
July 28, 2022Donato Crisostomi talks about how his mother helped spark a love of knowledge that led him to two science internships at Amazon.
-
July 22, 2022New EMNLP workshop will feature talks, papers, posters, and a competition built around the 50-plus-language, million-utterance MASSIVE dataset.
-
July 15, 2022New method optimizes the twin demands of retrieving relevant content and filtering out bad content.
-
July 14, 2022To become the interface for the Internet of things, conversational agents will need to learn on their own. Alexa has already started down that path.
-
July 13, 2022Four MIT professors are the recipients of the inaugural call for research projects.