-
ICASSP 20222022Recognizing the intents and domains of users’ spoken and written language is a key component of Natural Language Understanding (NLU) systems. Real applications however encounter dynamic, rapidly evolving environments with newly emerging intents and domains, for which no labeled data or prior information is available. For such a setting, we propose a novel framework, ADVIN, to automatically discover novel
-
ACL 20222022We present a complete pipeline to extract characters in a novel and link them to their direct-speech utterances. Our model is divided into three independent components: extracting direct-speech, compiling a list of characters, and attributing those characters to their utterances. Although we find that existing systems can perform the first two tasks accurately, attributing characters to direct speech is
-
ICASSP 20222022End-to-end (E2E) spoken language understanding (SLU) systems can infer the semantics of a spoken utterance directly from an audio signal. However, training an E2E system remains a challenge, largely due to the scarcity of paired audio semantics data. In this paper, we consider an E2E system as a multi-modal model, with audio and text functioning as its two modalities, and use a cross-modal latent space
-
ICASSP 20222022Automatic dubbing (AD) is among the machine translation (MT) use cases where translations should match a given length to allow for synchronicity between source and target speech. For neural MT, generating translations of length close to the source length (e.g. within ±10% in character count), while preserving quality is a challenging task. Controlling MT output length comes at a cost to translation quality
-
ICASSP 20222022Generic pre-trained speech and text representations promise to reduce the need for large labeled datasets on specific speech and language tasks. However, it is not clear how to effectively adapt these representations for speech emotion recognition. Recent public benchmarks show the efficacy of several popular self-supervised speech representations for emotion classification. In this study, we show that
Related content
-
May 2, 2023ICLR workshop sponsored by Amazon CodeWhisperer features Amazon papers on a novel contrastive-learning framework for causal language models and a way to gauge the robustness of code generation models.
-
April 12, 2023From noisy cars to unreliable signals, researchers have worked to extend the Alexa experience to vehicles on the move.
-
April 6, 2023University teams are competing to help advance the science of conversational embodied AI and robust human AI interaction.
-
April 3, 2023Combining acoustic and lexical information improves real-time voice sentiment analysis.
-
March 31, 2023Attendees explored new avenues of research in areas including robotics and conversational AI via roundtables moderated by researchers from Amazon.
-
March 27, 2023Initiative will advance artificial intelligence and machine learning research within speech, language, and multimodal-AI domains.