-
IJCNLP-AACL 20232023Prior work in the field of text summarization mostly focuses on generating summaries that are a sentence or two long. In this work, we introduce the task of abstractive short-phrase summarization (PhraseSumm), which aims at capturing the central theme of a document through a generated short phrase. We explore BART & T5-based neural summarization models, and measure their effectiveness for the task using
-
EMNLP 20232023Sequence-level knowledge distillation reduces the size of Seq2Seq models for more efficient abstractive summarization. However, it often leads to a loss of abstractiveness in summarization. In this paper, we propose a novel approach named DisCal to enhance the level of abstractiveness (measured by n-gram overlap) without sacrificing the informativeness (measured by ROUGE) of generated summaries. DisCal
-
ASRU 20232023Spoken language understanding systems using audio-only data are gaining popularity, yet their ability to handle unseen intents remains limited. In this study, we propose a generalized zero-shot audio-to-intent classification framework with only a few sample text sentences per intent. To achieve this, we first train a supervised audio-to-intent classifier by making use of a self-supervised pre-trained model
-
ASRU 20232023Although end-to-end (E2E) automatic speech recognition (ASR) systems excel in general tasks, they frequently struggle with accurately recognizing personal rare words. Leveraging contextual information to bias the internal states of E2E ASR model has proven to be an effective solution. However most existing work focuses on biasing for a single domain and it is still challenging to expand such contextualization
-
ECML-PKDD 2023 Workshop on Challenges and Opportunities of Large Language Models in Real-World Machine Learning Applications (COLLM)2023Collection of annotated dialogs for training task-oriented dialog systems have been one of the key bottlenecks in improving current models. While dialog response generation has been widely studied on the agent side, it is not evident if similar generative models can be used to generate a large variety of, and often unexpected, user inputs that real dialog systems encounter in practice. Existing data augmentation
Related content
-
May 21, 2019A person’s tone of voice can tell you a lot about how they’re feeling. Not surprisingly, emotion recognition is an increasingly popular conversational-AI research topic.
-
May 16, 2019Text normalization is an important process in conversational AI. If an Alexa customer says, “book me a table at 5:00 p.m.”, the automatic speech recognizer will transcribe the time as “five p m”. Before a skill can handle this request, “five p m” will need to be converted to “5:00PM”. Once Alexa has processed the request, it needs to synthesize the response — say, “Is 6:30 p.m. okay?” Here, 6:30PM will be converted to “six thirty p m” for the text-to-speech synthesizer. We call the process of converting “5:00PM” to “five p m” text normalization and its counterpart — converting “five p m” to “5:00PM” — inverse text normalization.
-
May 13, 2019Recently, we published a paper showing that training a neural network to do language processing in English, then retraining it in German, drastically reduces the amount of German-language training data required to achieve a given level of performance.
-
May 3, 2019Using cosine similarity rather than dot product to compare vectors helps prevent "catastrophic forgetting".
-
May 2, 2019Traditionally, Alexa has interpreted customer requests according to their intents and slots. If you say, “Alexa, play ‘What’s Going On?’ by Marvin Gaye,” the intent should be PlayMusic, and “‘What’s Going On?’” and “Marvin Gaye” should fill the slots SongName and ArtistName.
-
April 25, 2019When a customer asks Alexa to play “Hey Jude”, and Alexa responds, “Playing 'Hey Jude' by the Beatles,” that response is generated by a text-to-speech (TTS) system, which converts textual inputs into synthetic-speech outputs...