Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

PhraseSumm: Abstractive short phrase summarization

Kasturi Bhattacharjee, Kathleen McKeown, Rashmi Gangadharaiah

IJCNLP-AACL 2023

2023

Prior work in the field of text summarization mostly focuses on generating summaries that are a sentence or two long. In this work, we introduce the task of abstractive short-phrase summarization (PhraseSumm), which aims at capturing the central theme of a document through a generated short phrase. We explore BART & T5-based neural summarization models, and measure their effectiveness for the task using

Conversational AI
Enhancing abstractiveness of summarization models through calibrated distillation

Hwanjun Song, Igor Shalyminov, Hang Su, Siffi Singh, Kaisheng Yao, Saab Mansour

EMNLP 2023

2023

Sequence-level knowledge distillation reduces the size of Seq2Seq models for more efficient abstractive summarization. However, it often leads to a loss of abstractiveness in summarization. In this paper, we propose a novel approach named DisCal to enhance the level of abstractiveness (measured by n-gram overlap) without sacrificing the informativeness (measured by ROUGE) of generated summaries. DisCal

Conversational AI
Generalized zero-shot audio-to-intent classification

Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki

ASRU 2023

2023

Spoken language understanding systems using audio-only data are gaining popularity, yet their ability to handle unseen intents remains limited. In this study, we propose a generalized zero-shot audio-to-intent classification framework with only a few sample text sentences per intent. To achieve this, we first train a supervised audio-to-intent classifier by making use of a self-supervised pre-trained model

Conversational AI
Hierarchical attention-based contextual biasing for personalized speech recognition using neural transducers

Sibo Tong, Philip Harding, Simon Wiesler

ASRU 2023

2023

Although end-to-end (E2E) automatic speech recognition (ASR) systems excel in general tasks, they frequently struggle with accurately recognizing personal rare words. Leveraging contextual information to bias the internal states of E2E ASR model has proven to be an effective solution. However most existing work focuses on biasing for a single domain and it is still challenging to expand such contextualization

Conversational AI
Contextual data augmentation for task-oriented dialog systems

Dustin Axman, Avik Ray, Shubham Garg, Jing Huang

ECML-PKDD 2023 Workshop on Challenges and Opportunities of Large Language Models in Real-World Machine Learning Applications (COLLM)

2023

Collection of annotated dialogs for training task-oriented dialog systems have been one of the key bottlenecks in improving current models. While dialog response generation has been widely studied on the agent side, it is not evident if similar generative models can be used to generate a large variety of, and often unexpected, user inputs that real dialog systems encounter in practice. Existing data augmentation

Conversational AI

Using adversarial training to recognize speakers’ emotions

Viktor Rozgic

May 21, 2019

A person’s tone of voice can tell you a lot about how they’re feeling. Not surprisingly, emotion recognition is an increasingly popular conversational-AI research topic.

Conversational AI
Should Alexa read “2/3” as “two-thirds” or “February Third”?: The science of text normalization

Ming Sun

May 16, 2019

Text normalization is an important process in conversational AI. If an Alexa customer says, “book me a table at 5:00 p.m.”, the automatic speech recognizer will transcribe the time as “five p m”. Before a skill can handle this request, “five p m” will need to be converted to “5:00PM”. Once Alexa has processed the request, it needs to synthesize the response — say, “Is 6:30 p.m. okay?” Here, 6:30PM will be converted to “six thirty p m” for the text-to-speech synthesizer. We call the process of converting “5:00PM” to “five p m” text normalization and its counterpart — converting “five p m” to “5:00PM” — inverse text normalization.

Conversational AI
Training a Machine Learning Model in English Improves Its Performance in Japanese

Judith Gaspers

May 13, 2019

Recently, we published a paper showing that training a neural network to do language processing in English, then retraining it in German, drastically reduces the amount of German-language training data required to achieve a given level of performance.

Conversational AI
How we add new skills to Alexa’s name-free skill selector

Young-Bum Kim

May 3, 2019

Using cosine similarity rather than dot product to compare vectors helps prevent "catastrophic forgetting".

Conversational AI
“Alexa, Turn Down the Lights and Play Music”: The Science of Handling Compound Requests

Rahul Goel

May 2, 2019

Traditionally, Alexa has interpreted customer requests according to their intents and slots. If you say, “Alexa, play ‘What’s Going On?’ by Marvin Gaye,” the intent should be PlayMusic, and “‘What’s Going On?’” and “Marvin Gaye” should fill the slots SongName and ArtistName.

Conversational AI
Training Speech Synthesizers on Data from Multiple Speakers

Jakub Lachowicz

April 25, 2019

When a customer asks Alexa to play “Hey Jude”, and Alexa responds, “Playing 'Hey Jude' by the Beatles,” that response is generated by a text-to-speech (TTS) system, which converts textual inputs into synthetic-speech outputs...

Conversational AI

Conversational AI

Publications

Related content

Work with us