Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

Contextual Language Model Adaptation for Conversational Agents

Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anushree Venkatesh, Ariya Rastrow

Interspeech 2018

2018

Statistical language models (LM) play a key role in Automatic Speech Recognition (ASR) systems used by conversational agents. These ASR systems should provide a high accuracy under a variety of speaking styles, domains, vocabulary and argots. In this paper, we present a DNN-based method to adapt the LM to each user-agent interaction based on generalized contextual information, by predicting an optimal,

Conversational AI
Statistical Model Compression for Small-Footprint Natural Language Understanding

Grant Strimel, Kanthashree Mysore Sathyendra, Stanislav Peshterliev

Interspeech 2018

2018

In this paper we investigate statistical model compression applied to natural language understanding (NLU) models. Small-footprint NLU models are important for enabling offline systems on hardware restricted devices, and for decreasing on demand model loading latency in cloud-based systems. To compress NLU models, we present two main techniques, parameter quantization and perfect feature hashing. These

Related: Shrinking machine learning models for offline use

Conversational AI
Time-Delayed Bottleneck Highway Networks Using A DFT Feature for Keyword Spotting

Jinxi Guo, Kenichi Kumatani, Ming Sun, Minhua Wu, Anirudh Raju, Nikko Ström, Arindam Mandal

ICASSP 2018

2018

This paper presents a novel deep neural network (DNN) architecture with highway blocks (HWs) using a complex discrete Fourier transform (DFT) feature for keyword spotting. In our previous work, we showed that the feed-forward DNN with a time-delayed bottleneck layer (TDB-DNN) directly trained from the audio input outperformed the model with the log-mel filter bank energy feature (LFBE), given a large amount

Conversational AI
Smoothing model predictions using adversarial training procedures for speech based emotion recognition

Rahul Gupta

ICASSP 2018

2018

Training discriminative classifiers involves learning a conditional distribution p(yi|xi), given a set of feature vectors xi and the corresponding labels yi, i = 1..N. For a classifier to be generalizable and not overfit to training data, the resulting conditional distribution p(yi|xi) is desired to be smoothly varying over the inputs xi. Adversarial training procedures enforce this smoothness using manifold

Conversational AI
Multilayer Adaptation Based Complex Echo Cancellation and Voice Enhancement

Jun Yang

ICASSP 2018

2018

The paper proposes an efficient signal processing system mainly consisting of an adaptation-based nonlinear echo cancellation (NLEC) layer and a joint perceptual subband residual echo suppression (SBRES) layer and noise reduction (SBNR) layer. The theoretical analyses, subjective and objective test results show that the proposed signal processing system can offer a significant improvement for automatic

Conversational AI

Amazon Unveils Novel Alexa Dialog Modeling for Natural, Cross-Skill Conversations

Alexa Science Team

June 5, 2019

Today, customer exchanges with Alexa are generally either one-shot requests, like “Alexa, what’s the weather?”, or interactions that require multiple requests to complete more complex tasks.

Conversational AI
Using adversarial training to recognize speakers’ emotions

Viktor Rozgic

May 21, 2019

A person’s tone of voice can tell you a lot about how they’re feeling. Not surprisingly, emotion recognition is an increasingly popular conversational-AI research topic.

Conversational AI
Should Alexa read “2/3” as “two-thirds” or “February Third”?: The science of text normalization

Ming Sun

May 16, 2019

Text normalization is an important process in conversational AI. If an Alexa customer says, “book me a table at 5:00 p.m.”, the automatic speech recognizer will transcribe the time as “five p m”. Before a skill can handle this request, “five p m” will need to be converted to “5:00PM”. Once Alexa has processed the request, it needs to synthesize the response — say, “Is 6:30 p.m. okay?” Here, 6:30PM will be converted to “six thirty p m” for the text-to-speech synthesizer. We call the process of converting “5:00PM” to “five p m” text normalization and its counterpart — converting “five p m” to “5:00PM” — inverse text normalization.

Conversational AI
Training a Machine Learning Model in English Improves Its Performance in Japanese

Judith Gaspers

May 13, 2019

Recently, we published a paper showing that training a neural network to do language processing in English, then retraining it in German, drastically reduces the amount of German-language training data required to achieve a given level of performance.

Conversational AI
How we add new skills to Alexa’s name-free skill selector

Young-Bum Kim

May 3, 2019

Using cosine similarity rather than dot product to compare vectors helps prevent "catastrophic forgetting".

Conversational AI
“Alexa, Turn Down the Lights and Play Music”: The Science of Handling Compound Requests

Rahul Goel

May 2, 2019

Traditionally, Alexa has interpreted customer requests according to their intents and slots. If you say, “Alexa, play ‘What’s Going On?’ by Marvin Gaye,” the intent should be PlayMusic, and “‘What’s Going On?’” and “Marvin Gaye” should fill the slots SongName and ArtistName.

Conversational AI

Conversational AI

Publications

Related content

Work with us