Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

Cross-lingual prosody transfer for expressive machine dubbing

Jakub Swiatkowski, Duo Wang, Mikolaj Babianski, Patrick Tobing, Ravi chander Vipperla, Vincent Pollet

Interspeech 2023

2023

Prosody transfer is well-studied in the context of expressive speech synthesis. Cross-lingual prosody transfer, however, is challenging and has been underexplored to date. In this paper, we present a novel solution to learn prosody representations that are transferable across languages and speakers for machine dubbing of expressive multimedia contents. Multimedia contents often contain field recordings.

Conversational AI
Contact complexity in customer service

Shu-Ting Pi, Michael Yang, Qun Liu

KDD 2023 Workshop on Decision Intelligence and Analytics for Online Marketplaces

2023

Customers who reach out for customer service support may face a range of issues that vary in complexity. Routing high-complexity contacts to junior agents can lead to multiple transfers or repeated contacts, while directing low-complexity contacts to senior agents can strain their capacity to assist customers who need professional help. To tackle this, a machine learning model that accurately predicts the

Conversational AI
Uncovering customer issues through topological natural language analysis

Shu-Ting Pi, Sidarth Srinivasan, Yuying Zhu, Michael Yang, Qun Liu

KDD 2023 Workshop on Decision Intelligence and Analytics for Online Marketplaces

2023

E-commerce companies deal with a high volume of customer service requests daily. While a simple annotation system is often used to summarize the topics of customer contacts, thoroughly exploring each specific issue can be challenging. This presents a critical concern, especially during an emerging outbreak where companies must quickly identify and address specific issues. To tackle this challenge, we propose

Conversational AI
Teacher-student learning on complexity in intelligent routing

Shu-Ting Pi, Michael Yang, Yuying Zhu, Qun Liu

KDD 2023 Workshop on End-End Customer Journey Optimization

2023

Customer service is often the most time-consuming aspect for e-commerce websites, with each contact typically taking 10-15 minutes. Effectively routing customers to appropriate agents without transfers is therefore crucial for e-commerce success. To this end, we have developed a machine learning framework that predicts the complexity of customer contacts and routes them to appropriate agents accordingly

Conversational AI
Contrastive training improves zero-shot classification of semi-structured documents

Muhammad Khalifa, Yogarshi Vyas, Shuai Wang, Graham Horwood, Sunil Mallya, Miguel Ballesteros

ACL Findings 2023

2023

We investigate semi-structured document classification in a zero-shot setting. Classification of semi-structured documents is more challenging than that of standard unstructured documents, as positional, layout, and style information play a vital role in interpreting such documents. The standard classification setting where categories are fixed during both training and testing falls short in dynamic environments

Conversational AI

Amazon Unveils Novel Alexa Dialog Modeling for Natural, Cross-Skill Conversations

Alexa Science Team

June 5, 2019

Today, customer exchanges with Alexa are generally either one-shot requests, like “Alexa, what’s the weather?”, or interactions that require multiple requests to complete more complex tasks.

Conversational AI
Using adversarial training to recognize speakers’ emotions

Viktor Rozgic

May 21, 2019

A person’s tone of voice can tell you a lot about how they’re feeling. Not surprisingly, emotion recognition is an increasingly popular conversational-AI research topic.

Conversational AI
Should Alexa read “2/3” as “two-thirds” or “February Third”?: The science of text normalization

Ming Sun

May 16, 2019

Text normalization is an important process in conversational AI. If an Alexa customer says, “book me a table at 5:00 p.m.”, the automatic speech recognizer will transcribe the time as “five p m”. Before a skill can handle this request, “five p m” will need to be converted to “5:00PM”. Once Alexa has processed the request, it needs to synthesize the response — say, “Is 6:30 p.m. okay?” Here, 6:30PM will be converted to “six thirty p m” for the text-to-speech synthesizer. We call the process of converting “5:00PM” to “five p m” text normalization and its counterpart — converting “five p m” to “5:00PM” — inverse text normalization.

Conversational AI
Training a Machine Learning Model in English Improves Its Performance in Japanese

Judith Gaspers

May 13, 2019

Recently, we published a paper showing that training a neural network to do language processing in English, then retraining it in German, drastically reduces the amount of German-language training data required to achieve a given level of performance.

Conversational AI
How we add new skills to Alexa’s name-free skill selector

Young-Bum Kim

May 3, 2019

Using cosine similarity rather than dot product to compare vectors helps prevent "catastrophic forgetting".

Conversational AI
“Alexa, Turn Down the Lights and Play Music”: The Science of Handling Compound Requests

Rahul Goel

May 2, 2019

Traditionally, Alexa has interpreted customer requests according to their intents and slots. If you say, “Alexa, play ‘What’s Going On?’ by Marvin Gaye,” the intent should be PlayMusic, and “‘What’s Going On?’” and “Marvin Gaye” should fill the slots SongName and ArtistName.

Conversational AI

Conversational AI

Publications

Related content

Work with us