Conversational AI

Repairing interrupted questions makes voice agents more accessible

Learning to represent truncated sentences with semantic graphs improves models’ ability to infer missing content.

By Angus Addlesee, Marco Damonte

August 16, 2023

3 min read

Everyone’s had the experience of pausing mid-sentence during a conversation, trying to conjure a forgotten word. These pauses can be so pronounced that today’s voice assistants mistake them for the ends of users’ sentences. When this happens, the entire sentence has to be repeated.

This is frustrating for all users, but certain user groups are affected more than others — often, the groups that can benefit the most from voice assistants. During conversations, for example, people with dementia pause more often and for longer durations than others.

Prompt engineering enables researchers to generate customized training examples for lightweight “student” models.

At Alexa AI, we experimented with several speech-processing pipelines in an attempt to address this problem. Our most successful approach involved a model that learned to “understand” incomplete sentences. To train that model, we adapted two existing datasets, truncating their sentences and pairing each sentence with a graph-based semantic representation.

One of the truncated-sentence datasets, which we presented at the ACM conference on Conversational User Interfaces (CUI) earlier this year, contains only questions; the other dataset, which we’ll present next week at Interspeech, contains more-general sentences.

The graphs in our datasets capture the semantics of each word in each sentence and the relationships between words. When we truncated the original sentences, we also removed the sections of the graphs contributed by the removed words.

Truncated-utterance graph.png — A color-coded diagram of a sentence and its corresponding graph representation. The colors indicate which sections of the graph are contributed by each word.

We used these datasets to train a model that takes an incomplete sentence as input and outputs the corresponding incomplete semantic graph. The partial graphs, in turn, feed into a model that completes the graph, and its outputs are converted into text strings for downstream processing.

More-natural conversation

This work is part of a broader effort to make interactions with Alexa more natural and human-like. To get a sense of the problem we’re trying to address, read the following sentence fragment slowly, focusing on how the addition of each word increases your understanding:

Yesterday Susan ate some crackers with…

Maybe Susan ate crackers with cheese, with a fork, or with her aunt … the ending does not matter. You don’t need to read the end of this sentence to understand that multiple crackers were eaten by Susan yesterday, and you built this understanding word by word.

In conversation, when sentences are left incomplete, people typically ask for a clarification, like Amit’s question in this example:

Susan: “Who was the father of …”
Amit: “Sorry, of who?”
Susan: “Prince Harry”
Amit: “Oh, King Charles III”

Repairing interrupted questions makes voice agents more accessible

Learning to represent truncated sentences with semantic graphs improves models’ ability to infer missing content.

More-natural conversation

Related content

Work with us