March 4 marked the kickoff of the third Alexa Prize Socialbot Grand Challenge, in which university teams build socialbots capable of conversing on a wide range of topics and make them available to millions of Alexa customers through the invitation “Alexa, let’s chat”. Student teams may begin applying to the competition now, and in the next six weeks, the Alexa Prize team will make a series of roadshow appearances at tech hubs in the U.S. and Europe to meet with students and answer questions about the program.As the third Alexa Prize Socialbot Grand Challenge gears up, the Alexa science blog is reviewing some of the technical accomplishments from the second. An earlier post examined contributions by Amazon’s Alexa Prize team; this one examines innovations from the participating university teams.
The 2018 Alexa Prize featured eight student teams from four countries, each of which adopted distinctive approaches to some of the central technical questions in conversational AI. We survey those approaches in a paper we released late last year, and the teams themselves go into even greater detail in the papers they submitted to the latest Alexa Prize Proceedings. Here, we touch on just a few of the teams’ innovations.
Handling automatic speech recognition (ASR) errors
Conversations with socialbots can cover a wide range of topics and named entities, which makes automatic speech recognition (ASR) more difficult than it is during more task-oriented interactions with Alexa. Consequently, all the teams in the 2018 Alexa Prize competition built computational modules to handle ASR errors.
Several teams built systems that prompted customers for clarification if ASR confidence scores were too low, and several others retained alternative high-scoring ASR interpretations for later re-evaluation.
Gunrock, the team from the University of California, Davis, that won the 2018 challenge, built a system for correcting ASR errors that uses the double-metaphone algorithm, an algorithm that produces standardized representations of word pronunciations. When the ASR system assigned a word a low confidence score, Gunrock’s error correction module would apply the double-metaphone algorithm to the word and then search a database of metaphone-encoded pronunciations for a partial match. Those pronunciations are grouped according to conversation topic, which lets the module take advantage of contextual information.
So, for instance, the metaphone representation of the phrase “secure holiday” is SKRLT, which doesn’t occur in Gunrock’s database. But SKRLT is a substring of the metaphone representation APSKRLTS, which does occur in the database. So Gunrock’s system would correct SKRLT to APSKRLTS and return the corresponding English phrase: “obscure holidays”.
Knowledge graphs
Carrying on a conversation requires knowledge, and most teams chose to encode their socialbots’ knowledge in graphs. A graph is a mathematical object consisting of nodes, usually depicted as circles, and edges, usually depicted as line segments connecting nodes. In a knowledge graph, the nodes might represent objects, and the edges might represent relationships between them. Several teams populated their knowledge graphs with data from open sources such as DBPedia, Wikidata, Reddit, Twitter, and IMDB, and many of the teams built their graphs using the Neptune graph database service from Amazon Web Services.
Alana, the team from Heriot-Watt University in Scotland and the third-place finisher in the 2018 challenge, used Neptune to build a knowledge graph that encodes all the information in the Wikidata knowledge base, plus some additional data from the DBpedia knowledge base. When the Alana socialbot identifies a named entity in a conversation, it begins a context-constrained exploration of the graph, assembling a subgraph of linked concepts.
If someone chatting with the Alana bot mentioned the movie E.T., for example, Alana’s linked-concept generator would follow the Wikidata link from E.T. to the entry for Drew Barrymore, who appeared in the film, but not to the entry for Sweden, which is the second country in which the movie was released. Then, once it has built up a database of linked concepts, the Alana socialbot selects one at random to serve as the basis for a conversational response.
Natural-language understanding (NLU) for open-domain dialogue
Amazon researchers provided the student teams with default modules for doing natural-language understanding (NLU), or extracting linguistic meaning from raw text, but most teams chose to supplement them or, in some cases, supplant them with systems tailored specifically to the demands of conversational AI. Student teams built their own modules to classify utterances according to intent, or the goal the speaker hopes to achieve, and dialogue act, such as asking for information or requesting clarification; to identify the topics of utterances; and to assess the sentiments expressed by particular choices of phrasing, among other things.
Most of the NLU literature focuses on relatively short, goal-directed utterances. But in conversations with socialbots, people will often speak in longer, more complex sentences. So Gunrock built an NLU module that splits longer sentences into smaller, semantically distinct units, which then pass to additional NLU modules.
To train the segmentation module, Gunrock used movie-dialogue data from the Cornell Movie-Quotes Corpus, which had been annotated with a special tag (“<BRK>”) to indicate breaks between semantically distinct units. On a test set, the module was 95.25% accurate, and an informal review indicated that it was accurately segmenting customers’ remarks. For example, the raw ASR output “Alexa that is cool what do you think of the Avengers” was segmented into “Alexa <BRK> that is cool <BRK> what do you think of the Avengers <BRK>”.
Dialogue management
The outputs of the NLU modules, along with any other utterance data the teams deem useful, pass to the dialogue management module, which generates an array of possible responses and selects one to send to Alexa’s voice synthesizer.
Alquist, the team from the Czech Technical University in Prague and the runner-up in the 2018 challenge, used a hybrid code network (HCN) for its dialogue manager. An HCN combines a neural network with handwritten code that reflects the developers’ understanding of the problem space. HCNs can dramatically reduce the amount of training data required to achieve a given level of performance, by sparing the network from having to learn how to perform tasks that are easily coded.
In Alquist’s case, the added code has two main functions: it filters out suggested responses that violate a set of handwritten rules about what types of responses should follow what types of utterances, and it inserts context-specific data into responses selected by the neural net. So, for instance, the neural network might output the response “That movie was directed by {say_director}”, where {say director} is an instruction to a complementary program that has separately processed data from the NLU modules.
Customer experience and personalization
All of the teams had to address the question of when to switch conversation topics and how to select new topics, but Iris, the team from Emory University, built a machine learning model that predicted appealing topics on the basis of conversational history — what topics a customer had previously accepted and rejected and what types of interactions he or she had previously engaged in. Iris trained their model on data from their socialbot’s past interactions with customers.
In tests, Iris compared their model to the simple heuristic of suggesting new topics in order of overall popularity and found that, on average, their model’s recommendations were 62% more likely to be accepted.
It was a pleasure to work with the student teams who competed in the 2018 Alexa Prize and a privilege to witness their innovative approaches to a fundamental problem in artificial-intelligence research. We can’t wait to see what the next group of teams will come up with!