Conversational AI

New Alexa features: Interactive teaching by customers

Deep learning and reasoning enable customers to explicitly teach Alexa how to interpret their novel requests.

By Govind Thattai, Gokhan Tur, Prem Natarajan

September 24, 2020

5 min read

Today in Seattle, Dave Limp, Amazon’s senior vice president for devices, unveiled the latest lineup of products and services from his organization. During the presentation, Rohit Prasad, Amazon vice president and Alexa head scientist, described three new advances from the Alexa science team. One of those is interactive teaching by customers.

Read Alexa head scientist Rohit Prasad's overview of today's Alexa-related announcements on Amazon's Day One blog.

Last year, we launched a self-learning feature that enables Alexa to automatically correct interpretation errors, based on cues such as customers’ rephrasing of requests or interruptions of Alexa’s responses. Millions of customers today enjoy the benefit of this capability. But what if the customer says something that Alexa doesn’t know how to interpret?

To allow customers to directly help Alexa learn the correct interpretation, we have given Alexa the ability to engage in live interactive teaching sessions with a customer, learn new concepts on the fly, generalize those concepts to new contexts, and associate them with the customer’s account.

For instance, if a customer says, “Alexa, set the living room light to study mode”, Alexa might now respond, “I don't know what study mode is. Can you teach me?” Alexa extracts a definition from the customer’s answer, and when the customer later makes the same request — or a similar request — Alexa responds with the learned action.

Illustration of Alexa engaging in live interactive teaching sessions with a customer.

Unlike Alexa Routines, where customers use the Alexa app to associate actions with verbal triggers (such as turning off the lights when the customer says “good night”), interactive teaching lets Alexa engage in a conversation to ask about unknown or unresolved concepts, in order to complete tasks that would fail otherwise.

Interactive teaching allows Alexa to learn two different types of concepts. One is entity concepts: in the example above, “study mode” is a new entity that Alexa must learn. The other type is declarative concepts. With declarative concepts, Alexa learns how to interpret instructions that are disguised as declarations, such as “Alexa, it’s too dark in this room.”

Interactive teaching is a conversational-AI solution that uses the predictions of multiple deep-learning models to determine its next output during a teaching session. Those models have four chief functions:

understanding-gap detection, or automatically identifying the part of an utterance that Alexa doesn’t understand;
concept interpretation, or eliciting and extracting the definition of a concept from interactions with the customer;
dialogue management, or keeping conversations about new concepts on track; and
declarative reasoning, or evaluating the actions available to Alexa (e.g., controlling smart-home appliances) for the best matches to a declarative instruction (e.g., “It’s dark in here”).

Alexa’s natural-language-understanding models classify customer utterances by domain — broad functional areas such as music or weather — and intent — the action the customer wants performed, such as playing music.

They also identify the slots and slot-values in the utterance, or the specific entities and entity types the intent should operate upon. For instance, in the utterance “Alexa, play ‘Blinding Lights’ by the Weeknd”, “Blinding Lights” is the value of the slot Song_Name, and “the Weeknd” is the value of the slot “Artist_Name”.

When the probabilities of the top-ranked slots are low, the understanding-gap-detection model recognizes an opportunity to learn new slot concepts (such as “study mode” in the utterance “set the living room light to study mode”). The model is also trained to reject utterances such as “set the lights to, umm, never mind”.

Once the customer engages in a teaching session, the concept interpretation model elicits and extracts the interpretation of the new concept from the customer’s free-form speech.

For example, the customer could respond to the question “What do you mean by ‘study mode’?” by saying, “Well, you know, I usually study at night by setting the light to 50% brightness”.

The concept interpretation model would extract the phrase “50% brightness” from that utterance and store it as the definition of “study mode”.

The dialogue management model checks whether a customer’s answer to a question is within the scope of the question or not. For example, when Alexa asks, “What do you mean by ‘study mode’?”, the customer might reply, “Set it to a good brightness level for reading”. The model would recognize that that answer doesn’t provide a suitable concept definition.

After every failed attempt to elicit a definition, the dialogue manager reduces the complexity of the follow-up question. For example, if the concept extraction model fails to extract a definition of “study mode” after one round of questioning, the dialogue manager might ask the more direct question “Can you provide me a value for brightness or color?”

Finally, the declarative-reasoning model combines machine learning and machine reasoning to predict actions that correspond to customers’ declarative utterances. The model also helps verify that the chosen action is semantically appropriate in the context of the declarative utterance before deciding to store it for future re-use.

After a successful teaching session, the previously learned concepts can be reused in relevant contexts. For instance, when a customer has taught Alexa that in the living room, “study mode” means setting the lights to 50%, Alexa knows to apply the same concept in the office, as well.

Similarly, if the customer has taught Alexa to respond to a declarative utterance such as “It’s dark in here” by turning on a light, Alexa knows that the subsequent utterance “I can’t see anything here” should trigger the same action.

In addition to automatically generalizing taught concepts, the teachable-AI capability will allow the customer to explicitly instruct Alexa to forget either the most recently learned or all learned concepts.

At launch, interactive teaching by customers will be available for Alexa smart-home devices, and it will expand to other features over time. This is an exciting step forward not just for Alexa but for AI services’ being explicitly taught by end users.

More coverage of Alexa announcements

About the Author

Govind Thattai

Govind Thattai is a principal applied scientist in the Alexa AI organization.

Gokhan Tur

Gokhan Tur is a senior principal scientist in the Alexa AI organization.

Prem Natarajan

Prem Natarajan is a former Alexa AI vice president.

New Alexa features: Interactive teaching by customers

Deep learning and reasoning enable customers to explicitly teach Alexa how to interpret their novel requests.

More coverage of Alexa announcements

Related content

Work with us