Science innovations power Alexa Conversations dialogue management

Dialogue simulator and conversations-first modeling architecture provide ability for customers to interact with Alexa in a natural and conversational manner.

Today we announced the public beta launch of Alexa Conversations dialogue management. Alexa developers can now leverage a state-of-the-art dialogue manager powered by deep learning to create complex, nonlinear experiences — conversations that go well beyond today's typical one-shot interactions, such as "Alexa, what's the weather forecast for today?" or "Alexa, set a ten-minute pasta timer".

Alexa’s natural-language-understanding models classify requests according to domain, or the particular service that should handle the intent that the customer wants executed. The models also identify the slot types of the entities named in the requests, or the roles those entities play in fulfilling the request. In the request “Play ‘Rise Up’ by Andra Day”, the domain is Music, the intent is PlayMusic, and the names “Rise Up” and “Andra Day” fill the slots SongName and ArtistName.

Also at today's Alexa Live event, Nedim Fresko, vice president of Alexa Devices and Developers, announced that Amazon scientists have begun applying deep neural networks to custom skills and are seeing increases in accuracy. Read more here.

Natural conversations don’t follow these kinds of predetermined dialogue paths and often include anaphoric references (such as referring to a previously mentioned song by saying “play it”), contextual carryover of entities, customer revisions of requests, and many other types of interactions.

Alexa Conversations enables customers to interact with Alexa in a natural and conversational manner. At the same time, it relieves developers of the effort they would typically need to expend in authoring complex dialogue management rules, which are hard to maintain and often result in brittle customer experiences. Our dialogue augmentation algorithms and deep-learning models address the challenge of designing flexible and robust conversational experiences.

Dialogue management for Alexa Conversations is powered by two major science innovations: a dialogue simulator for data augmentation that generalizes a small number of sample dialogues provided by a developer into tens of thousands of annotated dialogues, and a conversations-first modeling architecture that leverages the generated dialogues to train deep-learning-based models to support dialogues beyond just the happy paths provided by the sample dialogues.

The Alexa Conversations dialogue simulator

Building high-performing deep-learning models requires large and diverse data sets, which are costly to acquire. With Alexa Conversations, the dialogue simulator automatically generates diversity from a few developer-provided sample dialogues that cover skill functionality, and it also generates difficult or uncommon exchanges that could occur.

The inputs to the dialogue simulator include developer application programming interfaces (APIs), slots and associated catalogues for slot values (e.g. city, state), and response templates (Alexa’s responses in different situations, such as requesting a slot value from the customer). These inputs together with their input arguments and output values define the skill-specific schema of actions and slots that the dialogue manager will predict.

Alexa Conversations dialogue simulator
The Alexa Conversations dialogue simulator generates tens of thousands of annotated dialogue examples that are used to train conversational models.

The dialogue simulator uses these inputs to generate additional sample dialogues in two steps.

In the first step, the simulator generates dialogue variations that represent different paths a conversation can take, such as different sequences of slot values and divergent paths that arise when a customer changes her mind.

More specifically, we conceive a conversation as a collaborative, goal-oriented interaction between two agents, a customer and Alexa. In this setting, the customer has a goal she wants to achieve, such as booking an airplane flight, and Alexa has access to resources, such as APIs for searching flight information or booking flights, that can help the customer reach her goal.

The simulated dialogues are generated through the interaction of two agent simulators, one for the customer, the other for Alexa. From the sample dialogues provided by the developer, the simulator first samples several plausible goals that customers interacting with the skill may want to achieve.

Conditioned on a sample goal, we generate synthetic interactions between the two simulator agents. The customer agent progressively reveals its goal to the Alexa agent, while the Alexa agent gathers the customer agent’s information, confirms information, and asks follow-up questions about missing information, guiding the interaction toward goal completion.

In the second step, the simulator injects language variations into the dialogue paths. The variations include alternate expressions of the same customer intention, such as “recommend me a movie” versus “I want to watch a movie”. Some of these alternatives are provided by the sample conversations and Alexa response templates, while others are generated through paraphrasing.

The variations also include alternate slot values (such as “Andra Day” or “Alicia Keys” for the slot ArtistName), which are sampled from slot catalogues provided by the developer. Through these two steps, the simulator generates tens of thousands of annotated dialogue examples that are used for training the conversational models.

The Alexa Conversations modeling architecture

A natural conversational experience could follow any one of a wide range of nonlinear dialogue patterns. Our conversations-first modeling architecture leverages dialogue-simulator and conversational-modeling components to support dialogue patterns that include carryover of entities, anaphora, confirmation of slots and APIs, and proactively offering related functionality, as well as robust support for a customer changing her mind midway through a conversation.

We follow an end-to-end dialogue-modeling approach, where the models take into account the current customer utterance and context from the entire conversation history to predict the optimal next actions for Alexa. Those actions might include calling a developer-provided API to retrieve information and relaying that information to the customer; asking for more information from the customer; or any number of other possibilities.

The modeling architecture is built using state-of-the-art deep-learning technology and consists of three models: a named-entity-recognition (NER) model, an action prediction (AP) model, and an argument-filling (AF) model. The models are built by combining supervised training techniques on the annotated synthetic dialogues generated by the dialogue simulator and unsupervised pretraining of large Transformer-based components on text corpora.

Alexa Conversations modeling architecture
The Alexa Conversations modeling architecture uses state-of-the-art deep-learning technology and consists of three models: a named-entity-recognition model, an action prediction model, and an argument-filling model. The models are built by combining supervised training techniques on the annotated synthetic dialogues generated by the dialogue simulator and unsupervised pretraining of large Transformer-based components on text corpora.

First, the NER model identifies slots in each of the customer utterances, selecting from slots the developer defined as part of the build-time assets (date, city, etc.). For example, for the request “search for flights to Seattle tomorrow”, the NER model will identify “Seattle” as a city slot and “tomorrow” as a date slot.

The NER model is a sequence-tagging model built using a bidirectional LSTM layer on top of a Transformer-based pretrained sentence encoder. In addition to the current sentence, NER also takes dialogue context as input, which is encoded through a hierarchical LSTM architecture that captures the conversational history, including past slots and Alexa actions.

Next, the AP model predicts the optimal next action for Alexa to take, such as calling an API or responding to the customer to either elicit more information or complete a request. The action space is defined by the APIs and Alexa response templates that the developer provides during the skill-authoring process.

The AP model is a classification model that, like the NER model, uses a hierarchical LSTM architecture to encode the current utterance and past dialogue context, which ultimately passes to a feed-forward network to generate the action prediction.

Finally, the AF model fills in the argument values for the API and response templates by looking at the entire dialogue for context. Using an attention-based pointing mechanism over the dialogue context, the AF model selects compatible slots from all slot values that the NER model recognized earlier.

For example, suppose slot values “Seattle” and “tomorrow” exist in the dialogue context for city and date slots respectively, and the AP model predicted the SearchFlight API as the optimal next action. The AF model will fill in the API arguments with the appropriate values, generating a complete API call: SearchFlight (city=“Seattle”, date="tomorrow").

The AP and AF models may also predict and generate more than one action after a customer utterance. For example, they may decide to first call an API to retrieve flight information and then call an Alexa response template to communicate this information to the customer. Therefore, the AP and AF models can make sequential predictions of actions, including the decision to stop predicting more actions and wait for the next customer request.

The finer points

Consistency check logic ensures that the resulting predictions are all valid actions, consistent with developer-provided information about their APIs. For example, the system would not generate an API call with an empty input argument, if that input argument is required by the developer.

The inputs include the entire dialogue history, as well as the latest customer request, and the resulting model predictions are contextual, relevant, and not repetitive. For example, if a customer has already provided the date of a trip while searching for a flight, Alexa will not ask for the date when booking the flight. Instead, the date provided earlier will contextually carry over and pass to the appropriate API.

We leveraged large pretrained Transformer components (BERT) that encode current and past requests in the conversation. To ensure state-of-the-art model build-time and runtime latency, we performed inference architecture optimizations such as accelerating embedding computation on GPUs, implementing efficient caching, and leveraging both data- and model-level parallelism.

We are excited about the advances that enable Alexa developers to build flexible and robust conversational experiences that allow customers to have natural interactions with their devices. Developers interested in learning more about the "how" of building these conversational experiences should read our accompanying developer blog.

For more information about the technical advances behind Alexa Conversations, at right are relevant publications related to our work in dialogue systems, dialogue state tracking, and data augmentation.

Acknowledgments: The entire Alexa Conversations team for making the innovations highlighted here possible.

Research areas

Related content

US, NY, New York
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Senior Applied Scientist to work on pre-training methodologies for Generative Artificial Intelligence (GenAI) models. You will interact closely with our customers and with the academic and research communities. Key job responsibilities Join us to work as an integral part of a team that has experience with GenAI models in this space. We work on these areas: - Scaling laws - Hardware-informed efficient model architecture, low-precision training - Optimization methods, learning objectives, curriculum design - Deep learning theories on efficient hyperparameter search and self-supervised learning - Learning objectives and reinforcement learning methods - Distributed training methods and solutions - AI-assisted research About the team The AGI team has a mission to push the envelope in GenAI with Large Language Models (LLMs) and multimodal systems, in order to provide the best-possible experience for our customers.
US, WA, Seattle
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! Key job responsibilities - Develop ML models for various recommendation & search systems using deep learning, online learning, and optimization methods - Work closely with other scientists, engineers and product managers to expand the depth of our product insights with data, create a variety of experiments to determine the high impact projects to include in planning roadmaps - Stay up-to-date with advancements and the latest modeling techniques in the field - Publish your research findings in top conferences and journals A day in the life We're using advanced approaches such as foundation models to connect information about our videos and customers from a variety of information sources, acquiring and processing data sets on a scale that only a few companies in the world can match. This will enable us to recommend titles effectively, even when we don't have a large behavioral signal (to tackle the cold-start title problem). It will also allow us to find our customer's niche interests, helping them discover groups of titles that they didn't even know existed. We are looking for creative & customer obsessed machine learning scientists who can apply the latest research, state of the art algorithms and ML to build highly scalable page personalization solutions. You'll be a research leader in the space and a hands-on ML practitioner, guiding and collaborating with talented teams of engineers and scientists and senior leaders in the Prime Video organization. You will also have the opportunity to publish your research at internal and external conferences.
US, NY, New York
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! We are looking for a self-motivated, passionate and resourceful Applied Scientist to bring diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. You will spend your time as a hands-on machine learning practitioner and a research leader. You will play a key role on the team, building and guiding machine learning models from the ground up. At the end of the day, you will have the reward of seeing your contributions benefit millions of Amazon.com customers worldwide. Key job responsibilities - Develop AI solutions for various Prime Video Search systems using Deep learning, GenAI, Reinforcement Learning, and optimization methods; - Work closely with engineers and product managers to design, implement and launch AI solutions end-to-end; - Design and conduct offline and online (A/B) experiments to evaluate proposed solutions based on in-depth data analyses; - Effectively communicate technical and non-technical ideas with teammates and stakeholders; - Stay up-to-date with advancements and the latest modeling techniques in the field; - Publish your research findings in top conferences and journals. About the team Prime Video Search Science team owns science solution to power search experience on various devices, from sourcing, relevance, ranking, to name a few. We work closely with the engineering teams to launch our solutions in production.
US, CA, San Francisco
If you are interested in this position, please apply on Twitch's Career site https://www.twitch.tv/jobs/en/ About Us: Twitch is the world’s biggest live streaming service, with global communities built around gaming, entertainment, music, sports, cooking, and more. It is where thousands of communities come together for whatever, every day. We’re about community, inside and out. You’ll find coworkers who are eager to team up, collaborate, and smash (or elegantly solve) problems together. We’re on a quest to empower live communities, so if this sounds good to you, see what we’re up to on LinkedIn and X, and discover the projects we’re solving on our Blog. Be sure to explore our Interviewing Guide to learn how to ace our interview process. You can work in San Francisco, CA or Seattle, WA. Perks - Medical, Dental, Vision & Disability Insurance - 401(k) - Maternity & Parental Leave - Flexible PTO - Amazon Employee Discount
US, WA, Bellevue
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Applied Scientist with a strong deep learning background, to help build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As an Applied Scientist with the AGI team, you will work with world-class scientists and engineers to develop novel data, modeling and engineering solutions to support the responsible AI initiatives at AGI. Your work will directly impact our customers in the form of products and services that make use of audio technology. About the team While the rapid advancements in Generative AI have captivated global attention, we see these as just the starting point. Our team is dedicated to pushing the boundaries of what’s possible, leveraging Amazon’s unparalleled ML infrastructure, computing resources, and commitment to responsible AI principles. And Amazon’s leadership principle of customer obsession guides our approach, prioritizing our customers’ needs and preferences each step of the way.
US, WA, Bellevue
Are you interested in a unique opportunity to advance the accuracy and efficiency of Artificial General Intelligence (AGI) systems? If so, you're at the right place! As a Quantitative Researcher on our team, you will be working at the intersection of mathematics, computer science, and finance, you will collaborate with a diverse team of engineers in a fast-paced, intellectually challenging environment where innovative thinking is encouraged and rewarded. We operate at Amazon's large scale with the energy of a nimble start-up. If you have a learner's mindset, enjoy solving challenging problems, and value an inclusive team culture, you will thrive in this role, and we hope to hear from you. Key job responsibilities * Conduct statistical analyses on web-scale datasets to develop state-of-the-art multimodal large language models * Conceptualize and develop mathematical models, data sampling and preparation strategies to continuously improve existing algorithms * Identify and utilize data sources to drive innovation and improvements to our LLMs About the team We are passionate engineers and scientists dedicated to pushing the boundaries of innovation. We evaluate and represent the customer perspective through accurate benchmarking.
US, CA, Sunnyvale
The Artificial General Intelligence (AGI) team is looking for a highly skilled and experienced Senior Applied Scientist, to lead the development and implementation of algorithms and models for supervised fine-tuning and reinforcement learning through human feedback; with a focus across text, image, and video modalities. As a Senior Applied Scientist, you will play a critical role in driving the development of Generative AI (Gen AI) technologies that can handle Amazon-scale use cases and have a significant impact on our customers' experiences. Key job responsibilities - Collaborate with cross-functional teams of engineers, product managers, and scientists to identify and solve complex problems in GenAI - Design and execute experiments to evaluate the performance of different algorithms and models, and iterate quickly to improve results - Think big about the arc of development of GenAI over a multi-year horizon, and identify new opportunities to apply these technologies to solve real-world problems - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports - Mentor and guide junior scientists and engineers, and contribute to the overall growth and development of the team
MX, DIF, Mexico City
Do you like working on projects that are highly visible and are tied closely to Amazon’s growth? Are you seeking an environment where you can drive innovation leveraging the scalability and innovation with Amazon's AWS cloud services? The Amazon International Technology Team is hiring Applied Scientists to work in our Machine Learning team in Mexico City. The Intech team builds International extensions and new features of the Amazon.com web site for individual countries and creates systems to support Amazon operations. We have already worked in Germany, France, UK, India, China, Italy, Brazil and more. Key job responsibilities About you You want to make changes that help millions of customers. You don’t want to make something 10% better as a part of an enormous team. Rather, you want to innovate with a small community of passionate peers. You have experience in analytics, machine learning, LLMs and Agentic AI, and a desire to learn more about these subjects. You want a trusted role in strategy and product design. You put the customer first in your thinking. You have great problem solving skills. You research the latest data technologies and use them to help you innovate and keep costs low. You have great judgment and communication skills, and a history of delivering results. Your Responsibilities - Define and own complex machine learning solutions in the consumer space, including targeting, measurement, creative optimization, and multivariate testing. - Design, implement, and evolve Agentic AI systems that can autonomously perceive their environment, reason about context, and take actions across business workflows—while ensuring human-in-the-loop oversight for high-stakes decisions. - Influence the broader team's approach to integrating machine learning into business workflows. - Advise leadership, both tech and non-tech. - Support technical trade-offs between short-term needs and long-term goals.
BR, SP, Sao Paulo
Do you like working on projects that are highly visible and are tied closely to Amazon’s growth? Are you seeking an environment where you can drive innovation leveraging the scalability and innovation with Amazon's AWS cloud services? The Amazon International Technology Team is hiring Applied Scientists to work in our Machine Learning team in Mexico City. The Intech team builds International extensions and new features of the Amazon.com web site for individual countries and creates systems to support Amazon operations. We have already worked in Germany, France, UK, India, China, Italy, Brazil and more. Key job responsibilities About you You want to make changes that help millions of customers. You don’t want to make something 10% better as a part of an enormous team. Rather, you want to innovate with a small community of passionate peers. You have experience in analytics, machine learning, LLMs and Agentic AI, and a desire to learn more about these subjects. You want a trusted role in strategy and product design. You put the customer first in your thinking. You have great problem solving skills. You research the latest data technologies and use them to help you innovate and keep costs low. You have great judgment and communication skills, and a history of delivering results. Your Responsibilities - Define and own complex machine learning solutions in the consumer space, including targeting, measurement, creative optimization, and multivariate testing. - Design, implement, and evolve Agentic AI systems that can autonomously perceive their environment, reason about context, and take actions across business workflows—while ensuring human-in-the-loop oversight for high-stakes decisions. - Influence the broader team's approach to integrating machine learning into business workflows. - Advise leadership, both tech and non-tech. - Support technical trade-offs between short-term needs and long-term goals.
BR, SP, Sao Paulo
Do you like working on projects that are highly visible and are tied closely to Amazon’s growth? Are you seeking an environment where you can drive innovation leveraging the scalability and innovation with Amazon's AWS cloud services? The Amazon International Technology Team is hiring Applied Scientists to work in our Software Development Center in Sao Paulo. The Intech team builds International extensions and new features of the Amazon.com web site for individual countries and creates systems to support Amazon operations. We have already worked in Germany, France, UK, India, China, Italy, Brazil and more. Key job responsibilities About you You want to make changes that help millions of customers. You don’t want to make something 10% better as a part of an enormous team. Rather, you want to innovate with a small community of passionate peers. You have experience in analytics, machine learning and big data, and a desire to learn more about these subjects. You want a trusted role in strategy and product design. You put the customer first in your thinking. You have great problem solving skills. You research the latest data technologies and use them to help you innovate and keep costs low. You have great judgment and communication skills, and a history of delivering results. Your Responsibilities - Define and own complex machine learning solutions in the consumer space, including targeting, measurement, creative optimization, and multivariate testing. - Influence the broader team's approach to integrating machine learning into business workflows. - Advise senior leadership, both tech and non-tech. - Make technical trade-offs between short-term needs and long-term goals.