How Voice and Graphics Working Together Enhance the Alexa Experience

Last week, Amazon announced the release of both a redesigned Echo Show with a bigger screen and the Alexa Presentation Language, which enables third-party developers to build “multimodal” skills that coordinate Alexa’s natural-language-understanding systems with on-screen graphics.

Echo in use

One way that multimodal interaction can improve Alexa customers’ experiences is by helping resolve ambiguous requests. If a customer says, “Alexa, play Harry Potter”, the Echo Show screen could display separate graphics representing a Harry Potter audiobook, a movie, and a soundtrack. If the customer follows up by saying “the last one”, the system must determine whether that means the last item in the on-screen list, the last Harry Potter movie, or something else.

Alexa’s ability to handle these types of interactions derives in part from research that my colleagues and I presented earlier this year at the annual meeting of the Association for the Advancement of Artificial Intelligence. In our paper, we consider three different neural-network designs that treat query resolution as an integrated problem involving both on-screen data and natural-language understanding.

We find that they consistently outperform a natural-language-understanding network that uses hand-coded rules to factor in on-screen data. And on inputs that consist of voice only, their performance is comparable to that of a system trained exclusively on speech inputs. That means that extending the network to consider on-screen data does not degrade accuracy for voice-only inputs.

The other models we investigated are derivatives of the voice-only model, so I’ll describe it first.

All of our networks were trained to classify utterances according to two criteria, intent and slot. An intent is the action that the customer wants Alexa to perform, such as PlayAction<Movie>. Slot values designate the entities on which the intents act, such as ‘Harry Potter’->Movie.name. We have found, empirically, that training a single network to perform both classifications works better than training a separate network for each.

As inputs to the network, we use two different embeddings of each utterance. Embeddings represent words as points in a geometric space, such that strings with similar meanings (or functional roles) are clustered together. Our network learns one embedding from the data on which it is trained, so it is specifically tailored to typical Alexa commands. We also use a standard embedding, based on a much larger corpus of texts, which groups words together according to the words they co-occur with.

The embeddings pass to a bidirectional long short-term memory network. A long short-term memory (LSTM) network processes inputs in order, and its judgment about any given input reflects its judgments about the preceding inputs. LSTMs are widely used in both speech recognition and natural-language processing because they can use context to resolve ambiguities. A bidirectional LSTM (bi-LSTM) is a pair of LSTMs that process an input utterance both backward and forward.

Intent classification is based on the final outputs of the forward and backward LSTMs, since the networks’ confidence in their intent classifications should increase the more of the utterance they see. Slot classification is based on the total output of the LSTMs, since the relevant slot values can occur anywhere in the utterance.

A diagram describing the architectures of all four neural models we evaluated
A diagram describing the architectures of all four neural models we evaluated. The baseline system,which doesn’t use screen information, received only the (a) inputs. The three multimodal neuralsystems received, respectively, (a) and (b); (a), (b), and (c); and (a), (b), and (d).

The data on which we trained all our networks was annotated using the Alexa Meaning Representation Language, a formal language that captures more sophisticated relationships between the parts of an input sentence than earlier methods did. A team of Amazon researchers presented a paper describing the language earlier this year at the annual meeting of the North American chapter of the Association for Computational Linguistics.

The other four models we investigated factored in on-screen content in various ways. The first was a benchmark system that modifies the outputs of the voice-only network according to hand-coded rules.

If, for instance, a customer says, “Play Harry Potter,” the voice-only classifier, absent any other information, might estimate a 50% probability that the customer means the audiobook, a 40% probability that she means the movie, and a 10% probability that she means the soundtrack. If, however, the screen is displaying only movies, our rules would boost the probability that the customer wants the movie.

The factors by which our rules increase or decrease probabilities were determined by a “grid search” on a subset of the training data, in which an algorithm automatically swept through a range of possible modifications to find those that yielded the most accurate results.

The first of our experimental neural models takes as input both the embeddings of the customer’s utterances and a vector representing the types of data displayed on-screen, such as Onscreen_Movie or Onscreen_Book. We assume a fixed number of data types, so the input is a “one-hot” vector, with a bit for each type. If data of a particular type is currently displayed on-screen, its bit is set to 1; otherwise, its bit is set to 0.

The next neural model takes as additional input not only the type of data displayed on-screen but the specific name of each data item — so not just Onscreen_Movie but also ‘Harry Potter’ or ‘The Black Panther’. Those names, too, undergo an embedding, which the network learns to perform during training.

Our third and final neural model factors in the names of on-screen data items as well, but in a more complex way. During training, it uses convolutional filters to, essentially, identify the separate contribution that each name on the screen makes toward the accuracy of the final classification. During operation, it thus bases each of its classifications on the single most relevant name on-screen, rather than all the names at once.

So, in all, we built, trained, and evaluated five different networks: the voice-only network; the voice-only network with hand-coded rules; the voice-and-data-type network; the voice, data type, and data name network; and the voice, data type, and convolutional-filter network.

We tested each of the five networks on four different data sets: slots with and without screen information and intents with and without screen information.

We evaluated performance according to two different metrics, micro-F1 and macro-F1. Micro-F1 scores the networks’ performance separately on each intent and slot, then averages the results. Macro-F1, by contrast, pools the scores across intents and slots and then averages. Micro-F1 gives more weight to intents and slots that are underrepresented in the data, macro-F1 less.

According to micro-F1, all three multimodal neural nets outperformed both the voice-only and the rule-based system across the board. The difference was dramatic on the test sets that included screen information, as might be expected, but the neural nets even had a slight edge on voice-only test sets. On all four test sets, the voice, data type, and data name network achieved the best results.

According to macro-F1, the neural nets generally outperformed the baseline systems, although the voice, data type, and data name network lagged slightly behind the baselines on voice-only slot classification. There was more variation in the top-performing system, too, with each of the three neural nets achieving the highest score on at least one test. Again, however, the neural nets dramatically outperformed the baseline systems on test sets that included screen information.

Acknowledgments: Angeliki Metallinou, Rahul Goel

Related content

US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
LU, Luxembourg
The Decision, Science and Technology (DST) team part of the global Reliability Maintenance Engineering (RME) is looking for a Senior Operations Research Scientist interested in solving challenging optimization problems in the maintenance space. Our mission is to leverage the use of data, science, and technology to improve the efficiency of RME maintenance activities, reduce costs, increase safety and promote sustainability while creating frictionless customer experiences. As a Senior OR Scientist in DST you will be focused on leading the design and development of innovative approaches and solutions by leading technical work supporting RME’s Predictive Maintenance (PdM) and Spare Parts (SP) programs. You will connect with world leaders in your field and you will be tackling customer's natural language challenges by carrying out a systematic review of existing solutions. The appropriate choice of methods and their deployment into effective tools will be the key for the success in this role. The successful candidate will be a self-starter comfortable with ambiguity, with strong attention to detail and outstanding ability in balancing technical leadership with strong business judgment to make the right decisions about model and method choices. Key job responsibilities • Provide technical expertise to support team strategies that will take EU RME towards World Class predictive maintenance practices and processes, driving better equipment up-time and lower repair costs with optimized spare parts inventory and placement • Implement an advanced maintenance framework utilizing Machine Learning technologies to drive equipment performance leading to reduced unplanned downtime • Provide technical expertise to support the development of long-term spares management strategies that will ensure spares availability at an optimal level for local sites and reduce the cost of spares A day in the life As a Senior OR Scientist in DST you will be focused on leading the design and development of innovative approaches and solutions by leading technical work supporting RME’s Predictive Maintenance (PdM) and Spare Parts (SP) programs. You will connect with world leaders in your field and you will be tackling customer's natural language challenges by carrying out a systematic review of existing solutions. The appropriate choice of methods and their deployment into effective tools will be the key for the success in this role. About the team Our mission is to leverage the use of data, science, and technology to improve the efficiency of RME maintenance activities, reduce costs, increase safety and promote sustainability while creating frictionless customer experiences. We are open to hiring candidates to work out of one of the following locations: Luxembourg, LUX
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining cutting edge times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining cutting edge times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining cutting edge times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Boston, MA, USA | Los Angeles, CA, USA | New York, NY, USA | San Francisco, CA, USA | Seattle, WA, USA | Sunnyvale, CA, USA