Whisper to Alexa, and She’ll Whisper Back

If you’re in a room where a child has just fallen asleep, and someone else walks in, you might start speaking in a whisper, to indicate that you’re trying to keep the room quiet. The other person will probably start whispering, too.

Cozy in his crib
Image of a sleeping child, from the slide deck that Amazon senior vice president Dave Limp used when announcing whisper mode.
PeopleImages/Getty Images/iStockphoto

We would like Alexa to react to conversational cues in just such a natural, intuitive way, and toward that end, Amazon last week announced Alexa’s new whisper mode, which will let Alexa-enabled devices respond to whispered speech by whispering back. (The U.S. English version will be available in October.)

At the IEEE Workshop on Spoken Language Technology, in December, my colleagues and I will present a paper that describes the techniques we used to enable whisper mode. The ultimate implementation differs somewhat, but the basic principles are the same.

Whispered speech is predominantly unvoiced, meaning that it doesn’t involve the vibration of the vocal cords, and it has less energy in lower frequency bands than ordinary speech. Previously, researchers have sought to exploit these facts by training their classifiers, not on raw speech signals, but on “features” extracted from the signals, which are designed to capture information that could help discriminate whispers from normal speech.

In our paper, we compare the performance of two different neural nets on the whisper detection task. One is a relatively simple, feed-forward network known as a multilayer perceptron (MLP), and the second is a more sophisticated long short-term memory (LSTM) network.

The models are trained on two categories of features. One is log filter-bank energies, a fairly direct representation of the speech signal that records the signal energies in different frequency ranges. The other is a set of features specifically engineered to exploit the signal differences between whispered and normal speech.

We found that an LSTM network that doesn’t use handcrafted features performs as well as an MLP that does, indicating that LSTMs are capable of learning which signal attributes are most useful for whisper detection. In the paper, we also report an experiment in which the LSTM received the handcrafted features as well as the log filter-bank energies, and its performance improved still further.

After the paper’s acceptance, however, we found that the more data the LSTM saw, the less of an improvement the handcrafted features provided, until the difference evaporated. So the model we moved into production doesn’t use the handcrafted features at all.

There are several advantages to this approach. One is that other components of Alexa’s speech recognition system rely solely on log filter-bank energies. Using the same inputs for different components makes the system as a whole more compact, which is crucial if it is to be used offline, as we envision it will be.

Another advantage is that the handcrafted features are tailored to the data that we’ve seen so far. One of the features we used in our paper, for instance, is the ratio of the energy in the 6,875- to 8,000-hertz frequency band to that in the 310- to 620-hertz band. But it might be that, as we see more training data from more diverse populations, we find that ratios of energies in different frequency bands work better. A network that can learn features on its own is more scalable and can adapt more readily to new data.

LSTMs are widely used in speech recognition and natural-language understanding because they process inputs in sequential order, and their judgments about any given input are conditioned by what they’ve already seen.

This can pose a problem for whisper detection, however. In our system, before passing to the LSTM, the input utterance is broken into overlapping 25-millisecond segments called “frames”, which the LSTM processes in sequence. Because the LSTM’s output for a given frame reflects its outputs for the preceding frames, its confidence in its classifications tends to increase as the utterance progresses.

In a process called “end-pointing”, however, Alexa recognizes the end of an utterance by the short period of silence that follows end of speech, and that silence is part of the input to the whisper detector. When we apply the detector to live data, we typically see that its confidence increases across most of the duration of an utterance then falls off precipitously in the final 50 or so frames.

A graph of our whisper detector’s confidence in its classification (y-axis), across the duration of a single utterance (x-axis)
A graph of our whisper detector’s confidence in its classification (y-axis), across the duration of a single utterance (x-axis)

In the experiments reported in the paper, we tried to solve this problem in several different ways. One was to average the LSTM’s outputs for the entire utterance; one was to drop the last 50 frames and average what was left; and the third was to drop the last 50 frames and average only the preceding 100 frames, when the LSTM’s confidence should be at its peak.

Unexpectedly, averaging the entire signal — including the troublesome final 50 frames — yielded the best results. We suspect, however, that that’s because the samples of whispered speech that we used in our experiments were manually segmented, while the samples of normal speech were automatically segmented, using Alexa’s production end-pointer. There could be some consistent difference between manual and automatic segmentation that the system was actually exploiting to distinguish the two types of input, and dropping the final 50 frames made that difference more difficult to detect.

Nevertheless, in production, where both whispered speech and normal speech are segmented by the end-pointer, we’ve found that dropping the final 50 frames of data is crucial to maintaining performance and that averaging across a subset of the preceding frames, rather than the whole remaining signal, yields the best results.

Acknowledgments: Kellen Gillespie, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Björn Hoffmeister

Related content

US, WA, Seattle
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Knowledge of econometrics, as well as basic familiarity with Python (or R, Matlab, or equivalent) is necessary, and experience with SQL would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of previous cohorts have converted to full time scientist employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com.
US, WA, Bellevue
As an applied scientist, you will use your experience to initiate the design, development, execution and implementation of scientific research projects. Working closely with fellow research scientists and product managers, you will use your experience in modeling, statistics, and simulation to design models of new policies, simulate their performance, and evaluate their benefits and impacts to cost, reliability, and speed of our fulfillment network. Our teams are looking for experience in network and combinatorial optimization, algorithms, data structures, statistics, and/or machine learning. This position requires superior analytical thinking, and ability to apply their technical and statistical knowledge to identify opportunities for real world applications. You should be able to mine and analyze large data, and be able to use necessary programming and statistical analysis software/tools to do so. Amazon has positions available for Research Scientists in multiple locations across the US and Canada.
US, WA, Virtual Contact Center-WA
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Some knowledge of econometrics, as well as basic familiarity with Python is necessary, and experience with SQL and UNIX would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of previous cohorts have converted to full time scientist employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. About the team The Selling Partner Fees team owns the end-to-end fees experience for two million active third party sellers. We own the fee strategy, fee seller experience, fee accuracy and integrity, fee science and analytics, and we provide scalable technology to monetize all services available to third-party sellers. Within the Science team, our goal is to understand the impact of changing fees on Seller (supply) and Customers (demand) behavior (e.g. price changes, advertising strategy changes, introducing new selection etc.) as well as using this information to optimize our fee structure and maximizing our long term profitability.
US, WA, Seattle
This is a unique opportunity to build technology and science that millions of people will use every day. Are you excited about working on large scale Natural Language Processing (NLP), Machine Learning (ML), and Deep Learning (DL)? We are embarking on a multi-year journey to improve the shopping experience for customers globally. Amazon Search team creates customer-focused search solutions and technologies that makes shopping delightful and effortless for our customers. Our goal is to understand what customers are looking for in whatever language happens to be their choice at the moment and help them find what they need in Amazon's vast catalog of billions of products. As Amazon expands to new geographies, we are faced with the unique challenge of maintaining the bar on Search Quality due to the diversity in user preferences, multilingual search and data scarcity in new locales. We are looking for an applied researcher to work on improving search on Amazon using NLP, ML, and DL technology. As an Applied Scientist, you will lead our efforts in query understanding, semantic matching (e.g. is a drone the same as quadcopter?), relevance ranking (what is a "funny halloween costume"?), language identification (did the customer just switch to their mother tongue?), machine translation (猫の餌を注文する). This is a highly visible role with a huge impact on Amazon customers and business. As part of this role, you will develop high precision, high recall, and low latency solutions for search. Your solutions should work for all languages that Amazon supports and will be used in all Amazon locales world-wide. You will develop scalable science and engineering solutions that work successfully in production. You will work with leaders to develop a strategic vision and long term plans to improve search globally. We are growing our collaborative group of engineers and applied scientists by expanding into new areas. This is a position on Global Search Quality team in Seattle Washington. We are moving fast to change the way Amazon search works. Together with a multi-disciplinary team you will work on building solutions with NLP/ML/DL at its core. Along the way, you’ll learn a ton, have fun and make a positive impact on millions of people. Come and join us as we invent new ways to delight Amazon customers.
US, WA, Seattle
This is a unique opportunity to build technology and science that millions of people will use every day. Are you excited about working on large scale Natural Language Processing (NLP), Machine Learning (ML), and Deep Learning (DL)? We are embarking on a multi-year journey to improve the shopping experience for customers globally. Amazon Search team creates customer-focused search solutions and technologies that makes shopping delightful and effortless for our customers. Our goal is to understand what customers are looking for in whatever language happens to be their choice at the moment and help them find what they need in Amazon's vast catalog of billions of products. As Amazon expands to new geographies, we are faced with the unique challenge of maintaining the bar on Search Quality due to the diversity in user preferences, multilingual search and data scarcity in new locales. We are looking for an applied researcher to work on improving search on Amazon using NLP, ML, and DL technology. As an Applied Scientist, you will lead our efforts in query understanding, semantic matching (e.g. is a drone the same as quadcopter?), relevance ranking (what is a "funny halloween costume"?), language identification (did the customer just switch to their mother tongue?), machine translation (猫の餌を注文する). This is a highly visible role with a huge impact on Amazon customers and business. As part of this role, you will develop high precision, high recall, and low latency solutions for search. Your solutions should work for all languages that Amazon supports and will be used in all Amazon locales world-wide. You will develop scalable science and engineering solutions that work successfully in production. You will work with leaders to develop a strategic vision and long term plans to improve search globally. We are growing our collaborative group of engineers and applied scientists by expanding into new areas. This is a position on Global Search Quality team in Seattle Washington. We are moving fast to change the way Amazon search works. Together with a multi-disciplinary team you will work on building solutions with NLP/ML/DL at its core. Along the way, you’ll learn a ton, have fun and make a positive impact on millions of people. Come and join us as we invent new ways to delight Amazon customers.
US, WA, Seattle
The retail pricing science and research group is a team of scientists and economists who design and implement the analytics powering pricing for Amazon’s on-line retail business. The team uses world-class analytics to make sure that the prices for all of Amazon’s goods and services are aligned with Amazon’s corporate goals. We are seeking an experienced high-energy Economist to help envision, design and build the next generation of retail pricing capabilities. You will work at the intersection of economic theory, statistical inference, and machine learning to design new methods and pricing strategies to deliver game changing value to our customers. Roughly 85% of previous intern cohorts have converted to full time scientist employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. Key job responsibilities Amazon’s Pricing Science and Research team is seeking an Economist to help envision, design and build the next generation of pricing capabilities behind Amazon’s on-line retail business. As an economist on our team, you will work at the intersection of economic theory, statistical inference, and machine learning to design new methods and pricing strategies with the potential to deliver game changing value to our customers. This is an opportunity for a high-energy individual to work with our unprecedented retail data to bring cutting edge research into real world applications, and communicate the insights we produce to our leadership. This position is perfect for someone who has a deep and broad analytic background and is passionate about using mathematical modeling and statistical analysis to make a real difference. You should be familiar with modern tools for data science and business analysis. We are particularly interested in candidates with research background in applied microeconomics, econometrics, statistical inference and/or finance. A day in the life Discussions with business partners, as well as product managers and tech leaders to understand the business problem. Brainstorming with other scientists and economists to design the right model for the problem in hand. Present the results and new ideas for existing or forward looking problems to leadership. Deep dive into the data. Modeling and creating working prototypes. Analyze the results and review with partners. Partnering with other scientists for research problems. About the team The retail pricing science and research group is a team of scientists and economists who design and implement the analytics powering pricing for Amazon’s on-line retail business. The team uses world-class analytics to make sure that the prices for all of Amazon’s goods and services are aligned with Amazon’s corporate goals.
US, CA, San Francisco
The retail pricing science and research group is a team of scientists and economists who design and implement the analytics powering pricing for Amazon's on-line retail business. The team uses world-class analytics to make sure that the prices for all of Amazon's goods and services are aligned with Amazon's corporate goals. We are seeking an experienced high-energy Economist to help envision, design and build the next generation of retail pricing capabilities. You will work at the intersection of statistical inference, experimentation design, economic theory and machine learning to design new methods and pricing strategies for assessing pricing innovations. Roughly 85% of previous intern cohorts have converted to full time scientist employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. Key job responsibilities Amazon's Pricing Science and Research team is seeking an Economist to help envision, design and build the next generation of pricing capabilities behind Amazon's on-line retail business. As an economist on our team, you will will have the opportunity to work with our unprecedented retail data to bring cutting edge research into real world applications, and communicate the insights we produce to our leadership. This position is perfect for someone who has a deep and broad analytic background and is passionate about using mathematical modeling and statistical analysis to make a real difference. You should be familiar with modern tools for data science and business analysis. We are particularly interested in candidates with research background in experimentation design, applied microeconomics, econometrics, statistical inference and/or finance. A day in the life Discussions with business partners, as well as product managers and tech leaders to understand the business problem. Brainstorming with other scientists and economists to design the right model for the problem in hand. Present the results and new ideas for existing or forward looking problems to leadership. Deep dive into the data. Modeling and creating working prototypes. Analyze the results and review with partners. Partnering with other scientists for research problems. About the team The retail pricing science and research group is a team of scientists and economists who design and implement the analytics powering pricing for Amazon's on-line retail business. The team uses world-class analytics to make sure that the prices for all of Amazon's goods and services are aligned with Amazon's corporate goals.
US, WA, Bellevue
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Some knowledge of econometrics, as well as basic familiarity with Python is necessary, and experience with SQL and UNIX would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of interns from previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com.
US
The Amazon Supply Chain Optimization Technology (SCOT) organization is looking for an Intern in Economics to work on exciting and challenging problems related to Amazon's worldwide inventory planning. SCOT provides unique opportunities to both create and see the direct impact of your work on billions of dollars’ worth of inventory, in one of the world’s most advanced supply chains, and at massive scale. We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. We are looking for a PhD candidate with exposure to Program Evaluation/Causal Inference. Knowledge of econometrics and Stata/R/or Python is necessary, and experience with SQL, Hadoop, and Spark would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of previous cohorts have converted to full time scientist employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com.
US, WA, Seattle
The Selling Partner Fees team owns the end-to-end fees experience for two million active third party sellers. We own the fee strategy, fee seller experience, fee accuracy and integrity, fee science and analytics, and we provide scalable technology to monetize all services available to third-party sellers. We are looking for an Intern Economist with excellent coding skills to design and develop rigorous models to assess the causal impact of fees on third party sellers’ behavior and business performance. As a Science Intern, you will have access to large datasets with billions of transactions and will translate ambiguous fee related business problems into rigorous scientific models. You will work on real world problems which will help to inform strategic direction and have the opportunity to make an impact for both Amazon and our Selling Partners.