Alexa’s spoken-language-understanding research at Interspeech 2022

Methods for learning from noisy data, using phonetic embeddings to improve entity resolution, and quantization-aware training are a few of the highlights.

Interspeech, the world’s largest and most comprehensive conference on the science and technology of spoken-language processing, takes place this week in Incheon, Korea, with Amazon as a platinum sponsor. Amazon Science asked three of Alexa AI’s leading scientists — in the fields of speech, spoken-language-understanding, and text-to-speech — to highlight some of Amazon’s contributions to the conference.

Related content
Research from Alexa Speech covers a range of topics related to end-to-end neural speech recognition and fairness.

In this installment, senior principal scientist Gokhan Tur selects a few representative papers covering a wide range of topics in spoken-language understanding.

"Learning under label noise for robust spoken language understanding systems"

While deep-learning-based approaches have shown superior results for benchmark evaluation tasks, their performance degrades significantly when the training data is noisy. This is typically due to memorization, in which the model simply learns one-to-one correspondences between specific inputs and specific classifications, and the problem is especially acute for overparameterized models, which are already prone to overfitting. In this paper, the Alexa researchers perform a systematic study introducing various levels of controlled noise to the training data and explore five different label noise mitigation strategies for the task of intent classification:

  • Noise layer learns the noise distribution, adding a final layer to the model.
  • Robust loss uses both active loss (maximizing the probability of being in the labeled class) and passive loss (minimizing the probabilities of being in other classes).
  • LIMIT augments the objective function with the mutual information between model weights and the labels conditioned on data instances, to reduce memorization.
  • Label smoothing regularizes the model by replacing the hard 0 and 1 classification targets with smoothed values.
  • Early stopping aims to prevent overfitting by stopping when the validation error starts to increase.
Mitigation accuracies.png
The accuracy of various mitigation methods on public datasets. Top accuracy scores in bold.

The results table shows the effectiveness of these methods for the well-known language-understanding datasets ATIS, SNIPS, and TOP, for different noise levels. First, the researchers have shown that for each of the datasets, the accuracy of the baseline model (DistillBERT) has degraded more than 30%, with 50% noise level. The paper reports that all mitigation methods are effective in alleviating this degradation. The LIMIT approach performs best and is able to recover more than 80% of the dropped accuracy with 50% noise level and more than 96% with 20% noise level.

“Phonetic embedding for ASR robustness in entity resolution”

In Alexa, entity resolution (ER) is the task of retrieving the index of an entity given various ways of describing it in natural language. Phonetic variations are one big category of errors, such as “chip and potato” being recognized as “shipping potato”. While lexical and phonetic search methods are a straightforward way to resolve such errors, they are suboptimal since they cannot tell which pairs of phrases are more likely to be confused.

Related content
New model sets new standard in accuracy while enabling 60-fold speedups.

In this paper, Alexa researchers propose to employ phonetic embeddings based on the pronunciations of such phrases, where the similarity of pronunciation is directly reflected by the embedding-vector distance. Then they employ a neural vector search mechanism using a Siamese network to improve the robustness of the ER task against automatic speech recognition (ASR) noise. The phonetic embedding is combined with the semantic embedding from a pretrained BERT model. They also experimented with using the ASR n-best hypotheses as an input during training.

Weighted-sum model.png
The architecture of the weighted-sum model.

The paper presents results using the Video and Book domains in Alexa. In the evaluation of retrieval tests, the researchers see that, compared to the lexical-search baseline, the phonetic-embedding-based approach reduces the error rate by 44% in the Video domain and by 35% in the Book domain. With the ASR n-best data augmentation, they further reduce the error rate to 50% in the Video domain.

“Squashed weight distribution for low bit quantization of deep models”

Large deep-learning models — especially Transformer-based ones — have been shown to achieve state-of-the art performance on many public benchmark tasks. But their size often makes them impractical for real-world applications with memory and latency constraints. To this end, researchers have proposed various compression methods, such as pruning weights, distillation, and quantization.

Related content
Combination of distillation and distillation-aware quantization compresses BART model to 1/16th its size.

Quantization divides a variable’s possible values into discrete intervals, and maps all values in each interval to a single, representative value. It is a straightforward process with “bit-widths” of eight bits or more, meaning that each representative value has an eight-bit (or larger) index. It’s often applied after full-precision training of a model, but to avoid a mismatch between training and testing, researchers are turning to quantization-aware training approaches, where quantization noise is injected in the forward pass.

In this paper, Alexa researchers present the lowest reported quantization bit-widths for compressed Transformer models. They show only 0.2% relative degradation on public GLUE benchmarks with three-bit quantization and 0.4% relative degradation on Alexa data with only two-bit quantization. They achieve this with a reparameterization of the weights that squashes the distribution and by introducing a regularization term to the training loss to control the mean and variance of the learned model parameters.

The main idea is optimizing the overall distribution of weights under the well-known stochastic-gradient-descent (SGD) approach to training using a novel weight transformation that causes SGD to learn approximately uniformly distributed weights instead of the typical Gaussian distribution.

“Impact of acoustic event tagging on scene classification in a multi-task learning framework”

This paper explores the use of acoustic event tagging (AET) for improving the task of acoustic scene classification (ASC). Acoustic events represent information at levels of abstraction such as “car engine”, “dog-bark”, etc., while scenes are collections of acoustic events in no particular temporal order that represent information at higher levels of abstraction, such as “street traffic” and “urban park”. Previous studies suggest that humans leverage event information for scene classification. For instance, knowledge of the event “jet-engine” helps classify a given acoustic scene as “airport” instead of “shopping mall”.

Related content
Knowledge distillation technique for shrinking neural networks yields relative performance increases of up to 122%.

In this paper, Alexa researchers propose jointly training a deep-learning model to perform both AET and ASC, using a multitask-learning approach that uses a weighted combination of the individual AET and ASC losses. They show that this method lowers the ASC error rate by more than 10% relative to the baseline model and outperforms a model pretrained with AET first and then fine-tuned on ASC.

Multitask network.png
The ASC and AET baselines, along with the multitask network presented in the Amazon researchers’ paper.

“L2-GEN: A neural phoneme paraphrasing approach to L2 speech synthesis for mispronunciation diagnosis”

For machine learning models that help users learn English as a second language (ESL), mispronunciation detection and diagnosis (MDD) is an essential task. However, it is difficult to obtain non-native (L2) speech audio with fine-grained phonetic annotations. In this paper, Alexa researchers propose a speech synthesis system for generating mispronounced speech mimicking L2 speakers.

L2-GEN.png
The architecture of the L2-GEN framework.

The core of the system is a state-of-the-art Transformer-based sequence-to-sequence machine translation model. The L1 reference phoneme sequence of a word is treated as the source text and its corresponding mispronounced L2 phoneme sequences as "paraphrased" target texts. The researchers’ experiments demonstrate the effectiveness of the L2-GEN system in improving MDD accuracy on public benchmark evaluation sets.

Research areas

Related content

US, NY, New York
We are seeking a Robotics/AI Motor Control Scientist to develop cutting-edge machine learning algorithms for motor control systems in robots. In this role, you will focus on creating and optimizing intelligent motor control strategies to enable robots to perform complex, whole-body tasks. Your contributions will be essential in advancing robotics by enabling fluid, reliable, and safe interactions between robots and their environments. Key job responsibilities - Develop controllers that leverage reinforcement learning, imitation learning, or other advanced AI techniques to achieve natural, robust, and adaptive motor behaviors - Collaborate with multi-disciplinary teams to integrate motor control systems with robotic hardware, ensuring alignment with real-world constraints such as actuator dynamics and energy efficiency - Use simulation and real-world testing to refine and validate control algorithms - Stay updated on advancements in robotics, AI, and control systems to apply advanced techniques to robotic motion challenges - Lead technical projects from conception through production deployment - Mentor junior scientists and engineers - Bridge research initiatives with practical engineering implementation About the team Fauna Robotics, an Amazon company, is building capable, safe, and genuinely delightful robots for everyday life. Our goal is simple: make robots people actually want to live and interact with in everyday human spaces. We believe that future won’t arrive until building for robotics becomes far more accessible. Today, too much effort is spent reinventing the fundamentals. We’re changing that by developing tightly integrated hardware and software systems that make it faster, safer, and more intuitive to create real-world robotic products. Our work spans the full stack: mechanical design, control systems, dynamic modeling, and intelligent software. The focus is not just functionality, but experience. We’re building robots that feel responsive, expressive, and genuinely useful. At Fauna, you’ll work at the frontier of this space, helping define how robots move, manipulate, and interact with people in natural environments. It’s an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build. If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you. an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build. If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.