Alexa’s spoken-language-understanding research at Interspeech 2022

Methods for learning from noisy data, using phonetic embeddings to improve entity resolution, and quantization-aware training are a few of the highlights.

Interspeech, the world’s largest and most comprehensive conference on the science and technology of spoken-language processing, takes place this week in Incheon, Korea, with Amazon as a platinum sponsor. Amazon Science asked three of Alexa AI’s leading scientists — in the fields of speech, spoken-language-understanding, and text-to-speech — to highlight some of Amazon’s contributions to the conference.

Related content
Research from Alexa Speech covers a range of topics related to end-to-end neural speech recognition and fairness.

In this installment, senior principal scientist Gokhan Tur selects a few representative papers covering a wide range of topics in spoken-language understanding.

"Learning under label noise for robust spoken language understanding systems"

While deep-learning-based approaches have shown superior results for benchmark evaluation tasks, their performance degrades significantly when the training data is noisy. This is typically due to memorization, in which the model simply learns one-to-one correspondences between specific inputs and specific classifications, and the problem is especially acute for overparameterized models, which are already prone to overfitting. In this paper, the Alexa researchers perform a systematic study introducing various levels of controlled noise to the training data and explore five different label noise mitigation strategies for the task of intent classification:

  • Noise layer learns the noise distribution, adding a final layer to the model.
  • Robust loss uses both active loss (maximizing the probability of being in the labeled class) and passive loss (minimizing the probabilities of being in other classes).
  • LIMIT augments the objective function with the mutual information between model weights and the labels conditioned on data instances, to reduce memorization.
  • Label smoothing regularizes the model by replacing the hard 0 and 1 classification targets with smoothed values.
  • Early stopping aims to prevent overfitting by stopping when the validation error starts to increase.
Mitigation accuracies.png
The accuracy of various mitigation methods on public datasets. Top accuracy scores in bold.

The results table shows the effectiveness of these methods for the well-known language-understanding datasets ATIS, SNIPS, and TOP, for different noise levels. First, the researchers have shown that for each of the datasets, the accuracy of the baseline model (DistillBERT) has degraded more than 30%, with 50% noise level. The paper reports that all mitigation methods are effective in alleviating this degradation. The LIMIT approach performs best and is able to recover more than 80% of the dropped accuracy with 50% noise level and more than 96% with 20% noise level.

“Phonetic embedding for ASR robustness in entity resolution”

In Alexa, entity resolution (ER) is the task of retrieving the index of an entity given various ways of describing it in natural language. Phonetic variations are one big category of errors, such as “chip and potato” being recognized as “shipping potato”. While lexical and phonetic search methods are a straightforward way to resolve such errors, they are suboptimal since they cannot tell which pairs of phrases are more likely to be confused.

Related content
New model sets new standard in accuracy while enabling 60-fold speedups.

In this paper, Alexa researchers propose to employ phonetic embeddings based on the pronunciations of such phrases, where the similarity of pronunciation is directly reflected by the embedding-vector distance. Then they employ a neural vector search mechanism using a Siamese network to improve the robustness of the ER task against automatic speech recognition (ASR) noise. The phonetic embedding is combined with the semantic embedding from a pretrained BERT model. They also experimented with using the ASR n-best hypotheses as an input during training.

Weighted-sum model.png
The architecture of the weighted-sum model.

The paper presents results using the Video and Book domains in Alexa. In the evaluation of retrieval tests, the researchers see that, compared to the lexical-search baseline, the phonetic-embedding-based approach reduces the error rate by 44% in the Video domain and by 35% in the Book domain. With the ASR n-best data augmentation, they further reduce the error rate to 50% in the Video domain.

“Squashed weight distribution for low bit quantization of deep models”

Large deep-learning models — especially Transformer-based ones — have been shown to achieve state-of-the art performance on many public benchmark tasks. But their size often makes them impractical for real-world applications with memory and latency constraints. To this end, researchers have proposed various compression methods, such as pruning weights, distillation, and quantization.

Related content
Combination of distillation and distillation-aware quantization compresses BART model to 1/16th its size.

Quantization divides a variable’s possible values into discrete intervals, and maps all values in each interval to a single, representative value. It is a straightforward process with “bit-widths” of eight bits or more, meaning that each representative value has an eight-bit (or larger) index. It’s often applied after full-precision training of a model, but to avoid a mismatch between training and testing, researchers are turning to quantization-aware training approaches, where quantization noise is injected in the forward pass.

In this paper, Alexa researchers present the lowest reported quantization bit-widths for compressed Transformer models. They show only 0.2% relative degradation on public GLUE benchmarks with three-bit quantization and 0.4% relative degradation on Alexa data with only two-bit quantization. They achieve this with a reparameterization of the weights that squashes the distribution and by introducing a regularization term to the training loss to control the mean and variance of the learned model parameters.

The main idea is optimizing the overall distribution of weights under the well-known stochastic-gradient-descent (SGD) approach to training using a novel weight transformation that causes SGD to learn approximately uniformly distributed weights instead of the typical Gaussian distribution.

“Impact of acoustic event tagging on scene classification in a multi-task learning framework”

This paper explores the use of acoustic event tagging (AET) for improving the task of acoustic scene classification (ASC). Acoustic events represent information at levels of abstraction such as “car engine”, “dog-bark”, etc., while scenes are collections of acoustic events in no particular temporal order that represent information at higher levels of abstraction, such as “street traffic” and “urban park”. Previous studies suggest that humans leverage event information for scene classification. For instance, knowledge of the event “jet-engine” helps classify a given acoustic scene as “airport” instead of “shopping mall”.

Related content
Knowledge distillation technique for shrinking neural networks yields relative performance increases of up to 122%.

In this paper, Alexa researchers propose jointly training a deep-learning model to perform both AET and ASC, using a multitask-learning approach that uses a weighted combination of the individual AET and ASC losses. They show that this method lowers the ASC error rate by more than 10% relative to the baseline model and outperforms a model pretrained with AET first and then fine-tuned on ASC.

Multitask network.png
The ASC and AET baselines, along with the multitask network presented in the Amazon researchers’ paper.

“L2-GEN: A neural phoneme paraphrasing approach to L2 speech synthesis for mispronunciation diagnosis”

For machine learning models that help users learn English as a second language (ESL), mispronunciation detection and diagnosis (MDD) is an essential task. However, it is difficult to obtain non-native (L2) speech audio with fine-grained phonetic annotations. In this paper, Alexa researchers propose a speech synthesis system for generating mispronounced speech mimicking L2 speakers.

L2-GEN.png
The architecture of the L2-GEN framework.

The core of the system is a state-of-the-art Transformer-based sequence-to-sequence machine translation model. The L1 reference phoneme sequence of a word is treated as the source text and its corresponding mispronounced L2 phoneme sequences as "paraphrased" target texts. The researchers’ experiments demonstrate the effectiveness of the L2-GEN system in improving MDD accuracy on public benchmark evaluation sets.

Research areas

Related content

US, WA, Seattle
The Amazon Devices and Services organization designs, builds and markets Kindle e-readers, Fire Tablets, Fire TV Streaming Media Players and Echo devices. The Device Economics team is looking for an Economist to join our fast paced, start-up environment to help invent the future of product economics. We solve significant business problems in the devices and retail spaces by understanding customer behavior and developing business decision-making frameworks. You will build econometric and machine learning models for causal inference and prediction, using our world class data systems, and apply economic theory to solve business problems in a fast-moving environment. This involves analyzing Amazon Devices and Services customer behavior, and measuring and predicting the lifetime value of existing and future products. We build scalable systems to ensure that our models have broad applicability and large impact. You will work with Scientists, Economists, Product Managers, and Software Developers to provide meaningful feedback about stakeholder problems to inform business solutions and increase the velocity, quality, and scope behind our recommendations. Key job responsibilities Applies expertise in causal modeling to develop econometric/machine learning models to measure the economic value of devices and the business Reviews models and results for other scientists, mentors junior scientists Generates economic insights for the Devices and Services business and work with stakeholders to run the business for effectively Describes strategic importance of vision inside and outside of team. Identifies business opportunities, defines the problem and how to solve it. Engages with scientists, business leadership outside Devices and Services to understand interplay between different business units We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Seattle, WA, USA
US, WA, Seattle
Amazon Advertising's Publisher Technologies team is looking for an experienced Applied Scientist with proven research experience in control theory, online machine learning, and/or mechanism design to drive innovative algorithms for ad-delivery at scale. Your work will directly shape pacing, yield optimization, and ad-selection for Amazon's publishers and impact experiences for hundreds of millions of users and devices. About the team Amazon Advertising operates at the intersection of eCommerce, streaming, and advertising, offering a rich array of digital advertising solutions with the goal of helping our customers find and discover anything they want to buy. We help advertisers reach customers across Amazon's owned and operated sites (publishers) across the web and on millions of devices such as Amazon.com, Prime Video, FreeVee, Kindles, Fire tablets, Fire TV, Alexa, Mobile, Twitch, and more. Within Ads, Publisher Technologies is building the next generation of ad-serving products to allow our publishers to monetize their on-demand, streaming, and static content across Amazon’s ad network in a few clicks. Publishers interact directly with our technology, through programmatic APIs to optimize billions of impression opportunities per day. About the role Publisher Technologies is looking to build out our Publisher Ad Server Science + Simulation and Experimentation team to drive innovation across ad-server delivery algorithms for budget pacing, ad-selection, and yield optimization. We seek to ensure the highest quality experiences for Amazon's customers by matching them with most relevant ads while ensuring optimal yield for publishers. As a Senior Applied Scientist, you will research, invent, and apply cutting edge designs and methodologies in control theory, online optimization, and machine learning to improve publisher yield and customer experience. You will work closely with our engineering and product team to design and implement algorithms in production. In addition, you will contribute to the end state vision of AI enhanced ad-delivery. You will be a foundational member of the team that builds a world-class, green-field ad-delivery service for Amazon's video, audio, and display advertising. To be successful in this role, you must be customer obsessed, have a deep technical background in both online algorithms and distributed systems, comfort dealing with ambiguity, an eye for detail, and a passion to identify and solve for practical considerations that occur when complex control-loops have to operate autonomously and reliably to make millisecond level decisions at scale. You are a technical leader with track record of building control theoretic and/or machine learning models in production to drive business KPIs such as budget delivery. If you are interested working on challenging and practical problems that impact hundreds of millions of users and devices and span cutting edge areas of optimization and AI while having fun on a rapidly expanding team, come join us! Key job responsibilities * Developing new statistical, causal, machine learning, and simulation techniques and develop solution prototypes to drive innovation * Developing an understanding of key business metrics / KPIs and providing clear, compelling analysis that shapes the direction of our business * Working with technical and non-technical customers to design experiments, simulations, and communicate results * Collaborating with our dedicated software team to create production implementations for large-scale data analysis * Staying up-to-date with and contributing to the state-of-the-art research and methodologies in the area of advertising algorithms * Presenting research results to our internal research community * Leading training and informational sessions on our science and capabilities * Your contributions will be seen and recognized broadly within Amazon, contributing to the Amazon research corpus and patent portfolio. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
US, WA, Seattle
The Alexa Economics team is looking for a Senior Economics Manager who is able to provide structure around complex business problems, hone those complex problems into specific, scientific questions, and test those questions to generate insights. The candidate will work with various product, analytics, science, and engineering teams to develop models and algorithms on large scale data, design pilots and measure their impact, and transform successful prototypes into data products at scale. They will lead teams of researchers to produce robust, objective research results and insights which can be communicated to a broad audience inside and outside of Alexa. Key job responsibilities Ideal candidates will work closely with business partners to develop science that solves the most important business challenges. They will work well in a team setting with individuals from diverse disciplines and backgrounds. They will serve as an ambassador for science for business teams, so that leaders are equipped with the right data and mental model to make important business decisions. Ideal candidates will own the development of scientific models and manage the data analysis, modeling, and experimentation that is necessary for estimating and validating models. They will be customer centric – clearly communicating scientific approaches and findings to business leaders, listening to and incorporate their feedback, and delivering successful scientific solutions. A day in the life - Review new technical approaches to understand Engagement and associated benefits to Alexa. - Partner with Engineering and Product teams to inject econometric insights and models into customer-facing products. - Help business teams understand the key causal inputs that drive business outcome objectives. About the team The Alexa Engagement and Economics and Team uses data, analytics, economics, statistics, and machine learning to measure, report, and track business outputs and growth. We are a team that is obsessed with understanding customer behaviors, and leveraging all aspects from customers behaviors with Alexa and Amazon to develop and deliver solutions that can drive Alexa growth and long-term business success. We use causal inference to identify business optimization and product opportunities. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA | Seattle, WA, USA
US, WA, Bellevue
We are seeking a passionate, talented, and inventive individual to join the Applied AI team and help build industry-leading technologies that customers will love. This team offers a unique opportunity to make a significant impact on the customer experience and contribute to the design, architecture, and implementation of a cutting-edge product. The mission of the Applied AI team is to enable organizations within Worldwide Amazon.com Stores to accelerate the adoption of AI technologies across various parts of our business. We are looking for an Applied Scientist to join our Applied AI team to work on LLM-based solutions. Key job responsibilities You will be responsible for developing and maintaining the systems and tools that enable us to accelerate knowledge operations and work in the intersection of Science and Engineering. You will push the boundaries of ML and Generative AI techniques to scale the inputs for hundreds of billions of dollars of annual revenue for our eCommerce business. If you have a passion for AI technologies, a drive to innovate and a desire to make a meaningful impact, we invite you to become a valued member of our team. A day in the life We are seeking an experienced Scientist who combines superb technical, research, analytical and leadership capabilities with a demonstrated ability to get the right things done quickly and effectively. This person must be comfortable working with a team of top-notch developers and collaborating with our research teams. We’re looking for someone who innovates, and loves solving hard problems. You will be expected to have an established background in building highly scalable systems and system design, excellent project management skills, great communication skills, and a motivation to achieve results in a fast-paced environment. You should be somebody who enjoys working on complex problems, is customer-centric, and feels strongly about building good software as well as making that software achieve its operational goals. About the team On our team you will push the boundaries of ML and Generative AI techniques to scale the inputs for hundreds of billions of dollars of annual revenue for our eCommerce business. If you have a passion for AI technologies, a drive to innovate and a desire to make a meaningful impact, we invite you to become a valued member of our team. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
US, WA, Bellevue
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Some knowledge of econometrics, as well as basic familiarity with Python is necessary, and experience with SQL and UNIX would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Seattle, WA, USA
US, WA, Seattle
The ASFS Team is hiring an Intern in Economics. We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Knowledge of econometrics and macroeconomics, as well as familiarity with Python, Matlab, or R is necessary. This is a full-time position at 40 hours per week, with compensation being awarded on an hourly basis. You will use internal and external data to estimate macroeconometric models to answer critical business questions, also you will have the opportunity to collaborate with economists and data scientists. Roughly 85% of interns from previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA | New York City, NY, USA | Seattle, WA, USA
US, WA, Bellevue
As an Applied Scientist on our Learning and Development team, you will play a critical role in driving the design, development, and delivery of learning programs and initiatives aimed at enhancing leadership and associate development within the organization. You will leverage your expertise in learning science, data analysis, and statistical model design to create impactful learning journey roadmap that align with organizational goals and priorities. Key job responsibilities 1) Research and Analysis: Conduct research on learning and development trends, theories, and best practices related to leadership and associate development. Analyze data to identify learning needs, performance gaps, and opportunities for improvement within the organization. Use data-driven insights to inform the design and implementation of learning interventions. 2) Program Design and Development: Collaborate with cross-functional teams to develop comprehensive learning programs focused on leadership development and associate growth. Design learning experiences using evidence-based instructional strategies, adult learning principles, and innovative technologies. Create engaging and interactive learning materials, including e-learning modules, instructor-led workshops, and multimedia resources. 3) Evaluation and Continuous Improvement: Develop evaluation frameworks to assess the effectiveness and impact of learning programs on leadership development and associate performance. Collect and analyze feedback from participants and stakeholders to identify strengths, areas for improvement, and future learning needs. Iterate on learning interventions based on evaluation results and feedback to continuously improve program outcomes. 4) Thought Leadership and Collaboration: Serve as a subject matter expert on learning science, instructional design, and leadership development within the organization. Collaborate with stakeholders across the company to align learning initiatives with strategic priorities and business objectives. Share knowledge and best practices with colleagues to foster a culture of continuous learning and development. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA | Nashville, TN, USA
US, WA, Seattle
Amazon Web Services (AWS) is building a world-class marketing organization, and we are looking for an experienced Economist to join the central data and science organization for AWS Marketing. This candidate will develop innovative solutions to measure the return on marketing investments. They will work closely with business leaders, scientists, and engineers to translate business and functional requirements into concrete deliverables, including the design, development, testing, and deployment of innovative measurement solutions. They will interact with functional leaders owning events (e.g. re:Invent, summits, webinars), paid media (paid search, paid social, display), AWS-owned channels (email, website, console) as well as lead management organization to drive the development, fine-tuning and adoption of the consistent measurement framework across these diverse initiatives. We seek candidates with an entrepreneurial spirit who want to make a big impact on AWS growth. They will develop strong working relationships and thrive in a collaborative team environment. They will have the creativity, curiosity, and strong judgment to work on high-impact, high-visibility products to improve the experience of AWS leads and customers. Key job responsibilities - Apply your expertise in causal inference and ML to develop systems to measure B2B marketing impact - Develop and execute science products from concept, prototype to production incorporating feedback from customers, scientists and business leaders - Identify new opportunities for leveraging economic insights and models in the marketing space - Write technical white papers and business-facing documents to clearly explain complex technical concepts to audiences with diverse business/scientific backgrounds We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Austin, TX, USA | New York City, NY, USA | Seattle, WA, USA
US, GA, Atlanta
Looking for your next challenge? North America Sort Centers (NASC) are experiencing growth and looking for a skilled, highly motivated Data Scientist to join the NASC Engineering Data, Product and Simulation Team. The Sort Center network is the critical Middle-Mile solution in the Amazon Transportation Services (ATS) group, linking Fulfillment Centers to the Last Mile. The experience of our customers is dependent on our ability to efficiently execute volume flow through the middle-mile network. Key job responsibilities The Senior Data Scientist will design and implement solutions to address complex business questions using simulation. In this role, you will apply advanced analysis techniques and statistical concepts to draw insights from massive datasets, and create intuitive simulations and data visualizations. You can contribute to each layer of a data solution – you work closely with process design engineers, business intelligence engineers and technical product managers to obtain relevant datasets and create simulation models, and review key results with business leaders and stakeholders. Your work exhibits a balance between scientific validity and business practicality. On this team, you will have a large impact on the entire NASC organization, with lots of opportunity to learn and grow within the NASC Engineering team. This role will be the first dedicated simulation expert, so you will have an exceptional opportunity to define and drive vision for simulation best practices on our team. To be successful in this role, you must be able to turn ambiguous business questions into clearly defined problems, develop quantifiable metrics and deliver results that meet high standards of data quality, security, and privacy. About the team NASC Engineering’s Product and Analytics Team’s sole objective is to develop tools for under the roof simulation and optimization, supporting the needs of our internal and external stakeholders (i.e Process Design Engineering, NASC Engineering, ACES, Finance, Safety and Operations). We develop data science tools to evaluate what-if design and operations scenarios for new and existing sort centers to understand their robustness, stability, scalability, and cost-effectiveness. We conceptualize new data science solutions, using optimization and machine learning platforms, to analyze new and existing process, identify and reduce non-value added steps, and increase overall performance and rate. We work by interfacing with various functional teams to test and pilot new hardware/software solutions. We are open to hiring candidates to work out of one of the following locations: Atlanta, GA, USA | Bellevue, WA, USA
US, WA, Bellevue
Amazon’s Middle Mile Planning & Optimization team is looking for an exceptional Sr. Applied Scientist to solve complex optimization problems that ensure we exceed customer delivery promise expectations and minimize overall operational cost while supporting Amazon’s rapid growth globally. We use cutting edge technologies in large-scale optimization, predictive analytics, and generative AI to optimize the flow of packages within our network to efficiently match network capacity with shipment demand. Our services already handle thousands of requests per second, make business decisions impacting billions of dollars a year, and improve the delivery experience for millions of online shoppers. That said, this remains a fast-growing business and our journey has just started. Our mission is to build the most efficient and optimal transportation solution on the planet, using our technology and engineering muscle as our biggest advantage. Key job responsibilities You will work closely with product managers, research scientists, business/operations leaders, and technical leadership to build capabilities that transform our transportation network. This includes analyzing big data, building end-to-end workflows, prototype optimization/simulation models, and launch production capabilities. You will have exposure to senior leadership as you communicate results and provide scientific guidance to the business. Your insights will be a key influencer of our product strategy and roadmap and your experimental research will inform our future investment areas. About the team You will join the Surface Research Science (SRS) team, which is the science partner of the Middle-Mile Planning & Optimization tech organization. SRS is working on a fascinating range of problems, including some of the hardest and largest optimization, simulation, and prediction problems in the industry. Examples are long-term and short-term demand forecasting, capacity planning, driver scheduling, vehicle routing, and equipment rebalancing problems. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA