The range of AWS's speech research is on display at Interspeech

Katrin Kirchhoff, director of speech processing for Amazon Web Services, on the many scientific challenges her teams are tackling.

Katrin Kirchhoff is the director of speech processing for Amazon Web Services, and her organization has a trio of papers at this year’s Interspeech conference, which begins next week.

Katrin Kirchhoff.large.png
Katrin Kirchhoff, director of speech processing for Amazon Web Services.

One paper is on novel evaluation metrics for speaker diarization,” Kirchhoff says. “Speaker diarization is the task of determining who speaks when, and errors in that domain can be due to vocal characteristics of speakers, but they can also be due to conversational patterns. So, for instance, speaker diarization is harder when you have a lot of short turns from speakers, very frequent speaker changes, and usually our metrics don't really disentangle those different causes. So this is a new paper that proposes new ways of looking at this and proposes to measure the contributions in different ways.

Another paper is on adversarial learning for accented speech, and the third is on incorporating more contextual information into ASR [automatic speech recognition] for dialogue systems. So in the case where you have an ASR system as a front end for a dialogue system, it's really important to actually model things like dialogue state and the longer conversational history to improve ASR performance. That's the theme of the third paper.”

Speech at AWS

The diversity of those papers’ topics is a good indicator of the breadth of speech research at Amazon Web Services (AWS).

“My teams work on a wide range of science topics relevant to cloud-based spoken language processing, starting with robustness to different audio conditions like noise and reverberation, all the way to different machine learning techniques,” Kirchhoff says. “We look into unsupervised, semi-supervised, and self-supervised learning.”

“That's actually a really broad trend these days, and also a trend that I see everywhere at Interspeech this year. Our machine learning models are very data-hungry, and labeled data is difficult to produce for speech. For a lot of tasks and a lot of languages, we simply don't have those kinds of data resources.

Amazon at Interspeech

Read more about Amazon's involvement at Interspeech — papers, organizing-committee membership, workshops and special sessions, and more.

“So everybody's training self-supervised representations these days, which means that we use proxy tasks to make models learn something about the input signal without having explicit ground truth labels — by, say, predicting certain frequency bands from others, or by masking time slices and then trying to predict the content from the surrounding signal, or teaching the model which speech segments are from the same signal as opposed to different signals. 

“The question is, is there a single representation that's universally best for various downstream processing tasks? That is, can you use the same representation as a starting point for tasks like ASR, speaker recognition, and language identification? And then taking that one step further, can we actually use that, not only for speech, but for audio processing more generally? So at AWS, we're starting to look into that. 

“Other areas of interest for us are fields like continual learning or few-shot learning, which means, again, ‘How can you learn models without a lot of labeled data?’ But rather than going the completely unsupervised way, we look at what you can do with just a very small number of samples from a given class or from a given task.

“ASR systems often need to process speech collected in vastly different scenarios and domains, which can include proper names or particular phrases, stylistic patterns, et cetera, that are rare overall but frequent in a particular application. You need to figure how to prime your system to recognize them accurately, and how to do that with just a handful of observed samples.”

Non-autoregressive processing

Some of the research in Kirchhoff’s organization involves real-time processing of short audio snippets, but several AWS products — such as Amazon TranscribeAmazon Transcribe Medical, and Contact Lens — require transcription of longer audio files, such as movies, lectures, and dictations. In this context, the ASR model has the entire speech signal available to it before it begins transcribing. 

This has fueled Kirchhoff’s interest in the topic of non-autoregressive processing. In fact, together with colleagues at Yahoo and Carnegie Mellon University, Kirchhoff is co-organizing a special session at Interspeech titled Non-Autoregressive Sequential Modeling for Speech Processing.

Non-autoregressive processing means that all decoding steps are conducted in parallel. The question is, how do you get the same performance when you're not conditioning each step on all of the previous steps?
Katrin Kirchhoff

“Traditionally, you have a decoder in an ASR system that combines different knowledge sources and then generates an output hypothesis in a step-by-step fashion, where each step is conditioned on the previous time step,” Kirchhoff explains. “You essentially run over the speech signal in one direction, left to right, and each processing step is conditioned on the previous one. 

“Non-autoregressive processing means that all decoding steps are conducted in parallel. So all steps happen simultaneously, and each step can be conditioned on a context in both directions. This challenges the intuitive notion that speech is generated sequentially in time and that, therefore, decoding should work in the same way. But it also means that the decoding process can be very heavily parallelized, and it can be much more efficient and much faster than traditional decoding approaches. And since it's heavily parallelizable, it can also benefit much more from developments in deep-learning hardware.

“The question is, how do you get the same performance when you're not conditioning each step on all of the previous steps? Because there's clearly information flow that needs to happen across these different time steps. How do you still model that interaction?”

Some of the papers at the special Interspeech session will address that question, but Kirchhoff’s group provided one provisional answer to it in June, at the annual meeting of the North American branch of the Association for Computational Linguistics (NAACL), in a paper titled “Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment”.

“That is applying non-autoregressive decoding to speech recognition,” Kirchhoff says. “We call our approach ‘align-refine’. We essentially iterate the process: each iteration takes the decoding hypothesis from the previous iteration and tries to improve and refine it, rather than doing it in a single step. Since all decoding steps happen in parallel for each iteration, there’s still a vast gain in efficiency.”

“What I really liked about the special session is that we had submissions both from ASR and from other areas of speech processing, like TTS [text-to-speech],” Kirchhoff adds. “It's very interesting that you can generalize approaches across different fields, because traditionally they've been quite separate — non-autoregressive decoding originated in machine translation. So there’s increasingly a convergence between natural-language processing, ASR, and TTS. There's a lot of commonality in the approaches that we use.”

About the Author
Larry Hardesty is the editor of the Amazon Science blog. Previously, he was a senior editor at MIT Technology Review and the computer science writer at the MIT News Office.

Related content

US, WA, Seattle
Job summaryAmazon brings buyers and sellers together. Our retail customers depend on us to give them access to every product at the best possible price. Our sellers depend on us to give them a platform to launch their business into every home and marketplace. Making this happen is the mission of every engineer in Amazon's North America Consumer (NAC) organization.To this end, the Science team is tasked with:· Organizing available data sources, and creating detailed dictionaries of data that can be used in future analyses.· Partnering with product teams in evaluating the financial and operational impact of new product offerings.· Conducting research into optimization and machine learning algorithms which can be applied to solve business problems.· Partnering with other scientists in evaluating algorithms and suggestions from a business view point.· Carrying out independent data-backed initiatives that can be leveraged later on in the fields of network organization, costing and financial modeling of processes.In order to execute the above mandate we are on the look out for smart and qualified Data Scientists who will own projects in partnership with product and research teams as well as operate autonomously on independent initiatives that are expected to unlock benefits in the future. A past background in Statistics is necessary, along with advanced proficiency in languages such as Python and R.Key job responsibilitiesAs a Data Scientist, you are able to use a range of advanced analytical methodologies to solve challenging business problems when the solution is unclear. You have a combination of business acumen, broad knowledge of statistics, deep understanding of ML algorithms, and an analytical mindset. You thrive in a collaborative environment, and are passionate about learning. Our team utilizes a variety of AWS tools such as Redshift, Sagemaker, Lambda, S3, and EC2 with a variety of skillsets in Linear and Discrete Optimization, ML, NLP, Forecasting, Probabilistic ML and Causal ML. You will bring knowledge in many of these domains along with your own specialties and skillsets.
US, CA, Pasadena
Job summaryThe Amazon Web Services (AWS) Center for Quantum Computing in Pasadena, CA, is hiring a Quantum Research Scientist to join a multi-disciplinary, fast-paced team of theoretical and experimental physicists, materials scientists, and hardware and software engineers pushing the forefront of quantum computing. The candidate should demonstrate a thorough knowledge of experimental measurement techniques as well as quantum mechanics theory.Inclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.Key job responsibilities* Contribute to fast-paced and agile research to help close the many orders of magnitude gap in gate error rates required for fault tolerant quantum computation* Design and perform experiments to characterize quantum devices in close collaboration with software and engineering teams* Develop models to understand and improve device performance* Effectively document results and communicate to a broad audience* Create robust software for implementation, automation, and analysis of measurements* Specify technical requirements in a cross-team collaboration using analytical arguments derived from physics theoryA day in the life* Analyze experimental data* Develop software to test and run new experiments on existing devices; collaborate with software engineers to achieve high code standard* Debug test setups to achieve high-quality data* Present results and cross-collaborate with others’ work* Perform code review for a colleague’s merge request
US, CA, Pasadena
Job summaryThe Amazon Web Services (AWS) Center for Quantum Computing in Pasadena, CA, is looking to hire a Quantum Research Scientist in the Test and Measurement group. You will join a multi-disciplinary team of theoretical and experimental physicists, materials scientists, and hardware and software engineers working at the forefront of quantum computing. You should have a deep and broad knowledge of experimental measurement techniques.Candidates with a track record of original scientific contributions will be preferred. We are looking for candidates with strong engineering principles, resourcefulness and a bias for action, superior problem solving, and excellent communication skills. Working effectively within a team environment is essential. As a research scientist you will be expected to work on new ideas and stay abreast of the field of experimental quantum computation.Inclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.Key job responsibilitiesIn this role, you will drive improvements in qubit performance by characterizing the impact of environmental and material noise on qubit dynamics. This will require designing experiments to assess the role of specific noise sources, ensuring the collection of statistically significant data, analyzing the results, and preparing clear summaries for the team. Finally, you will work with hardware engineers, material scientists, and circuit designers to implement changes which mitigate the impact of the most significant noise sources.
US, MA, Cambridge
Job summaryThe Alexa Artificial Intelligence (AI) team is looking for a passionate, talented, and inventive Applied Scientist with a strong machine learning background, to help build industry-leading Speech and Language technology.Key job responsibilitiesAs an Applied Scientist with the Alexa AI team, you will work with talented peers to develop novel algorithms and modeling techniques to advance the state of the art in spoken language understanding. Your work will directly impact our customers in the form of products and services that make use of speech and language technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in spoken language understanding.About the teamThe Alexa AI team has a mission to push the envelope in Natural Language Understanding (NLU). Specifically, we focus on incremental learning, continual learning and fairness, in order to provide the best-possible experience for our customers.
US, WA, Seattle
Job summaryThe Alexa Artificial Intelligence (AI) team is looking for a passionate, talented, and inventive Applied Scientist with a strong machine learning background to help build industry-leading Speech and Language technology. Our mission is to push the envelope in Natural Language Understanding (NLU), Audio Signal Processing, text-to-speech (TTS), and Dialog Management, in order to provide the best-possible experience for our customers.Key job responsibilitiesAs an Applied Scientist, you will work with talented peers to develop novel algorithms and modeling techniques to advance the state of the art in spoken language understanding. Your work will directly impact our customers in the form of products and services that make use of speech and language technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in spoken language understanding.
US, MA, Cambridge
Job summaryWant to transform the way people enjoy music, video, and radio? Come join the team that made Amazon Music, Spotify, Hulu, Netflix, Pandora, available to Alexa customers. We are innovating the way our customers interact with entertainment in the living room, on the go, and in the car. We are at the epicenter of the future of entertainment.Alexa Entertainment is looking for an Applied Scientist as we build a team of talented and passionate scientists for ASR (automatic speech recognition) and NLU (natural language understanding). As a Research Scientist, you will participate in the design, development, and evaluation of models and ML (machine learning) technology so that customers have the magical experience of entertainment via Alexa. You will help lay the foundation to move from directed interactions to learned behaviors that enable Alexa to proactively take action on behalf of the customer. And, you will have the satisfaction of working on a product your friends and family can relate to, and want to use every day. Like the world of smart phones less than 10 years ago, this is a rare opportunity to have a giant impact on the way people live.You will be part of a team delivering features that are highly anticipated by media and well received by our customers.
US, VA, Arlington
Job summaryThe People eXperience and Technology Central Science Team (PXTCS) uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal.We are looking for a research scientist with expertise in applying causal inference, experimental design, or causal machine learning techniques to topics in labor, personnel, education, health, public, or behavioral science. We are particularly interested in candidates with experience applying these skills to strategic problems with significant business and/or social policy impact.Candidates will work with economists, scientists and engineers to estimate and validate their models on large scale data, and will help business partners turn the results of their analysis into policies, programs, and actions that have a major impact on Amazon’s business and its workforce. We are looking for creative thinkers who can combine a strong scientific toolbox with a desire to learn from others, and who know how to execute and deliver on big ideas.You will conduct, direct, and coordinate all phases of research projects, including defining key research questions, developing models, designing and implementing appropriate data collection methods, executing analysis plans, and communicating results. You will earn trust from our business partners by collaborating with them to define key research questions, communicate scientific approaches and findings, listen to and incorporate their feedback, and deliver successful solutions.
US, WA, Seattle
Job summaryWant to work on one of Amazon’s most ambitious efforts? Time and Attendance (TAA) is leading the charge to build products that support our global workforce of passionate Amazonians!At Amazon we take seriously our commitment to pay employees accurately and on-time. While each line of business is responsible for knowing and driving down pay defects for their own employees, the centralized Perfect Pay team manages data stores and analytics, program oversight, cross-org technical and non-technical projects, and drives accountability across leaders.TAA is looking for a strong Data Scientist, Machine Learning for the Perfect Pay program to drive and own design and development of Machine Learning products to detect anomalies and risks to prevent pay errors before they happen. You will lead the team in designing anomaly and risk detection models to identify and prevent defects for Amazonians in their HR and pay data. You will work on all aspects of the product development life cycle, with a focus on the hardest problems around building scalable machine learning models with native AWS solutions that leverage tools like SageMaker, Glue, and Redshift to grow with Amazon. You will build high quality, scalable models which create immediate and impactful value for our Amazonians worldwide, while also ensuring that our products are evolving in a sustainable long-term direction.Who are we looking for to join our team?We are looking for a Data Science, machine learning specialist to build new and innovative systems that can predict pay defects before they happen and drive operational excellence across businesses. The HR systems and tools have never been analyzed together in context. The opportunity to automate improving the Amazonian experience using ML and AI span from improving the pay experience, to building risk prevention, to automatically triggering internal HR systems to correct anomalies. Getting the opportunity to cross-functionally explore data sets which support 1.4 million Amazonians for the first time is a unique opportunity. The ideal candidate will be experienced in innovating in domains without current ML/AI products. Domain experience in time and attendance and payroll, or Amazon operations field experience is useful but not required.Key job responsibilitiesMain responsibilities• Use statistical and machine learning techniques to create scalable anomaly detection and risk management systems• Analyzing and understanding large amounts of Amazon’s historical HR data for specific instances of defects or broader risk trends• Design, development, and evaluation of highly innovative models for anomaly detection and risk assessment• Working closely with data engineering team to scope scalable data architecture solutions that support your ML models• Working closely with software engineering teams to drive real-time model implementations and new feature creations• Working closely with operations staff to optimize defect prevention and model implementations• Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation• Research and implement novel machine learning and statistical approaches• Working closely with HR Business Partners to understand their use-cases for anomaly and risk detection as well as to define the data needed to carry out the work
US, WA, Bellevue
Job summaryAmazon relies on the latest technology to deliver millions of packages every day to our customers – on time, at low cost, and safely. The Middle Mile Planning Research & Optimization Science team builds complex science models and solutions that work across our vendors, warehouses and carriers to optimize both time & cost of getting the packages delivered. Our models are state-of-the-art, make business decisions impacting billions of dollars a year, and improve ordering and delivery experience for millions of online shoppers. That said, this remains a fast growing business and our journey has only started. Our mission is to build the most efficient and transportation network on the planet, using our science and technology as our biggest advantage. We aim to leverage cutting edge technologies in machine learning and operations research to grow our businesses.As a Machine Learning Applied Scientist, you’ll design, model, develop and implement state-of-the-art machine learning models and solutions used by Amazon worldwide. You will need to collaborate effectively with internal stakeholders and cross-functional teams to solve problems, create operational efficiencies, and deliver successfully against high organizational standards. As part of your role you will regularly interact with software engineering teams and business leadership. The focus of this role is to research, develop, and deploy predictive models that will inform and support our business, primarily in the areas of carrier safety.Tasks/ Responsibilities:· Lead and partner with the engineering and operations teams to drive modeling and technical design for complex business problems.· Develop accurate and scalable machine learning models and methods to solve our hardest predictive problems in transportation.· Lead complex modeling analyses to aid management in making key business decisions and set new policies.
US, NJ, Newark
Job summaryGood storytelling starts with great listening. At Audible, that means each role and every project has our audience in mind. Because the same people who design, develop, and deploy our products also happen to use them. To us, that speaks volumes.ABOUT THIS ROLEAudible is searching for an exceptional data scientist to join our economics team and drive the development of models at the intersection of machine learning and econometrics at scale. The Audible economics organization works across the business to measure and maximize the value Audible delivers to customers, creators, and communities globally. In this role, there will be a focus on partnering with our content and product teams to build a groundbreaking catalog of audiobooks and spoken-word entertainment, develop innovative tools to generate value for creators, and optimize content distribution and monetization.We are looking for someone experienced in building ML models at scale for complex prediction and optimization problems, who also has a background (or burgeoning interest!) in causal inference or interpretable machine learning. In addition to working with our staff economists and data scientists, you will also collaborate closely with scientists across Audible and partner teams at Amazon on problems pertinent to subscription businesses and the production of original media content.As a Data Scientist, you will...· Work with leadership in our content and product organizations to identify key analytical problems and opportunities – your work is expected to be a key input to our future content strategy.· Develop and maintain scalable, innovative data science and machine learning models that deliver actionable insights and results.· Collaborate with other data scientists, economists, and analysts at Audible to build data-driven solutions to key business problems.