A quick guide to Amazon’s 40-plus papers at Interspeech 2022

Speech recognition and text-to-speech predominate, but other topics include audio watermarking, automatic dubbing, and compression.

Of Amazon’s more than 40 papers at this year’s Interspeech, automatic speech recognition and text-to-speech account for about half. But the others cover a range of topics, from acoustic watermarking and automatic dubbing to quantization and fairness.

Acoustic watermarking

Practical over-the-air perceptual acoustic watermarking
Ameya Agaskar

Audio classification

CNN-based audio event recognition for automated violence classification and rating for Prime Video content
Tarun Gupta, Mayank Sharma, Kenny Qiu, Xiang Hao, Raffay Hamid

Impact of acoustic event tagging on scene classification in a multi-task learning framework
Rahil Parikh, Harshavardhan Sundar, Ming Sun, Chao Wang, Spyros Matsoukas

Automatic dubbing

Isochrony-aware neural machine translation for automatic dubbing
Derek Tam, Surafel Melaku Lakew, Yogesh Virkar, Prashant Mathur, Marcello Federico

Prosodic alignment for off-screen automatic dubbing
Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote

Automatic speech recognition

Compute cost amortized transformer for streaming ASR
Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant Strimel

Cost amortized Transformer.png
"Compute cost amortized Transformer for streaming ASR" proposes a mechanism that toggles components of Transformer blocks on and off to use computational resources more efficiently.

Content-context factorized representations for automated speech recognition
David M. Chan, Shalini Ghosh

ConvRNN-T: Convolutional augmented recurrent neural network transducers for streaming speech recognition
Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant Strimel, Nathan Susanj, Athanasios Mouchtaris

Directed speech separation for automatic speech recognition of long-form conversational speech
Rohit Paturi, Sundararajan Srinivasan, Katrin Kirchhoff, Daniel Garcia-Romero

Domain prompts: Towards memory and compute efficient domain adaptation of ASR systems
Saket Dingliwa, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff

Incremental learning for RNN-Transducer based speech recognition models
Deepak Baby, Pasquale D'Alterio, Valentin Mendelev

Knowledge distillation via module replacing for automatic speech recognition with recurrent neural network transducer
Kaiqi Zhao, Hieu Duy Nguyen, Animesh Jain, Nathan Susanj, Athanasios Mouchtaris, Lokesh Gupta, Ming Zhao

Learning to rank with BERT-based confidence models in ASR rescoring
Ting-Wei Wu, I-FAN CHEN, Ankur Gandhe

Reducing geographic disparities in automatic speech recognition via elastic weight consolidation
Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas

RefTextLAS: Reference text biased listen, attend, and spell model for accurate reading evaluation
Phani Sankar Nidadavolu, Na Xu, Nick Jutila, Ravi Teja Gadde, Aswarth Abhilash Dara, Joseph Savold, Sapan Patel, Aaron Hoff, Veerdhawal Pande, Kevin Crews, Ankur Gandhe, Ariya Rastrow, Roland Maas

RNN-T lattice enhancement by grafting of pruned paths
Mirek Novak, Pavlos Papadopoulos

Using data augmentation and consistency regularization to improve semi-supervised speech recognition
Ashtosh Sapru

Dialogue

Contextual acoustic barge in classification for spoken dialog systems
Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff

Adversarial reweighting.png
The method presented in "Adversarial reweighting for speaker verification fairness" uses an adversarial network to identify underperforming groups in a speaker verification dataset (green) and adjusts their contribution to the training loss (bottom).

Fairness

Toward fairness in speech recognition: Discovery and mitigation of performance disparities
Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke

Keyword spotting

Latency control for keyword spotting
Christin Jose, Joe Wang, Grant Strimel, Mohammad Omar Khursheed, Yuriy Mishchenko, Brian Kulis

Language identification

A multimodal strategy for singing language identification
Wo Jae Lee, Emanuele Coviello

Multidevice processing

Challenges and opportunities in multi-device speech processing
Gregory Ciccarelli, Jarred Barber, Arun Nair, Israel Cohen, Tao Zhang

Multiparty speech

Separator-transducer-segmenter: Streaming recognition and segmentation of multi-party speech
Ilya Sklyar, Anna Piunova, Christian Osendorfer

Natural-language understanding

Phonetic embedding for ASR robustness in entity resolution
Xiaozhou Zhou, Ruying Bao, William M. Campbell

Quantization

Squashed weight distribution for low bit quantization of deep models
Nikko Ström, Haidar Khan, Wael Hamza

Sub-8-bit quantization aware training for 8-bit neural network accelerator with on device speech recognition
Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta, Nathan Susanj, Athanasios Mouchtaris, Tariq Afzal, Ariya Rastrow

Sub-8-bit quantization.png
The training behavior of the algorithm proposed in "Sub-8-bit quantization aware training for 8-bit neural network accelerator with on device speech recognition", in which weights are optimized to lower quantization loss.

Signal processing

Clock skew robust acoustic echo cancellation
Karim Helwani, Erfan Soltanmohammadi, Michael M. Goodwin, Arvindh Krishnaswamy

Real-time packet loss concealment with mixed generative and predictive model
Jean-Marc Valin, Ahmed Mustafa, Christopher Montgomery, Timothy B. Terriberry, Michael Klingbeil, Paris Smaragdis, Arvindh Krishnaswamy

Speaker identification/verification

Adversarial reweighting for speaker verification fairness
Minho Jin, Chelsea J.-T. Ju, Zeya Chen, Yi Chieh Liu, Jasha Droppo, Andreas Stolcke

Graph-based multi-view fusion and local adaptation: Mitigating within household confusability for speaker identification
Long Chen, Yixiong Meng, Venkatesh Ravichandran, Andreas Stolcke

Graph fusion and fairness.png
The method proposed in "Graph-based multi-view fusion and local adaptation" propagates labels across a graph whose nodes represent utterances and whose weighted edges quantify the similarity between utterances.

Spoken-language understanding

Learning under label noise for robust spoken language understanding systems
Anoop Kumar, Pankaj Sharma, Aravind Illa, Sriram Venkatapathy, Subhrangshu Nandi, Pritam Varma, Anurag Dwarakanath, Aram Galstyan

On joint training with interfaces for spoken language understanding
Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow

Text-to-speech

Automatic evaluation of speaker similarity
Kamil Deja, Ariadna Sanchez, Julian Roth, Marius Cotescu

CopyCat2: A single model for multi-speaker TTS and many-to-many fine-grained prosody transfer
Sri Karlapati, Penny Karanasou, Mateusz Lajszczak, Ammar Abbas, Alexis Moinet, Peter Makarov, Ray Li, Arent van Korlaar, Simon Slangen, Thomas Drugman

Creating new voices.png
Voices created through the method presented in "Creating new voices using normalizing flows" (green) are spread across the embedding space of voices from the training set (blue), confirming that the method can generate a variety of new voices.

Creating new voices using normalizing flows
Piotr Biliński, Tom Merritt, Abdelhamid Ezzerg, Kamil Pokora, Sebastian Cygert, Kayoko Yanagisawa, Roberto Barra-Chicote, Daniel Korzekwa

Cross-lingual style transfer with conditional prior VAE and style loss
Dino Ratcliffe, You Wang, Alex Mansbridge, Penny Karanasou, Alexis Moinet, Marius Cotescu

End-to-end LPCNet: A neural vocoder with fully-differentiable LPC estimation
Krishna Subramani, Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy

Expressive, variable, and controllable duration modelling in TTS
Ammar Abbas, Tom Merritt, Alexis Moinet, Sri Karlapati, Ewa Muszynska, Simon Slangen, Elia Gatti, Thomas Drugman

GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion
Magdalena Proszewska, Grzegorz Beringer, Daniel Saez Trigueros, Tom Merritt, Abdelhamid Ezzerg, Roberto Barra-Chicote

L2-GEN: A neural phoneme paraphrasing approach to L2 speech synthesis for mispronunciation diagnosis
Daniel Zhang, Ashwinkumar Ganesan, Sarah Campbell, Daniel Korzekwa

Low data? No problem: low resource, language-agnostic conversational text-to-speech via F0- conditioned data augmentation
Giulia Comini, Goeric Huybrechts, Manuel Sam Ribeiro, Adam Gabrys, Jaime Lorenzo Trueba

Mix and match: An empirical study on training corpus composition for polyglot text-to-speech (TTS)
Ziyao Zhang, Alessio Falai, Ariadna Sanchez, Orazio Angelini, Kayoko Yanagisawa

Simple and effective multi-sentence TTS with expressive and coherent prosody
Peter Makarov, Ammar Abbas, Mateusz Lajszczak, Arnaud Joly, Sri Karlapati, Alexis Moinet, Thomas Drugman, Penny Karanasou

Unify and conquer: How phonetic feature representation affects polyglot text-to-speech (TTS)
Ariadna Sanchez, Alessio Falai, Ziyao Zhang, Orazio Angelini, Kayoko Yanagisawa

Research areas

Related content

GB, London
Our team's mission is to improve Shopping experience for customers interacting with Amazon devices via voice. We work with Alexa and multiple other teams to research and develop advanced state-of-the-art speech technologies. Do you want to be part of the team developing the latest technology that impacts the customer experience of ground-breaking products? Then come join us and make history. Key job responsibilities We are looking for a passionate, talented, and inventive Senior Applied Scientist with a background in Machine Learning to help build industry-leading Speech and Language technology. As a Senior Applied Scientist at Amazon you will work with talented peers to develop novel algorithms and modelling techniques to drive the state of the art in speech synthesis. Position Responsibilities: * Participate in the design, development, evaluation, deployment and updating of data-driven models for Speech and Language applications. * Participate in research activities including the application and evaluation of Speech and Language techniques for novel applications. * Research and implement novel ML and statistical approaches to add value to the business. * Mentor junior engineers and scientists. We are open to hiring candidates to work out of one of the following locations: London, GBR
ES, M, Madrid
Amazon's International Technology org in EU (EU INTech) is creating new ways for Amazon customers discovering Amazon catalog through new and innovative Customer experiences. Our vision is to provide the most relevant content and CX for their shopping mission. We are responsible for building the software and machine learning models to surface high quality and relevant content to the Amazon customers worldwide across the site. The team, mainly located in Madrid Technical Hub, London and Luxembourg, comprises Software Developer and ML Engineers, Applied Scientists, Product Managers, Technical Product Managers and UX Designers who are experts on several areas of ranking, computer vision, recommendations systems, Search as well as CX. Are you interested on how the experiences that fuel Catalog and Search are built to scale to customers WW? Are interesting on how we use state of the art AI to generate and provide the most relevant content? Key job responsibilities We are looking for Applied Scientists who are passionate to solve highly ambiguous and challenging problems at global scale. You will be responsible for major science challenges for our team, including working with text to image and image to text state of the art models to scale to enable new Customer Experiences WW. You will design, develop, deliver and support a variety of models in collaboration with a variety of roles and partner teams around the world. You will influence scientific direction and best practices and maintain quality on team deliverables. We are open to hiring candidates to work out of one of the following locations: Madrid, M, ESP
US, WA, Seattle
Here at Amazon, we embrace our differences. We are committed to furthering our culture of diversity and inclusion of our teams within the organization. How do you get items to customers quickly, cost-effectively, and—most importantly—safely, in less than an hour? And how do you do it in a way that can scale? Our teams of hundreds of scientists, engineers, aerospace professionals, and futurists have been working hard to do just that! We are delivering to customers, and are excited for what’s to come. Check out more information about Prime Air on the About Amazon blog (https://www.aboutamazon.com/news/transportation/amazon-prime-air-delivery-drone-reveal-photos). If you are seeking an iterative environment where you can drive innovation, apply state-of-the-art technologies to solve real world delivery challenges, and provide benefits to customers, Prime Air is the place for you. Come work on the Amazon Prime Air Team! Our Prime Air Drone Vehicle Design and Test team within Flight Sciences is looking for an outstanding engineer to help us rapidly configure, design, analyze, prototype, and test innovative drone vehicles. You’ll be responsible for developing, improving, and maintaining a suite of multi-disciplinary optimization (MDO) tools across all aircraft design disciplines. You’ll use these to explore new and novel drone vehicle conceptual designs in both focused and wide open design spaces, with the ultimate goal of meeting our customer requirements. You’ll have the opportunity to prototype vehicle designs and support wind tunnel and other testing of vehicle designs. You will directly support the Office of the Chief Program Engineer, and work closely across all vehicle subsystem teams to ensure integrated designs that meet performance, reliability, operability, manufacturing, and cost requirements. In addition, you’ll own the Flight Sciences assessments and analysis methods for the drone vehicle design as it progresses through later stages of development. About the team Our Flight Sciences Vehicle Design & Test organization includes teams that span the following disciplines: Aerodynamics, Performance, Stability & Control, Configuration & Spatial Integration, Loads, Structures, Mass Properties, Multi-disciplinary Optimization (MDO), Wind Tunnel Testing, Noise Testing, Flight Test Instrumentation, and Rapid Prototyping. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
US, WA, Bellevue
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Some knowledge of econometrics, as well as basic familiarity with Python is necessary, and experience with SQL and UNIX would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
US, MA, Boston
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Knowledge of applied econometrics is necessary, and experience with SQL and Python would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will build data sets and perform applied econometric analysis, collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with future job market placement. Roughly 85% of previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. We are open to hiring candidates to work out of one of the following locations: Boston, MA, USA | Seattle, WA, USA
ES, M, Madrid
Amazon's International Technology org in EU (EU INTech) is creating new ways for Amazon customers discovering Amazon catalog through new and innovative Customer experiences. Our vision is to provide the most relevant content and CX for their shopping mission. We are responsible for building the software and machine learning models to surface high quality and relevant content to the Amazon customers worldwide across the site. The team, mainly located in Madrid Technical Hub, London and Luxembourg, comprises Software Developer and ML Engineers, Applied Scientists, Product Managers, Technical Product Managers and UX Designers who are experts on several areas of ranking, computer vision, recommendations systems, Search as well as CX. Are you interested on how the experiences that fuel Catalog and Search are built to scale to customers WW? Are interesting on how we use state of the art AI to generate and provide the most relevant content? Key job responsibilities We are looking for Applied Scientists who are passionate to solve highly ambiguous and challenging problems at global scale. You will be responsible for major science challenges for our team, including working with text to image and image to text state of the art models to scale to enable new Customer Experiences WW. You will design, develop, deliver and support a variety of models in collaboration with a variety of roles and partner teams around the world. You will influence scientific direction and best practices and maintain quality on team deliverables. We are open to hiring candidates to work out of one of the following locations: Madrid, M, ESP
US, WA, Bellevue
Amazon’s Modeling and Optimization Team (MOP) is looking for a passionate individual with strong optimization and analytical skills to join us in the endeavor of designing and planning the most complex supply chain in the world. The team is responsible for optimizing the global supply chain for Amazon.com and ensuring that the company is able to inbound goods from seller and vendors, transport them to their target fulfillment center, and deliver to our customers as quickly, accurately, and cost effectively as possible. We work on problems ranging from network design to inventory management, in order to support strategic decisions. It is a terrific opportunity to have a direct impact in the business while pushing the boundaries of science. Key job responsibilities We are seeking an experienced scientist who has solid background in Operations Research, Operations Management, Applied Mathematics or other similar domain. In this role, you will develop models and solution algorithms that are innovative and scalable to solve new challenges in the inventory management space. You will collaborate with other scientists across teams to create integrated solutions that improves fulfillment speed, cost, and carbon emission. You have deep understanding of business challenges and provide scientific analysis to support business decision using a range of methodologies. You will also work with engineering teams to identify new data requirements, deploy new models or simplifying existing processes. About the team https://www.aboutamazon.com/news/innovation-at-amazon/how-artificial-intelligence-helps-amazon-deliver We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
US, CA, Santa Clara
Do you wish to create the greatest possible worldwide impact in healthcare? We, at Amazon Health Store Tech, are working towards the best-in-class healthcare storefront to make high-quality healthcare reliable, accessible, and intuitive. Our mission is to make it dramatically easier for customers to access the healthcare products and services they need to get and stay healthy. Towards this mission, we are building the technology, products and services, that help customers find, buy, and engage with the healthcare solutions they need. We are looking to hire and develop subject-matter experts in AI who focus on data analytics, machine learning (ML), natural language understanding (NLP), and deep learning for healthcare. We target high-impact algorithmic unlocks in areas such as natural language understanding (NLU), Foundation Models, Large Language Models (LLMs), document understanding, and knowledge representation systems—all of which are of high-value to our healthcare products and services. If you are a seasoned, hands-on Principal Applied Scientist with a track record of delivering to timelines with high quality, deeply technical and innovative, we want to talk to you. You will bring AI and machine learning advancements to real-time analytics for customer-facing solutions in healthcare. You will explore, innovate, and deliver advanced ML-based technologies that involve clinical and medical data. You are a domain expert in one or more of the following areas: natural language processing and understanding (language models, transformers like BERT, GPT-3, T-5, etc.), Foundation Models and LLMs, deep learning, active learning, reinforcement learning, and bioinformatics. Key job responsibilities As an Principal Applied Scientist, you will take on challenging and ambiguous customer problems, distill customer requirements, and then deliver solutions that either leverage existing academic and medical research or utilize your own out-of-the-box but pragmatic thinking. In addition to coming up with novel solutions and prototypes, you will directly contribute to its implementation. A successful candidate has excellent technical depth, scientific vision, great implementation skills, and a drive to achieve results in a collaborative team environment. You should enjoy the process of solving real-world, open-ended problems that, quite frankly, haven’t been solved at scale anywhere before. Along the way, we guarantee you’ll get opportunities to be a fearless disruptor, prolific innovator, and a reputed problem solver—someone who truly enables machine learning and statistics to truly impact the lives and health of millions of customers. You mentor and help develop a team of Applied Scientists and SDEs and work with key leaders to guide this top talent to push the boundary of science and next generation of product. They will lead the technical implementation of our evidence-based retrieval sub-system that ingests, indexes and retrieves relevant data in different forms and from multiple sources given the customer question and context. We are open to hiring candidates to work out of one of the following locations: Santa Clara, CA, USA | Seattle, WA, USA
US, WA, Bellevue
Imagine being part of an agile team where your ideas have the potential to reach millions of customers. Picture working on cutting-edge, customer-facing solutions, where every team member is a critical voice in the decision making process. Envision being able to leverage the resources of a Fortune 500 company within the atmosphere of a start-up. Welcome to Amazon’s NCRC team. We solve complex problems in an ambiguous space, focusing on reducing return costs and improving the customer experience. We build solutions that are distributed on a large scale, positively impacting experiences for our customers and sellers. Come innovate with the NCRC team! The Net Cost of Refunds and Concessions (NCRC) team is looking for a Senior Manager Data Science to lead a team of economists, business intelligence engineers and business analysts who investigate business problems, develop insights and build models & algorithms that predict and quantify new opportunity. The team instigates and productionalizes nascent solutions around four pillars: outbound defects, inbound defects, yield optimization and returns reduction. These four pillars interact, resulting in impacts to our overall return rate, associated costs, and customer satisfaction. You may have seen some downstream impacts of our work including Amazon.com customer satisfaction badges on the website and app, new returns drop off optionality, and faster refunds for low cost items. In this role, you will set the science vision and direction for the team, collaborating with internal stakeholders across our returns and re-commerce teams to scale and advance science solutions. This role is based in Bellevue, WA Key job responsibilities * Single threaded leader responsible for setting and driving science strategy for the organization. * Lead and provide coaching to a team of Scientists, Economists, Business Intelligence Engineers and Business Analysts. * Partner with Engineering, Product and Machine Learning leaders to deliver insights and recommendations across NCRC initiatives. * Lead research and development of models and science products powering return cost reduction. * Educate and evangelize across internal teams on analytics, insights and measurement by writing whitepapers, knowledge documentation and delivering learning sessions. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
US, WA, Bellevue
We are designing the future. If you are in quest of an iterative fast-paced environment, where you can drive innovation through scientific inquiry, and provide tangible benefit to hundreds of thousands of our associates worldwide, this is your opportunity. Come work on the Amazon Worldwide Fulfillment Design & Engineering Team! We are looking for an experienced and Research Scientist with background in Ergonomics and Industrial Human Factors, someone that is excited to work on complex real-world challenges for which a comprehensive scientific approach is necessary to drive solutions. Your investigations will define human factor / ergonomic thresholds resulting in design and implementation of safe and efficient workspaces and processes for our associates. Your role will entail assessment and design of manual material handling tasks throughout the entire Amazon network. You will identify fundamental questions pertaining to the human capabilities and tolerances in a myriad of work environments, and will initiate and lead studies that will drive decision making on an extreme scale. .You will provide definitive human factors/ ergonomics input and participate in design with every single design group in our network, including Amazon Robotics, Engineering R&D, and Operations Engineering. You will work closely with our Worldwide Health and Safety organization to gain feedback on designs and work tenaciously to continuously improve our associate’s experience. Key job responsibilities - Collaborating and designing work processes and workspaces that adhere to human factors / ergonomics standards worldwide. - Producing comprehensive and assessments of workstations and processes covering biomechanical, physiological, and psychophysical demands. - Effectively communicate your design rationale to multiple engineering and operations entities. - Identifying gaps in current human factors standards and guidelines, and lead comprehensive studies to redefine “industry best practices” based on solid scientific foundations. - Continuously strive to gain in-depth knowledge of your profession, as well as branch out to learn about intersecting fields, such as robotics and mechatronics. - Travelling to our various sites to perform thorough assessments and gain in-depth operational feedback, approximately 25%-50% of the time. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA