Neural text-to-speech makes speech synthesizers much more versatile

A text-to-speech system, which converts written text into synthesized speech, is what allows Alexa to respond verbally to requests or commands. Through a service called Amazon Polly, text-to-speech is also a technology that Amazon Web Services offers to its customers.

Last year, both Alexa and Polly evolved toward neural-network-based text-to-speech systems, which synthesize speech from scratch, rather than the earlier unit-selection method, which strung together tiny snippets of pre-recorded sounds.

In user studies, people tend to find speech produced by neural text-to-speech (NTTS) systems more natural-sounding than speech produced by unit selection. But the real advantage of NTTS is its adaptability, something we demonstrated last year in our work on changing the speaking style (“newscaster” versus “neutral”) of an NTTS system.

At this year’s Interspeech, two new papers from the Amazon Text-to-Speech group further demonstrate the adaptability of NTTS. One is on prosody transfer, or synthesizing speech that mimics the prosody — shifts in tempo, pitch, and volume — of a recording. In essence, prosody transfer lets you choose whose voice you will hear reading back recorded content, with all the original vocal inflections preserved.

The other paper is on universal vocoding. An NTTS system outputs a series of spectrograms, snapshots of the energies in different audio frequency bands over short periods of time. But spectrograms don’t contain enough information to directly produce a natural-sounding speech signal. A vocoder is required to fill in the missing details.

A typical neural vocoder is trained on data from a single speaker. But in our paper, we report a vocoder trained on data from 74 speakers in 17 languages. In our experiments, for any given speaker, the universal vocoder outperformed speaker-specific vocoders — even when it had never seen data from that particular speaker before.

Our first paper, on prosody transfer, is titled “Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-to-Speech”. Past attempts at prosody transfer have involved neural networks that take speaker-specific spectrograms and the corresponding text as input and output spectrograms that represent a different voice. But these tend not to adapt well to input voices that they haven’t heard before.

We adopted several techniques to make our network more general, including not using raw spectrograms as input. Instead, our system uses prosodic features that are easier to normalize.

First, our system aligns the speech signal with the text at the level of phonemes, the smallest units of speech. Then, for each phoneme, the system extracts prosodic features — such as changes in pitch or volume — from the spectrograms. These features can be normalized, which makes them easy to apply to new voices.

“But Germany thinks she can manage it … ”

Original

Transferred

Synthesized

“I knew of old its little ways ... “

Original

Transferred

Synthesized

“Good old Harry … ”

Original

Transferred

Synthesized

Three different versions of the same three text excerpts. "Original" denotes the original recording of the text by a live speaker. "Transferred" denotes a synthesized voice with prosody transferred from the original recording by our system. And "Synthesized" denotes the synthesis of the same excerpt from scratch, using existing Amazon TTS technology.

This approach works well when the system has a clean transcript to work with — as when, for instance, the input recording is a reading of a known text. But we also examine the case in which a clean transcript isn’t available.

In that instance, we run the input speech through an automatic speech recognizer, like the one that Alexa uses to process customer requests. Speech recognizers begin by constructing multiple hypotheses about the sequences of phonemes that correspond to a given input signal, and they represent those hypotheses as probability distributions. Later, they use higher-level information about word sequence frequencies to decide between hypotheses.

When we don’t have reliable source text, our system takes the speech recognizer’s low-level phoneme-sequence probabilities as inputs. This allows it to learn general correlations between phonemes and prosodic features, rather than trying to force acoustic information to align with transcriptions that may be inaccurate.

In experiments, we find that the difference between the outputs of this textless prosody transfer system and a system trained using highly reliable transcripts is statistically insignificant.

Prosody_transfer_architecture.jpg._CB439112644_.jpg
The architecture of our prosody transfer system, both when speech transcripts are available (top left) and when they're not (top right). "Posteriograms" are sets of phonemic features predicted by an automatic speech recognition system.

Our second paper is titled “Towards Achieving Robust Universal Neural Vocoding”. In the past, researchers have used data from multiple speakers to train neural vocoders, but they didn’t expect their models to generalize to unfamiliar voices. Usually, the input to the model includes some indication of which speaker the voice belongs to.

We investigated whether it is possible to train a universal vocoder to attain state-of-the-art quality on voices it hasn’t previously encountered. The first step: create a diverse enough set of training data that the vocoder can generalize. Our data set comprised about 2,000 utterances each from 52 female and 22 male speakers, in 17 languages.

The next step: extensive testing of the resulting vocoder. We tested it on voices that it had heard before, voices that it hadn’t, topics that it had encountered before, topics that it hadn’t, languages that were familiar (such as English and Spanish), languages that weren’t (Ahmaric, Swahili, and Wolof), and a wide range of unusual speaking conditions, such as whispered or sung speech or speech with heavy background noise.

We compared the output of our vocoder to that of four baselines: natural speech, speaker-specific vocoders, and generalized vocoders trained on less diverse data — three- and seven-speaker data sets. Five listeners scored every output utterance of each vocoder according to the multiple stimuli with hidden reference and anchor (MUSHRA) test. Across the board, our vocoder outperformed the three digital baselines and usually came very close to the scores for natural speech.

Acknowledgments: Thomas Drugman, Srikanth Ronanki, Jonas Rohnke, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal

Research areas

Related content

US, CA, Sunnyvale
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! We are looking for a self-motivated, passionate and resourceful Applied Science Manager to bring diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. You will lead a strong science team and work closely with other science and engineering leaders, product and business partners together to build the best personalized customer experience for Prime Video. At the end of the day, you will have the reward of seeing your contributions benefit millions of Amazon.com customers worldwide. Key job responsibilities - Lead to develop AI solutions for various Prime Video recommendation and personalization systems using Deep learning, GenAI, Reinforcement Learning, recommendation system and optimization methods; - Work closely with engineers and product managers to design, implement and launch AI solutions end-to-end; - Effectively communicate technical and non-technical ideas with teammates and stakeholders; - Stay up-to-date with advancements and the latest modeling techniques in the field; - Hire and grow a science team working in this exciting video personalization domain. About the team Prime Video Recommendation Science team owns science solution to power recommendation and personalization experience on various devices. We work closely with the engineering teams to launch our solutions in production.
US, CA, San Francisco
We are seeking a Member of Technical Staff Simulation Engineer to join our AI robotics research team developing foundation models for robotics. You will rapidly develop 3D physics-based and photorealistic simulations alongside scientists to enable training large-scale machine learning models. Key job responsibilities - Develop simulations for reinforcement learning, closed-loop simulations and synthetic data generation - Implement essential robotics features, including accurate modeling of sensors, actuators, and controllers - Build real-to-sim workflows for dynamic environments and robotics tasks - Implement simulation features to minimize sim-to-real gaps through domain randomization and system identification - Create asset toolchains supporting industry-standard formats (URDF, MJCF, USD) - Collaborate closely with a team of ML researchers to enable large-scale robotics training pipelines About the team At Frontier AI & Robotics (FAR), we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through frontier foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
US, CA, San Francisco
Amazon’s Frontier AI & Robotics (FAR) team is seeking a Member of Technical Staff, Infrastructure to build and scale the foundational systems that power our robotics research and development platform. In this role, you will design and operate the distributed infrastructure that enables our researchers and engineers to train foundation models, run large-scale experiments, and deploy intelligent robotic systems at Amazon scale. Join the next revolution in robotics, where you’ll work alongside world-renowned AI pioneers to push the boundaries of what’s possible in robotic intelligence. As a Member of Technical Staff focused on Infrastructure, you’ll build the critical platform layer that accelerates every aspect of FAR’s research — from high-throughput data pipelines and experiment management systems to low-latency model serving and configuration delivery for robotic deployments. This role is deeply technical and focuses on performance, scalability, and reliability at scale. You will design systems that support volumes of training data, operate with strict latency requirements, and provide the compute and data foundation that enables breakthrough research across FAR’s robotics ecosystem. Key job responsibilities - Design and build scalable data infrastructure to support AI robotics research, including automated pipelines for data ingestion, processing, curation, and delivery - Build highly scalable experimentation and analytics infrastructure to support model evaluation, A/B testing, and feature performance monitoring across robotic systems - Design and operate low-latency configuration and model delivery systems powering progressive rollouts across FAR’s robotic platforms - Improve the performance, efficiency, and reliability of FAR’s core compute and storage infrastructure, ensuring systems remain fast and stable as research scales - Develop tooling and frameworks that accelerate research workflows, including dataset management, visualization, and quality assessment systems - Optimize query performance and data availability for experimentation and analytics workflows used by research teams - Collaborate directly with science and robotics teams to support research projects through both infrastructure development and hands-on technical contribution - Lead large technical initiatives and shape the architecture of FAR’s research platform infrastructure
US, CA, East Palo Alto
As part of the AWS Solutions organization, we have a vision to provide business applications, leveraging Amazon’s unique experience and expertise, that are used by millions of companies worldwide to manage day-to-day operations. We will accomplish this by accelerating our customers’ businesses through delivery of intuitive and differentiated technology solutions that solve enduring business challenges. We blend vision with curiosity and Amazon’s real-world experience to build opinionated, turnkey solutions. Where customers prefer to buy over build, we become their trusted partner with solutions that are no-brainers to buy and easy to use. Key job responsibilities Everyone on the team needs to be entrepreneurial, wear many hats and work in a highly collaborative environment that’s more startup than big company. We’ll need to tackle problems that span a variety of domains: computer vision, image recognition, machine learning, real-time and distributed systems. As a Sr. Applied Scientist, you will help solve a variety of technical challenges and mentor other scientists. You will be the thought leader of the team. You will tackle challenging, novel situations every day and given the size of this initiative, you’ll have the opportunity to work with multiple technical teams at Amazon in different locations. You should be comfortable with a degree of ambiguity that’s higher than most projects and relish the idea of solving problems that, frankly, haven’t been solved at scale before - anywhere. Along the way, we guarantee that you’ll learn a ton, have fun and make a positive impact on millions of people. A key focus of this role will be developing and implementing advanced visual reasoning systems that can understand complex spatial relationships and object interactions in real-time. You'll work on designing autonomous AI agents that can make intelligent decisions based on visual inputs, understand customer behavior patterns, and adapt to dynamic retail environments. This includes developing systems that can perform complex scene understanding, reason about object permanence, and predict customer intentions through visual cues. About the team Just Walk Out (JWO) is a new kind of store with no lines and no checkout—you just grab and go! Customers simply use the Amazon Go app to enter the store, take what they want from our selection of fresh, delicious meals and grocery essentials, and go! Our checkout-free shopping experience is made possible by our Just Walk Out Technology, which automatically detects when products are taken from or returned to the shelves and keeps track of them in a virtual cart. When you’re done shopping, you can just leave the store. Shortly after, we’ll charge your account and send you a receipt. Check it out at amazon.com/go. Designed and custom-built by Amazonians, our Just Walk Out Technology uses a variety of technologies including computer vision, sensor fusion, and advanced machine learning. Innovation is part of our DNA! Our goal is to be Earths’ most customer centric company and we are just getting started. We need people who want to join an ambitious program that continues to push the state of the art in computer vision, machine learning, distributed systems and hardware design.
US, NY, New York
We are seeking a Robotics/AI Motor Control Scientist to develop cutting-edge machine learning algorithms for motor control systems in robots. In this role, you will focus on creating and optimizing intelligent motor control strategies to enable robots to perform complex, whole-body tasks. Your contributions will be essential in advancing robotics by enabling fluid, reliable, and safe interactions between robots and their environments. Key job responsibilities - Develop controllers that leverage reinforcement learning, imitation learning, or other advanced AI techniques to achieve natural, robust, and adaptive motor behaviors - Collaborate with multi-disciplinary teams to integrate motor control systems with robotic hardware, ensuring alignment with real-world constraints such as actuator dynamics and energy efficiency - Use simulation and real-world testing to refine and validate control algorithms - Stay updated on advancements in robotics, AI, and control systems to apply advanced techniques to robotic motion challenges - Lead technical projects from conception through production deployment - Mentor junior scientists and engineers - Bridge research initiatives with practical engineering implementation About the team Fauna Robotics, an Amazon company, is building capable, safe, and genuinely delightful robots for everyday life. Our goal is simple: make robots people actually want to live and interact with in everyday human spaces. We believe that future won’t arrive until building for robotics becomes far more accessible. Today, too much effort is spent reinventing the fundamentals. We’re changing that by developing tightly integrated hardware and software systems that make it faster, safer, and more intuitive to create real-world robotic products. Our work spans the full stack: mechanical design, control systems, dynamic modeling, and intelligent software. The focus is not just functionality, but experience. We’re building robots that feel responsive, expressive, and genuinely useful. At Fauna, you’ll work at the frontier of this space, helping define how robots move, manipulate, and interact with people in natural environments. It’s an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build. If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you. an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build. If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you.
US, MA, N.reading
Amazon is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic dexterous manipulation, locomotion, and human-robot interaction. We are seeking a talented Applied Scientist to join our advanced robotics team, focusing on developing and applying cutting-edge simulation methodologies for advanced robotics systems. This role centers on research and development of physics-based simulation techniques, sim-to-real transfer methods, and machine learning approaches that enable rapid development, testing, and validation of robotic systems operating in complex, real-world environments. Key job responsibilities - Advance physics-based simulation fidelity for contact-rich manipulation and locomotion - Design and build high-performance simulation tools integrated into a robotics design stack - Translate research ideas into robust, verifiable data - Develop methods to quantify and reduce simulation-to-reality gaps across design, safety, and control - Architect scalable simulation solutions for rigid and deformable body dynamics - Build simulation pipelines optimized for a digital twin level of fidelity - Establish frameworks for continuous simulation improvement using real-world hardware - Collaborate with engineering, science, and safety teams on simulation requirements and validation About the team Our team is building a comprehensive robot simulation and modeling platform for advanced robotics development, combining locomotion and manipulation capabilities. We operate at the cutting edge of physics simulation, reinforcement learning, hardware-in-the-loop (HIL), and sim-to-real transfer, collaborating with world-class robotics engineers, scientists, and mechanical designers in a fast-paced, innovation-driven environment. This role uniquely combines fundamental research with real-world development. You will pursue core research questions in physics-based simulation while seeing your work translated into real robots, validated on real hardware. Working alongside Robot scientist and designers, you will help transform research ideas into scalable, quantifiable simulation capabilities that directly impact how robots are designed and built.
US, CA, Palo Alto
We are looking for a passionate Applied Scientist to help pioneer the next generation of agentic AI applications for Amazon advertisers. In this role, you will design agentic architectures, develop tools and datasets, and contribute to building systems that can reason, plan, and act autonomously across complex advertiser workflows. You will work at the forefront of applied AI, developing methods for fine-tuning, reinforcement learning, and preference optimization, while helping create evaluation frameworks that ensure safety, reliability, and trust at scale. You will work backwards from the needs of advertisers—delivering customer-facing products that directly help them create, optimize, and grow their campaigns. Beyond building models, you will advance the agent ecosystem by experimenting with and applying core primitives such as tool orchestration, multi-step reasoning, and adaptive preference-driven behavior. This role requires working independently on ambiguous technical problems, collaborating closely with scientists, engineers, and product managers to bring innovative solutions into production. Key job responsibilities - Design and build agents for our autonomous campaigns experience. - Design and implement advanced model and agent optimization techniques, including supervised fine-tuning, instruction tuning and preference optimization (e.g., DPO/IPO). - Curate datasets and tools for MCP. - Build evaluation pipelines for agent workflows, including automated benchmarks, multi-step reasoning tests, and safety guardrails. - Develop agentic architectures (e.g., CoT, ToT, ReAct) that integrate planning, tool use, and long-horizon reasoning. - Prototype and iterate on multi-agent orchestration frameworks and workflows. - Collaborate with peers across engineering and product to bring scientific innovations into production. - Stay current with the latest research in LLMs, RL, and agent-based AI, and translate findings into practical applications. About the team The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through the latest generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. The Autonomous Campaigns team within Sponsored Products and Brands is focused on guiding and supporting 1.6MM advertisers to meet their advertising needs of creating and managing ad campaigns. At this scale, the complexity of diverse advertiser goals, campaign types, and market dynamics creates both a massive technical challenge and a transformative opportunity: even small improvements in guidance systems can have outsized impact on advertiser success and Amazon’s retail ecosystem. Our vision is to build a highly personalized, context-aware campaign creation and management system that leverages LLMs together with tools such as auction simulations, ML models, and optimization algorithms. This agentic framework, will operate across both chat and non-chat experiences in the ad console, scaling to natural language queries as well as proactively delivering guidance based on deep understanding of the advertiser. To execute this vision, we collaborate closely with stakeholders across Ad Console, Sales, and Marketing to identify opportunities—from high-level product guidance down to granular keyword recommendations—and deliver them through a tailored, personalized experience. Our work is grounded in state-of-the-art agent architectures, tool integration, reasoning frameworks, and model customization approaches (including tuning, MCP, and preference optimization), ensuring our systems are both scalable and adaptive.
US, WA, Seattle
Are you interested in leading growth initiatives for one of Amazon’s most significant and fastest growing businesses? Selling Partners offer hundreds of millions of unique products and are a critical to delivering on our vision of offering the Earth’s largest selection and lowest prices. The Amazon Marketplace enables over 2 million third-party selling partners in eleven marketplaces to list their products for sale to Amazon customers across the world. Within our WW Marketplace business, International Seller Services (ISS) oversees the recruiting and development of Selling Partners for all of our international marketplaces (e.g. UK, Germany, Japan, Middle East etc.). ISS also enables global selling, helping Sellers in one country expand and sell internationally. Are you fascinated by the power of Natural Language Processing (NLP) and Large Language Models (LLM) to transform the way we interact with technology? Are you passionate about applying advanced machine learning techniques to solve complex challenges in the e-commerce space? If so, the Central Science Team of Amazon's International Seller Services has an exciting opportunity for you as an Applied Science Manager. We are seeking an experienced science leader who is adept at a variety of skills; especially in generative AI, computer vision, and large language models that will help international sellers succeed as they sell on Amazon. The right candidate will provide science leadership, establish the right direction and vision, build team mechanisms, foster the spirit of collaboration and innovation within the org, and execute against a roadmap. This leader will provide both technical direction as well as manage a sizable team of scientists. They will need to be adept at recruiting, launching AI models into production, writing vision/direction documents, and building team mechanisms that will foster innovation and execution. Additionally, while the position is based in Seattle, this leader will interact with global leaders and teams in Europe, Japan, China, Australia, and other regions. Key job responsibilities Key job responsibilities Responsibilities include: * Drive end-to-end applied science projects that have a high degree of ambiguity, scale, complexity. * Provide technical / science leadership related to NLP, computer vision and large language models. * Research new and innovative machine learning approaches. * Recruit high performing Applied Scientists to the team and provide mentorship. * Establish team mechanisms, including team building, planning, and document reviews. * Communicate complex technical concepts effectively to both technical and non-technical stakeholders, providing clear explanations and guidance on proposed solutions and their potential impact.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.