Varying speaking styles with neural text-to-speech

When people speak, they use different speaking styles depending on context. A TV newscaster, for example, will use a very different style when conveying the day’s headlines than a parent will when reading a bedtime story. Amazon scientists have shown that our latest text-to-speech (TTS) system, which uses a generative neural network, can learn to employ a newscaster style from just a few hours of training data. This advance paves the way for Alexa and other services to adopt different speaking styles in different contexts, improving customer experiences.

To users, synthetic speech produced by neural networks sounds much more natural than speech produced through concatenative methods, which string together short speech snippets stored in an audio database. We present Amazon’s approach to neural TTS (NTTS) in a series of newly released papers:

With the increased flexibility provided by NTTS, we can easily vary the speaking style of synthesized speech. For example, by augmenting a large, existing data set of style-neutral recordings with only a few hours of newscaster-style recordings, we have created a news-domain voice. That would have been impossible with previous techniques based on concatenative synthesis.

The examples below provide a comparison of speech synthesized using concatenative synthesis, NTTS with standard neutral style, and NTTS with newscaster style.

Speech synthesized

Female voice

Concatenative
Standard neutral NTTS
NTTS newscaster

Male voice

Concatenative
Standard neutral NTTS
NTTS newscaster

How NTTS can model different speaking styles

Our neural TTS system comprises two components: (1) a neural network that converts a sequence of phonemes — the most basic units of language — into a sequence of “spectrograms,” or snapshots of the energy levels in different frequency bands; and (2) a vocoder, which converts the spectrograms into a continuous audio signal.

The first component of the system is a sequence-to-sequence model, meaning that it doesn’t compute a given output solely from the corresponding input but also considers its position in the sequence of outputs. The spectrograms that the system outputs are mel-spectrograms, meaning that their frequency bands are chosen to emphasize acoustic features that the human brain uses when processing speech.

When trained on the large data sets used to build general-purpose concatenative-synthesis systems, this sequence-to-sequence approach will yield high-quality, neutral-sounding voices. By design, those data sets lack the distinctive prosodic features required to represent particular speech styles. Although high in quality, the speech generated through this approach displays a limited variety of expression (pitch, breaks, rhythm).

On the other hand, producing a similar-sized data set by enlisting voice talent to read in the desired style is very time consuming and expensive. (Training a sequence-to-sequence model from scratch requires tens of hours of speech).

We find that we can leverage a large corpus of style-neutral data in the training of a style-specific speech synthesizer by making a simple modification to the sequence-to-sequence model. We train the model not only with phoneme sequences and the corresponding sequences of mel-spectrograms but with a “style encoding” that identifies the speaking style employed in the training example.

The sequence-to-sequence model topology used to generate mel-spectrograms.The added speaking-style encoding is highlighted in the red block.
The sequence-to-sequence model topology used to generate mel-spectrograms.The added speaking-style encoding is highlighted in the red block.

With this approach, we can train a high-quality multiple-style sequence-to-sequence model by combining the larger amount of neutral-style speech data with just a few hours of supplementary data in the desired style. This is possible even when using an extremely simple style encoding, such as a “one-hot” vector — a string of zeroes with a single one at the location corresponding to the selected style.

The key technical advantage of this multiple-style sequence-to-sequence model is its ability to separately model aspects of speech that are independent of speaking style and aspects of speech that are particular to a single speaking style. When presented with a speaking-style code during operation, the network predicts the prosodic pattern suitable for that style and applies it to a separately generated, style-agnostic representation. The high quality achieved with relatively little additional training data allows for rapid expansion of speaking styles.

The output of our model passes to a neural vocoder, a neural network trained to convert mel-spectrograms into speech waveforms. In order to be of general practical use, the vocoder must be capable of simulating speech by any speaker in any language in any speaking style. Typically, neural vocoder systems use some form of speaker encoding (either a one-hot encoding or some other vector representation, or “embedding”). Our neural vocoder takes mel-spectrograms from any speaker, regardless of whether he or she was seen during training time, and generates high-quality speech waveforms without the use of a speaker encoding.

Listener perception of style

To determine whether the NTTS newscaster style makes a difference to listeners’ experiences, we conducted a large-scale perceptual test. We asked listeners to rate speech samples on a scale from 0 to 100, according to how suitable their speaking styles were for news reading. We included an audio recording of a real newscaster as a reference.

NTTS bar chart

The results are presented in the figure below. Listeners rated neutral NTTS more highly than concatenative synthesis, reducing the score discrepancy between human and synthetic speech by 46%. They preferred NTTS newscaster style to both other systems, however, shrinking the discrepancy by a further 35%. The preference for the neutral-style NTTS reflects the widely reported increase in general speech synthesis quality due to neural generative methods. The further improvement for the NTTS newscaster voice reflects our system’s ability to capture a style relevant to the text.

Acknowledgments: Adam Nadolski, Roberto Barra-Chicote, Jaime Lorenzo Trueba, Srikanth Ronanki, Viacheslav Klimkov, Nishant Prateek, Vatsal Aggarwal, Alexis Moinet

Related content

US, WA, Bellevue
Conversational AI ModEling and Learning (CAMEL) team is part of Amazon Devices organization where our mission is to build a best-in-class Conversational AI that is intuitive, intelligent, and responsive, by developing superior Large Language Models (LLM) solutions and services which increase the capabilities built into the model and which enable utilizing thousands of APIs and external knowledge sources to provide the best experience for each request across millions of customers and endpoints. We are looking for a passionate, talented, and resourceful Senior Applied Scientist in the field of LLM, Artificial Intelligence (AI), Natural Language Processing (NLP), Recommender Systems and/or Information Retrieval, to invent and build scalable solutions for a state-of-the-art context-aware conversational AI. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also have hands-on experiences in building Generative AI solutions with LLMs, enjoy operating in dynamic environments, be self-motivated to take on challenging problems to deliver big customer impact, moving fast to ship solutions and then iterating on user feedback and interactions. Key job responsibilities As a Senior Applied Scientist, you will leverage your technical expertise and experience to demonstrate leadership in tackling large complex problems, setting the direction and collaborating with other talented applied scientists and engineers to research and develop LLM modeling and engineering techniques to reduce friction and enable natural and contextual conversations. You will analyze, understand and improve user experiences by leveraging Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in artificial intelligence. You will work on core LLM technologies, including Prompt Engineering, Model Fine-Tuning, Reinforcement Learning from Human Feedback (RLHF), Evaluation, etc. Your work will directly impact our customers in the form of novel products and services .
US, CA, Pasadena
The Amazon Web Services (AWS) Center for Quantum Computing (CQC) is a multi-disciplinary team of scientists, engineers, and technicians, on a mission to develop a fault-tolerant quantum computer. We are looking to hire a Research Scientist with fabrication and data analysis experience working on all elements of a superconducting circuit. The position is on-site at our lab, located on the in Pasadena, CA. The ideal candidate will have had prior experience building software tools for data analysis and visualization to enable deep diving into fabrication details, electrical test data. We are looking for candidates with strong engineering principles, resourcefulness and data science experience. Organization and communication skills are essential. Key job responsibilities * Develop and automate data pipeline pertinent to superconducting device fabrication. * Develop analytical tools to uncover new information about established and new processes. * Develop new or contribute to modifying existing data visualization tools. * Utilize machine learning to enable better deeper dives into fabrication and related data. * Interface with various software, design, fabrication and electrical test teams to enable new functionalities. A day in the life The role will be vital to the fabrication team and quantum computing device integration mechanism. The candidate will develop software based analytical tools to enable data driven decisions across projects related to fabrication and supporting infrastructure. Each fabrication run delivers additional data. The candidate will stay close to the details of fabrication providing data analysis and quick feedback to key stakeholders. At the end of fabrication runs custom and standardized reports will be generated by the candidate to provide insights into data generated from the run. This position may require occasional weekend work. About the team AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.
US, WA, Seattle
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by preventing eCommerce fraud? Are you excited by the prospect of analyzing and modeling terabytes of data and creating state-of-the-art algorithms to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you enjoy collaborating in a diverse team environment? If yes, then you may be a great fit to join the Amazon Buyer Risk Prevention (BRP) Machine Learning group. We are looking for a talented scientist who is passionate to build advanced algorithmic systems that help manage safety of millions of transactions every day. Key job responsibilities Use machine learning and statistical techniques to create scalable risk management systems Learning and understanding large amounts of Amazon’s historical business data for specific instances of risk or broader risk trends Design, development and evaluation of highly innovative models for risk management Working closely with software engineering teams to drive real-time model implementations and new feature creations Working closely with operations staff to optimize risk management operations, Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation Tracking general business activity and providing clear, compelling management reporting on a regular basis Research and implement novel machine learning and statistical approaches
CA, ON, Toronto
Amazon Advertising is one of Amazon's fastest growing and most profitable businesses, responsible for defining and delivering a collection of advertising products that drive discovery and sales. Our products and solutions are strategically important to enable our Retail and Marketplace businesses to drive long-term growth. We deliver billions of ad impressions and millions of clicks and break fresh ground in product and technical innovations every day! As an Applied Scientist on this team, you will: - Drive end-to-end Machine Learning projects that have a high degree of ambiguity, scale, complexity. - Perform hands-on analysis and modeling of enormous data sets to develop insights that increase traffic monetization and merchandise sales, without compromising the shopper experience. - Build machine learning models, perform proof-of-concept, experiment, optimize, and deploy your models into production; work closely with software engineers to assist in productionizing your ML models. - Run A/B experiments, gather data, and perform statistical analysis. - Establish scalable, efficient, automated processes for large-scale data analysis, machine-learning model development, model validation and serving. - Research new and innovative machine learning approaches. - Recruit Applied Scientists to the team and provide mentorship. Why you will love this opportunity: Amazon is investing heavily in building a world-class advertising business. This team defines and delivers a collection of advertising products that drive discovery and sales. Our solutions generate billions in revenue and drive long-term growth for Amazon’s Retail and Marketplace businesses. We deliver billions of ad impressions, millions of clicks daily, and break fresh ground to create world-class products. We are a highly motivated, collaborative, and fun-loving team with an entrepreneurial spirit - with a broad mandate to experiment and innovate. Impact and Career Growth: You will invent new experiences and influence customer-facing shopping experiences to help suppliers grow their retail business and the auction dynamics that leverage native advertising; this is your opportunity to work within the fastest-growing businesses across all of Amazon! Define a long-term science vision for our advertising business, driven from our customers' needs, translating that direction into specific plans for research and applied scientists, as well as engineering and product teams. This role combines science leadership, organizational ability, technical strength, product focus, and business understanding. Team video https://youtu.be/zD_6Lzw8raE
US, WA, Seattle
Amazon Advertising is one of Amazon's fastest growing and most profitable businesses, responsible for defining and delivering a collection of advertising products that drive discovery and sales. Our products and solutions are strategically important to enable our Retail and Marketplace businesses to drive long-term growth. We deliver billions of ad impressions and millions of clicks and break fresh ground in product and technical innovations every day! As the Data Science Manager on this team, you will: - Lead of team of scientists, business intelligence engineers, etc., on solving science problems with a high degree of complexity and ambiguity. - Develop science roadmaps, run annual planning, and foster cross-team collaboration to execute complex projects. - Perform hands-on data analysis, build machine-learning models, run regular A/B tests, and communicate the impact to senior management. - Hire and develop top talent, provide technical and career development guidance to scientists and engineers in the organization. - Analyze historical data to identify trends and support optimal decision making. - Apply statistical and machine learning knowledge to specific business problems and data. - Formalize assumptions about how our systems should work, create statistical definitions of outliers, and develop methods to systematically identify outliers. Work out why such examples are outliers and define if any actions needed. - Given anecdotes about anomalies or generate automatic scripts to define anomalies, deep dive to explain why they happen, and identify fixes. - Build decision-making models and propose effective solutions for the business problems you define. - Conduct written and verbal presentations to share insights to audiences of varying levels of technical sophistication. Why you will love this opportunity: Amazon has invested heavily in building a world-class advertising business. This team defines and delivers a collection of advertising products that drive discovery and sales. Our solutions generate billions in revenue and drive long-term growth for Amazon’s Retail and Marketplace businesses. We deliver billions of ad impressions, millions of clicks daily, and break fresh ground to create world-class products. We are a highly motivated, collaborative, and fun-loving team with an entrepreneurial spirit - with a broad mandate to experiment and innovate. Impact and Career Growth: You will invent new experiences and influence customer-facing shopping experiences to help suppliers grow their retail business and the auction dynamics that leverage native advertising; this is your opportunity to work within the fastest-growing businesses across all of Amazon! Define a long-term science vision for our advertising business, driven from our customers' needs, translating that direction into specific plans for research and applied scientists, as well as engineering and product teams. This role combines science leadership, organizational ability, technical strength, product focus, and business understanding. Team video ~ https://youtu.be/zD_6Lzw8raE
US, WA, Seattle
Amazon Advertising is one of Amazon's fastest growing and most profitable businesses, responsible for defining and delivering a collection of advertising products that drive discovery and sales. Our products and solutions are strategically important to enable our Retail and Marketplace businesses to drive long-term growth. We deliver billions of ad impressions and millions of clicks and break fresh ground in product and technical innovations every day! As an Applied Science Manager in Machine Learning, you will: - Directly manage and lead a cross-functional team of Applied Scientists, Data Scientists, Economists, and Business Intelligence Engineers. - Develop and manage a research agenda that balances short term deliverables with measurable business impact as well as long term investments. - Lead marketplace design and development based on economic theory and data analysis. - Provide technical and scientific guidance to team members. - Rapidly design, prototype and test many possible hypotheses in a high-ambiguity environment, making use of both quantitative and business judgment - Advance the team's engineering craftsmanship and drive continued scientific innovation as a thought leader and practitioner. - Develop science and engineering roadmaps, run annual planning, and foster cross-team collaboration to execute complex projects. - Perform hands-on data analysis, build machine-learning models, run regular A/B tests, and communicate the impact to senior management. - Collaborate with business and software teams across Amazon Ads. - Stay up to date with recent scientific publications relevant to the team. - Hire and develop top talent, provide technical and career development guidance to scientists and engineers within and across the organization. Why you will love this opportunity: Amazon is investing heavily in building a world-class advertising business. This team defines and delivers a collection of advertising products that drive discovery and sales. Our solutions generate billions in revenue and drive long-term growth for Amazon’s Retail and Marketplace businesses. We deliver billions of ad impressions, millions of clicks daily, and break fresh ground to create world-class products. We are a highly motivated, collaborative, and fun-loving team with an entrepreneurial spirit - with a broad mandate to experiment and innovate. Impact and Career Growth: You will invent new experiences and influence customer-facing shopping experiences to help suppliers grow their retail business and the auction dynamics that leverage native advertising; this is your opportunity to work within the fastest-growing businesses across all of Amazon! Define a long-term science vision for our advertising business, driven from our customers' needs, translating that direction into specific plans for research and applied scientists, as well as engineering and product teams. This role combines science leadership, organizational ability, technical strength, product focus, and business understanding. Team video ~ https://youtu.be/zD_6Lzw8raE
US, WA, Seattle
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by preventing eCommerce fraud? Are you excited by the prospect of analyzing and modeling terabytes of data and creating state-of-the-art algorithms to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you enjoy collaborating in a diverse team environment? If yes, then you may be a great fit to join the Amazon Buyer Risk Prevention (BRP) Machine Learning group. We are looking for a talented scientist who is passionate to build advanced algorithmic systems that help manage safety of millions of transactions every day. Key job responsibilities Use machine learning and statistical techniques to create scalable risk management systems Learning and understanding large amounts of Amazon’s historical business data for specific instances of risk or broader risk trends Design, development and evaluation of highly innovative models for risk management Working closely with software engineering teams to drive real-time model implementations and new feature creations Working closely with operations staff to optimize risk management operations, Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation Tracking general business activity and providing clear, compelling management reporting on a regular basis Research and implement novel machine learning and statistical approaches
US, CA, San Diego
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by preventing eCommerce fraud? Are you excited by the prospect of analyzing and modeling terabytes of data and creating state-of-the-art algorithms to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you enjoy collaborating in a diverse team environment? If yes, then you may be a great fit to join the Amazon Buyer Risk Prevention (BRP) Machine Learning group. We are looking for a talented scientist who is passionate to build advanced algorithmic systems that help manage safety of millions of transactions every day. Key job responsibilities Use machine learning and statistical techniques to create scalable risk management systems Learning and understanding large amounts of Amazon’s historical business data for specific instances of risk or broader risk trends Design, development and evaluation of highly innovative models for risk management Working closely with software engineering teams to drive real-time model implementations and new feature creations Working closely with operations staff to optimize risk management operations, Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation Tracking general business activity and providing clear, compelling management reporting on a regular basis Research and implement novel machine learning and statistical approaches
US, WA, Seattle
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by preventing eCommerce fraud? Are you excited by the prospect of analyzing and modeling terabytes of data and creating state-of-the-art algorithms to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you enjoy collaborating in a diverse team environment? If yes, then you may be a great fit to join the Amazon Buyer Risk Prevention (BRP) Machine Learning group. We are looking for a talented scientist who is passionate to build advanced algorithmic systems that help manage safety of millions of transactions every day. Key job responsibilities Use machine learning and statistical techniques to create scalable risk management systems Learning and understanding large amounts of Amazon’s historical business data for specific instances of risk or broader risk trends Design, development and evaluation of highly innovative models for risk management Working closely with software engineering teams to drive real-time model implementations and new feature creations Working closely with operations staff to optimize risk management operations, Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation Tracking general business activity and providing clear, compelling management reporting on a regular basis Research and implement novel machine learning and statistical approaches
US, CA, San Diego
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by preventing eCommerce fraud? Are you excited by the prospect of analyzing and modeling terabytes of data and creating state-of-the-art algorithms to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you enjoy collaborating in a diverse team environment? If yes, then you may be a great fit to join the Amazon Buyer Risk Prevention (BRP) Machine Learning group. We are looking for a talented scientist who is passionate to build advanced algorithmic systems that help manage safety of millions of transactions every day. Key job responsibilities Use machine learning and statistical techniques to create scalable risk management systems Learning and understanding large amounts of Amazon’s historical business data for specific instances of risk or broader risk trends Design, development and evaluation of highly innovative models for risk management Working closely with software engineering teams to drive real-time model implementations and new feature creations Working closely with operations staff to optimize risk management operations, Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation Tracking general business activity and providing clear, compelling management reporting on a regular basis Research and implement novel machine learning and statistical approaches