Icelandic delegation.png — Top row, center: H. E. Guðni Th. Jóhannesson, president of Iceland. Top row, from left of president: Robin Dautricourt, principal product manager, Amazon Polly; Jack FitzGerald, senior applied scientist, Amazon Alexa AI; Nikko Ström, distinguished scientist and VP, Amazon Alexa AI; Michele Butti, director, Amazon Alexa International. Top row, from right of president: Halldór Benjamín Þorbergsson, CEO of the Confederation of Icelandic Enterprise; Björgvin Ingi Ólafsson, member of the Board of Almannarómur, the Icelandic Center for Language Technology; Jón Guðnason, associate professor of signal processing and language technology, Reykjavík University. Middle row, from left of president: Nikulás Hannigan, Iceland’s trade commissioner to North America and consul general in New York; Stefanía G. Halldórsdóttir, chairman of the Board of Almannarómur. Middle row from right of president: Susan Pointer, VP, public policy, Amazon; Jóhanna Vigdís Guðmundsdottir, CEO of Almannarómur; Vilhjálmur Þorsteinsson, founder and CEO of Miðeind; Kristrún Heiða Hauksdóttir, specialist at the Ministry for Cultural and Business Affairs. Bottom row: Lilja Dögg Alfreðsdóttir, minister of cultural and business affairs; Nikhil Sharma, senior manager of product management, Amazon Text-to-Speech.

Conversational AI

Amazon scientists welcome Iceland’s presidential delegation

President’s visit part of a mission to preserve the Icelandic language in the digital age.

By Jack G. M. FitzGerald, Nikko Ström

June 29, 2022

5 min read

Recently at our Seattle headquarters, Amazon had the pleasure of hosting Iceland’s President, H. E. Guðni Th. Jóhannesson, along with a delegation spanning Icelandic government officials, business leaders, and academics. It was truly an honor to meet with them.

Resources

Here are some resources provided to us by the Icelandic delegation that you may find useful:

An overview of the program and past work.
Parallel text-speech database for TTS (Talrómur): The first part of the database (Talrómur 1) consists of 220 hours of studio-quality recordings from four female and four male voices. Each voice donor recorded between 10 and 30 hours of data, which should be sufficient to build a voice that sounds like that donor. The data is available under a Creative Commons 4.0 BY license.
Talrómur 2: 80 hours of studio-quality recordings from 20 female and 20 male voices. Each voice donor recorded approximately two hours of data. While two hours might not be enough to create a voice from scratch based on a specific voice donor, it should be possible to join the voices in this dataset (and, indeed, in Talrómur 1) to create a voice that is a unique mix of the voices in the dataset. The data is available under a Creative Commons 4.0 BY license.
Icelandic pronunciation dictionary: A manually verified pronunciation lexicon containing almost 50,000 unique word forms transcribed in four pronunciation variants, often including a clear and a less formal transcription (reading pronunciation vs. casual-speech pronunciation). The repository contains transcription rules and guidelines followed in the project. The dictionary is available under a Creative Commons 4.0. BY license.
Text normalization corpus: A corpus of 40,000 sentences, manually normalized for TTS (an example of a normalization task in TTS is converting, e.g., “$30” to “thirty dollars”).
Text preprocessing for TTS: A text-preprocessing pipeline connecting standalone modules for text cleaning, text normalization, phrasing, and grapheme-to-phoneme (g2p) conversion. The front-end pipeline and all submodules are available under an Apache 2.0 license.
Recipes for Icelandic TTS: Open-source TTS recipes for Icelandic have been made available as part of the Language Technology Programme for Icelandic (LTPI). A traditional unit selection recipe implemented in Festival is available here under an Apache 2.0 license.
Neural-TTS recipe: Implemented in FastSpeech. Available under Apache 2.0 license.
Talrómur 1 baseline models, train/test splits, and alignments
Parallel text-speech database for ASR (Samrómur): The Samrómur crowd-sourcing platform is derived from the Mozilla Common Voice project. It is based on read prompts from volunteers and totals over 2,300 hours of data. The crowdsourcing statistics can be seen here. A concurrent verification effort has led to publications (under Creative Commons 4.0 BY licenses) that can, for example, be found here. A similar dataset of 152 hours of adult voices was collected around 2011 and is available here.
Parliamentary speech data: 542 hours of clean and verified speeches from the Icelandic parliament.

Other speech databases

Resources for ASR language modeling

The Icelandic Gigaword Corpus

Other tools and recipes for ASR

About the Author

Jack G. M. FitzGerald

Jack G. M. FitzGerald is a senior applied scientist in the Artificial General Intelligence (AGI) organization at Amazon and has a background in machine learning, physics, and tech leadership. He currently focuses on large-language-model pretraining and adaptation, multilingual modeling, optimized distributed training, and efficient modeling operations.

Nikko Ström

Nikko Ström is a vice president and distinguished scientist in the Alexa AI organization.