The theme of this work is improving Computer-Assisted Pronunciation Training (CAPT) for the task of languages learning. The central hypothesis is that accuracy of ML models for the detection of pronunciation errors is impacted by the limited amount of training data available, and that this limitation can be overcome by using synthetic speech generation and end-to-end modelling; if a generative model that can mimic non-native speech is available then it is possible to produce training data for the task of detecting pronunciation errors. The thesis presents three innovative techniques based on phoneme-to-phoneme (P2P), text-to-speech (T2S), and speech-to-speech (S2S) conversion to generate both correctly pronounced and mispronounced synthetic speech. It shows that these techniques not only improve the accuracy of machine learning models for detecting pronunciation errors but also establishes a new state-of-the-art in the field.
This PhD was conducted within the framework of the “Implementation Doctorate” Program of the
Ministry of Education and Science at Gdansk University of Technology, under the supervision of
Professor Bożena Kostek (Gdańsk University of Technology) and Roberto Barra-Chicote, PhD, TTS
Research, Amazon.
Automated detection of pronunciation errors in non-native English speech employing deep learning
2022
Research areas