Acoustic Model Bootstrapping Using Semi-Supervised Learning
2019
This work aims at bootstrapping the acoustic model training with small amount of the human annotated speech data and large amount of the unlabeled speech data for automatic speech recognition.The technologies of the semi-supervised learning were investigated to select the automatically transcribed training samples.Two semi-supervised learning methods were pro-posed: one is the local-global uncertainty based method which introduces both the local uncertainty from the current utterance,and the global uncertainty from the whole data pool into the data selection; the other is the margin based data selection, which selects the utterances near to the decision boundary through the language model tuning. The experimental results based on a Japanese far-field automatic speech recognition system indicated that the acoustic model trained by the automatically transcribed speech data achieved about 17% relative gain when the in-domain human annotated data was not available for initialization. While 3.7% relative gain was obtained when the initial acoustic model was trained by small amount of the in-domain data.
Research areas