-
Interspeech 20212021The success of modern deep learning systems is built on two cornerstones, massive amount of annotated training data and advanced computational infrastructure to support large-scale computation. In recent years, the model size of state-of-the-art deep learning systems has rapidly increased and sometimes reached to billions of parameters. Herein we take a close look into this phenomenon and present an empirical
-
Interspeech 20212021End-to-end automatic speech recognition systems map a sequence of acoustic features to text. In modern systems, text is encoded to grapheme subwords which are generated by methods designed for text processing tasks and therefore don’t model or take advantage of the statistics of the acoustic features. Here, we present a novel method for generating grapheme subwords that are derived from phoneme sequences
-
Interspeech 20212021Fine-tuning transformer-based models have shown to outperform other methods for many Natural Language Understanding (NLU) tasks. Recent studies to reduce the size of transformer models have achieved reductions of > 80%, making on-device inference on powerful devices possible. However, other resource-constrained devices, like those enabling voice assistants (VAs), require much further reductions. In this
-
Interspeech 20212021This paper proposes a general enhancement to the Normalizing Flows (NF) used in neural vocoding. As a case study, we improve expressive speech vocoding with a revamped Parallel Wavenet (PW). Specifically, we propose to extend the affine transformation of PW to the more expressive invertible nonaffine function. The greater expressiveness of the improved PW leads to better-perceived signal quality and naturalness
-
Interspeech 20212021Multi-channel inputs offer several advantages over singlechannel, to improve the robustness of on-device speech recognition systems. Recent work on multi-channel transformer, has proposed a way to incorporate such inputs into end-to-end ASR for improved accuracy. However, this approach is characterized by a high computational complexity, which prevents it from being deployed in on-device systems. In this
Related content
-
April 7, 2022The JHU + Amazon Initiative for Interactive AI (AI2AI) will be housed in the Whiting School of Engineering.
-
April 4, 2022Thanks to a set of simple abstractions, models with different architectures can be integrated and optimized for particular hardware accelerators.
-
March 23, 2022Amazon researchers optimize the distributed-training tool to run efficiently on the Elastic Fabric Adapter network interface.
-
March 16, 2022A machine learning model learns representations that cluster devices according to their usage patterns.
-
March 3, 2022As an applied science manager at Amazon, Muthu Chandrasekaran works on new tools to automate and build a risk technology.
-
February 28, 2022Novel pretraining method enables increases of 5% to 14% on five different evaluation metrics.