-
Interspeech 20202020Over the past few years, speech enhancement methods based on deep learning have greatly surpassed traditional methods based on spectral subtraction and spectral estimation. Many of these new techniques operate directly in the the short-time Fourier transform (STFT) domain, resulting in a high computational complexity. In this work, we propose PercepNet, an efficient approach that relies on human perception
-
Interspeech 20202020We introduce DashHashLM, an efficient data structure that stores an n-gram language model compactly while making minimal trade-offs on runtime lookup latency. The data structure implements a finite state transducer with a lossless structural compression and outperforms comparable implementations when considering lookup speed in the small-footprint setting. DashHashLM introduces several optimizations to
-
Interspeech 20202020Spoken language understanding (SLU) refers to the process of inferring the semantic information from audio signals. While the neural transformers consistently deliver the best performance among the state-of-the-art neural architectures in field of natural language processing (NLP), their merits in a closely related field, i.e., spoken language understanding (SLU) have not been investigated. In this paper
-
Interspeech 20202020Large end-to-end neural open-domain chatbots are becoming increasingly popular. However, research on building such chatbots has typically assumed that the user input is written in nature and it is not clear whether these chatbots would seamlessly integrate with automatic speech recognition (ASR) models to serve the speech modality. We aim to bring attention to this important question by empirically studying
-
Interspeech 20202020Wake word (WW) spotting is challenging in far-field due to the complexities and variations in acoustic conditions and the environmental interference in signal transmission. A suite of carefully designed and optimized audio front-end (AFE) algorithms help mitigate these challenges and provide better quality audio signals to the downstream modules such as WW spotter. Since the WW model is trained with the
Related content
-
April 7, 2021Technique that lets devices convey information in natural language improves on state of the art.
-
March 31, 2021Throughout the pandemic, the Alexa team has continued to invent on behalf of our customers.
-
March 26, 2021In the future, says Amazon Scholar Emine Yilmaz, users will interact with computers to identify just the information they need, rather than scrolling through long lists of results.
-
March 24, 2021Human-evaluation studies validate metrics, and experiments show evidence of bias in popular language models.
-
March 19, 2021A model that uses both local and global context improves on the state of the art by 6% and 11% on two benchmark datasets.
-
March 16, 2021Amanda Cullen, a PhD candidate in informatics at the University of California, Irvine, wanted to do work that had an impact outside of academia — she found an ideal opportunity at Twitch.