-
ICASSP 20232023Self-supervised speech representation learning (S3RL) is revolutionizing the way we leverage the ever-growing availability of data. While S3RL related studies typically use large models, we employ light-weight networks to comply with tight memory of computeconstrained devices. We demonstrate the effectiveness of S3RL on a keyword-spotting (KS) problem by using transformers with 330k parameters and propose
-
ICASSP 20232023We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW spotting
-
ICASSP 20232023Tiny Signal-to-Interpretation (TinyS2I) has been recently introduced as an ultra low-footprint end-to-end spoken language understanding (SLU) model. This architecture is capable of running in ultra resource constrained environments like voice assistant devices, while at the same time reducing latency. In this work, we propose an extension to TinyS2I and train a multilingual system supporting several languages
-
AAAI 20232023A primary objective of news articles is to establish the factual record for an event, frequently achieved by conveying both the details of the specified event (i.e., the 5 Ws; Who, What, Where, When and Why regarding the event) and how people reacted to it (i.e., reported statements). However, existing work on news summarization almost exclusively focuses on the event details. In this work, we propose the
-
ICASSP 20232023Audio event has a hierarchical architecture in both time and frequency and can be grouped together to construct more abstract semantic audio classes. In this work, we develop a multiscale audio spectrogram Transformer (MAST) that employs hierarchical representation learning for efficient audio classification. Specifically, MAST employs one-dimensional (and two-dimensional) pooling operators along the time
Related content
-
July 6, 2023The program exposes students to computer science as they create their own Alexa skills.
-
July 5, 2023Amazon Research Award recipient Shrikanth Narayanan is on a mission to make inclusive human-AI conversational experiences.
-
July 3, 2023With little training data and no mapping of speech to phonemes, Amazon researchers used voice conversion to generate Irish-accented training data in Alexa’s own voice.
-
June 26, 2023How phonetically blended results (PBR) help ensure customers find the content they were actually asking for.
-
June 9, 2023In a top-3% paper at ICASSP, Amazon researchers adapt graph-based label propagation to improve speech recognition on underrepresented pronunciations.
-
June 7, 2023Team earned $500,000 for its performance in a challenge focused on advancing next-generation virtual assistants that help humans complete real-world tasks by continuously learning.