Customer-obsessed science
Research areas
-
December 5, 20256 min readA multiagent architecture separates data perception, tool knowledge, execution history, and code generation, enabling ML automation that works with messy, real-world inputs.
-
-
-
November 20, 20254 min read
-
October 20, 20254 min read
Featured news
-
35th Picture Coding Symposium2021Video live streaming is gaining prevalence among video streaming services, especially for the delivery of popular sporting events. Many objective Video Quality Assessment (VQA) models have been developed to predict the perceptual quality of videos. Appropriate databases that exemplify the distortions encountered in live streaming videos are important to designing and learning objective VQA models. Towards
-
NAACL 20212021Non-autoregressive encoder-decoder models greatly improve decoding speed over autoregressive models, at the expense of generation quality. To mitigate this, iterative decoding models repeatedly infill or refine the proposal of a non-autoregressive model. However, editing at the level of output sequences limits model flexibility. We instead propose iterative realignment, which by refining latent alignments
-
Interspeech 20212021The concept of multi-headed self attention (MHSA) introduced as a critical building block of a Transformer Encoder/Decoder Module has made a significant impact in the areas of natural language processing (NLP), automatic speech recognition (ASR) and recently in the area of sound event detection (SED). The current state-of-the-art approaches to SED employ a shared attention mechanism achieved through a stack
-
NAACL 20212021Recent advances in transfer learning have improved the performance of virtual assistants considerably. Nevertheless, creating sophisticated voice-enabled applications for new domains remains a challenge, and meager training data is often a key bottleneck. Accordingly, unsupervised learning and SSL (semisupervised learning) techniques continue to be of vital importance. While a number of such methods have
-
TSD 20212021We present results from Alexa speech teams on semi-supervised learning (SSL) of acoustic models (AM) with experiments spanning over 3000 hours of GPU time, making our study one of the largest of its kind. We discuss SSL for AMs in a small footprint setting, showing that a smaller capacity model trained with 1 million hours of unsupervised data can outperform a baseline supervised system by 14.3% word error
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all