Customer-obsessed science
Research areas
-
December 5, 20256 min readA multiagent architecture separates data perception, tool knowledge, execution history, and code generation, enabling ML automation that works with messy, real-world inputs.
-
-
-
November 20, 20254 min read
-
October 20, 20254 min read
Featured news
-
Interspeech 20212021Fine-tuning transformer-based models have shown to outperform other methods for many Natural Language Understanding (NLU) tasks. Recent studies to reduce the size of transformer models have achieved reductions of > 80%, making on-device inference on powerful devices possible. However, other resource-constrained devices, like those enabling voice assistants (VAs), require much further reductions. In this
-
Interspeech 20212021This paper proposes a general enhancement to the Normalizing Flows (NF) used in neural vocoding. As a case study, we improve expressive speech vocoding with a revamped Parallel Wavenet (PW). Specifically, we propose to extend the affine transformation of PW to the more expressive invertible nonaffine function. The greater expressiveness of the improved PW leads to better-perceived signal quality and naturalness
-
Interspeech 20212021Multi-channel inputs offer several advantages over singlechannel, to improve the robustness of on-device speech recognition systems. Recent work on multi-channel transformer, has proposed a way to incorporate such inputs into end-to-end ASR for improved accuracy. However, this approach is characterized by a high computational complexity, which prevents it from being deployed in on-device systems. In this
-
Interspeech 20212021End-to-end (E2E) spoken language understanding (SLU) systems predict utterance semantics directly from speech using a single model. Previous work in this area has focused on targeted tasks in fixed domains, where the output semantic structure is assumed a priori and the input speech is of limited complexity. In this work we present our approach to developing an E2E model for generalized SLU in commercial
-
Interspeech 20212021We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all