Search - Amazon Science

18,723 results found

Sort

Pablo Garcia Moreno

Applied Scientist
Thomas Drugman

Senior Manager, Applied Science
Effect of Data Reduction on Seq-to-seq Acoustic Models for Speech Synthesis

Javier Latorre, Jakub Lachowicz, Jaime Lorenzo Trueba, Tom Merritt, Thomas Drugman

ICASSP 2019

2019

Recent speech synthesis systems based on sampling from autoregressive neural networks models can generate speech almost undistinguishable from human recordings. To work properly these models required large amounts of data. However, they are more efficient at dealing less homogenous data, which might make possible to compensate the lack of data from one speaker with data from other speakers. This paper evaluates

Conversational AI
Quynh Ngoc Thi Do
Judith Gaspers

Alexa AI Natural Understanding Group

Applied Scientist
Cross-lingual Transfer Learning with Data Selection for Large-Scale Spoken Language Understanding

Quynh Ngoc Thi Do, Judith Gaspers

ICASSP 2019, EMNLP 2019

2019

Typically, spoken language understanding (SLU) models are trained on annotated data which are costly to gather. Aiming to reduce data needs for bootstrapping a SLU system for a new language, we present a simple but effective weight transfer approach using data from another language. The approach is evaluated with our promising multi-task SLU framework developed towards different languages. We evaluate our

Related: Improving cross-lingual transfer learning by filtering training data

Conversational AI
Self-attention networks for connectionist temporal classification in speech recognition

Julian Salazar, Katrin Kirchhoff, Zhiheng Huang

ICASSP 2019

2019

The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free, non-autoregressive approach to sequence transduction, either by itself or in various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully self-attentional network for

Conversational AI
Amit S. Chhetri
Carlo Murgia
Reconfigurable Multitask Audio Dynamics Processing Scheme

Jun Yang, Amit S. Chhetri, Carlo Murgia, Philip Hilmes

ICASSP 2019

2019

Automatic speech recognition (ASR), audio quality, and loudness are key performance indicators (KPIs) in smart speakers. To improve all these KPIs, audio dynamics processing is a crucial component in related systems. Unfortunately, single-band and existing multiband dynamics processing (MBDP) schemes fail to maximize bass and loudness but even produce unwanted peaks, distortions, and nonlinear echo so that

Related: Signal processor improves Echo’s bass response, loudness, and speech recognition accuracy

Conversational AI
Xing Fan

Alexa AI group, Amazon

Principal, Applied Scientist
I-Fan Chen

Applied Scientist
End-to-end Anchored Speech Recognition

Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Björn Hoffmeister

ICASSP 2019

2019

Voice-controlled house-hold devices, like Amazon Echo or Google Home, face the problem of performing speech recognition of devicedirected speech in the presence of interfering background speech, i.e., background noise and interfering speech from another person or media device in proximity need to be ignored. We propose two end-to-end models to tackle this problem with information extracted from the “anchored

Related: Using wake word acoustics to filter out background speech improves speech recognition by 15%

Machine learning
Ladislav Mosner
Brian King

Applied Scientist
Arun Krishnan

Contributing writer
Deep Embeddings for Rare Audio Event Detection With Imbalanced Data

Vipul Arora, Ming Sun, Chao Wang

ICASSP 2019

2019

In this paper, we present a method to handle data imbalance for classification with neural networks, and apply it to acoustic event detection (AED) problem. The common approach to tackle data imbalance is to use class-weights in the objective function while training. An existing more sophisticated approach is to map the input to clusters in an embedding space, so that learning is locally balanced by incorporating

Related: To correct imbalances in training data, don’t oversample: Cluster

Machine learning
Towards better confidence estimation for neural models

Vishal Thanvantri Vasudevan, Abhinav Sethy, Alireza Roshan-Ghias

ICASSP 2019

2019

In this work we focus on confidence modeling for neural network based text classification and sequence to sequence models in the context of Natural Language Understanding (NLU) tasks. For most applications, the confidence of a neural network model in it’s output is computed as a function of the posterior probability, determined via a softmax layer. In this work, we show that such scores can be poorly calibrated

Conversational AI
Vishal Thanvantri Vasudevan
Alireza Roshan-Ghias

...

911

912

913

...

937

Search results

Work with us