Customer-obsessed science
Research areas
-
February 2, 202610 min readEvery NFL game generates millions of tracking data points from 22 RFID-equipped players. Seventy-five machine learning models running on AWS process that data in under a second, transforming football into a sport where every movement is measured, modeled, and instantly analyzed.
-
January 13, 20267 min read
-
January 8, 20264 min read
-
-
December 29, 20256 min read
Featured news
-
IJCNLP-AACL 20232023Evaluation of QA systems is very challenging and expensive, with the most reliable approach being human annotations of correctness of answers for questions. Recent works (AVA, BEM) have shown that transformer LM encoder based similarity metrics transfer well for QA evaluation, but they are limited by the usage of a single correct reference answer. We propose a new evaluation metric: SQuArE (Sentence-level
-
ICCV 2023 Workshop on Closing the Loop Between Vision and Language2023Traditional image-to-image and text-to-image search struggle with comprehending complex user intentions, particularly in fashion e-commerce, where users search for similar products with text modifications to a reference image. This paper introduces Progressive Vision-Language Alignment and Multimodal Fusion (ProVLA), a novel approach which utilizes a transformer-based vision and language model to generate
-
IEEE 2023 Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)2023A large scale music catalog contains diverse types of sound recordings. In this paper, we present a methodology to identify instrumental music with high precision and high recall. Our method starts by separating a recording into a vocals track and a background track. Then, we process the vocals track with a singing voice detection model, to estimate the amount of singing voice in a song. Finally, we analyze
-
SIGDIAL 20232023Dialogue act annotations are important to improve response generation quality in taskoriented dialogue systems. However, it can be challenging to use dialogue acts to control response generation in a generalizable way because different datasets and tasks may have incompatible annotations. While alternative methods that utilize latent action spaces or reinforcement learning do not require explicit annotations
-
Interspeech 20232023To translate speech for automatic dubbing, machine translation needs to be isochronous, i.e. translated speech needs to be aligned with the source in terms of speech durations. We introduce target factors in a transformer model to predict durations jointly with target language phoneme sequences. We also introduce auxiliary counters to help the decoder to keep track of the timing information while generating
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all