Customer-obsessed science


Research areas
-
August 11, 2025Trained on millions of hours of data from Amazon fulfillment centers and sortation centers, Amazon’s new DeepFleet models predict future traffic patterns for fleets of mobile robots.
-
Featured news
-
2024Self-supervised learning methods have demonstrated impressive performance across visual understanding tasks, including human behavior understanding. However, there has been limited work for self-supervised learning for egocentric social videos. Visual processing in such contexts faces several challenges, including noisy input, limited availability of egocentric social data, and the absence of pretrained
-
International Workshop on Acoustic Signal Enhancement (IWAENC) 20242024We describe a new method for estimating the direction of sound in a reverberant environment from basic principles of sound propagation. The method utilizes SNR-adaptive features from time-delay and energy of the directional components after acoustic wave decomposition of the observed sound field to estimate the line-of-sight direction under noisy and reverberant conditions. The effectiveness of the approach
-
2024In this work, we propose an efficient Video-Language Alignment (ViLA) network. Our ViLA model addresses both efficient frame sampling and effective cross-modal alignment in a unified way. In our ViLA network, we design a new learnable text-guided Frame-Prompter together with a cross-modal distillation (QFormer-Distiller) module. Pretrained large image-language models have shown promising results on problems
-
Sixth Symposium on Advances in Approximate Bayesian Inference2024With the advances of computational power, there has been a rapid development in complex systems to predict certain outputs for industrial problems. Attributing outputs to input features, or output changes to input or system changes has been a critical and challenging problem in many real world applications. In industrial settings, a system could be a chain of large scale models or simulators, or a combination
-
2024Cross-language transfer learning from English to a target language has shown effectiveness in low-resourced audiovisual speech recognition (AV-ASR). We first investigate a 2-stage protocol, which performs fine-tuning of the English pre-trained AV encoder on a large audio corpus in the target language (1st stage), and then carries out cross-modality transfer learning from audio to AV in the target language
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all