Customer-obsessed science


Research areas
-
September 2, 2025Audible's ML algorithms connect users directly to relevant titles, reducing the number of purchase steps for millions of daily users.
-
Featured news
-
Predicting customer preferences for each item is a prerequisite module for most recommender systems in e-commerce. However, the sparsity of behavioral data is often a challenge to learn accurate prediction models. Given millions of items, each customer may only be able to interact with a small subset of them over time. This sparse behavioral data is insufficient to represent item-customer and item-item
-
2024We introduce a novel and efficient approach for text-based video-to-video editing that eliminates the need for resource-intensive per-video-per-model finetuning. At the core of our approach is a synthetic paired video dataset tailored for video-to-video transfer tasks. Inspired by Instruct Pix2Pix’s image transfer via editing instruction, we adapt this paradigm to the video domain. Extending the Prompt-to-Prompt
-
2024We propose a single-shot approach to determining 6-DoF pose of an object with available 3D computer-aided design (CAD) model from a single RGB image. Our method, dubbed MRC-Net, comprises two stages. The first performs pose classification and renders the 3D object in the classified pose. The second stage performs regression to predict fine-grained residual pose within class. Connecting the two stages is
-
2024Existing losses used in deep metric learning (DML) for image retrieval often lead to highly non-uniform intra-class and inter-class representation structures across test classes and data distributions. When combined with the common practice of using a fixed threshold to declare a match, this gives rise to significant performance variations in terms of false accept rate (FAR) and false reject rate (FRR)
-
2024We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering. Using web screenshots unlocks a treasure trove of visual and textual cues that are not present in using image-text pairs. In S4, we leverage the inherent tree-structured hierarchy of HTML elements and the spatial localization to
Conferences
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all