Customer-obsessed science


Research areas
-
August 8, 2025A new philosophy for developing LLM architectures reduces energy requirements, speeds up runtime, and preserves pretrained-model performance.
Featured news
-
In e-commerce applications, vision-language multimodal transformer models play a pivotal role in product search. The key to successfully training a multimodal model lies in the alignment quality of image-text pairs in the dataset. However, the data in practice is often automatically collected with minimal manual intervention. Hence the alignment of image-text pairs is far from ideal. In e-commerce, this
-
CVPR 2024 Workshop on "What is Next in Multimodal Foundation Models?"2024This paper presents novel benchmarks for evaluating vision-language models (VLMs) in zero-shot recognition, focusing on granularity and specificity. Although VLMs ex-cel in tasks like image captioning, they face challenges in open-world settings. Our benchmarks test VLMs’ consistency in understanding concepts across semantic granularity levels and their response to varying text specificity. Findings show
-
CVPR 2024 Workshop on Computer Vision for Fashion, Art, and Design2024Virtual try-on and product personalization have become increasingly important in modern online shopping, high-lighting the need for accurate body measurement estimation. Although previous research has advanced in estimating 3D body shapes from RGB images, the task is inherently ambiguous as the observed scale of human subjects in the images depends on two unknown factors: capture distance and body dimensions
-
The context of modern smart voice assistants are often multi-modal, where images, audio and video content are consumed by users simultaneously. In such a setup, co-reference resolution is especially challenging, and runs across modalities and dialogue turns. We explore the problem of multi-modal co-reference resolution in multi-turn dialogues and quantify the performance of multi-modal LLMs on a specially
-
E-commerce faces persistent challenges with data quality issue of product listings. Recent advances in Large Language Models (LLMs) offer a promising avenue for automated product listing enrichment. However, LLMs are prone to hallucinations, which we define as the generation of content that is unfaithful to the source input. This poses significant risks in customer-facing applications. Hallucination detection
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all