Customer-obsessed science


Research areas
-
July 31, 2025Using ensembles of agents to generate and refine interactions annotated with chains of thought improves performance on a battery of benchmarks by an average of 29%.
Featured news
-
CVPR 2024, CVPR 2024 Workshop on What is Next in Multimodal Foundation Models?, CVPR 2024 Workshop on Robustness in Large Language Models2024Generative Vision-Language Models (VLMs) are prone to generate plausible-sounding textual answers that, however, are not always grounded in the input image. We investigate this phenomenon, usually referred to as “hallucination” and show that it stems from an excessive reliance on the language prior. In particular, we show that as more tokens are generated, the reliance on the visual prompt decreases, and
-
Efficient retrieval and ranking of relevant products in e-commerce product search relies on accurate mapping of queries to product categories. This query classification typically utilizes a combination of textual and customer behavioral signals. However, new product categories often lack customer interaction data leading to poor performance. In this paper, we present a novel approach to mitigate this cold
-
E-commerce platforms typically store and structure product information and search data in a hierarchy. Efficiently categorizing user search queries into a similar hierarchical structure is paramount in enhancing user experience on e-commerce platforms as well as news curation and academic research. The significance of this task is amplified when dealing with sensitive query categorization or critical information
-
2024The increasing use of transformer-based large language models brings forward the challenge of processing long sequences. In document visual question answering (DocVQA), leading methods focus on the single-page setting, while documents can span hundreds of pages. We present GRAM, a method that seamlessly extends pre-trained single-page models to the multi-page setting, with-out requiring computationally-heavy
-
2024Synthesizing novel views for dynamic scenes from a collection of RGB inputs poses significant challenges due to the inherent under-constrained nature of the problem. To mitigate this ill-posedness, practitioners in the field of neural radiance fields (NeRF) often resort to the adoption of intricate geometric regularization techniques, including scene flow, depth estimation, or learned perceptual similarity
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all