Customer-obsessed science
Research areas
-
January 13, 20267 min readLeveraging existing environment simulators and reward functions based on verifiable ground truth boosts task success rate, even with small models and small training datasets.
-
January 8, 20264 min read
-
December 29, 20256 min read
-
December 29, 20259 min read
-
December 10, 20255 min read
Featured news
-
CIKM 20252025Search query understanding (QU) is an important building block of the modern e-commerce search engines. QU extracts multiple intents from customer queries, including intended color, brand, etc. One of the most important tasks in QU is predicting which product category the user is interested in. In our work we are tapping into query product type classification (Q2PT) task. Compared to classification of full-fledged
-
2025Data perspectivism goes beyond majority vote label aggregation by recognizing various perspectives as legitimate ground truths. However, current evaluation practices remain fragmented, making it difficult to compare perspectivist approaches and analyze their impact on differ-ent users and demographic subgroups. To ad-dress this gap, we introduce PersEval, the first unified framework for evaluating perspectivist
-
2025Existing outfit recommendation frameworks focus on outfit compatibility prediction and complementary item retrieval. We present a text-driven outfit generation framework, Text2Outfit, which generates outfits controlled by text prompts. Our framework supports two forms of outfit recommendation: 1) Text-to-outfit generation, where the prompt includes the specification for each outfit item (e.g., product features
-
2025E-commerce stores increasingly use Large Language Models (LLMs) to enhance catalog data quality through automated regeneration. A critical challenge is accurately predicting missing structured attribute values across multilingual product catalogs, where LLM performance varies significantly by language. While existing approaches leverage general knowledge through prompt engineering and external retrieval
-
VLDB 20252025Cloud service providers usually leverage standard benchmarks such as TPC-H and TPC-DS to evaluate and optimize the performance of cloud data analytic systems. However, these benchmarks have fixed query patterns and are unable to effectively generate statistics of the cloud workloads in production. For example, they cannot simulate the real workload with the similar performance metrics such as CPU Time and
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all