Customer-obsessed science


Research areas
-
July 29, 2025New cost-to-serve-software metric that accounts for the full software development lifecycle helps determine which software development innovations provide quantifiable value.
Featured news
-
2024Current instruction-tuned language models are exclusively trained with textual preference data and thus are often not aligned with the unique requirements of other modalities, such as speech. To better align language models with the speech domain, we explore (i) prompting strategies grounded in radio-industry best practices and (ii) preference learning using a novel speech-based preference data of 20K samples
-
Findings of EMNLP 20242024Augmenting Large Language Models (LLMs) with information retrieval capabilities (i.e., Retrieval-Augmented Generation (RAG)) has proven beneficial for knowledge-intensive tasks. However, understanding users’ contextual search intent when generating responses is an understudied topic for conversational question answering (QA). This conversational extension leads to additional concerns when compared to single-turn
-
RecSys 20242024Music Search encounters a significant challenge as users increasingly rely on catchy lines from lyrics to search for both new releases and other popular songs. Integrating lyrics into existing lexical search index or using lyrics vector index pose difficulties due to lyrics text length. While lexical scoring mechanisms like BM25 are inadequate and necessitates complex query planning and index schema for
-
Findings of EMNLP 20242024Large Language Models (LLMs) are widely used in both industry and academia for various tasks, yet evaluating the consistency of generated text responses continues to be a challenge. Traditional metrics like ROUGE and BLEU show a weak correlation with human judgment. More sophisticated metrics using Natural Language Inference (NLI) have shown improved correlations but are complex to implement, require domain-specific
-
CIKM 2024 Workshop on Generative AI for E-commerce2024Large language models (LLMs) offer substantial potential for automating labeling tasks, showcasing robust zero-shot performance across diverse classification tasks. The LLM-generated reasons that accompany these classifications contain signals about the quality of the classifications. Estimates of quality of these reasons can, in essence, be used to detect potentially incorrect predictions. Conventional
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all