Customer-obsessed science


Research areas
-
June 25, 2025With large datasets, directly generating data ID codes from query embeddings is much more efficient than performing pairwise comparisons between queries and candidate responses.
Featured news
-
2024 Conference on Digital Experimentation @ MIT (CODE@MIT)2024Online sites typically evaluate the impact of new product features on customer behavior using online controlled experiments (or A/B tests). For many business applications, it is important to detect heterogeneity in these experiments [1], as new features often have a differential impact by customer segment, product group, and other variables. Understanding heterogeneity can provide key insights into causal
-
2024Code generation models are not robust to small perturbations, which often lead to incorrect generations and significantly degrade the performance of these models. Although improving the robustness of code generation models is crucial to enhancing user experience in real-world applications, existing research efforts do not address this issue. To fill this gap, we propose CodeFort, a framework to improve
-
In-context learning (ICL) is a powerful paradigm where large language models (LLMs) benefit from task demonstrations added to the prompt. Yet, selecting optimal demonstrations is not trivial, especially for complex or multi-modal tasks where input and output distributions differ. We hypothesize that forming taskspecific representations of the input is key. In this paper, we propose a method to align representations
-
2024Hierarchical Text Classification (HTC) is a sub-class of multi-label classification. It is challenging because the hierarchy typically has a large number of diverse topics. Existing methods for HTC fall within two categories, local methods (a classifier for each level, node, or parent) or global methods (a single classifier for everything). Local methods are computationally expensive, whereas global methods
-
2024Data is a crucial element in large language model (LLM) alignment. Recent studies have explored using LLMs for efficient data collection. However, LLM-generated data often suffers from quality issues, with underrepresented or absent aspects and low-quality data-points. To address these problems, we propose DATA ADVISOR, an enhanced LLM-based method for generating data that takes into account the characteristics
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all