Customer-obsessed science


Research areas
-
June 25, 2025With large datasets, directly generating data ID codes from query embeddings is much more efficient than performing pairwise comparisons between queries and candidate responses.
Featured news
-
Journal of the American Medical Informatics Association2024Objectives: Patients are increasingly being given direct access to their medical records. However, radiology reports are written for clinicians and typically contain medical jargon, which can be confusing. One solution is for radiologists to provide a “colloquial” version that is accessible to the layperson. Because manually generating these colloquial translations would represent a significant burden for
-
2024In e-commerce, high consideration search missions typically require careful and elaborate decision making, and involve a substantial research investment from customers. We consider the task of automatically identifying such High Consideration (HC) queries. Detecting such missions or searches enables e-commerce sites to better serve user needs through targeted experiences such as curated QA widgets that
-
2024 Conference on Digital Experimentation @ MIT (CODE@MIT)2024There are different reasons why experimenters may want to randomize their experiment at a region level. In some cases, treatments cannot be turned on or off at the individual level, therefore requiring randomization at a group level, for which regions can be a good candidate. In other cases, experimenters may worry about network effects or other types of spillovers within a geographic area, and opt to randomize
-
Representation learning is a fundamental aspect of modern artificial intelligence, driving substantial improvements across diverse applications. While self-supervised contrastive learning has led to significant advancements in fields like computer vision and natural language processing, its adaptation to tabular data presents unique challenges. Traditional approaches often prioritize optimizing model architecture
-
2024Various types of learning rate (LR) schedulers are being used for training or fine tuning of Large Language Models today. In practice, several mid-flight changes are required in the LR schedule either manually, or with careful choices around warmup steps, peak LR, type of decay and restarts. To study this further, we consider the effect of switching the learning rate at a predetermined time during training
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all