Customer-obsessed science
Research areas
-
January 13, 20267 min readLeveraging existing environment simulators and reward functions based on verifiable ground truth boosts task success rate, even with small models and small training datasets.
-
December 29, 20256 min read
-
December 29, 20259 min read
-
December 8, 20258 min read
-
December 5, 20256 min read
Featured news
-
NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles and Scalability2024Cybersecurity applications are challenged by constant distribution shifts due to the evolvement of services, users, and threats, degrading pretrained model performance. Fast adaptation is crucial for maintaining reliable security measures. Existing works primarily focus on pretraining models that can quickly adapt to new distributions, yet their fine-tuning relies on a rudimentary strategy that treats each
-
2024Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. The predominant approach is to alter the supervised learning pipeline by augmenting typical loss functions, letting model rejection incur a lower loss than an incorrect prediction. Instead, we propose a different distributional perspective, where we seek to find an idealized data distribution
-
2023 Conference on Digital Experimentation @ MIT (CODE@MIT), NeurIPS 20242024This paper introduces the confounded pure exploration transductive linear bandit (CPET-LB) problem. As a motivating example, often online services cannot directly assign users to specific control or treatment experiences either for business or practical reasons. In these settings, naively comparing treatment and control groups that may result from self-selection can lead to biased estimates of underlying
-
2024Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior. The key idea is to sample from a chain of approximate
-
2024In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging for complex tasks. Prior works on self-debugging mostly focus on prompting methods by providing LLMs with few-shot examples, which work poorly on small open-sourced LLMs
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all