Customer-obsessed science
Research areas
-
January 13, 20267 min readLeveraging existing environment simulators and reward functions based on verifiable ground truth boosts task success rate, even with small models and small training datasets.
-
December 29, 20256 min read
-
December 29, 20259 min read
-
December 8, 20258 min read
-
December 5, 20256 min read
Featured news
-
Towards effective genAI multi-agent collaboration: Design and evaluation for enterprise applicationsarXiv2024AI agents powered by large language models (LLMs) have shown strong capabilities in problem solving. Through combining many intelligent agents, multi-agent collaboration has emerged as a promising approach to tackle complex, multi-faceted problems that exceed the capabilities of single AI agents. However, designing the collaboration protocols and evaluating the effectiveness of these systems remains a significant
-
2024Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements. In this paper, we propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for both the
-
ITC 20242024Recently the semiconductor industry has been alerted by hyperscaler companies reporting impact of field errors in megascale datacenters. They tend to be elusive and very difficult to detect until they affect a particular application several days or months after the IC has been deployed in a fleet. Although the cause of such errors can be manifold, ranging from test escapes and design marginalities to design
-
CIKM 2024 Workshop on GenAI and RAG Systems for Enterprise2024Security controls are mechanisms or policies designed for cloud-based services to reduce risk, protect information, and ensure compliance with security regulations. The development of security controls is traditionally a labor-intensive and time-consuming process. This paper explores the use of Generative AI to accelerate the generation of security controls. We specifically focus on generating Gherkin codes
-
NeurIPS 2024 Workshop on Time Series in the Age of Large Models2024Demand forecasting faces challenges induced by Peak Events (PEs) corresponding to special periods such as promotions and holidays. Peak events create significant spikes in demand followed by demand ramp down periods. Neural networks like MQCNN [12, 6] and MQT [1] overreact to demand peaks by carrying over the elevated PE demand into subsequent Post-Peak-Event (PPE) periods, resulting in significantly over-biased
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all