-
IJCAI 2025 Workshop on User-Aligned Assessment of Adaptive AI Systems2025Effectively assessing AI systems, particularly those operating in specialized domains or producing dynamic outputs, requires translating nuanced human expertise into scalable, quantitative measures. Traditional metrics often fall short in capturing qualitative requirements that domain experts intuitively grasp. This paper presents a novel framework that systematically transforms qualitative expert feedback
-
2025Behavioral therapy notes are important for both legal compliance and patient care. Unlike progress notes in physical health, quality standards for behavioral therapy notes remain underdeveloped. To address this gap, we collaborated with licensed therapists to design a comprehensive rubric for evaluating therapy notes across key dimensions: completeness, conciseness, and faithfulness. Further, we extend
-
ACL 2025 Workshop on Research on Agent Language Models2025Developing language model-based dialogue agents requires effective data to train models that can follow specific task logic. However, most existing data simulation methods focus on increasing diversity in language, topics, or dialogue acts at the utterance level, largely neglecting a critical aspect of task logic diversity at the dialogue level. This paper proposes a novel data simulation method designed
-
Evaluating long-form AI-generated content remains challenging due to the lack of standardized methodologies that robustly align with human judgment across formats such as articles, blogs, and essays. We introduce HALF-Eval, a scalable framework that combines structured, checklist-based evaluation with machine learning aggregation to assess key quality dimensions, including creativity, impact, coherence
-
Humor is a complex yet essential aspect of human communication. It can be defined as a communicative expression establishing surprising, incongruent relationships or meanings to amuse. This paper presents empirical evidence demonstrating the successful application of computational methods to humor recognition in AI generated textual data, specifically jokes. Through experiments on synthetic and open-source
Related content
-
November 11, 2020With a new machine learning system, Alexa can infer that an initial question implies a subsequent request.
-
November 10, 2020Alexa senior applied scientist provides career advice to graduate students considering a research role in industry.
-
November 9, 2020Watch a recording of the EMNLP 2020 session featuring a discussion with Amazon scholars and academics on the state of conversational AI.
-
November 6, 2020Work aims to improve accuracy of models both on- and off-device.
-
November 3, 2020Fourth challenge features four new teams.
-
October 30, 2020Prosody transfer technique addresses the problem of “source speaker leakage”, while prosody selection model better matches prosody to semantic content.