-
CHI 20252025In this paper, we advance the study of AI-augmented reasoning in the context of Human-Computer Interaction (HCI), psychology and cognitive science, focusing on the critical task of visual perception. Specifically, we investigate the applicability of Multimodal Large Language Models (MLLMs) in this domain. To this end, we leverage established principles and explanations from psychology and cognitive science
-
Visual agent models for automating human activities on Graphical User Interfaces (GUIs) have emerged as a promising research direction, driven by advances in large Vision Language Models (VLMs). A critical challenge in GUI automation is the precise grounding of interface elements across diverse platforms. Existing vision-only GUI agents directly ground elements from large and cluttered screenshots, requiring
-
2025Language models have led to a leap forward in web automation. The current web automation approaches take the current web state, history actions, and language instruction as inputs to predict the next action, overlooking the importance of history states. However, the highly verbose nature of web page states can result in long input sequences and sparse information, hampering the effective utilization of
-
2025Automatic evaluation of retrieval augmented generation (RAG) systems relies on fine grained dimensions like faithfulness and relevance, as judged by expert human annotators. Meta-evaluation benchmarks support the development of automatic evaluators that correlate well with human judgement. However, existing benchmarks predominantly focus on English or use translated data, which fails to capture cultural
-
Recent large language models have shown promising capabilities in long-form reasoning, following structured chains of thought before arriving at a final answer. However, we observe that these reasoning paths tend to include substantial redundancy; analyzing attention patterns reveals that attention scores are widely scattered, particularly incorrect answers exhibit greater attention sparsity. In this paper
Related content
-
March 2, 2021The newest chapter addresses a problem that often bedevils nonparametric machine learning models.
-
March 1, 2021The Art Museum skill uses Alexa Conversations, an AI-driven dialogue management tool.
-
February 8, 2021Technique that relies on inverse reinforcement learning, or learning by example, improves task completion rate by 14% to 17% in simulations.
-
February 8, 2021Yanagisawa discusses the science behind Alexa's new bilingual Polyglot model, her career in speech research, and more.
-
February 3, 2021Neural text-to-speech enables new multilingual model to use the same voice for Spanish and English responses.
-
January 26, 2021Sneha Rajana is an applied scientist at Amazon today, but she didn't start out that way. Learn how she made the switch, and the advice she has for others considering a similar change.