Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

Multimodal LLM augmented reasoning for interpretable visual perception analysis

Shravan Chaudhari, Trilokya Akula, Yoon Kim, Tom Blake

CHI 2025

2025

In this paper, we advance the study of AI-augmented reasoning in the context of Human-Computer Interaction (HCI), psychology and cognitive science, focusing on the critical task of visual perception. Specifically, we investigate the applicability of Multimodal Large Language Models (MLLMs) in this domain. To this end, we leverage established principles and explanations from psychology and cognitive science

Conversational AI
R-VLM: Region-aware vision language model for precise GUI grounding

Joonhyung Park, Peng Tang, Sagnik Das, Srikar Appalaraju, Kunwar Yashraj Singh, R. Manmatha, Shabnam Ghadar

ACL 2025, CVPR 2025

2025

Visual agent models for automating human activities on Graphical User Interfaces (GUIs) have emerged as a promising research direction, driven by advances in large Vision Language Models (VLMs). A critical challenge in GUI automation is the precise grounding of interface elements across diverse platforms. Existing vision-only GUI agents directly ground elements from large and cluttered screenshots, requiring

Computer vision
Turbocharging web automation: The impact of compressed history states

Xiyue Zhu, Peng Tang, Haofu Liao, Srikar Appalaraju

ACL 2025

2025

Language models have led to a leap forward in web automation. The current web automation approaches take the current web state, history actions, and language instruction as inputs to predict the next action, overlooking the importance of history states. However, the highly verbose nature of web page states can result in long input sequences and sparse information, hampering the effective utilization of

Conversational AI
MEMERAG: A multilingual end-to-end meta-evaluation benchmark for retrieval augmented generation

Andrea Cruz, Jayasimha Talur, Bruno Charron, Dong Liu, Saab Mansour, Marcello Federico

ACL 2025

2025

Automatic evaluation of retrieval augmented generation (RAG) systems relies on fine grained dimensions like faithfulness and relevance, as judged by expert human annotators. Meta-evaluation benchmarks support the development of automatic evaluators that correlate well with human judgement. However, existing benchmarks predominantly focus on English or use translated data, which fails to capture cultural

Conversational AI
Think clearly: Improving reasoning via redundant token pruning

Daewon Choi, Jimin Lee, Jihoon Tack, Woomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi, Bhavana Ganesh, Jinwoo Shin, Aram Galstyan, Sravan Babu Bodapati

ICML 2025 Workshop on Efficient Systems for Foundation Models, ARR 2025

2025

Recent large language models have shown promising capabilities in long-form reasoning, following structured chains of thought before arriving at a final answer. However, we observe that these reasoning paths tend to include substantial redundancy; analyzing attention patterns reveals that attention scores are widely scattered, particularly incorrect answers exhibit greater attention sparsity. In this paper

Conversational AI

Dive into Deep Learning adds attention mechanism chapter

Douglas Gantenbein

March 2, 2021

The newest chapter addresses a problem that often bedevils nonparametric machine learning models.

Machine learning
rafalkrakow/Getty Images

Making an art collection browsable by voice

Christina Nunez

March 1, 2021

The Art Museum skill uses Alexa Conversations, an AI-driven dialogue management tool.

Conversational AI
Credit: Glynis Condon

Teaching robots to respond to natural-language commands

Li Zhou

February 8, 2021

Technique that relies on inverse reinforcement learning, or learning by example, improves task completion rate by 14% to 17% in simulations.

Conversational AI
Alexa & Friends features Kayoko Yanagisawa, Alexa AI senior speech scientist

Staff writer

February 8, 2021

Yanagisawa discusses the science behind Alexa's new bilingual Polyglot model, her career in speech research, and more.

Conversational AI
English-language Alexa voice learns to speak Spanish

Kayoko Yanagisawa, Marius Cotescu

February 3, 2021

Neural text-to-speech enables new multilingual model to use the same voice for Spanish and English responses.

Conversational AI
Courtesy of Sneha Rajana

No PhD, no problem: One software engineer’s path to applied science

Staff writer

January 26, 2021

Sneha Rajana is an applied scientist at Amazon today, but she didn't start out that way. Learn how she made the switch, and the advice she has for others considering a similar change.

Conversational AI

Conversational AI

Publications

Related content

Work with us