PLAN-Bot: Contextualized and knowledge-grounded multimodal taskbot
We introduce PLAN-Bot, a knowledge-grounded multimodal taskbot to guide users through completing real-world cooking or do-it-yourself (DIY) tasks in the Alexa TaskBot Challenge 2. A successful task-driven conversational assistant must be effective and engaging, capable of assisting users from task discovery to providing step-by-step instructions for the chosen task. To achieve this, PLAN-Bot incorporates several key features. Firstly, PLAN-Bot is equipped with a robust and adaptable query extraction system to efficiently search for specific tasks or suggest interesting and seasonal activities to the users. Secondly, each task is represented using a hierarchical graph, enabling organized and seamless navigation for users throughout the process. Additionally, PLAN-Bot can answer contextual inquiries related to the selected task by using a knowledge-grounded question-answering module that ensures users receive accurate and informative responses to their questions. Furthermore, we propose the use of fine-grained recipe embeddings, enabling improved cross-modal retrieval tasks and ingredient substitution, and enhancing the overall user experience. To prioritize user safety, PLAN-Bot integrates a robust safety classifier that prevents the bot from providing harmful advice, resulting in more than 98% uptime during the semifinal interaction period. As of June 30, 2023, PLAN-Bot has achieved 3.5 and 3.58 ratings in terms of L7d (↑ .27) and L14d (↑ .18), respectively, while also maintaining a 44.2% (↑ 8.8) completion rate and a 21 conversation resume rate of 19.07%.