Disentangling user conversations with voice assistants for online shopping
2023
Conversation disentanglement aims to identify and group utterances from a conversation into separate threads. Existing methods in the literature primarily focus on disentangling multi-party conversations involving three or more speakers, which enables their models to explicitly or implicitly incorporate speaker-related feature signals while disentangling. Most existing models require a large amount of human annotated data for model training, and often focus largely on pairwise relations between utterances, not accounting much for the conversational context. In this work, we propose a multi-task learning approach with a contrastive learning objective, DiSC, to disentangle conversations between two speakers – a user and a virtual speech assistant, for a novel domain of e-commerce. We analyze multiple ways and granularities to define conversation “threads”. DiSC jointly learns the relation between pairs of utterances, as well as between utterances and their respective thread context. We train and evaluate our models on multiple multi-threaded conversation datasets that were automatically created, without any human labeling effort. Experimental results on public datasets as well as real-world shopping conversations from a commercial speech assistant show that DiSC outperforms state-of-the-art baselines by at least 3%, across both automatic and human evaluation metrics. We also demonstrate how DiSC improves down-stream dialog response generation in the shopping domain.
Research areas