Weakly supervised hierarchical multi-task classification of customer questions
Identifying granular and actionable topics from customer questions (CQ) posted on e-commerce websites helps surface the missing information on the product detail page expected by customers before making a purchase. Insights on missing information on product page helps brands and sellers enrich the catalog quality to improve the overall customer experience (CX). In this paper, we propose a weakly supervised Hierarchical Multi-task Classification Framework (HMCF) to identify topics from customer questions at various granularities. Complexity lies in creating a list of granular topics (taxonomy) for thousands of product categories and building a scalable classification system. To this end, we introduce a clustering based Taxonomy Creation and Data Labeling (TCDL) module for creating taxonomy and labelled data with minimal supervision. Using the TCDL module, taxonomy and labelled data creation effort by subject matter expert reduces to 2 hours as compared to 2 weeks. For classification, we propose a two level HMCF that performs multi-class classification to identify coarse level-1 topic and leverages NLI based label-aware approach to identify granular level-2 topic. We showcase that HMCF (based on BERT and NLI) a) achieves an absolute improvement of 13% in Top-1 accuracy over single-task non-hierarchical baselines b) learns a generic domain invariant function that can adapt to a constantly evolving taxonomy (open label set) without need of re-training. c) reduces model deployment efforts significantly since it needs only one model that caters to thousands of product categories.