Mixture of domain experts for language understanding: An analysis of modularity, task performance, and memory tradeoffs
One of the limitations of large-scale machine learning models is that they are difficult to adjust after deployment without significant re-training costs. In this paper, we focus on NLU and the needs of virtual assistant systems to continually update themselves through time to support new functionality. Specifically, we consider the tasks of intent classification (IC) and slot filling (SF), which are fundamental to processing user interaction with virtual assistants. We studied six different architectures with varying degrees of modularity in order to gain insights into the performance implications of designing models for flexible updates through time. Our experiments on the SLURP dataset, modified to simulate the real-world experience of adding new intents over time, show that a single dense model yields 2.5 – 3.5 points of average improvement versus individual domain models, but suffers a median degradation of 0.4 – 1.1 points as the new intents are incorporated. We present a mixture-of-experts based hybrid system that performs within 2.1 points of the dense model in exact match accuracy while either improving median performance for untouched domains through time or only degrading by 0.1 points at worst.