Improving natural language understanding accuracy by pre-adapting to live traffic
2021
Virtual assistants enable users to interact with a large number of services in natural language. Third-party developers building new applications for virtual assistants often have limited annotation resources and find it challenging to procure large amounts of suitable training data, opting instead for limited collections of sample utterance templates, annotated with their semantics. We can enrich such collections by synthesizing more examples based on the given templates, but the resulting utterance distribution will still be quite different from the distribution of actual user utterances in the wild. We treat this as a domain adaptation problem from developer-provided sample utterances to live utterances, and we apply adversarial training between them to mitigate their gap. In addition, we show that we can achieve this in the pre-training stage as pre-adaptation.We demonstrate consistent improvements across different test sets in two different languages.
Research areas