Towards automated distillation: A systematic study of knowledge distillation in natural language processing

Haoyu He; Xingjian Shi; Jonas Mueller; Sheng Zha; Mu Li; George Karypis

Publication

Towards automated distillation: A systematic study of knowledge distillation in natural language processing

By Haoyu He, Xingjian Shi, Jonas Mueller, Sheng Zha, Mu Li, George Karypis

2022

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Key factors underpinning the optimal Knowledge Distillation (KD) performance remain elusive as the effects of these factors are often confounded in sophisticated distillation algorithms. This poses a challenge for choosing the best distillation algorithm from the large design space for existing and new tasks alike and hinders automated distillation. In this work, we aim to identify how the distillation performance across different tasks is affected by the components in the KD pipeline, such as the data augmentation policy, the loss function, and the intermediate knowledge transfer between the teacher and the student. To isolate their effects, we propose Distiller, a meta-KD framework that systematically combines the key distillation techniques as components across different stages of the KD pipeline. Distiller enables us to quantify each component’s contribution and conduct experimental studies to derive insights about distillation performance: 1) the approach used to distill the intermediate representations is the most important factor in KD performance, 2) the best-performed distillation algorithms are quite different across various tasks, and 3) data augmentation provides a large boost for small training datasets or small student networks. Based on these insights, we propose a simple AutoDistiller algorithm that can recommend a close-to-optimal KD pipeline for a new dataset/task. This is the first step toward automated KD that can save engineering costs and democratize practical KD applications.

Towards automated distillation: A systematic study of knowledge distillation in natural language processing

Latest news

Work with us