Self-supervised pre-training and semi-supervised learning for extractive dialog summarization
2023
Language model pre-training has led to state-of-the-art performance in text summarization. While a variety of pre-trained transformer models are available nowadays, they are mostly trained on documents. In this study we introduce self-supervised pre-training to enhance the BERT model’s semantic and structural understanding of dialog texts from social media. We also propose a semisupervised teacher-student learning framework to address the common issue of limited available labels in summarization datasets. We empirically evaluate our approach on extractive summarization task with the TWEETSUMM corpus, a recently introduced dialog summarization dataset from Twitter customer care conversations and demonstrate that our self-supervised pre-training and semisupervised teacher-student learning are both beneficial in comparison to other pre-trained models. Additionally, we compare pretraining and teacher-student learning in various low data-resource settings, and find that pre-training outperforms teacher-student learning and the differences between the two are more significant when the available labels are scarce.
Research areas