Relevance under the iceberg: Reasonable prediction for extreme multi-label classification
In the era of big data, eXtreme Multi-label Classification (XMC) has already become one of the most essential research tasks to deal with enormous label spaces in machine learning applications. Instead of assessing every individual label, most XMC methods rely on label trees or filters to derive short ranked label lists as prediction, thereby reducing computational overhead. Specifically, existing studies obtain ranked label lists with a fixed length for prediction and evaluation. However, these predictions are unreasonable since data points have varied numbers of relevant labels. The greatly small and large list lengths in evaluation, such as Precision@5 and Recall@100, can also lead to the ignorance of other relevant labels or the tolerance of many irrelevant labels. In this paper, we aim to provide reasonable prediction for extreme multi-label classification with dynamic numbers of predicted labels. In particular, we propose a novel framework, Model-Agnostic List Truncation with Ordinal Regression (MALTOR), to leverage the ranking properties and truncate long ranked label lists for better accuracy. Extensive experiments conducted on six large-scale real-world benchmark datasets demonstrate that MALTOR significantly outperforms statistical baseline methods and conventional ranked list truncation methods in ad-hoc retrieval with both linear and deep XMC models. The results of an ablation study also shows the effectiveness of each individual component in our proposed MALTOR.