Measuring feature quality for improved ranking performance
2023
Learning-to-rank models are mostly evaluated based on how good it is able to estimate the user behaviour. Output metrics like NDCG become the obvious choice for the purpose. A model is considered to have a good performance if it is able to predict the correct ranked ordering, else it is considered to be of poor quality. However the performance of a model is not only dependent on the prediction power of the model but also the quality of input features. Hence evaluation via output metrics like NDCG does not truly reflect the underlying problem. In this paper we introduce a simple feature coverage metric (FeCo) that can be used for tracking feature quality for diagnostic purpose as well as to get insight into model performance. Our experiments show that FeCo score is correlated with output metrics like NDCG. We also found that even a small change in FeCo score during training can have significant impact on the feature’s contribution to the model. Our findings provide a perspective of having a 360 degree evaluation of model performance for ranking in production setup.
Research areas