-
RecSys 20242024For many recommender systems, the primary data source is a his-torical record of user clicks. The associated click matrix is often very sparse, as the number of users × products can be far larger than the number of clicks. Such sparsity is accentuated in cold-start settings, which makes the efficient use of metadata information of paramount importance. In this work, we propose a simple approach to address
-
ICPR 20242024This paper improves upon existing data pruning methods for image classification by introducing a novel pruning metric and pruning procedure based on importance sampling. The proposed pruning metric explicitly accounts for data separability, data integrity, and model uncertainty, while the sampling procedure is adaptive to the pruning ratio and considers both intra-class and inter-class separation to further
-
JSM 20242024SHAP (SHapley Additive exPlanations) is widely used in machine learning model explanations nowadays, especially for complex and black-box models (deep learning models, ensemble models). SHAP assigns a feature contribution to every record. Users can check each individual record feature contribution or use the mean absolute SHAP values over the entire dataset as the SHAP feature importance. But it’s not uncommon
-
AutoML 20242024Abstract We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1310 models evaluated on 200 classification and regression datasets. We illustrate the benefit of our dataset in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering
-
2024Model performance evaluation is a critical and expensive task in machine learning and computer vision. Without clear guidelines, practitioners often estimate model accuracy using a one-time completely random selection of the data. However, by employing tailored sampling and estimation strategies, one can obtain more precise estimates and reduce annotation costs. In this paper, we propose a statistical framework
Related content
-
July 13, 2021Innovative faculty proposals will explore various aspects of trustworthy machine learning.
-
July 07, 2021James Hensman joins an effort to expand machine learning talent for UN sustainability goals.
-
June 29, 2021How Amazon’s Delivery Experience team acts as a concierge for customers.
-
June 28, 2021Didn't get the opportunity to attend the summit earlier this month? Now available on demand: Presentations on the science of machine learning by leading scholars, a fireside chat with Andrew Ng, and more career-growth content.
-
June 22, 2021Scientists describe the use of privacy-preserving machine learning to address privacy challenges in XGBoost training and prediction.
-
June 21, 2021Özer’s paper published in INFORMS’ Management Science 2021 explores the dynamics behind “cheap-talk” communications.