Leveraging structural information in tree ensembles for table representation learning

Nikhil Pattisapu; Siva Rajesh Kasa; Sumegh Roychowdhury; Karan Gupta; Anish Bhanushali; Prasanna Srinivasa Murthy

Publication

Leveraging structural information in tree ensembles for table representation learning

By Nikhil Pattisapu, Siva Rajesh Kasa, Sumegh Roychowdhury, Karan Gupta, Anish Bhanushali, Prasanna Srinivasa Murthy

2025

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Tabular data is one of the most common data formats found in the web and used in domains like finance, banking, e-commerce and medical. Although deep neural networks (DNNs) have demonstrated outstanding performance on homogeneous data such as visual, audio, and textual data, tree ensemble methods such as Gradient Boosted Decision Trees (GBDTs) are often the go-to choice for supervised machine learning problems involving heterogeneous tabular data. However, a major limitation of these methods lies in the difficulty of plugging-in other modalities (like text, images), as is achievable with deep learning (DL) models. To bridge this gap, researchers have put forth a multitude of DL approaches tailored specifically for tabular data. In this work, we propose a new path embedding-based method to harness the structural information from tree ensembles to improve tabular data representation. Our approach not only demonstrates superior performance compared to existing DL models for tabular classification tasks but also outperforms competitive baselines when combined with textual data in multimodal tabular transformers.

Leveraging structural information in tree ensembles for table representation learning

Latest news

Work with us