SST: Semantic and structural transformers for hierarchy-aware language models in e-commerce
Hierarchies are common structures used to organize data, such as e-commerce hierarchies associated with product data. With these product hierarchies, we aim to learn hierarchy-aware product text embeddings to improve fine-tuning performance on a variety of downstream e-commerce tasks. Existing methods leverage hierarchies by either aligning the text embeddings to separate hierarchical embeddings or by aligning the hierarchical information implicitly within a unified text Transformer. Although these models optimize to predict hierarchy information, performing further fine-tuning on new tasks is non-trivial. To bridge this gap, we propose a pre-training architecture to implicitly encode the hierarchy within the product text and then directly leverage a sub-set of the pre-training model during fine-tuning. Pre-training is done through Semantic and Structural Transformers (SST) where the Semantic-Transformer first encodes the product text into a contextual embedding, which is then used by the Structural-Transformer to infer the product’s path in the hierarchy. Fine-tuning is done using only the initial Semantic-Transformer, now that hierarchy-aware text embeddings are learned. With this design, we eliminate the need of linking each fine-tuning dataset with corresponding hierarchies. This leads to fine-tuning performance improvements on critical e-commerce downstream tasks over the existing state-of-the-art hierarchy models, even when hierarchy data is available during fine-tuning. Moreover, this improvement is consistent even after augmenting our baseline models to support fine-tuning. We conclude by discussing how such implicit structural encodings can be leveraged beyond the e-commerce domain.