Dual attentional higher order factorization machines
Numerous problems of practical significance such as clickthrough rate (CTR) prediction, forecasting, tagging and so on, involve complex interaction of various user, item and context features. Manual feature engineering has been used in the past to model these combinatorial features but it requires domain expertise and becomes prohibitively expensive as the number of features increases. Feedforward neural networks alleviate the need for manual feature engineering to a large extent and have shown impressive performance across multiple domains due to their ability to learn arbitrary functions. Despite multiple layers of non-linear projections, neural networks are limited in their ability to efficiently model functions with higher order interaction terms. In recent years, Factorization Machines and its variants have been proposed to explicitly capture higher order combinatorial interactions. However not all feature interactions are equally important, and in sparse data settings, without a suitable suppression mechanism, this might result into noisy terms during inference and hurt model generalization. In this work we present Dual Attentional Higher Order Factorization Machine (DA-HoFM), a unified attentional higher order factorization machine which leverages a compositional architecture to compute higher order terms with complexity linear in terms of maximum interaction degree. Equipped with sparse dual attention mechanism, DA-HoFM summarizes interaction terms at each layer, and is able to efficiently select important higher order terms. We empirically demonstrate effectiveness of our proposed models on the task of CTR prediction, where our model exhibits superior performance compared to the recent state-of-the-art models, outperforming them by up to 6.7% on the logloss metric.