Two-stream hybrid attention network for multimodal classification

Qipin Chen; Zhenyu Shi; Zhen ZUO; Jinmiao Fu; Yi Sun

Publication

Two-stream hybrid attention network for multimodal classification

By Qipin Chen, Zhenyu Shi, Zhen ZUO, Jinmiao Fu, Yi Sun

2021

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

On modern e-commerce platforms like Amazon, the number of products is fast growing, precise and efficient product classification becomes a key lever to great customer shopping experience. To tackle the large-scale product classification problem, a major challenge is how to leverage multimodal product information (e.g., image, text). One of the most successful directions is the attention-based deep multimodal learning, where there are mainly two types of frameworks: 1) keyless attention, which learns the importance of features within each modal; and 2) key-based attention, which learns the importance of features using other modalities. In this paper, we propose a novel Two-stream Hybrid Attention Network (HANet), which leverages both key-based and keyless attention mechanisms to capture the key information across product image and title modalities. We experimentally show that our HANet achieves state-of-the-art performance on Amazon-scale product classification problem.

Two-stream hybrid attention network for multimodal classification

Latest news

Work with us