One-stage object referring with gaze estimation

Jianhang Chen; Xu Zhang; Yue (Rex) Wu; Shalini Ghosh; Pradeep Natarajan; Shih-Fu Chang; Jan Allebach

Publication

One-stage object referring with gaze estimation

By Jianhang Chen, Xu Zhang, Yue (Rex) Wu, Shalini Ghosh, Pradeep Natarajan, Shih-Fu Chang, Jan Allebach

2022

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

The classic object referring task aims at localizing the referred object in the image and requires a reference image and a natural language description as inputs. Given the facts that gaze signal can be easily obtained by a modern human-computer interaction system with a camera and that human tends to look at the object when referring to it, we propose a novel gaze-assisted object referring framework. The formulation not only simplifies the state-of-the-art gaze-assisted object referring system requiring many input signals besides gaze, but also incorporates the one-stage object detection idea to improve the inference efficiency. More importantly, it implicitly considers all object candidates and thus resolves the main pain point of existing two-stage object referring solutions for proposing an appropriate number of candidates – it cannot be too large, otherwise the computational cost can be prohibitive; it cannot be too small, otherwise the chance of missing a referred object can be significant. To utilize the gaze information, we propose to build a gaze heatmap by using the anchor position encoding map and the gaze prediction result. The gaze heatmap and the language feature are then merged into the feature pyramid in the object detection as the final one-stage referring system. In the CityScapes-OR dataset, the proposed method outperforms the state-of-the-art by 7.8% for Acc@1.

One-stage object referring with gaze estimation

Latest news

Work with us