Multi-object tracking with hallucinated and unlabeled video

Daniel McKee; Bing Shuai; Andrew Berneshawi; Manchen Wang; Davide Modolo; Svetlana Lazebnik; Joe Tighe

Publication

Multi-object tracking with hallucinated and unlabeled video

By Daniel McKee, Bing Shuai, Andrew Berneshawi, Manchen Wang, Davide Modolo, Svetlana Lazebnik, Joe Tighe

2021

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

In this paper, we explore learning end-to-end deep neural trackers without tracking annotations. This is important as large-scale training data is essential for deep neural trackers, while tracking annotations are expensive to acquire. We first hallucinate videos from images with bounding box annotations using motion transformations along with simulated video effects to create a diverse tracking dataset. We then use a tracker trained from our hallucinated data to mine hard examples from a pool of unlabeled real videos. We propose an optimization-based connecting process to first identify and then rectify hard examples from the unlabeled videos. The output of this process is a set of mined hard examples with refined pseudo labels. We train jointly on hallucinated data and mined hard video examples, and our tracker achieves state-of-the-art performance on the MOT17 and TAO-person datasets.

Multi-object tracking with hallucinated and unlabeled video

Latest news

Work with us