Rethinking zero-shot video classification: end-to-end training for realistic applications

Biagio Brattoli; Joe Tighe; Fedor Zhdanov; Pietro Perona; Krzysztof Chalupka

Publication

Rethinking zero-shot video classification: end-to-end training for realistic applications

By Biagio Brattoli, Joe Tighe, Fedor Zhdanov, Pietro Perona, Krzysztof Chalupka

2020

Download Copy BibTeX GitHub

Share

Download

Copy BibTeX

GitHub

Share

Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes.However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin.

Rethinking zero-shot video classification: end-to-end training for realistic applications

Latest news

Work with us