Detecting text-rich objects: OCR or object detection? A case study with stopwatch detection
2023
In this paper, we study the problem of detecting objects with rich textual features from images. One such example is to detect stopwatch regions from sports videos. We propose a novel approach that combines image feature with text features for object detection, and benchmark against traditional OCR-based method and object detection method using image feature only. In particular, we modify the Faster R-CNN model to accommodate input images with more than three channels, with the additional channels corresponding to text features. We demonstrate the effectiveness of our proposed method through extensive experiments on various sports datasets and analyze its performance in terms of accuracy and robustness.
Research areas