Robust actor recognition in entertainment multimedia at scale
Actor identification and localization in movies and TV series seasons can enable deeper engagement with the content. Manual actor identification and tagging at every time-instance in a video is error prone as it is a highly repetitive, decision intensive and time-consuming task. The goal of this paper is to accurately label as many faces as possible in the video with actor names. We solve this problem using a multi-step clustering process followed by a selection of face-instances that are: (a) representative of their member clusters and (b) aesthetically pleasing for visual identification. These face-instances can be matched with the actor names by automated or manual techniques to complete actor tagging. This solution is further optimized for seasons with repeating cast members which constitutes majority of the entertainment multimedia content. In such titles, the face labels from the previous episodes are efficiently used to pre-label faces in the subsequent episode. We guarantee the same level of accuracy even after scaling the solution to TV series seasons. This novel solution works in a completely realistic setup where the input to the solution is just the raw video. This is the first known work which has proved its robustness on more than 5000 TV episodes and movies across different genres, languages and runtimes with actors of diverse ethnicity, race, gender identity, age, etc. The proposed solution establishes a new state-of-the-art for cluster purity in both movies and TV series seasons by achieving near-perfect cluster homogeneity.