Self-supervised incremental learning of object representations from arbitrary image sets

George Leotescu; Alin-Ionut Popa; Diana Grigore; Daniel Voinea; Pietro Perona

Publication

Self-supervised incremental learning of object representations from arbitrary image sets

By George Leotescu, Alin-Ionut Popa, Diana Grigore, Daniel Voinea, Pietro Perona

2025

Download Copy BibTeX GitHub

Share

Download

Copy BibTeX

GitHub

Share

Computing a comprehensive and robust visual representation of an arbitrary object or category of objects is a complex problem. The difficulty increases when one starts from a set of uncalibrated images obtained from different sources. We propose a self-supervised approach, Multi-Image Latent Embedding (MILE), which computes a single representation from such an image set. MILE operates incrementally, considering one image at a time, while processing various depictions of the class through a shared gated cross-attention mechanism. The representations are progressively refined as more available images are incorporated, without requiring additional training. Our experiments on Amazon Berkeley Objects (ABO) and iNaturalist demonstrate the effectiveness in two tasks: object or category-specific image retrieval and unsupervised context-conditioned object segmentation. Moreover, the proposed multi-image input setup opens new frontiers for the task of object retrieval. Our studies indicate that our models can capture descriptive representations that better encapsulate the intrinsic characteristics of the objects.

Self-supervised incremental learning of object representations from arbitrary image sets

Latest news

Work with us