-
WACV 20242024Data augmentation is vital for object detection tasks that require expensive bounding box annotations. Recent successes in diffusion models have inspired the use of diffusion-based synthetic images for data augmentation. However, existing works have primarily focused on image classification, and their applicability to boost object detection’s performance remains unclear. To address this gap, we propose
-
WACV 20242024The asymmetrical retrieval setting is a well suited solution for resource constrained applications such as face recognition and image retrieval. In this setting, a large model is used for indexing the gallery while a lightweight model is used for querying. The key principle in such systems is ensuring that both models share the same embedding space. Most methods in this domain are based on knowledge distillation
-
WACV 20242024Grounding-based vision and language models have been successfully applied to low-level vision tasks, aiming to precisely locate objects referred in captions. The effectiveness of grounding representation learning heavily relies on the scale of the training dataset. Despite being a useful data enrichment strategy, data augmentation has received minimal attention in existing vision and language tasks as augmentation
-
WACV 20242024Deep Metric Learning (DML) methods aim at learning an embedding space in which distances are closely related to the inherent semantic similarity of the inputs. Previous studies have shown that popular benchmark datasets often contain numerous wrong labels, and DML methods are susceptible to them. Intending to study the effect of realistic noise, we create an ontology of the classes in a dataset and use
-
WACV 20242024Object detection is a fundamental problem in computer vision, whose research has primarily focused on unimodal models, solely operating on visual data. However, in many real-world applications, data from multiple modalities may be available, such as text accompanying the visual data. Leveraging traditional models on these multi-modal data sources may lead to difficulties in accurately delineating object
Related content
-
June 24, 2022Technique that mixes public and private training data can meet differential-privacy criteria while cutting error increase by 60%-70%.
-
June 24, 2022The field motivated him to pursue a PhD, which eventually led him to Amazon.
-
June 23, 2022EMVA Young Professional Award honors “outstanding and innovative work of a student or a young professional in the field of machine vision or image processing.”
-
June 22, 2022CVPR papers examine the recovery of 3-D information from camera movement and learning general representations from weakly annotated data.
-
June 21, 2022How she moved across the world to discover a passion for (and a career in) machine learning.
-
June 20, 2022Amazon’s director of applied science in Adelaide, Australia, believes the economic value of computer vision has “gone through the roof".