Computer vision

Helping devices see and understand our visual world.

What to look at and where: Semantic and spatial refined transformer for detecting human-object interactions

A S M Iftekhar, Hao Chen, Kaustav Kundu, Xinyu (Arthur) Li, Joe Tighe, Davide Modolo

CVPR 2022

2022

We propose a novel one-stage Transformer-based semantic and spatial refined transformer (SSRT) to solve the Human-Object Interaction detection task, which requires to localize humans and objects, and predicts their interactions. Differently from previous Transformer-based HOI approaches, which mostly focus at improving the design of the decoder outputs for the final detection, SSRT introduces two new modules

Computer vision
Transform-retrieve-generate: Natural language-centric outside-knowledge visual question answering

Feng Gao, Qing Ping, Govind Thattai, Aishwarya (Aish) Reganti, Ying Nian Wu, Prem Natarajan

CVPR 2022

2022

Outside-knowledge visual question answering (OKVQA) requires the agent to comprehend the image, make use of relevant knowledge from the entire web, and digest all the information to answer the question. Most previous works address the problem by first fusing the image and question in the multi-modal space, which is inflexible for further fusion with a vast amount of external knowledge. In this paper, we

Computer vision
Omni-DETR: Omni-supervised object detection with transformers

Pei Wang, Zhaowei Cai, Hao Yang, Gurumurthy Swaminathan, Nuno Vasconcelos, Bernt Schiele, Stefano Soatto

CVPR 2022

2022

We consider the problem of omni-supervised object detection, which can use unlabeled, fully labeled and weakly labeled annotations, such as image tags, counts, points, etc., for object detection. This is enabled by a unified architecture, Omni-DETR, based on the recent progress on student-teacher framework and end-to-end transformer based object detection. Under this unified architecture, different types

Computer vision
Leveraging tensor methods in neural architecture search for the automatic development of lightweight convolutional neural networks

Mayur Dhanaraj, Huyen Do, Dinesh Nair, Cong Xu

SPIE DCS 2022 Big Data IV: Learning, Analytics, and Applications

2022

Most state-of-the-art Convolutional Neural Networks (CNNs) are bulky and cannot be deployed on resource-constrained edge devices. In order to leverage the exceptional generalizability of CNNs on edge-devices, they need to be made efficient in terms of memory usage, model size, and power consumption, while maintaining acceptable performance. Neural architecture search (NAS) is a recent approach for developing

Computer vision
MeMOT: Multi-object tracking with memory

Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, Stefano Soatto

CVPR 2022

2022

We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span. This is realized by preserving a large spatio-temporal memory to store the identity embeddings of the tracked objects, and by adaptively referencing and aggregating useful information from the memory as needed. Our model, called MeMOT

Computer vision

Courtesy Alla Sheffer

Amazon Scholar Alla Sheffer uses computer graphics to drive improvements in garment sizing and fitting

Douglas Gantenbein

February 24, 2021

Complex algorithms promise to fundamentally change a craft that still relies almost entirely on handwork.

Computer vision
Credit: Glynis Condon

Growing generative adversarial networks, layer by layer

Yuting Zhang

February 16, 2021

A new approach that grows networks dynamically promises improvements over GANs with fixed architectures or predetermined growing strategies.

Machine learning
Prime Video's work on sports field registration, recap/intro detection

Raffay Hamid

January 15, 2021

Two papers at WACV propose neural models for enhancing video-streaming experiences.

Computer vision
Credit: Photos courtesy of the speakers

Amazon at WACV: Computer vision is more than labeling pixels

Larry Hardesty

January 8, 2021

Amazon distinguished scientist Gérard Medioni on the complexities of “understanding your environment through visual input”.

Computer vision
Credit: Glynis Condon

The science behind Amazon's new StyleSnap for Home feature

Liz Sheeley

December 22, 2020

StyleSnap for fashion and home features are made possible by use of multiple convolutional neural networks.

Search and information retrieval
How a ‘Think Big’ idea helped bring Lookout for Vision to life

Staff writer

December 3, 2020

Learn about the science behind the new machine learning product for manufacturers — and how a unique approach solved a complex problem.

Machine learning

Computer vision

Recent publications

Related content

Work with us