Computer vision

Helping devices see and understand our visual world.

Selective feature compression for efficient activity recognition inference

Chunhui Liu, Xinyu (Arthur) Li, Hao Chen, Davide Modolo, Joe Tighe

ICCV 2021

2021

Most action recognition solutions rely on dense sampling to precisely cover the informative temporal clip. Extensively searching the temporal region is expensive for a real-world application. In this work, we focus on improving the inference efficiency of current action recognition backbones on trimmed videos and illustrate that an action model can accurately classify an action with a single pass over the

Computer vision
Long Short-Term Transformer for online action detection

Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu (Arthur) Li, Wei Xia, Zhuowen Tu, Stefano Soatto

NeurIPS 2021

2021

We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data. It consists of an LSTR encoder that dynamically leverages coarse-scale historical information from an extended temporal window (e.g., 2048 frames spanning of up to 8 minutes), together with an LSTR decoder that focuses

Computer vision
Blending anti-aliasing into vision transformer

Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia

NeurIPS 2021

2021

The transformer architectures, based on self-attention mechanism and convolution-free design, recently found superior performance and booming applications in computer vision. However, the discontinuous patch-wise tokenization process implicitly introduces jagged artifacts into attention maps, arising the traditional problem of aliasing for vision transformers. Aliasing effect occurs when discrete patterns

Computer vision
Progressive coordinate transforms for monocular 3D object detection

Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, Xiangyang Xue

NeurIPS 2021

2021

Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment. While significant progress has been achieved with expensive LiDAR point clouds, it poses a great challenge for 3D object detection given only a monocular image. While there exist different alternatives for tackling this problem, it is found that they are either equipped with heavy

Computer vision
CrossCLR: Cross-modal contrastive learning for multi-modal video representations

Mohammadreza Zolfaghari, Yi Zhu, Peter Gehler, Thomas Brox

ICCV 2021

2021

Contrastive learning allows us to flexibly define powerful losses by contrasting positive pairs from sets of negative samples. Recently, the principle has also been used to learn cross-modal embeddings for video and text, yet without exploiting its full potential. In particular, previous losses do not take the intra-modality similarities into account, which leads to inefficient embeddings, as the same content

Computer vision

Courtesy Alla Sheffer

Amazon Scholar Alla Sheffer uses computer graphics to drive improvements in garment sizing and fitting

Douglas Gantenbein

February 24, 2021

Complex algorithms promise to fundamentally change a craft that still relies almost entirely on handwork.

Computer vision
Credit: Glynis Condon

Growing generative adversarial networks, layer by layer

Yuting Zhang

February 16, 2021

A new approach that grows networks dynamically promises improvements over GANs with fixed architectures or predetermined growing strategies.

Machine learning
Prime Video's work on sports field registration, recap/intro detection

Raffay Hamid

January 15, 2021

Two papers at WACV propose neural models for enhancing video-streaming experiences.

Computer vision
Credit: Photos courtesy of the speakers

Amazon at WACV: Computer vision is more than labeling pixels

Larry Hardesty

January 8, 2021

Amazon distinguished scientist Gérard Medioni on the complexities of “understanding your environment through visual input”.

Computer vision
Credit: Glynis Condon

The science behind Amazon's new StyleSnap for Home feature

Liz Sheeley

December 22, 2020

StyleSnap for fashion and home features are made possible by use of multiple convolutional neural networks.

Search and information retrieval
How a ‘Think Big’ idea helped bring Lookout for Vision to life

Staff writer

December 3, 2020

Learn about the science behind the new machine learning product for manufacturers — and how a unique approach solved a complex problem.

Machine learning

Computer vision

Recent publications

Related content

Work with us