Computer vision

Helping devices see and understand our visual world.

A no-reference model for detecting audio artifacts using pretrained audio neural networks

David Higham, Ayush Bagla, Veneta Haralampieva

WACV 2022

2022

This work presents a No-Reference model to detect audio artifacts in video. The model, based upon a Pretrained Audio Neural Network, classifies a 1-second audio segment as either No Defect, Audio Hum, Audio Hiss, Audio Distortion or Audio Clicks. The model achieves a balanced accuracy of 0.986 on our proprietary simulated dataset.

Related: How Prime Video uses machine learning to ensure video quality

Computer vision
MAPS: Multimodal attention for product similarity

Nilotpal Das, Aniket Joshi, Promod Yenigalla, Gourav Agrwal

WACV 2022

2022

Learning to identify similar products in the e-commerce domain has widespread applications such as ensuring consistent grouping of the products in the catalog, avoiding duplicates in the search results, etc. Here, we address the problem of learning product similarity for highly challenging real-world data from the Amazon catalog. We define it as a metric learning problem, where similar products are projected

Related: Using computer vision to weed out product catalogue errors

Computer vision
SeeTek: Very large-scale open-set logo recognition with text-aware metric learning

Chenge Li, István Fehérvári, Xiaonan Zhao, Ives Macedo, Srikar Appalaraju

WACV 2022

2022

Recent advances in deep learning and computer vision have set new state of the art in logo recognition. Logo recognition has mostly been approached as a closed-set object recognition problem and more recently as an open-set retrieval problem. Current approaches suffer from distinguishing visually similar logos, especially in open-set retrieval for very large-scale applications with thousands of brands.

Computer vision
NUTA: Non-uniform temporal aggregation for action recognition

Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Hao Chen, Joe Tighe

WACV 2022

2022

In the world of action recognition research, one primary focus has been on how to construct and train networks to model the spatial-temporal volume of an input video. These methods typically uniformly sample a segment of an input clip (along the temporal dimension). However, not all parts of a video are equally important to determine the action in the clip. In this work, we focus instead on learning where

Related: WACV: Transformers for video and contrastive learning

Computer vision
SSCAP: Self-supervised co-occurrence action parsing for unsupervised temporal action segmentation

Zhe Wang, Hao Chen, Xinyu Li, Chunhui Liu, Yuanjun Xiong, Joe Tighe, Charless Fowlkes

WACV 2022

2022

Temporal action segmentation is a task to classify each frame in the video with an action label. However, it is quite expensive to annotate every frame in a large corpus of videos to construct a comprehensive supervised training dataset. Thus in this work we propose an unsupervised method, namely SSCAP, that operates on a corpus of unlabeled videos and predicts a likely set of temporal segments across the

Related: WACV: Transformers for video and contrastive learning

Computer vision

Johns Hopkins University

René Vidal wins Edward J. McCluskey Technical Achievement Award

Staff writer

February 2, 2022

The Amazon Scholar and Johns Hopkins University professor was honored for “pioneering contributions to subspace clustering”.

Computer vision
Hierarchical representations improve image retrieval

Muhammet Bastan

January 14, 2022

A new metric-learning loss function groups together superclasses and learns commonalities within them.

Computer vision
Using computer vision to weed out product catalogue errors

Nilotpal Das

January 10, 2022

Method uses metric learning to determine whether images depict the same product.

Computer vision
WACV: Transformers for video and contrastive learning

Larry Hardesty

January 6, 2022

Amazon’s Joe Tighe on the major trends he sees in the field of computer vision.

Computer vision
How deep learning is reducing Amazon’s packaging waste

Sean O'Neill

January 4, 2022

A combination of deep learning, natural language processing, and computer vision enables Amazon to hone in on the right amount of packaging for each product.

Sustainability
Amazon presents new method for "debugging" machine learning models

Dylan Slack, Nathalie Rauschmayr

December 7, 2021

Synthetic data produced by perturbing test inputs identify error classes and provide additional data for retraining.

Computer vision

Computer vision

Recent publications

Related content

Work with us