Computer vision

Helping devices see and understand our visual world.

GNHK: A dataset for English handwriting in the wild

Alex W. C. Lee, Jonathan Chung, Marco Lee

ICDAR 2021

2021

In this paper, we present the GoodNotes Handwriting Kollection (GNHK) dataset. The GNHK dataset includes unconstrained camera-captured images of English handwritten text sourced from different regions around the world. The dataset is modeled after scene text datasets allowing researchers to investigate new localisation and text recognition techniques. We presented benchmark text localisation and recognition

Computer vision
Assessment of subjective and objective quality of live streaming sports videos

Zaixi Shang, Joshua P. Ebenezer, Alan C. Bovik, Yongjun Wu, Hai Wei, Sriram Sethuraman

35th Picture Coding Symposium

2021

Video live streaming is gaining prevalence among video streaming services, especially for the delivery of popular sporting events. Many objective Video Quality Assessment (VQA) models have been developed to predict the perceptual quality of videos. Appropriate databases that exemplify the distortions encountered in live streaming videos are important to designing and learning objective VQA models. Towards

Computer vision
A unified efficient pyramid transformer for semantic segmentation

Fangrui Zhu, Yi Zhu, Li Zhang, Chongruo Wu, Yanwei Fu, Mu Li

ICCV 2021 Workshop on the 1st Video Scene Parsing in the Wild Challenge

2021

Semantic segmentation is a challenging problem due to difficulties in modeling context in complex scenes and class confusions along boundaries. Most literature either focuses on context modeling or boundary refinement, which is less generalizable in open-world scenarios. In this work, we advocate a unified framework (UN-EPT) to segment objects by considering both context information and boundary artifacts

Computer vision
TesseTrack: End-to-end learnable multi-person articulated 3D pose tracking

N Dinesh Reddy, Laurent Guigues, Leonid Pishchulin, Jayan Eledath, Srinivasa G. Narasimhan

CVPR 2021

2021

We consider the task of 3D pose estimation and tracking of multiple people seen in an arbitrary number of camera feeds. We propose TesseTrack1, a novel top-down approach that simultaneously reasons about multiple individuals’ 3D body joint reconstructions and associations in space and time in a single end-to-end learnable framework. At the core of our approach is a novel spatio-temporal formulation that

Computer vision
LUMINOUS: Indoor scene generation for embodied AI challenges

Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi (QZ) Gao, Govind Thattai, Jesse Thomason, Gaurav Sukhatme

NeurIPS 2021 Workshop on CtrlGen

2021

Learning-based methods for training embodied agents typically require a large number of high-quality scenes that contain realistic layouts and support meaningful interactions. However, current simulators for Embodied AI (EAI) challenges only provide simulated indoor scenes with a limited number of layouts. This paper presents LUMINOUS, the first research framework that employs stateof-the-art indoor scene

Computer vision

Courtesy Alla Sheffer

Amazon Scholar Alla Sheffer uses computer graphics to drive improvements in garment sizing and fitting

Douglas Gantenbein

February 24, 2021

Complex algorithms promise to fundamentally change a craft that still relies almost entirely on handwork.

Computer vision
Credit: Glynis Condon

Growing generative adversarial networks, layer by layer

Yuting Zhang

February 16, 2021

A new approach that grows networks dynamically promises improvements over GANs with fixed architectures or predetermined growing strategies.

Machine learning
Prime Video's work on sports field registration, recap/intro detection

Raffay Hamid

January 15, 2021

Two papers at WACV propose neural models for enhancing video-streaming experiences.

Computer vision
Credit: Photos courtesy of the speakers

Amazon at WACV: Computer vision is more than labeling pixels

Larry Hardesty

January 8, 2021

Amazon distinguished scientist Gérard Medioni on the complexities of “understanding your environment through visual input”.

Computer vision
Credit: Glynis Condon

The science behind Amazon's new StyleSnap for Home feature

Liz Sheeley

December 22, 2020

StyleSnap for fashion and home features are made possible by use of multiple convolutional neural networks.

Search and information retrieval
How a ‘Think Big’ idea helped bring Lookout for Vision to life

Staff writer

December 3, 2020

Learn about the science behind the new machine learning product for manufacturers — and how a unique approach solved a complex problem.

Machine learning

Computer vision

Recent publications

Related content

Work with us