Computer vision

A quick guide to Amazon’s papers at ICCV 2023

From classic problems like image segmentation and object detection to theoretical topics like data representation and “machine unlearning”, Amazon researchers’ ICCV papers showcase the diversity of their work in computer vision.

By Staff writer

September 29, 2023

3 min read

Amazon’s papers at this year’s International Conference on Computer Vision, organized by topic.

3-D

HAL3D: Hierarchical active learning for fine-grained 3D part labeling
Fenggen Yu, Yiming Qian, Francisca Gil Ureta, Brian Jackson, Eric Bennett, Richard Zhang

ImGeoNet: Image-induced geometry-aware voxel representation for multi-view 3D object detection
Tao Tu, Shun-Po Chuang, Yu-Lun Liu, Cheng Sun, Ke Zhang, Donna Roy, Cheng-Hao Kuo, Min Sun

Action recognition

SkeleTR: Towards skeleton-based action recognition in the wild
Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joe Tighe, Alessandro Bergamo

Data representation

Linear spaces of meanings: Compositional structures in vision-language models
Matthew Trager, Pramuditha Perera, Luca Zancato, Alessandro Achille, Parminder Bhatia, Stefano Soatto

Motion-guided masking for spatiotemporal representation learning
David Fan, Jue Wang, Leo Liao, Yi Zhu, Vimal Bhat, Hector Santos, Rohith Mysore Vijaya Kumar, Xinyu (Arthur) Li

Dubbed-video generation

SIDGAN: High-resolution dubbed video generation via shift-invariant learning
Urwa Muaz, Wondong Jang, Rohun Tripathi, santhosh Mani, Wenbin Ouyang, Ravi Teja Gadde, Baris Gecer, Sergio Elizondo, Reza Madad, Naveen Nair

Geospatial foundation models

Towards geospatial foundation models via continual pretraining
Matias Mendieta, Boran Han, Xingjian Shi, Yi Zhu, Chen Chen

Graph neural networks

Learning adaptive neighborhoods for graph neural networks
Avi Saha, Oscar Mendez, Chris Russell, Richard Bowden

Image retrieval

FashionNTM: Multi-turn fashion image retrieval via cascaded memory
Anwesan Pal, Sahil Wadhwa, Ayush Jaiswal, Xu Zhang, Yue Wu, Rakesh Chada, Pradeep Natarajan, Henrik I. Christensen

Image segmentation

Coarse-to-fine amodal segmentation with shape prior
Jianxiong Gao, Xuelin Qian, Yikai Wang, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

Coarse-to-fine amodal segmentation with shape prior.png — From "Coarse-to-fine amodal segmentation with shape prior".

LD-ZNet: A latent diffusion approach for text-based image segmentation
Koutilya PNVR, Bharat Singh, Pallabi Ghosh, Behjat Siddiquie, David Jacobs

Rethinking amodal video segmentation from learning supervised signals with object-centric representation
Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

Information extraction

DocTr: Document transformer for structured information extraction in documents
Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan

Machine unlearning

SAFE: Machine unlearning with shard graph
Yonatan Dukler, Ben Bowman, Alessandro Achille, Aditya Golatkar, Ashwin Swaminathan, Stefano Soatto

Object detection

Bidirectional alignment for domain adaptive detection with transformers
Liqiang He, Wei Wang, Albert Chen, Min Sun, Cheng-Hao Kuo, Sinisa Todorovic

Unsupervised open-vocabulary object localization in videos
Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He

Object tracking

Object-centric multiple object tracking
Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

Scene text recognition

CLIPTER: Looking at the bigger picture in scene text recognition
Aviad Aberdam, David Haim Bensaid, Alona Golts, Roy Ganz, Oren Nuriel, Royee Tichauer, Shai Mazor, Ron Litman

Towards models that can see and read
Roy Ganz, Oren Nuriel, Aviad Aberdam, Yair Kittenplon, Shai Mazor, Ron Litman

Towards models that can see and read.png — From "Towards models that can see and read".

Transfer learning

PADCLIP: Pseudo-labeling with adaptive debiasing in CLIP for unsupervised domain adaptation
Zhengfeng Lai, Sol Vesdapunt, Ning Zhou, Jun Wu, Cong Phuoc Huynh, Xuelu Li, Kah Kuen Fu, Chen-Nee Chuah

Video retrieval

Audio-enhanced text-to-video retrieval using text-conditioned feature alignment
Sarah Ibrahimi, Xiaohang Sun, Pichao Wang, Amanmeet Garg, Ashutosh Sanan, Mohamed Omar

Video segmentation

MEGA: Multimodal alignment aggregation and distillation for cinematic video segmentation
Najmeh Sadoughi, Xinyu (Arthur) Li, Avijit Vajpayee, David Fan, Bing Shuai, Hector Santos, Vimal Bhat, Rohith Mysore Vijaya Kumar

About the Author

Staff writer

A quick guide to Amazon’s papers at ICCV 2023

From classic problems like image segmentation and object detection to theoretical topics like data representation and “machine unlearning”, Amazon researchers’ ICCV papers showcase the diversity of their work in computer vision.

3-D

Action recognition

Data representation

Dubbed-video generation

Geospatial foundation models

Graph neural networks

Image retrieval

Image segmentation

Information extraction

Machine unlearning

Object detection

Object tracking

Scene text recognition

Transfer learning

Video retrieval

Video segmentation

Related content

Work with us