Overview
The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) is the premier annual computer vision event comprising the main conference and several co-located workshops and short courses.
Sponsorship Details
Accepted publications
Workshops
CVPR 2026 Workshop on Grounded Retrieval and Agentic Intelligence for Vision-Language (GRAIL-V)
June 3
Website: Link
Accepted Papers:
"CoCoA-DVC: Consistency and Concept Aware Training for Dense Video Captioning", presented by Jay Nitin Paranjape, Yue Guo, Sankar Venkataraman, Vishal M. Patel, Nataraj Jammalamadaka
"HTEF: Holistic Brand-Theme Alignment Scoring as a Catalog Gate for Grounded Conversational Recommendation", presented by Mahmudur Rahman, Dhruv Garg, Rishabh Rathod, Sanket Bindle
"Learning to Mix Flat and Curved Representations for Vision-Language Retrieval", presented by Kathy Wu, Sarthak Srivastava
"RAGENT: Robust Optimization for Grounded Vision-Language Retrieval", presented by Kathy Wu, Sarthak Srivastava
"ViSS-R1: Self-Supervised Reinforcement Video Reasoning", presented by Bo Fang, YuXin Song, Haoyuan Sun, Xinyao Zhang, Qiangqiang Wu, Wenhao Wu, Antoni B. Chan
Location: Room 506
Description: A CVPR 2026 workshop for researchers and practitioners building grounded multimodal retrieval, reranking, and verification systems that can be deployed with confidence.
Accepted Papers:
"CoCoA-DVC: Consistency and Concept Aware Training for Dense Video Captioning", presented by Jay Nitin Paranjape, Yue Guo, Sankar Venkataraman, Vishal M. Patel, Nataraj Jammalamadaka
"HTEF: Holistic Brand-Theme Alignment Scoring as a Catalog Gate for Grounded Conversational Recommendation", presented by Mahmudur Rahman, Dhruv Garg, Rishabh Rathod, Sanket Bindle
"Learning to Mix Flat and Curved Representations for Vision-Language Retrieval", presented by Kathy Wu, Sarthak Srivastava
"RAGENT: Robust Optimization for Grounded Vision-Language Retrieval", presented by Kathy Wu, Sarthak Srivastava
"ViSS-R1: Self-Supervised Reinforcement Video Reasoning", presented by Bo Fang, YuXin Song, Haoyuan Sun, Xinyao Zhang, Qiangqiang Wu, Wenhao Wu, Antoni B. Chan
Location: Room 506
Description: A CVPR 2026 workshop for researchers and practitioners building grounded multimodal retrieval, reranking, and verification systems that can be deployed with confidence.
12th International Workshop on Computer Vision in Sports (CVsports) at CVPR 2026
June 4
Website: Link
Poster Presenter: Kevin Song
Location: Room 503
Description: Computer vision has recently started to play an important role in sports, in particular for performance optimization and analytics, and in media productions, where computer vision-based graphics in real-time enhances different aspects of the game. The potential of computer vision algorithms in sports is huge, ranging from automatic annotation of broadcast footage, through to better understanding of sport injuries, coaching, and enhanced viewing. The ambition of this workshop is to bring together practitioners and researchers from different disciplines to share ideas and methods on current and future use of computer vision in sports.
Poster Presenter: Kevin Song
Location: Room 503
Description: Computer vision has recently started to play an important role in sports, in particular for performance optimization and analytics, and in media productions, where computer vision-based graphics in real-time enhances different aspects of the game. The potential of computer vision algorithms in sports is huge, ranging from automatic annotation of broadcast footage, through to better understanding of sport injuries, coaching, and enhanced viewing. The ambition of this workshop is to bring together practitioners and researchers from different disciplines to share ideas and methods on current and future use of computer vision in sports.
2nd Workshop on Video Large Language Models
June 4
Website: Link
Organizing Committee:
General Chairs: Larry Davis, Rene Vidal, Son Tran, Vimal Bhat, Garin Kessler, Kushan Thakkar, Jayakrishnan Unnikrishnan
Program Chairs: Rohit Gupta, Swetha Sirnam, Bhagyashree Puranik
Location: 3A-3D
Description: The VidLLMs Workshop focuses on the latest advancements and challenges of Video Large Language Models. We aim to bring together researchers and practitioners from academia and industry to discuss open problems, applications, and future directions in this space. Join us at CVPR 2026 for a full-day event exploring the latest in Video Large Language Models and engage with leading experts, participate in challenge tracks, and discover the future of video understanding.
Organizing Committee:
General Chairs: Larry Davis, Rene Vidal, Son Tran, Vimal Bhat, Garin Kessler, Kushan Thakkar, Jayakrishnan Unnikrishnan
Program Chairs: Rohit Gupta, Swetha Sirnam, Bhagyashree Puranik
Location: 3A-3D
Description: The VidLLMs Workshop focuses on the latest advancements and challenges of Video Large Language Models. We aim to bring together researchers and practitioners from academia and industry to discuss open problems, applications, and future directions in this space. Join us at CVPR 2026 for a full-day event exploring the latest in Video Large Language Models and engage with leading experts, participate in challenge tracks, and discover the future of video understanding.
CVPR 2026 Workshop on Medical Reasoning with Vision Language Foundation Models
June 4
Website: Link
Invited Speakers: Maria Xenochristou
Location: Room 110
Description: The CVPR 2026 Workshop on Medical Reasoning with Vision Language Foundation Models (Med-Reasoner) aims to bring together computer vision researchers, medical AI experts, imaging scientists, and practicing clinicians to discuss state-of-the-art advancements, applications, and challenges in reasoning capabilities for medical vision-language models. The workshop will foster discussions that inspire innovation in interpretable medical AI and address real-world deployment challenges including privacy constraints, workflow integration into healthcare systems, and ensuring fairness across patient populations. Through invited talks from leading researchers at Google DeepMind, Stanford, MIT, and University of Toronto, contributed paper presentations, interactive poster sessions, and expert panel discussions, we will establish reasoning architectures and evaluation frameworks that advance healthcare applications with the potential to impact millions of patients globally.
Invited Speakers: Maria Xenochristou
Location: Room 110
Description: The CVPR 2026 Workshop on Medical Reasoning with Vision Language Foundation Models (Med-Reasoner) aims to bring together computer vision researchers, medical AI experts, imaging scientists, and practicing clinicians to discuss state-of-the-art advancements, applications, and challenges in reasoning capabilities for medical vision-language models. The workshop will foster discussions that inspire innovation in interpretable medical AI and address real-world deployment challenges including privacy constraints, workflow integration into healthcare systems, and ensuring fairness across patient populations. Through invited talks from leading researchers at Google DeepMind, Stanford, MIT, and University of Toronto, contributed paper presentations, interactive poster sessions, and expert panel discussions, we will establish reasoning architectures and evaluation frameworks that advance healthcare applications with the potential to impact millions of patients globally.
CVPR 2026 Workshop on Personalization in Generative AI
June 4
Website: Link
Oral Presentation #1: "MakeupMirror: Improving Facial Attribute Preservation in Diffusion Models for Makeup Transfer", presented by Michael Opitz
Location: Room 4CD
Description: The P13N: Personalization in Generative AI workshop aims to unite researchers, practitioners, and artists from academia and industry to explore the challenges and opportunities in personalized generative systems.
Generative AI has revolutionized creativity and problem-solving across domains, yet personalization remains one of the most challenging and underexplored frontiers. Building systems that understand and adapt to individual users’ preferences, identities, or contexts raises profound technical, ethical, and societal questions. Through invited talks, panel discussions, poster sessions, and hands-on challenges, P13N serves as a platform to foster new directions in model design, evaluation, and governance for personalized generative systems.
Oral Presentation #1: "MakeupMirror: Improving Facial Attribute Preservation in Diffusion Models for Makeup Transfer", presented by Michael Opitz
Location: Room 4CD
Description: The P13N: Personalization in Generative AI workshop aims to unite researchers, practitioners, and artists from academia and industry to explore the challenges and opportunities in personalized generative systems.
Generative AI has revolutionized creativity and problem-solving across domains, yet personalization remains one of the most challenging and underexplored frontiers. Building systems that understand and adapt to individual users’ preferences, identities, or contexts raises profound technical, ethical, and societal questions. Through invited talks, panel discussions, poster sessions, and hands-on challenges, P13N serves as a platform to foster new directions in model design, evaluation, and governance for personalized generative systems.
The Seventh Annual Embodied Artificial Intelligence Workshop
June 4
Website: Link
Challenge Organizers: Xiaofeng Gao
Location: Room 107
Description: The goal of the Embodied AI workshop is to bring together researchers from computer vision, language, graphics, and robotics to share and discuss the latest advances in embodied intelligent agents. EAI 2026’s overaching theme is World Models for Embodied AI: embodied AI agents that create models of the world to help them imagine and act, or to help researchers to test and evaluate them. This umbrella theme is divided into three topics:
Challenge Organizers: Xiaofeng Gao
Location: Room 107
Description: The goal of the Embodied AI workshop is to bring together researchers from computer vision, language, graphics, and robotics to share and discuss the latest advances in embodied intelligent agents. EAI 2026’s overaching theme is World Models for Embodied AI: embodied AI agents that create models of the world to help them imagine and act, or to help researchers to test and evaluate them. This umbrella theme is divided into three topics:
- World Models for Action and Evaluation Explores both dynamics models which incorporate physics and geometry, and video models where dynamics are implicit.
- The Resurgence of Classic Methods Examining new applications of techniques such as reinforcement learning and model-predictive control to embodied AI.
- Long-Horizon Embodied Intelligence Explores benchmarks and methods for multi-step tasks, robust testing, and, in particular, safe operation.
VAND 4.0 Challenge at CVPR'26
June 4
Website: Link
Co-organizers of VAND 4.0 Challenge: Sebastian Höfer, Dorian Henning, Anton Milan
Invited Speaker: "Visual Defect Detection in Retail Logistics: The Kaputt Dataset and VAND 4.0 Retail Challenge", presented by Sebastian Höfer at 3:30 - 4:00 pm
Location: Room 601
Description: Our workshop challenge aims to showcase current progress in anomaly detection across different practical settings while addressing critical issues in the field. Building on the encouraging results from previous years — including the VAND 3.0 challenge — this edition sets its sights even higher, pushing the boundaries of robust and generalizable anomaly detection models for real-world use cases, for the first time including both industrial and retail logistics focused competitions.
Co-organizers of VAND 4.0 Challenge: Sebastian Höfer, Dorian Henning, Anton Milan
Invited Speaker: "Visual Defect Detection in Retail Logistics: The Kaputt Dataset and VAND 4.0 Retail Challenge", presented by Sebastian Höfer at 3:30 - 4:00 pm
Location: Room 601
Description: Our workshop challenge aims to showcase current progress in anomaly detection across different practical settings while addressing critical issues in the field. Building on the encouraging results from previous years — including the VAND 3.0 challenge — this edition sets its sights even higher, pushing the boundaries of robust and generalizable anomaly detection models for real-world use cases, for the first time including both industrial and retail logistics focused competitions.
Booth Schedule
Friday, June 5
June 5
Demos
10:30 - 11:00am - "CompAgent: An Agentic Framework for Visual Compliance Verification" and "MARBLE: Multi-Agent Retrieval via Belief-Propagation and Fine-Grained Language-Vision Evidence", presented by Chun-Hao Liu and Rahul Ghosh
11:00 - 11:30am - "Amazon Photos: Chat with Photos", presented by Raja Bala
1:30 - 2:00pm - "Apollo: Agentic Marketing Creative for E-Commerce", presented by Kathy Wu
2:00 - 2:30pm - "The Kaputt Dataset and the Visual Anomaly and Novelty Detection (VAND) Challenge: Advancing the State of the Art in Visual Defect Detection", presented by Sebastian Hoefer and Dorian Henning
Come chat with us about:
10:30 - 11:00am - 3D Reconstruction, 3D Scene Understanding, Computer Vision, Efficient AI, Generative AI, Image Processing, LLM/VLM Post-Training (SFT and RL), Restoration and Enhancement, Sensor Fusion, Vision Language Models
11:00 - 11:30am - Agentic AI, Anomaly Detection, Computer Vision, Fraud Prevention, Large Language Models, Multi-Agent Trajectory Classification and Prediction, Robotics, Sports Analytics, Vision Language Models
1:00 - 1:30pm - AI and Creativity, Computer Vision, Multimodal Large Language Models, Video Machine Learning, Vision Language Action, Vision Language Models
2:00 - 2:30pm - Artificial Intelligence, Computer Vision, Evaluation, Generative AI, Health AI, Large Language Models, Machine Learning, Multimodal Machine Learning, Reinforcement Learning, Robotics, Speech Processing and Synthesis, Trust & Safety, Video Understanding and Synthesis, Vision Language Models
4:00 - 4:30pm - Diffusion, Efficient Vision Language Models, Large Language Models, Multimodal Image, Video Understanding, Unified Vision Language Models, Visual Agents
10:30 - 11:00am - "CompAgent: An Agentic Framework for Visual Compliance Verification" and "MARBLE: Multi-Agent Retrieval via Belief-Propagation and Fine-Grained Language-Vision Evidence", presented by Chun-Hao Liu and Rahul Ghosh
11:00 - 11:30am - "Amazon Photos: Chat with Photos", presented by Raja Bala
1:30 - 2:00pm - "Apollo: Agentic Marketing Creative for E-Commerce", presented by Kathy Wu
2:00 - 2:30pm - "The Kaputt Dataset and the Visual Anomaly and Novelty Detection (VAND) Challenge: Advancing the State of the Art in Visual Defect Detection", presented by Sebastian Hoefer and Dorian Henning
Come chat with us about:
10:30 - 11:00am - 3D Reconstruction, 3D Scene Understanding, Computer Vision, Efficient AI, Generative AI, Image Processing, LLM/VLM Post-Training (SFT and RL), Restoration and Enhancement, Sensor Fusion, Vision Language Models
11:00 - 11:30am - Agentic AI, Anomaly Detection, Computer Vision, Fraud Prevention, Large Language Models, Multi-Agent Trajectory Classification and Prediction, Robotics, Sports Analytics, Vision Language Models
1:00 - 1:30pm - AI and Creativity, Computer Vision, Multimodal Large Language Models, Video Machine Learning, Vision Language Action, Vision Language Models
2:00 - 2:30pm - Artificial Intelligence, Computer Vision, Evaluation, Generative AI, Health AI, Large Language Models, Machine Learning, Multimodal Machine Learning, Reinforcement Learning, Robotics, Speech Processing and Synthesis, Trust & Safety, Video Understanding and Synthesis, Vision Language Models
4:00 - 4:30pm - Diffusion, Efficient Vision Language Models, Large Language Models, Multimodal Image, Video Understanding, Unified Vision Language Models, Visual Agents
Saturday, June 6
June 6
Demos
1:30 - 2:00pm - "Universal Guideline-Driven Image Clustering via a Hybrid LLM Agent", presented by Wenliang Zhong
2:00 - 2:30pm - "Vulcan Stow: A Robot With a Sense of Touch", presented by Bhavya Goyal
3:30 - 4:00pm - "Science Internships @ Amazon Informational", presented by Ankita Goyal (Science Recruiter for Amazon University Talent Acquisition)
4:00 - 4:30pm - "Demonstrating the Innovation Done in Prime Video", presented by Vimal Bhat
Come chat with us about:
10:00 - 10:30am - Computational Photography, Computer Vision, Diffusion, Large Language Models, Novel Sensing and Camera, Robot Perception, Unified Vision Language Models, Vision Language Model Post-Training
11:30 - 12:00pm - Audio-Visual Modeling, Computer Vision, Multimodal LLMs, Multi-Agent Trajectory Classification and Prediction, Multimodal Foundation Models, Multimodal Understanding and Generation, Post-Training, Speech-to-Speech, Sports Analytics, System Optimization
1:30 - 2:00pm - Computer Vision, Foundational Models, Image Generation, Multimodal Large Language Models, Multimodal Learning, Object Detection/Classification, Video Understanding, Video Vision New Advancements
2:00 - 2:30pm - Biomedical LLMs, Computer Vision, Generative AI, Machine Learning, Metric Learning, Protein Engineering, Robotics
3:00 - 3:30pm - Concept Personalization, Flow Matching/Diffusion Model, Image/Video Generation Editing, Reinforcement Learning
3:30 - 4:00pm - Computer Vision, Diffusion Models, Generative Models
1:30 - 2:00pm - "Universal Guideline-Driven Image Clustering via a Hybrid LLM Agent", presented by Wenliang Zhong
2:00 - 2:30pm - "Vulcan Stow: A Robot With a Sense of Touch", presented by Bhavya Goyal
3:30 - 4:00pm - "Science Internships @ Amazon Informational", presented by Ankita Goyal (Science Recruiter for Amazon University Talent Acquisition)
4:00 - 4:30pm - "Demonstrating the Innovation Done in Prime Video", presented by Vimal Bhat
Come chat with us about:
10:00 - 10:30am - Computational Photography, Computer Vision, Diffusion, Large Language Models, Novel Sensing and Camera, Robot Perception, Unified Vision Language Models, Vision Language Model Post-Training
11:30 - 12:00pm - Audio-Visual Modeling, Computer Vision, Multimodal LLMs, Multi-Agent Trajectory Classification and Prediction, Multimodal Foundation Models, Multimodal Understanding and Generation, Post-Training, Speech-to-Speech, Sports Analytics, System Optimization
1:30 - 2:00pm - Computer Vision, Foundational Models, Image Generation, Multimodal Large Language Models, Multimodal Learning, Object Detection/Classification, Video Understanding, Video Vision New Advancements
2:00 - 2:30pm - Biomedical LLMs, Computer Vision, Generative AI, Machine Learning, Metric Learning, Protein Engineering, Robotics
3:00 - 3:30pm - Concept Personalization, Flow Matching/Diffusion Model, Image/Video Generation Editing, Reinforcement Learning
3:30 - 4:00pm - Computer Vision, Diffusion Models, Generative Models
Sunday, June 7
June 7
Demos
2:00 - 2:30pm - "Vision at the Edge: The AI Behind Amazon's Smart AR Delivery Glasses", presented by Yelin Kim
Come chat with us about:
10:00 - 10:30am - Computational Photography, Computer Vision, Novel Sensing and Camera, Road User 3D Object Detection/Tracking/Prediction, Robot Perception, Vehicular Bird's-Eye View Models, Vehicular On-Board Machine Learning & Computer Vision
2:00 - 2:30pm - "Vision at the Edge: The AI Behind Amazon's Smart AR Delivery Glasses", presented by Yelin Kim
Come chat with us about:
10:00 - 10:30am - Computational Photography, Computer Vision, Novel Sensing and Camera, Road User 3D Object Detection/Tracking/Prediction, Robot Perception, Vehicular Bird's-Eye View Models, Vehicular On-Board Machine Learning & Computer Vision