Computer vision

Novel “Kaputt” dataset sets new benchmark for large-scale visual defect detection

A new dataset with over 238,000 images challenges and advances the state of the art in visual defect detection for complex retail applications.

October 2, 2025

3 min read

At Amazon, we're constantly working to improve our logistics operations through cutting-edge AI and computer vision. Today, we're excited to announce the public release of Kaputt, a large-scale dataset for visual defect detection in retail logistics. This dataset, which will be presented at the International Conference on Computer Vision (ICCV) 2025, represents a major step forward in our efforts to automate defect detection.

The Kaputt dataset contains 238,421 high-resolution images of 48,376 unique items, including 29,316 defective instances, making it 40 times as large as current state-of-the-art benchmark datasets. It captures the real-world complexities of detecting defects and damage across a vast range of products — minor creases, major spills, and everything in between.

Overview of defect severities and defect types. Our dataset categorizes defective samples into two severity classes: minor *(top two rows)* and major *(bottom two rows)*. Additionally, each defective sample is assigned one or multiple defect types *(columns)*, which characterize the defect(s) an item exhibits in a more fine-grained manner. The figure shows two representative samples per defect type/severity combination.

The challenge of automated defect detection

Developing robust visual defect detection systems for retail logistics presents significant challenges that existing research hasn't fully addressed. Existing benchmarks mostly focus on manufacturing and have reached saturation, achieving near-perfect performance with more than 99.9% AUROC (area under the receiver-operating-characteristic curve, which measures the balance between true-positive and false-positive rates). Unlike manufacturing settings, which commonly focus on highly standardized item poses and restricted numbers of distinct items, retail logistics handles millions of unique products, most of which have been seen only a handful of times. Without adequate data, it’s extremely difficult for AI systems to learn what constitutes “normal” versus “defective” across such diverse items.

A novel dataset for real-world applications

Our dataset's structure reflects these real-world challenges and opportunities. For each query image, we provide up to three reference images showing the item in “normal” (meaning more than 99% likely to be defect free — but not 100%) condition, mirroring how human inspectors might compare items to determine defects. We've also included detailed annotations for seven distinct types of defects and their severity levels, acknowledging the subjective nature of defect assessment.

Each query image is associated with one to three reference images, which may exhibit significant variability. *(1)* All three reference images are defect free and display the same face of the package. *(2)* The reference images demonstrate defects (packages have escaped their wrappers) and pose variability (one image displays the back of the package).

Understanding model performance

Our comprehensive evaluation of multiple leading methodologies reveals both the complexity of the task and current technological limitations. We tested four distinct approaches: zero-shot methods using general-purpose vision models, few-shot approaches leveraging reference images, supervised learning, and hybrid methods combining multiple techniques.

Impact beyond retail operations

The impact of improving visual defect detection extends far beyond operational efficiency. Early detection of defective items helps reduce waste, labor, and resource consumption by preventing defective products from moving further through the supply chain, ultimately supporting sustainability goals. It also helps ensure that customers receive their orders in perfect condition, reducing returns and reshipments — which in turn reduces carbon emissions from transportation.

Novel “Kaputt” dataset sets new benchmark for large-scale visual defect detection

A new dataset with over 238,000 images challenges and advances the state of the art in visual defect detection for complex retail applications.

The challenge of automated defect detection

A novel dataset for real-world applications

Understanding model performance

Impact beyond retail operations

Related content

Work with us