A line of Amazon packages are seen traveling down a conveyor belt
Amazon associates are always on the lookout for damaged items, but an extra pair of “eyes” may one day support them in this task, powered by machine-learning approaches being developed by Amazon’s Robotics AI team in Berlin, Germany.

The surprisingly subtle challenge of automating damage detection

Why detecting damage is so tricky at Amazon’s scale — and how researchers are training robots to help with that gargantuan task.

With billions of customer orders flowing through Amazon’s global network of fulfillment centers (FCs) every year, it is an unfortunate but inevitable fact that some of those items will suffer accidental damage during their journey through a warehouse.

Amazon associates are always on the lookout for damaged items in the FC, but an extra pair of “eyes” may one day support them in this task, powered by machine-learning approaches being developed by Amazon’s Robotics AI team in Berlin, Germany.

Related content
The customer-obsessed science produced by teams in Berlin is integrated in several Amazon products and services, including retail, Alexa, robotics, and more.

As well as avoiding delays in shipping and improving warehouse efficiency, this particular form of artificial intelligence has the benefit of aiming to reduce waste by shipping fewer damaged goods in the first place, ensuring customers have fewer damaged items to return.

For every thousand items that make their way through an FC prior to being dispatched to the customer, fewer than one becomes damaged. That is a tiny proportion, relatively speaking, but working at the scale of Amazon this nevertheless adds up to a challenging problem.

Damage detection is important because while damage is a costly problem in itself, it becomes even more costly the longer the damage goes undetected.

Amazon associates examine items at multiple occasions through the fulfillment process, of course, but if damage occurs late in the journey and a compromised item makes it as far as the final packaging station, an associate must sideline it so that a replacement can be requested, potentially delaying delivery. As associate must then further examine the sidelined item to determine its future.

Related content
New statistical model reduces shipment damage by 24% while cutting shipping costs by 5%.

Toward the end of 2020, Sebastian Hoefer, senior applied scientist with the Amazon Robotics AI team, supported by his Amazon colleagues, successfully pitched a novel project to address this problem. The idea: combine computer vision and machine learning (ML) approaches in an attempt to automate the detection of product damage in Amazon FCs.

“You want to avoid damage altogether, but in order to do so you need to first detect it,” notes Hoefer. “We are building that capability, so that robots in the future will be able to utilize it and assist in damage detection.”

Needles in a haystack

Damage detection is a challenging scientific problem, for two main reasons.

Damage caused in Amazon FCs is rare, and that’s clearly a good thing. But that also makes it challenging because we need to find these needles in the haystack, and identify the many forms damage can take.
Ariel Gordon

The first reason is purely practical — there is precious little data on which to train ML models.

“Damage caused in Amazon FCs is rare, and that’s clearly a good thing,” says Ariel Gordon, a principal applied scientist supporting Hoefer’s team from Seattle. “But that also makes it challenging because we need to find these needles in the haystack, and identify the many forms damage can take.”

The second reason takes us into the theoretical long grass of artificial intelligence more generally.

For an adult human, everyday damage detection feels easy — we cannot help but notice damage, because our ability to do so has been honed as a fundamental life skill. Yet whether something is sufficiently damaged to render it unsellable is subjective, often ambiguous, and depends on the context, says Maksim Lapin, an Amazon senior applied scientist in Berlin. “Is it damage that is tolerable from the customer point of view, like minor damage to external packaging that will be thrown into the recycling anyway?” Lapin asks. “Or is it damage of a similar degree on the product itself, which would definitely need to be flagged?”

A side by side image shows a perforated white mailer, on the left is a standard image, on the right is the damage as "seen" by Amazon's damage detection models
Damage in Amazon fulfillment centers can be hard to spot, unlike this perforation captured by a standard camera (left) and Amazon's damage detection models (right.)

In addition, the nature of product damage makes it difficult to even define what damage is for ML models. Damage is both heterogenous — any item or product can be damaged — and can take many forms, from rips to holes to a single broken part of a larger set. Multiplied over Amazon’s massive catalogue of items, the challenge becomes enormous.

In short, do ML models stand a chance?

Off to “Damage Land”

To find out, Hoefer’s team first needed to obtain that data in a standardized format amenable to machine learning. They set about collecting it at an FC near Hamburg, Germany, called HAM2, in a section of the warehouse affectionately known as “Damage Land”. Damaged items end up there while decisions are made on whether such items can be sold at a discount, refurbished, donated or, as a last resort, disposed of.

The team set up a sensor-laden, illuminated booth in Damage Land.

“I’m very proud that HAM2 was picked up as pilot site for this initiative,” says Julia Dembeck, a senior operations manager at HAM2, who set up the Damage Taskforce to coordinate the project’s many stakeholders. “Our aim was to support the project wholeheartedly.”

After workshops with Amazon associates to explain the project and its goals, associates started placing damaged items on a tray in the booth, which snapped images using an array of RGB and depth cameras. They then manually annotated the damage in the images using a linked computer terminal.

Annotating damage detection

“The results were amazing and got even better when associates shared their best practices on the optimal way to place items in the tray,” says Dembeck. Types of damage included things like crushes, tears, holes, deconstruction (e.g., contents breaking out from its container) and spillages.

The associates collected about 30,000 product images in this way, two-thirds of which were images of damaged items.

“We also collected images of non-damaged items because otherwise we cannot train our models to distinguish between the two,” says Hoefer. “Twenty thousand pictures of damage are not a lot in ‘big data’ terms, but it is a lot given the rarity of damage.”

With data in hand, the team first applied a supervised learning ML approach, a workhorse in computer vision. They used the data as a labelled training set that would allow the algorithm to build a generalizable model of what damage can look like. When put through its paces on images of products it had never seen before, the model’s early results were promising.

When analyzing a previously unseen image of a product, the model would ascribe a damage confidence score. The higher the score, the more confident it was that the item was damaged.

The researchers had to tune the sensitivity of the model by deciding upon the confidence threshold at which the model would declare a product unfit for sending to a customer. Set that threshold too high, and modest but significant damage could be missed. Set it too low, and the model would declare some undamaged items to be damaged, a false positive.

“We did a back-of-the-envelope calculation and found that if we're sidelining more than a tiny fraction of all items going through this process, then we're going to overwhelm with false positives,” says Hoefer.

Since those preliminary results in late 2021, the team has made significant improvements.

“We’re now optimizing the model to reduce its false positive rate, and our accuracy is increasing week to week,” says Hoefer.

Different types of damage

However, the supervised learning approach alone, while promising, suffers some drawbacks.

For example, what is the model to make of the packaging of a phone protector kit that shows a smashed screen? What is it to make of a cleaning product whose box is awash with apparent spills? What about a blister pack that is entirely undamaged and should hold three razor blades but for some reason contains just two — the “broken set” problem? What about a bag of ground coffee that appears uncompromised but is sitting next to a little puddle of brown powder?

Again, for humans, making sense of such situations is second nature. We not only know what damage looks like, but also quickly learn what undamaged products should look like. We learn to spot anomalies.

Hoefer’s team decided to incorporate this ability into their damage detection system, to create a more rounded and accurate model. Again, more data was needed, because if you want to know what an item should look like, you need standardized imagery of it. This is where recent work pioneered by Amazon’s Multimodal Identification (MMID) team, part of Berlin's Robotics AI group, came in.

The MMID team has developed a computer vision tool that enables the identification of a product purely from images of it. This is useful in cases where the all-important product barcode is smudged, missing, or wrong.

In fact, it was largely the MMID team that developed the sensor-laden photo booth hardware now being put to use by Hoefer’s team. The MMID team needed it to create a gallery of standardized reference images of pristine products.

Related content
A combination of deep learning, natural language processing, and computer vision enables Amazon to hone in on the right amount of packaging for each product.

“Damage detection could also exploit the same approach by identifying discrepancies between a product image and a gallery of reference images,” says Anton Milan, an Amazon senior applied scientist who is working across MMID and damage detection in Berlin. “In fact, our previous work on MMID allowed us to quickly take off exploring this direction in damage detection by evaluating and tweaking existing solutions.”

By incorporating the MMID team’s product image data and adapting that team’s techniques and models to sharpen their own, the damage-detection system now has a fighting chance of spotting broken sets. It is also much less likely to be fooled by damage-like images printed on the packaging of products, because it can check product imagery taken during the fulfillment process against the image of a pristine version of that product.

“Essentially, we are developing the model’s ability to say ‘something is amiss here’, and that’s a very useful signal,” says Gordon. “It's also problematic, though, because sometimes products change their design. So, the model has to be ‘alive’, continuously learning and updating in accordance with new packaging styles.”

The team is currently exploring how to combine the contributions of both discriminative and anomaly-based ML approaches to give the most accurate assessment of product damage. At the same time, they are developing hardware for trial deployment in an FC, and also collecting more data on damaged items.

The whole enterprise has come together fast, says Hoefer. “We pitched the idea just 18 months ago, and already we have an array of hardware and a team of 15 people making it a reality. As a scientist, this is super rewarding. And if it works as well as we hope, it could be sitting in across the network of Amazon fulfillment centers within a couple of years.”

Hoefer anticipates that the project will ultimately improve customer experience while also reducing waste.

Related content
Amazon Lab126 and the Center for Risk and Reliability will study how devices are accidentally damaged — and how to help ensure they survive more of those incidents.

“Once the technology matures, we expect to see a decrease in customer returns due to damage, because we will be able to identify and fix damaged products before dispatching them to customers. Not only that, by identifying damage early in the fulfillment chain, we will be able to work with vendors to build more robust products. This will again result in reducing damage overall — an important long-term goal of the project,” says Hoefer.

Also looking to the future, Lapin imagines this technology beyond warehousing.

“We are building these capabilities for the highly controlled environments of Amazon fulfillment centers, but I can see some future version of it being deployed in the wild, so to speak, in more chaotic bricks-and-mortar stores, where customers interact with products in unpredictable ways,” says Lapin.

View from space of a connected network around planet Earth representing the Internet of Things.
Sign up for our newsletter

Related content

US, WA, Seattle
The AWS AI Labs team has a world-leading team of researchers and academics, and we are looking for world-class colleagues to join us and make the AI revolution happen. Our team of scientists have developed the algorithms and models that power AWS computer vision services such as Amazon Rekognition and Amazon Textract. As part of the team, we expect that you will develop innovative solutions to hard problems, and publish your findings at peer reviewed conferences and workshops. AWS is the world-leading provider of cloud services, has fostered the creation and growth of countless new businesses, and is a positive force for good. Our customers bring problems which will give Applied Scientists like you endless opportunities to see your research have a positive and immediate impact in the world. You will have the opportunity to partner with technology and business teams to solve real-world problems, have access to virtually endless data and computational resources, and to world-class engineers and developers that can help bring your ideas into the world. Our research themes include, but are not limited to: few-shot learning, transfer learning, unsupervised and semi-supervised methods, active learning and semi-automated data annotation, large scale image and video detection and recognition, face detection and recognition, OCR and scene text recognition, document understanding, 3D scene and layout understanding, and geometric computer vision. For this role, we are looking for scientist who have experience working in the intersection of vision and language. We are located in Seattle, Pasadena, Palo Alto (USA) and in Haifa and Tel Aviv (Israel).
US, WA, Seattle
Amazon Prime Video is changing the way millions of customers enjoy digital content. Prime Video delivers premium content to customers through purchase and rental of movies and TV shows, unlimited on-demand streaming through Amazon Prime subscriptions, add-on channels like Showtime and HBO, and live concerts and sporting events like NFL Thursday Night Football. In total, Prime Video offers nearly 200,000 titles and is available across a wide variety of platforms, including PCs and Macs, Android and iOS mobile devices, Fire Tablets and Fire TV, Smart TVs, game consoles, Blu-ray players, set-top-boxes, and video-enabled Alexa devices. Amazon believes so strongly in the future of video that we've launched our own Amazon Studios to produce original movies and TV shows, many of which have already earned critical acclaim and top awards, including Oscars, Emmys and Golden Globes. The Global Consumer Engagement team within Amazon Prime Video builds product and technology solutions that drive customer activation and engagement across all our supported devices and global footprint. We obsess over finding effective, programmatic and scalable ways to reach customers via a broad portfolio of both in-app and out-of-app experiences. We would love to have you join us to build models that can classify and detect content available on Prime Video. We need you to analyze the video, audio and textual signal streams and improve state-of-art solutions while being scalable to Amazon size data. We need to solve problems across many cultures and languages, working alongside an operations team generating labels across many languages to help us achieve these goals. Our team consistently strives to innovate, and holds several novel patents and inventions in the motion picture and television industry. We are highly motivated to extend the state of the art. As a member of our team, you will apply your deep knowledge of Computer Vision and Machine Learning to concrete problems that have broad cross-organizational, global, and technology impact. Your work will focus on addressing fundamental computer vision models like video understanding and video summarization in addition to building appropriate large scale datasets. You will work on large engineering efforts that solve significantly complex problems facing global customers. You will be trusted to operate with independence and are often assigned to focus on areas with significant impact on audience satisfaction. You must be equally comfortable with digging in to customer requirements as you are drilling into design with development teams and developing production ready learning models. You consistently bring strong, data-driven business and technical judgment to decisions. You will work with internal and external stakeholders, cross-functional partners, and end-users around the world at all levels. Our team makes a big impact because nothing is more important to us than pleasing our customers, continually earning their trust, and thinking long term. You are empowered to bring new technologies and deep learning approaches to your solutions. We embrace the challenges of a fast paced market and evolving technologies, paving the way to universal availability of content. You will be encouraged to see the big picture, be innovative, and positively impact millions of customers. This is a young and evolving business where creativity and drive will have a lasting impact on the way video is enjoyed worldwide.
US, NY, New York
Amazon is looking for an outstanding Data Scientist to help build the next generation of selection systems. On the Specialized Selection team within the Supply Chain Optimization Technologies (SCOT) organization, we own the selection systems that determine which products Amazon offers in our fastest delivery programs. We build state-of-the-art models leveraging tools from machine learning, numerical optimization, natural language processing, and causal inference to automate the management of Amazon's sub-same day (SSD) selection at scale. We sit as a part of one of the largest and most sophisticated supply chains in the world. We operate a highly cross-functional team across software, science, analytics, and product to define and scalably execute the strategic direction of SSD and speed selection more broadly. As a Data Scientist on the team, you will work with scientists, engineers, product managers, and business stakeholders to conduct analyses that reveal key business insights and leverage data science and machine learning techniques to develop new models and solutions to emergent business problems. Key job responsibilities Understanding business problems and translate them to appropriate scientific solutions; Using data to provide new insights and clarity to ambiguous situations; Designing effective, scalable, and achievable solutions to key business problems; Developing the right set of metrics to evaluate efficacy of your models and solutions; Prototyping and analyzing new models and business logic; Communicating, both written and verbally, with both technical and business audiences throughout each project; Contributing to the scientific community across the organization
US, CA, Palo Alto
Join a team working on cutting-edge science to innovate search experiences for Amazon shoppers! Amazon Search helps customers shop with ease, confidence and delight WW. We aim to transform Search from an information retrieval engine to a shopping engine. In this role, you will build models to generate and recommend search queries that can help customers fulfill their shopping missions, reduce search efforts and let them explore and discover new products. You will also build models and applications that will increase customer awareness of related products and product attributes that might be best suited to fulfill the customer needs. Key job responsibilities On a day-to-day basis, you will: Design, develop, and evaluate highly innovative, scalable models and algorithms; Design and execute experiments to determine the impact of your models and algorithms; Work with product and software engineering teams to manage the integration of successful models and algorithms in complex, real-time production systems at very large scale; Share knowledge and research outcomes via internal and external conferences and journal publications; Project manage cross-functional Machine Learning initiatives. About the team The mission of Search Assistance is to improve search feature by reducing customers’ effort to search. We achieve this through three customer-facing features: Autocomplete, Spelling Correction and Related Searches. The core capability behind the three features is backend service Query Recommendation.
US, CA, Palo Alto
Amazon is investing heavily in building a world class advertising business and we are responsible for defining and delivering a collection of self-service performance advertising products that drive discovery and sales. Our products are strategically important to our Retail and Marketplace businesses driving long term growth. We deliver billions of ad impressions and millions of clicks daily and are breaking fresh ground to create world-class products. We are highly motivated, collaborative and fun-loving with an entrepreneurial spirit and bias for action. With a broad mandate to experiment and innovate, we are growing at an unprecedented rate with a seemingly endless range of new opportunities. The Ad Response Prediction team in Sponsored Products organization build advanced deep-learning models, large-scale machine-learning (ML) pipelines, and real-time serving infra to match shoppers’ intent to relevant ads on all devices, for all contexts and in all marketplaces. Through precise estimation of shoppers’ interaction with ads and their long-term value, we aim to drive optimal ads allocation and pricing, and help to deliver a relevant, engaging and delightful ads experience to Amazon shoppers. As the business and the complexity of various new initiatives we take continues to grow, we are looking for energetic, entrepreneurial, and self-driven science leaders to join the team. Key job responsibilities As a Principal Applied Scientist in the team, you will: Seek to understand in depth the Sponsored Products offering at Amazon and identify areas of opportunities to grow our business via principled ML solutions. Mentor and guide the applied scientists in our organization and hold us to a high standard of technical rigor and excellence in ML. Design and lead organization wide ML roadmaps to help our Amazon shoppers have a delightful shopping experience while creating long term value for our sellers. Work with our engineering partners and draw upon your experience to meet latency and other system constraints. Identify untapped, high-risk technical and scientific directions, and simulate new research directions that you will drive to completion and deliver. Be responsible for communicating our ML innovations to the broader internal & external scientific community.
US, CA, Palo Alto
We’re working to improve shopping on Amazon using the conversational capabilities of large language models, and are searching for pioneers who are passionate about technology, innovation, and customer experience, and are ready to make a lasting impact on the industry. You'll be working with talented scientists, engineers, and technical program managers (TPM) to innovate on behalf of our customers. If you're fired up about being part of a dynamic, driven team, then this is your moment to join us on this exciting journey!"?
US, CA, Santa Clara
AWS AI/ML is looking for world class scientists and engineers to join its AI Research and Education group working on foundation models, large-scale representation learning, and distributed learning methods and systems. At AWS AI/ML you will invent, implement, and deploy state of the art machine learning algorithms and systems. You will build prototypes and innovate on new representation learning solutions. You will interact closely with our customers and with the academic and research communities. You will be at the heart of a growing and exciting focus area for AWS and work with other acclaimed engineers and world famous scientists. Large-scale foundation models have been the powerhouse in many of the recent advancements in computer vision, natural language processing, automatic speech recognition, recommendation systems, and time series modeling. Developing such models requires not only skillful modeling in individual modalities, but also understanding of how to synergistically combine them, and how to scale the modeling methods to learn with huge models and on large datasets. Join us to work as an integral part of a team that has diverse experiences in this space. We actively work on these areas: * Hardware-informed efficient model architecture, training objective and curriculum design * Distributed training, accelerated optimization methods * Continual learning, multi-task/meta learning * Reasoning, interactive learning, reinforcement learning * Robustness, privacy, model watermarking * Model compression, distillation, pruning, sparsification, quantization About Us Inclusive Team Culture Here at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Work/Life Balance Our team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.
US, WA, Seattle
Do you want to join an innovative team of scientists who use machine learning to help Amazon provide the best experience to our Selling Partners by automatically understanding and addressing their challenges, needs and opportunities? Do you want to build advanced algorithmic systems that are powered by state-of-art ML, such as Natural Language Processing, Large Language Models, Deep Learning, Computer Vision and Causal Modeling, to seamlessly engage with Sellers? Are you excited by the prospect of analyzing and modeling terabytes of data and creating cutting edge algorithms to solve real world problems? Do you like to build end-to-end business solutions and directly impact the profitability of the company and experience of our customers? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Selling Partner Experience Science team. Key job responsibilities Use statistical and machine learning techniques to create the next generation of the tools that empower Amazon's Selling Partners to succeed. Design, develop and deploy highly innovative models to interact with Sellers and delight them with solutions. Work closely with teams of scientists and software engineers to drive real-time model implementations and deliver novel and highly impactful features. Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation. Research and implement novel machine learning and statistical approaches. Lead strategic initiatives to employ the most recent advances in ML in a fast-paced, experimental environment. Drive the vision and roadmap for how ML can continually improve Selling Partner experience. About the team Selling Partner Experience Science (SPeXSci) is a growing team of scientists, engineers and product leaders engaged in the research and development of the next generation of ML-driven technology to empower Amazon's Selling Partners to succeed. We draw from many science domains, from Natural Language Processing to Computer Vision to Optimization to Economics, to create solutions that seamlessly and automatically engage with Sellers, solve their problems, and help them grow. Focused on collaboration, innovation and strategic impact, we work closely with other science and technology teams, product and operations organizations, and with senior leadership, to transform the Selling Partner experience.
US, WA, Seattle
The AWS AI Labs team has a world-leading team of researchers and academics, and we are looking for world-class colleagues to join us and make the AI revolution happen. Our team of scientists have developed the algorithms and models that power AWS computer vision services such as Amazon Rekognition and Amazon Textract. As part of the team, we expect that you will develop innovative solutions to hard problems, and publish your findings at peer reviewed conferences and workshops. AWS is the world-leading provider of cloud services, has fostered the creation and growth of countless new businesses, and is a positive force for good. Our customers bring problems which will give Applied Scientists like you endless opportunities to see your research have a positive and immediate impact in the world. You will have the opportunity to partner with technology and business teams to solve real-world problems, have access to virtually endless data and computational resources, and to world-class engineers and developers that can help bring your ideas into the world. Our research themes include, but are not limited to: few-shot learning, transfer learning, unsupervised and semi-supervised methods, active learning and semi-automated data annotation, large scale image and video detection and recognition, face detection and recognition, OCR and scene text recognition, document understanding, 3D scene and layout understanding, and geometric computer vision. For this role, we are looking for scientist who have experience working in the intersection of vision and language. We are located in Seattle, Pasadena, Palo Alto (USA) and in Haifa and Tel Aviv (Israel).
GB, London
Are you excited about applying economic models and methods using large data sets to solve real world business problems? Then join the Economic Decision Science (EDS) team. EDS is an economic science team based in the EU Stores business. The teams goal is to optimize and automate business decision making in the EU business and beyond. An internship at Amazon is an opportunity to work with leading economic researchers on influencing needle-moving business decisions using incomparable datasets and tools. It is an opportunity for PhD students in Economics or related fields. We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Knowledge of econometrics, as well as basic familiarity with Stata, R, or Python is necessary. Experience with SQL would be a plus. As an Economics Intern, you will be working in a fast-paced, cross-disciplinary team of researchers who are pioneers in the field. You will take on complex problems, and work on solutions that either leverage existing academic and industrial research, or utilize your own out-of-the-box pragmatic thinking. In addition to coming up with novel solutions and prototypes, you may even need to deliver these to production in customer facing products. Roughly 85% of previous intern cohorts have converted to full time economics employment at Amazon.