Robotics

How Amazon Robotics researchers are solving a “beautiful problem”

Teaching robots to stow items presents a challenge so large it was previously considered impossible — until now.

November 15, 2022

The rate of innovation in machine learning is simply off the chart — what is possible today was barely on the drawing board even a handful of years ago. At Amazon, this has manifested in a robotic system that can not only identify potential space in a cluttered storage bin, but also sensitively manipulate that bin’s contents to create that space before successfully placing additional items inside — a result that, until recently, was impossible.

“Breaking all existing industrial robot thinking”

This stow task requires two high-level capabilities not generally found in robots. One, an excellent understanding of the three-dimensional world. Two, the ability to manipulate a wide range of packaged but sometimes fragile objects — from lightbulbs to toys — firmly, but sensitively: pushing items gently aside, flipping them up, slotting one item at an angle between other items and so on.

A simulation of robotic stowing

For a robotic system to stand a chance at this task, it would need intelligent visual perception, a free-moving robot arm, an end-of-arm manipulator unknown to engineering, and a keen sense of how much force it is exerting. In short: good luck with that.

“Stow fundamentally breaks all existing industrial robotic thinking,” says Siddhartha Srinivasa, director of Amazon Robotics AI. “Industrial manipulators are typically bulky arms that execute fixed trajectories very precisely. It’s very positional.”

When Srinivasa joined Amazon in 2018, multiple robotics programs had already attempted to stow to fabric pods using stiff positional manipulators.

“They failed miserably at it because it's a nightmare. It just doesn't work unless you have the right computational tool: you must not think physically, but computationally.”

Srinivasa knew the science for robotic stow didn’t exist yet, but he knew the right people to hire to develop it. He approached Parker Owan as he completed his PhD at the University of Washington.

A “beautiful problem”

Parker Owan, Robotics AI senior applied scientist, poses next to a robotic arm and in front of a yellow soft sided storage pod — Parker Owan, Robotics AI senior applied scientist

“At the time I was working on robotic contact, imitation learning, and force control,” says Owan, now a Robotics AI senior applied scientist. “Sidd said ‘Hey, there’s this beautiful problem at Amazon that you might be interested in taking a look at’, and he left it at that.”

The seed was planted. Owan joined Amazon, and then in 2019 dedicated himself to the stow challenge.

“I came at it from the perspective of decision-making algorithms: the perception needs; how to match items to the appropriate bin; how to leverage information of what's in the bin to make better decisions; motion planning for a robot arm moving through free space; and then actually making contact with products and creating space in bins.”

Aaron Parness, Robotics AI senior manager of applied science, poses near a robotic arm — Aaron Parness, Robotics AI senior manager of applied science

About six months into his exploratory work, Owan was joined by a small team of applied scientists, and hardware expert Aaron Parness, now a Robotics AI senior manager of applied science. Parness admits he was skeptical.

“My initial reaction was ‘Oh, how brave and naïve that this guy, fresh out of his PhD, thinks robots can deal with this level of clutter and physical contact!’”

But Parness was quickly hooked. “Once you see how the problem can be broken down and structured, it suddenly becomes clear that there's something super useful and interesting here.”

“Uncharted territory”

From a hardware perspective, the team needed to find a robot arm with force feedback. They tried several, before the team landed on an effective model. The arm provides feedback hundreds of times per second on how much force it is applying and any resistance it is meeting. Using this information to control the robot is called compliant manipulation.

“We knew from the beginning that we needed compliant manipulation, and we hadn't seen anybody in industry do this at scale before,” says Owan. “It was uncharted territory.”

Parness got to work on the all-important hardware. The problem of moving the elastics aside to stow an item was resolved using a relatively simple hooking system.

How the band separator works

The end-of-arm tool (EOAT) proved to be a next-level challenge. One reason that stowing is difficult for robots is the sheer diversity of items Amazon sells, and their associated packaging. You might have an unpumped soccer ball next to a book, next to a sports drink, next to a T-shirt, next to a jewelry box. A robot would need to handle this level of variety. The EOAT evolved quickly over two years, with multiple failures and iterations.

Paddles grip an array of items

“In the end, we found that gently squeezing an item between two paddles was the more stable way to hold items than using suction cups or mechanical pinchers,” says Parness.

However, the paddle set up presented a challenge when trying to insert held items into bins — the paddles kept getting in the way. Parness and his growing team hit upon an alternative: holding the item next to a bin, before simultaneously opening the paddles and using a plunger to push the item in. This drop-and-push technique was prone to errors because not all items reacted to it in the same way.

The EOAT’s next iteration saw the team put miniature conveyor belts on each paddle, enabling the EOAT to feed items smoothly into the bins without having to enter the bin itself.

The miniature conveyor belt works to bring an item to its designated bin

“With that change, our stowing success rate jumped from about 80% to 99%. That was a eureka moment for us — we knew we had our winner,” says Parness.

Making space with motion primitives

The ability to place items in bins is crucial, but so is making space in cluttered bins. To better understand what would be required of the robot system, the team closely studied how they performed the task themselves. Owan even donned a head camera to record his efforts.

The team was surprised to find that the vast majority of space-making hand movements within a fabric bin could be boiled down to four types or “motion primitives”. These include a sideways sweep of the bin’s current contents, flipping upright things that are lying flat, stacking, and slotting something at an angle into the gap between other items.

The process of making space

The engineers realized that the EOAT’s paddles could not get involved with this bin-manipulation task, because they would get in the way. The solution, in the end, was surprisingly simple: a thin metal sheet that could extend from the EOAT, dubbed “the spatula”. The extended spatula can firmly, but sensitively, push items to one side, flip them up, and generally be used to make room in a bin, before the paddles eject an item into the space created.

But how does the system know how full the pod’s bins are, and how does it decide where, and how, it will make space for the next item to be stowed? This is where visual perception and machine learning come into play.

Deciding where to attempt to stow an item requires a good understanding of how much space, in total, is available in each fabric bin. In an ideal world, this is where 3D sensor technologies such as LiDAR would be used. However, because the elastic cords across the front of every bin partially blocks the view inside, this option isn’t feasible.

A robot arm executes motion primitives

Instead, the system’s visual perception is based on cameras pointed at the pod that feed their image data to a machine learning system. Based on what it can see of each bin’s contents, the system “erases” the elastics and models what is lying unseen in the bin, and then estimates the total available space in each of the pod’s bins.

Often there is space available in a cluttered bin, but it is not contiguous: there are pockets of space here and there. The ML system — based in part on existing models developed by the Amazon Fulfillment Technologies team — then predicts how much contiguous space it can create in each bin, given the motion primitives at its disposal.

How the perception system "sees" available space

“These primitives, each of which can be varied as needed, can be chained in infinitely many ways,” Srinivasa explains. “It can, say, flip it over here, then push it across and drop the item in. Humans are great at identifying these primitives in the first place, and machine learning is great at organizing and orchestrating them.”

When the system has a firm idea of the options, it considers the items in its buffer — an area near the robot arm’s gantry in which products of various shapes and sizes wait to be stowed — and decides which items are best placed in which bins for maximum efficiency.

“For every potential stow, the system will predict its likelihood of success,” says Parness. “When the best prediction of success falls to about 96%, which happens when a pod is nearly full, we send that pod off and wheel in a new one.”

“Robots and people work together”

At the end of summer 2021, with its potential feasibility and value becoming clearer, the senior leadership team at Amazon gave the project their full backing.

“They said ‘As fast as you can go; whatever you need’. So this year has been a wild, wild ride. It feels like we’re a start-up within Amazon,” says Parness, who noted the approach has significant advantages for FC employees as well.