Amazon builds first foundation model for multirobot coordination

Trained on millions of hours of data from Amazon fulfillment centers and sortation centers, Amazon’s new DeepFleet models predict future traffic patterns for fleets of mobile robots.

Large language models and other foundation models have introduced a new paradigm in AI: large models trained in a self-supervised fashion — no data annotation required — on huge volumes of data can learn general competencies that allow them to perform a variety of tasks. The most prominent examples of this paradigm are in language, image, and video generation. But where else can it be applied?

At Amazon, one answer to that question is in managing fleets of robots. In June, we announced the development of a new foundation model for predicting the interactions of mobile robots on the floors of Amazon fulfillment centers (FCs) and sortation centers, which we call DeepFleet. We still have a lot to figure out, but DeepFleet can already help assign tasks to our robots and route them around potential congestion, increasing the efficiency of our robot deployments by 10%. That lets us deliver packages to customers more rapidly and at lower costs.

Robots laden with storage pods at a fulfillment center (left) and with packages at a sortation center (right).
Robots laden with storage pods at a fulfillment center (left) and with packages at a sortation center (right).

One question I get a lot is why we would need a foundation model to predict robots’ locations. After all, we know exactly what algorithms the robots are running; can’t we just simulate their interactions and get an answer that way?

There are two obstacles to this approach. First, accurately simulating the interactions of a couple thousand robots faster than real time is prohibitively resource intensive: our fleet already uses all available computation time to optimize its plans. In contrast, a learned model can quickly infer how traffic will likely play out.

Second, we see predicting robot locations as, really, a pretraining task, which we use to teach an AI to understand traffic flow. We believe that, just as pretraining on next-word prediction enabled chatbots to answer a diverse range of questions, pretraining on location prediction can enable an AI to generate general solutions for mobile-robot fleets.

Related content
Unique end-of-arm tools with three-dimensional force sensors and innovative control algorithms enable robotic arms to “pick” items from and “stow” items in fabric storage pods.

The success of a foundation model depends on having adequate training data, which is one of the areas where Amazon has an advantage. At the same time that we announced DeepFleet, we also announced the deployment of our millionth robot to Amazon FCs and sortation centers. We have literally billions of hours of robot navigation data that we can use to train our foundation models.

And of course, Amazon is also the largest provider of cloud computing resources, so we have the computational capacity to train and deploy models large enough to benefit from all that training data. One of our paper’s key findings is that, like other foundation models, a robot fleet foundation model continues to improve as the volume of training data increases.

In some ways, it’s natural to adapt LLM architectures to the problem of predicting robot location. An LLM takes in a sequence of words and projects that sequence forward, one word at a time. Similarly, a robot navigation model would take in a sequence of robot states or floor states and project it forward, one state at a time.

In other ways, the adaptation isn’t so straightforward. With LLMs, it’s clear what the inputs and outputs should be: words (or more precisely word parts, or tokens). But how about with robot navigation? Should the input to the model be the state of a single robot, and you produce a floor map by aggregating the outputs of multiple models? Or should the inputs and outputs include the state of the whole floor? And if they do, how do you represent the floor? As a set of features relative to the robot location? As an image? As a graph? And how do you handle time? Is each input to the model a snapshot taken at a regular interval? Or does each input represent a discrete action, whenever it took place?

We experimented with four distinct models that answer these questions in different ways. The basic setup is the same for all of them: we model the floor of an FC or sortation center as a grid whose cells can be occupied by robots, which are either laden (storage pods in an FC, packages in a sortation center) or unladen and have fixed orientations; obstacles; or storage or drop-off locations. Unoccupied cells make up travel lanes.

Sample models of a fulfillment center (top) and a sortation center (bottom).
Sample models of a fulfillment center (top) and a sortation center (bottom).

Like most machine learning systems of the past 10 years, our models produce embeddings of input data, or vector representations that capture data features useful for predictive tasks. All of our models make use of the Transformer architecture that is the basis of today’s LLMs. The Transformer’s characteristic feature is the attention mechanism: when determining its next output, the model determines how much it should attend to each data item it’s already seen — or to supplementary data. One of our models also uses a convolutional neural network, the standard model for image processing, while another uses a graph neural network to capture spatial relationships.

DeepFleet is the collective name for all of our models. Individually, they are the robot-centric model, the robot-floor model, the image-floor model, and the graph-floor model.

1. The robot-centric model

The robot-centric model focuses on one robot at a time — the “ego robot” — and builds a representation of its immediate environment. The model’s encoder produces an embedding of the ego robot’s state — where it is, what direction it’s facing, where it’s headed, whether it’s laden or unladen, and so on. The encoder also produces embeddings of the states of the 30 robots nearest the ego robot; the 100 nearest grid cells; and the 100 nearest objects (drop-off chutes, storage pods, charging stations, and so on).

A Transformer combines these embeddings into a single embedding, and a sequence of such embeddings — representing a sequence of states and actions the ego robot took — passes to a decoder. On the basis of that sequence, the decoder predicts the robot’s next action. This process happens in parallel for every robot on the floor. Updating the state of the floor as a whole is a matter of sequentially applying each robot’s predicted action.

Architecture of the robot-centric model.
Architecture of the robot-centric model.

2. The robot-floor model

With the robot-floor model, separate encoders produce embeddings of the robot states and fixed features of the floor cells. As the only changes to the states of the floor cells are the results of robotic motion, the floor state requires only a single embedding.

At decoding time, we use cross-attention between the robot embeddings and the floor state embedding to produce a new embedding for each robot that factors in floor state information. Then, for each robot, we use cross-attention between its updated embedding and those of each of the other robots to produce a final embedding, which captures both robot-robot and robot-floor relationships. The last layer of the model — the output head — uses these final embeddings to predict each robot’s next action.

The architecture of the robot-floor model..png
The architecture of the robot-floor model.

3. The image-floor model

Convolutional neural networks step through an input image, applying different filters to fixed-size blocks of pixels. Each filter establishes a separate processing channel through the network. Typically, the filters are looking for different image features, such as contours with particular shapes and orientations.

In our case, however, the “pixels” are cells of the floor grid, and each channel is dedicated to a separate cell feature. There are static features, such as fixed objects in particular cells, and dynamic features, such as the locations of the robots and their states.

Related content
Generative AI supports the creation, at scale, of complex, realistic driving scenarios that can be directed to specific locations and environments.

In each channel, representations of successive states of the floor are flattened — converted from 2-D grids to 1-D vectors — and fed to a Transformer. The Transformer’s attention mechanism can thus attend to temporal and spatial features simultaneously. The Transformer’s output is an encoding of the next floor state, which a convolutional decoder converts back to a 2-D representation.

4. The graph-floor model

A natural way to model the FC or sortation center floor is as a graph whose nodes are floor cells and whose edges encode the available movements between cells (for example, a robot may not move into a cell occupied by another object). We convert such a spatial graph into a spatiotemporal graph by adding temporal edges that connect each node to itself at a later time step.

Next, in the approach made standard by graph neural networks, we use a Transformer to iteratively encode the spatiotemporal graph as a set of node embeddings. With each iteration, a node’s embedding factors in information about nodes farther away from it in the graph. In parallel, the model also builds up a set of edge embeddings.

Each encoding block also includes an attention mechanism that uses the edge embeddings to compute attention scores between node embeddings. The output embedding thus factors in information about the distances between nodes, so it can capture long-range effects.

From the final set of node embeddings, we can decode a prediction of where each robot is, whether it is moving, what direction it is heading, etc.

The architecture of the graph-floor model.
The architecture of the graph-floor model.

Evaluation

We used two metrics to evaluate all four models’ performance. The first is dynamic-time-warping (DTW) distance between predictions and the ground truth across multiple dimensions, including robot position, speed, state, and the timing of load and unload events. The second metric is congestion delay error (CDE), or the relative error between delay predictions and ground truth.

Overall, the robot-centric model performed best, with the top scores on both CDE and the DTW distance on position and state predictions, but the robot-floor model achieved the top score on DTW distance for timing estimation. The graph-floor model didn’t fare quite as well, but its results were still strong at a significantly lower parameter count — 13 million, versus 97 million for the robot-centric model and 840 million for the robot-floor model.

The image-floor model didn’t work well. We suspect that this is because the convolutional filters of a convolutional neural network are designed to abstract away from pixel-level values to infer larger-scale image features, like object classifications. We were trying to use convolutional neural networks for pixel-level predictions, which they may not be suited for.

We also conducted scaling experiments with the robot-centric and graph-floor models, which showed that, indeed, model performance improved with increases in the volume of training data — an encouraging sign, given the amount of data we have at our disposal.

On the basis of these results, we are continuing to develop the robot-centric, robot-floor, and graph-floor models, initially using them to predict congestion, with the longer-term goal of using them to produce outputs like assignments of robots to specific retrieval tasks and target locations. You can read the full paper on arXiv.

Research areas

Related content

US, WA, Seattle
WW Amazon Stores Finance Science (ASFS) works to leverage science and economics to drive improved financial results, foster data backed decisions, and embed science within Finance. ASFS is focused on developing products that empower controllership, improve business decisions and financial planning by understanding financial drivers, and innovate science capabilities for efficiency and scale. We are looking for a data scientist to lead high visibility initiatives for forecasting Amazon Stores' financials. You will develop new science-based forecasting methodologies and build scalable models to improve financial decision making and planning for senior leadership up to VP and SVP level. You will build new ML and statistical models from the ground up that aim to transform financial planning for Amazon Stores. We prize creative problem solvers with the ability to draw on an expansive methodological toolkit to transform financial decision-making with science. The ideal candidate combines data-science acumen with strong business judgment. You have versatile modeling skills and are comfortable owning and extracting insights from data. You are excited to learn from and alongside seasoned scientists, engineers, and business leaders. You are an excellent communicator and effectively translate technical findings into business action. Key job responsibilities Demonstrating thorough technical knowledge, effective exploratory data analysis, and model building using industry standard ML models Working with technical and non-technical stakeholders across every step of science project life cycle Collaborating with finance, product, data engineering, and software engineering teams to create production implementations for large-scale ML models Innovating by adapting new modeling techniques and procedures Presenting research results to our internal research community
US, WA, Seattle
This role will contribute to developing the Economics and Science products and services in the Fee domain, with specialization in supply chain systems and fees. Through the lens of economics, you will develop causal links for how Amazon, Sellers and Customers interact. You will be a key and senior scientist, advising Amazon leaders how to price our services. You will work on developing frameworks and scalable, repeatable models supporting optimal pricing and policy in the two-sided marketplace that is central to Amazon's business. The pricing for Amazon services is complex. You will partner with science and technology teams across Amazon including Advertising, Supply Chain, Operations, Prime, Consumer Pricing, and Finance. We are looking for an experienced Economist to improve our understanding of seller Economics, enhance our ability to estimate the causal impact of fees, and work with partner teams to design pricing policy changes. In this role, you will provide guidance to scientists to develop econometric models to influence our fee pricing worldwide. You will lead the development of causal models to help isolate the impact of fee and policy changes from other business actions, using experiments when possible, or observational data when not. Key job responsibilities The ideal candidate will have extensive Economics knowledge, demonstrated strength in practical and policy relevant structural econometrics, strong collaboration skills, proven ability to lead highly ambiguous and large projects, and a drive to deliver results. They will work closely with Economists, Data / Applied Scientists, Strategy Analysts, Data Engineers, and Product leads to integrate economic insights into policy and systems production. Familiarity with systems and services that constitute seller supply chains is a plus but not required. About the team The Stores Economics and Sciences team is a central science team that supports Amazon's Retail and Supply Chain leadership. We tackle some of Amazon's most challenging economics and machine learning problems, where our mandate is to impact the business on massive scale.
US, WA, Bellevue
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to apply their causal inference and/or structural econometrics skillsets to solve real world problems. The intern will work in the area of Economics Intelligence in Amazon Returns and Recommerce Technology and Innovation and develop new, data-driven solutions to support the most critical components of this rapidly scaling team. Our PhD Economist Internship Program offers hands-on experience in applied economics, supported by mentorship, structured feedback, and professional development. Interns work on real business and research problems, building skills that prepare them for full-time economist roles at Amazon and beyond. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. About the team The WWRR Economics Intelligence (RREI) team brings together Economists, Data Scientists, and Business Intelligence Engineers experts to delivers economic solutions focused on forecasting, causality, attribution, customer behavior for returns, recommerce, and sustainability domains.
US, WA, Bellevue
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to apply their causal inference and/or structural econometrics skillsets to solve real world problems. The intern will work in the area of Economics Intelligence in Amazon Returns and Recommerce Technology and Innovation and develop new, data-driven solutions to support the most critical components of this rapidly scaling team. Our PhD Economist Internship Program offers hands-on experience in applied economics, supported by mentorship, structured feedback, and professional development. Interns work on real business and research problems, building skills that prepare them for full-time economist roles at Amazon and beyond. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. About the team The WWRR Economics Intelligence (RREI) team brings together Economists, Data Scientists, and Business Intelligence Engineers experts to delivers economic solutions focused on forecasting, causality, attribution, customer behavior for returns, recommerce, and sustainability domains.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the next level. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. As a Research Scientist, you will work with a unique and gifted team developing exciting products for consumers and collaborate with cross-functional teams. Our team rewards intellectual curiosity while maintaining a laser-focus in bringing products to market. Competitive candidates are responsive, flexible, and able to succeed within an open, collaborative, entrepreneurial, startup-like environment. At the intersection of both academic and applied research in this product area, you have the opportunity to work together with some of the most talented scientists, engineers, and product managers. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best.
US, WA, Seattle
Amazon has co-founded and signed The Climate Pledge, a commitment to reach net zero carbon by 2040. As a team, we leverage GenAI, sensors, smart home devices, cloud services, material science, and Alexa to build products that have a meaningful impact for customers and the climate. In alignment with this bold corporate goal, the Amazon Devices & Services organization is looking for a passionate, talented, and inventive Senior Applied Scientist to help build revolutionary products with potential for major societal impact. Great candidates for this position will have expertise in the areas of agentic AI applications, deep learning, time series analysis, LLMs, and multimodal systems. This includes experience designing autonomous AI agents that can reason, plan, and execute multi-step tasks, building tool-augmented LLM systems with access to external APIs and data sources, implementing multi-agent orchestration, and developing RAG architectures that combine LLMs with domain-specific knowledge bases. You will strive for simplicity and creativity, demonstrating high judgment backed by statistical proof. Key job responsibilities As a Senior Applied Scientist on the Energy Science team, you'll design and deploy agentic AI systems that autonomously analyze data, plan solutions, and execute recommendations. You'll build multi-agent architectures where specialized AI agents coordinate to solve complex optimization problems, and develop tool-augmented LLM applications that integrate with external data sources and APIs to deliver context-aware insights. Your work involves creating multimodal AI systems that synthesize diverse data streams, while implementing RAG pipelines that ground large language models in domain-specific knowledge bases. You'll apply advanced machine learning and deep learning techniques to time series analysis, forecasting, and pattern recognition. Beyond technical innovation, you'll drive end-to-end product development from research through production deployment, collaborating with cross-functional teams to translate AI capabilities into customer experiences. You'll establish rigorous experimentation frameworks to validate model performance and measure business impact, building AI-driven products with potential for major societal impact.
IN, KA, Bengaluru
Amazon Health Services (One Medical) About Us: At Health AI, we're revolutionizing healthcare delivery through innovative AI-enabled solutions. As part of Amazon Health Services and One Medical, we're on a mission to make quality healthcare more accessible while improving patient outcomes. Our work directly impacts millions of lives by empowering patients and enabling healthcare providers to deliver more meaningful care. Role Overview: We're seeking an Applied Scientist to join our dynamic team in building state of the art AI/ML solutions for healthcare. This role offers a unique opportunity to work at the intersection of artificial intelligence and healthcare, developing solutions that will shape the future of medical services delivery. Key job responsibilities • Lead end-to-end development of AI/ML solutions for Amazon Health organization, including Amazon Pharmacy and One Medical • Research, design, and implement state-of-the-art machine learning models, with a focus on Large Language Models (LLMs) and Visual Language Models (VLMs) • Optimize and fine-tune models for production deployment, including model distillation for improved latency • Drive scientific innovation while maintaining a strong focus on practical business outcomes • Collaborate with cross-functional teams to translate complex technical solutions into tangible customer benefits • Contribute to the broader Amazon Health scientific community and help shape our technical roadmap
US, CA, San Francisco
Amazon launched the AGI Lab to develop foundational capabilities for useful AI agents. We built Nova Act - a new AI model trained to perform actions within a web browser. The team builds AI/ML infrastructure that powers our production systems to run performantly at high scale. We’re also enabling practical AI to make our customers more productive, empowered, and fulfilled. In particular, our work combines large language models (LLMs) with reinforcement learning (RL) to solve reasoning, planning, and world modeling in both virtual and physical environments. Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. We’d love for you to join our lab and build it from the ground up! Key job responsibilities This role will lead a team of SDEs building AI agents infrastructure from launch to scale. The role requires the ability to span across ML/AI system architecture and infrastructure. You will work closely with application developers and scientists to have a impact on the Agentic AI industry. We're looking for a Software Development Manager who is energized by building high performance systems, making an impact and thrives in fast-paced, collaborative environments. About the team Check out the Nova Act tools our team built on on nova.amazon.com/act
US, WA, Seattle
MULTIPLE POSITIONS AVAILABLE Employer: AMAZON WEB SERVICES, INC. Offered Position: Applied Scientist III Job Location: Seattle, Washington Job Number: AMZ9674037 Position Responsibilities: Participate in the design, development, evaluation, deployment and updating of data-driven models and analytical solutions for machine learning (ML) and/or natural language (NL) applications. Develop and/or apply statistical modeling techniques (e.g. Bayesian models and deep neural networks), optimization methods, and other ML techniques to different applications in business and engineering. Routinely build and deploy ML models on available data, and run and analyze experiments in a production environment. Identify new opportunities for research in order to meet business goals. Research and implement novel ML and statistical approaches to add value to the business. Mentor junior engineers and scientists. Position Requirements: Master’s degree or foreign equivalent degree in Computer Science, Machine Learning, Engineering, or a related field and two years of research or work experience in the job offered, or as a Research Scientist, Research Assistant, Software Engineer, or a related occupation. Employer will accept a Bachelor’s degree or foreign equivalent degree in Computer Science, Machine Learning, Engineering, or a related field and five years of progressive post-baccalaureate research or work experience in the job offered or a related occupation as equivalent to the Master’s degree and two years of research or work experience. Must have one year of research or work experience in the following skill(s): (1) programming in Java, C++, Python, or equivalent programming language; and (2) conducting the analysis and development of various supervised and unsupervised machine learning models for moderately complex projects in business, science, or engineering. Amazon.com is an Equal Opportunity-Affirmative Action Employer – Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation. 40 hours / week, 8:00am-5:00pm, Salary Range $167,100/year to $226,100/year. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, visit: https://www.aboutamazon.com/workplace/employee-benefits.#0000
US, CA, Santa Clara
Amazon Quick Suite is an enterprise AI platform that transforms how organizations work with their data and knowledge. Combining generative AI-powered search, deep research capabilities, intelligent agents and automations, and comprehensive business intelligence, Quick Suite serves tens of thousands of users. Our platform processes thousands of queries monthly, helping teams make faster, data-driven decisions while maintaining enterprise-grade security and governance. From natural language interactions with complex datasets to automated workflows and custom AI agents, Quick Suite is redefining workplace productivity at unprecedented scale. We are seeking a Data Scientist II to join our Quick Data team, focusing on evaluation and benchmarking data development for Quick Suite features, with particular emphasis on Research and other generative AI capabilities. Our mission is to engineer high-quality datasets that are essential to the success of Amazon Quick Suite. From human evaluations and Responsible AI safeguards to Retrieval-Augmented Generation and beyond, our work ensures that Generative AI is enterprise-ready, safe, and effective for users at scale. As part of our diverse team—including data scientists, engineers, language engineers, linguists, and program managers—you will collaborate closely with science, engineering, and product teams. We are driven by customer obsession and a commitment to excellence. Key job responsibilities In this role, you will leverage data-centric AI principles to assess the impact of data on model performance and the broader machine learning pipeline. You will apply Generative AI techniques to evaluate how well our data represents human language and conduct experiments to measure downstream interactions. Specific responsibilities include: * Design and develop comprehensive evaluation and benchmarking datasets for Quick Suite AI-powered features * Leverage LLMs for synthetic data corpora generation; data evaluation and quality assessment using LLM-as-a-judge settings * Create ground truth datasets with high-quality question-answer pairs across diverse domains and use cases * Lead human annotation initiatives and model evaluation audits to ensure data quality and relevance * Develop and refine annotation guidelines and quality frameworks for evaluation tasks * Conduct statistical analysis to measure model performance, identify failure patterns, and guide improvement strategies * Collaborate with ML scientists and engineers to translate evaluation insights into actionable product improvements * Build scalable data pipelines and tools to support continuous evaluation and benchmarking efforts * Contribute to Responsible AI initiatives by developing safety and fairness evaluation datasets About the team Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.