Teaching household robots where to find requested objects

Leveraging a large vision-language foundation model enables state-of-the-art performance in remote-object grounding.

Remote-object grounding is the task of automatically determining where in the local environment to find an object specified in natural language. It is an essential capability for household robots, which need to be able to execute commands like “Bring me the pair of glasses on the counter in the kids’ bathroom.”

In a paper we are presenting at the International Conference on Intelligent Robots and Systems (IROS), my colleagues and I describe a new approach to remote-object grounding that leverages a foundation model — a large, self-supervised model that learns joint representations of language and images. By treating remote-object grounding as an information retrieval problem and using a “bag of tricks” to adapt the foundation model to this new application, we enable a 10% improvement over the state of the art on one benchmark dataset and a 5% improvement on another.

VLN problem setup.png
A new approach treats remote-object grounding as an information retrieval problem, in which a model must match candidate objects against a natural-language request.

Language-and-vision models

In recent years, foundation models — such as large language models — have revolutionized several branches of AI. Foundation models are usually trained through masking: elements of the input data — whether text or images — are masked out, and the model must learn to fill in the gaps. Since masking requires no human annotation, it enables the models to be trained on huge corpora of publicly available data. Our approach to remote-object grounding is based on a vision-language (VL) model — a model that has learned to jointly represent textual descriptions and visual depictions of the same objects.

Related content
Predicting the delays caused when robots’ paths intersect can improve task assignment and path planning in warehouses.

We consider the scenario in which a household robot has had adequate time to build up a 3-D map of its immediate environment, including visual representations of the objects in that environment. We treat remote-object grounding as an information retrieval problem, meaning that the model takes linguistic descriptions — e.g., “the glasses on the counter in the kids’ bathroom” — and retrieves the corresponding object in its representation of its visual environment.

Adapting a VL model to this problem poses two major challenges. The first is the scale of the problem. A single household might contain 100,000 discrete objects; it would be prohibitively time consuming to use a large foundation model to query that many candidates at once. The other challenge is that VL models are typically trained on 2-D images, whereas a household robot builds up a 3-D map of its environment.

Gunnar A. Sigurdsson on adapting vision-language foundation models to the problem of remote-object grounding.

Bag of tricks

In our paper, we present a “bag of tricks” that help our model surmount these and other challenges.

1. Negative examples

The obvious way to accommodate the scale of the retrieval problem is to break it up, separately scoring the candidate objects in each room, say, and then selecting the most probable candidates from each list of objects.

The problem with this approach is that the scores of the objects in each list are relative to each other. A high-scoring object is one that is much more likely than the others to be the correct referent for a command; relative to candidates on a different list, however, its score might drop. To improve consistency across lists, we augment the model’s training data with negative examples — viewpoints from which the target objects are not visible. This prevents the model from getting overconfident in its scoring of candidate objects.

Related content
Using different levels of precision for different arithmetic tasks reduces computational burden without compromising performance.

2. Distance-limited exploration

Our second trick for addressing the problem of scale is to limit the radius in which we search for candidate objects. During training, the model learns not only what objects best correspond to what requests but how far it usually has to go to find them. Limiting search radius makes the problem much more tractable with little loss of accuracy.

3. 3-D representations

To address the mismatch between the 2-D data used to train the VL model and the 3-D data that the robot uses to map its environment, we convert the 2-D coordinates of the “bounding box” surrounding an object — the rectangular demarcation of the object’s region of the image — to a set of 3-D coordinates: the three spatial dimensions of the center of the bounding box and a radius, defined as half the length of the bounding box’s diagonal.

4. Context vectors

Finally, we employ a trick to improve the model’s overall performance. For each viewpoint — that is, each location from which the robot captures multiple images of the immediate environment — our model produces a context vector, which is an average of the vectors corresponding to all of the objects visible from that viewpoint. Adding the context vector to the representations of particular candidate objects enables the robot to, say, distinguish the mirror above the sink in one bathroom from the mirror above the sink in another.

VLN overview.png
An overview of the "bag of tricks" deployed, both during training and at inference time, to adapt a vision-language model to the problem of remote-object grounding.

We tested our approach on two benchmark datasets, each of which contains tens of thousands of commands and the corresponding sets of sensor readings, and found that it significantly outperformed the previous state-of-the-art model. To test our algorithm’s practicality, we also deployed it on a real-world robot and found that it was able to execute commands in real time with high accuracy.

Frontier-based exploration.16x9.png
At inference time, if the robot has no prior knowledge of its environment, it can use frontier-based exploration to map the locations of candidate objects for remote-object grounding.

Related content

US, CA, Sunnyvale
As a Reinforcement Learning Controls Scientist, you will be responsible for developing Reinforcement Learning models to control complex electromechanical systems. You will take responsibility for defining frameworks, performing analysis, and training models that guide and inform mechanical and electrical designs, software implementation, and other software modules that affect overall device safety and performance. You understand trade-offs between model-based and model-free approaches. You will demonstrate cross-functional collaboration and influence to accomplish your goals. You will play a role in defining processes and methods to improve the productivity of the entire team. You will interface with Amazon teams outside your immediate organization to collaborate and share knowledge. You will investigate applicable academic and industry research, prototype and test solutions to support product features, and design and validate production designs that deliver an exceptional user experience. Key job responsibilities - Produce models and simulations of complex, high degree-of-freedom dynamic electromechanical systems - Train Reinforcement Learning control policies that achieve performance targets within hardware and software constraints - Hands-on prototyping and testing of physical systems in the lab - Influence hardware and software design decisions owned by other teams to optimize system-level performance - Work with cross-functional teams (controls, firmware, perception, planning, sensors, mechanical, electrical, etc.) to solve complex system integration issues - Define key performance indicators and allocate error budgets across hardware and software modules - Perform root cause analysis of system-level failures and distinguish between hardware/software failures and hardware/software mitigations - Translate business requirements to engineering requirements and identify trade-offs and sensitivities - Mentor junior engineers in good design practice; actively participate in hiring of new team members About the team The Dynamic Systems and Control team develops models, algorithms, and code to bridge hardware and software development teams and bring robotic products to life. We contributed to Amazon Astro (https://www.amazon.com/Introducing-Amazon-Astro/dp/B078NSDFSB) and Echo Show 10 (https://www.amazon.com/echo-show-10/dp/B07VHZ41L8/), along with several new technology introductions and unannounced products currently in development.
US, WA, Seattle
About Sponsored Products and Brands: The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through industry leading generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. About Our Team: The Sponsored Brands Impressions-based Offerings team is responsible for evolving the value proposition of Sponsored Brands to drive brand advertising in retail media at scale, helping brands get discovered, acquire new customers and sustainably grow customer lifetime value. We build end-to-end solutions that enable brands to drive discovery, visibility and share of voice. This includes building advertiser controls, shopper experiences, monetization strategies and optimization features. We succeed when (1) shoppers discover, engage and build affinity with brands and (2) brands can grow their business at scale with our advertising products. About This Role: As a Principal Scientist for the team, you will have the opportunity to apply your deep subject matter expertise in the area of ML, LLM and GenAI models. You will invent new product experiences that enable novel advertiser and shopper experiences. This role will liaise with internal Amazon partners and work on bringing state-of-the-art GenAI models to production, and stay abreast of the latest developments in the space of GenAI and identify opportunities to improve the efficiency and productivity of the team. Additionally, you will define a long-term science vision for our advertising business, driven by our customer’s needs, and translate it into actionable plans for our team of applied scientists and engineers. This role will play a critical role in elevating the team’s scientific and technical rigor, identifying and implementing best-in-class algorithms, methodologies, and infrastructure that enable rapid experimentation and scaling. You will communicate learnings to leadership and mentor and grow Applied AI talent across org. * Develop AI solutions for Sponsored Brands advertiser and shopper experiences. Build monetization and optimization systems that leverage generative models to value and improve campaign performance. * Define a long-term science vision and roadmap for our Sponsored Brands advertising business, driven from our customers' needs, translating that direction into specific plans for applied scientists and engineering teams. This role combines science leadership, organizational ability, technical strength, product focus, and business understanding. * Design and conduct A/B experiments to evaluate proposed solutions based on in-depth data analyses. * Effectively communicate technical and non-technical ideas with teammates and stakeholders. * Stay up-to-date with advancements and the latest modeling techniques in the field. * Think big about the arc of development of Gen AI over a multi-year horizon and identify new opportunities to apply these technologies to solve real-world problems. #GenAI
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the limits. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. As a Data Scientist on our team, you'll analyze complex data, develop statistical methodologies, and provide critical insights that shape how we optimize our solutions. Working closely with our Applied Science team, you'll help build robust analytical frameworks to improve healthcare outcomes. This role offers a unique opportunity to impact healthcare through data-driven innovation. Key job responsibilities In this role, you will: - Analyze complex healthcare data to identify patterns, trends, and insights - Develop and validate statistical methodologies - Create and maintain analytical frameworks - Provide recommendations on data collection strategies - Collaborate with Applied Scientists to support model development efforts - Design and implement statistical analyses to validate analytical approaches - Present findings to stakeholders and contribute to scientific publications - Work with cross-functional teams to ensure solutions are built on sound statistical foundations - Design and implement causal inference analyses to understand underlying mechanisms - Develop frameworks for identifying and validating causal relationships in complex systems - Work with stakeholders to translate causal insights into actionable recommendations A day in the life You'll work with large-scale healthcare datasets, conducting sophisticated statistical analyses to generate actionable insights. You'll collaborate with Applied Scientists to validate model predictions and ensure statistical rigor in our approach. Regular interaction with product teams will help translate analytical findings into practical improvements for our services. About the team We represent Amazon's ambitious vision to solve the world's most pressing challenges. We are exploring new approaches to enhance research practices in the healthcare space, leveraging Amazon's scale and technological expertise. We operate with the agility of a startup while backed by Amazon's resources and operational excellence. We're looking for builders who are excited about working on ambitious, undefined problems and are comfortable with ambiguity.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the limits. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. As an Applied Scientist on our team, you will focus on building state-of-the-art ML models for healthcare. Our team rewards curiosity while maintaining a laser-focus in bringing products to market. Competitive candidates are responsive, flexible, and able to succeed within an open, collaborative, entrepreneurial, startup-like environment. At the forefront of both academic and applied research in this product area, you have the opportunity to work together with a diverse and talented team of scientists, engineers, and product managers and collaborate with other teams. This role offers a unique opportunity to work on projects that could fundamentally transform healthcare outcomes. Key job responsibilities In this role, you will: • Design and implement novel AI/ML solutions for complex healthcare challenges • Drive advancements in machine learning and data science • Balance theoretical knowledge with practical implementation • Work closely with customers and partners to understand their requirements • Navigate ambiguity and create clarity in early-stage product development • Collaborate with cross-functional teams while fostering innovation in a collaborative work environment to deliver impactful solutions • Establish best practices for ML experimentation, evaluation, development and deployment • Partner with leadership to define roadmap and strategic initiatives You’ll need a strong background in AI/ML, proven leadership skills, and the ability to translate complex concepts into actionable plans. You’ll also need to effectively translate research findings into practical solutions. A day in the life You will solve real-world problems by getting and analyzing large amounts of data, generate insights and opportunities, design simulations and experiments, and develop statistical and ML models. The team is driven by business needs, which requires collaboration with other Scientists, Engineers, and Product Managers across the Special Projects organization. You will prepare written and verbal presentations to share insights to audiences of varying levels of technical sophistication. About the team We represent Amazon's ambitious vision to solve the world's most pressing challenges. We are exploring new approaches to enhance research practices in the healthcare space, leveraging Amazon's scale and technological expertise. We operate with the agility of a startup while backed by Amazon's resources and operational excellence. We're looking for builders who are excited about working on ambitious, undefined problems and are comfortable with ambiguity.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the limits. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. As an Applied Scientist on our team, you will focus on building state-of-the-art ML models for healthcare. Our team rewards curiosity while maintaining a laser-focus in bringing products to market. Competitive candidates are responsive, flexible, and able to succeed within an open, collaborative, entrepreneurial, startup-like environment. At the forefront of both academic and applied research in this product area, you have the opportunity to work together with a diverse and talented team of scientists, engineers, and product managers and collaborate with other teams. This role offers a unique opportunity to work on projects that could fundamentally transform healthcare outcomes. Key job responsibilities In this role, you will: • Design and implement novel AI/ML solutions for complex healthcare challenges • Drive advancements in machine learning and data science • Balance theoretical knowledge with practical implementation • Work closely with customers and partners to understand their requirements • Navigate ambiguity and create clarity in early-stage product development • Collaborate with cross-functional teams while fostering innovation in a collaborative work environment to deliver impactful solutions • Establish best practices for ML experimentation, evaluation, development and deployment • Partner with leadership to define roadmap and strategic initiatives You’ll need a strong background in AI/ML, proven leadership skills, and the ability to translate complex concepts into actionable plans. You’ll also need to effectively translate research findings into practical solutions. A day in the life You will solve real-world problems by getting and analyzing large amounts of data, generate insights and opportunities, design simulations and experiments, and develop statistical and ML models. The team is driven by business needs, which requires collaboration with other Scientists, Engineers, and Product Managers across the Special Projects organization. You will prepare written and verbal presentations to share insights to audiences of varying levels of technical sophistication. About the team We represent Amazon's ambitious vision to solve the world's most pressing challenges. We are exploring new approaches to enhance research practices in the healthcare space, leveraging Amazon's scale and technological expertise. We operate with the agility of a startup while backed by Amazon's resources and operational excellence. We're looking for builders who are excited about working on ambitious, undefined problems and are comfortable with ambiguity.
US, WA, Seattle
The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through novel generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace ecosystem. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. Key job responsibilities As an applied scientist on our team, you will * Develop AI solutions for Sponsored Brands advertiser and shopper experiences. Build recommendation systems that leverage generative models to develop and improve campaigns. * You invent and design new solutions for scientifically-complex problem areas and/or opportunities in new business initiatives. * You drive or heavily influence the design of scientifically-complex software solutions or systems, for which you personally write significant parts of the critical scientific novelty. You take ownership of these components, providing a system-wide view and design guidance. These systems or solutions can be brand new or evolve from existing ones. * Define a long-term science vision and roadmap for our Sponsored Brands advertising business, driven from our customers' needs, translating that direction into specific plans for applied scientists and engineering teams. This role combines science leadership, organizational ability, technical strength, product focus, and business understanding. * Work closely with engineers and product managers to design, implement and launch AI solutions end-to-end; * Design and conduct A/B experiments to evaluate proposed solutions based on in-depth data analyses; * Think big about the arc of development of Gen AI over a multi-year horizon, and identify new opportunities to apply these technologies to solve real-world problems * Effectively communicate technical and non-technical ideas with teammates and stakeholders; * Translate complex scientific challenges into clear and impactful solutions for business stakeholders. * Mentor and guide junior scientists, fostering a collaborative and high-performing team culture. * Stay up-to-date with advancements and the latest modeling techniques in the field About the team The Sponsored Brands Impressions-based Offerings team is responsible for evolving the value proposition of Sponsored Brands to drive brand advertising in retail media at scale, helping brands get discovered, acquire new customers and sustainably grow customer lifetime value. We build end-to-end solutions that enable brands to drive discovery, visibility and share of voice. This includes building advertiser controls, shopper experiences, monetization strategies and optimization features. We succeed when (1) shoppers discover, engage and build affinity with brands and (2) brands can grow their business at scale with our advertising products. #GenAI
US, CA, San Diego
The Private Brands team is looking for a Sr. Research Scientist to join the team in building science solutions at scale. Our team applies Optimization, Machine Learning, Statistics, Causal Inference, and Econometrics/Economics to derive actionable insights about the complex economy of Amazon’s retail business and develop Statistical Models and Algorithms to drive strategic business decisions and improve operations. We are an interdisciplinary team of Scientists, Engineers, PMTs and Economists. Key job responsibilities You will work with business leaders, scientists, and economists to translate business and functional requirements into concrete deliverables, including the design, development, testing, and deployment of highly scalable optimization solutions and ML models. This is a unique, high visibility opportunity for someone who wants to have business impact, dive deep into large-scale problems, enable measurable actions on the consumer economy, and work closely with scientists and economists. As a Sr Scientist, you bring business and industry context to science and technology decisions. You set the standard for scientific excellence and make decisions that affect the way we build and integrate algorithms. Your solutions are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility. You tackle intrinsically hard problems, acquiring expertise as needed. You decompose complex problems into straightforward solutions. We are particularly interested in candidates with experience in Operations Research, ML and predictive models and working with distributed systems. Academic and/or practical background in Operations Research and Machine Learning specifically Reinforcement Learning are particularly relevant for this position. To know more about Amazon science, Please visit https://www.amazon.science About the team We are a one pizza, agile team of scientists focused on solving supply chain challenges for Amazon Private Brands products. We collaborate with Amazon central teams like SCOT and develop both central as well as APB-specific solutions to address various challenges, including sourcing, demand forecasting, ordering optimization, inventory distribution, and inventory health management. Working closely with business stakeholders, Product Management Teams (PMTs), and engineering partners, we drive projects from initial concept through production deployment and ongoing monitoring.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the limits. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. As an Applied Scientist on our team, you will focus on building state-of-the-art ML models for healthcare. Our team rewards curiosity while maintaining a laser-focus in bringing products to market. Competitive candidates are responsive, flexible, and able to succeed within an open, collaborative, entrepreneurial, startup-like environment. At the forefront of both academic and applied research in this product area, you have the opportunity to work together with a diverse and talented team of scientists, engineers, and product managers and collaborate with other teams. This role offers a unique opportunity to work on projects that could fundamentally transform healthcare outcomes. Key job responsibilities In this role, you will: • Design and implement novel AI/ML solutions for complex healthcare challenges • Drive advancements in machine learning and data science • Balance theoretical knowledge with practical implementation • Work closely with customers and partners to understand their requirements • Navigate ambiguity and create clarity in early-stage product development • Collaborate with cross-functional teams while fostering innovation in a collaborative work environment to deliver impactful solutions • Establish best practices for ML experimentation, evaluation, development and deployment • Partner with leadership to define roadmap and strategic initiatives You’ll need a strong background in AI/ML, proven leadership skills, and the ability to translate complex concepts into actionable plans. You’ll also need to effectively translate research findings into practical solutions. A day in the life You will solve real-world problems by getting and analyzing large amounts of data, generate insights and opportunities, design simulations and experiments, and develop statistical and ML models. The team is driven by business needs, which requires collaboration with other Scientists, Engineers, and Product Managers across the Special Projects organization. You will prepare written and verbal presentations to share insights to audiences of varying levels of technical sophistication. About the team We represent Amazon's ambitious vision to solve the world's most pressing challenges. We are exploring new approaches to enhance research practices in the healthcare space, leveraging Amazon's scale and technological expertise. We operate with the agility of a startup while backed by Amazon's resources and operational excellence. We're looking for builders who are excited about working on ambitious, undefined problems and are comfortable with ambiguity.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the limits. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. As a Senior Applied Scientist on our team, you will focus on building state-of-the-art ML models for healthcare. Our team rewards curiosity while maintaining a laser-focus in bringing products to market. Competitive candidates are responsive, flexible, and able to succeed within an open, collaborative, entrepreneurial, startup-like environment. At the forefront of both academic and applied research in this product area, you have the opportunity to work together with a diverse and talented team of scientists, engineers, and product managers and collaborate with other teams. This role offers a unique opportunity to work on projects that could fundamentally transform healthcare outcomes. Key job responsibilities In this role, you will: • Design and implement novel AI/ML solutions for complex healthcare challenges • Drive advancements in machine learning and data science • Balance theoretical knowledge with practical implementation • Work closely with customers and partners to understand their requirements • Navigate ambiguity and create clarity in early-stage product development • Collaborate with cross-functional teams while fostering innovation in a collaborative work environment to deliver impactful solutions • Establish best practices for ML experimentation, evaluation, development and deployment • Partner with leadership to define roadmap and strategic initiatives You’ll need a strong background in AI/ML, proven leadership skills, and the ability to translate complex concepts into actionable plans. You’ll also need to effectively translate research findings into practical solutions. A day in the life You will solve real-world problems by getting and analyzing large amounts of data, generate insights and opportunities, design simulations and experiments, and develop statistical and ML models. The team is driven by business needs, which requires collaboration with other Scientists, Engineers, and Product Managers across the Special Projects organization. You will prepare written and verbal presentations to share insights to audiences of varying levels of technical sophistication. About the team We represent Amazon's ambitious vision to solve the world's most pressing challenges. We are exploring new approaches to enhance research practices in the healthcare space, leveraging Amazon's scale and technological expertise. We operate with the agility of a startup while backed by Amazon's resources and operational excellence. We're looking for builders who are excited about working on ambitious, undefined problems and are comfortable with ambiguity.
CA, ON, Toronto
The RBKS AI team is responsible for innovating AI features for Ring and Blink cameras, with a mission to make our neighborhoods safer. We are working at the intersection of computer vision, generative AI (GenAI), and ambient intelligence. The team is seeking Applied Science Manager to lead initiatives that combine advanced computer vision and multimodal GenAI capabilities. This role offers a unique opportunity to lead a world-class team while shaping next-generation home security technology and advancing the field of AI algorithms and systems. The team is focused on productizing research in computer vision and GenAI into products that benefit millions of customers worldwide, such as real-time object detection, video understanding, and multimodal LLMs. We are at the forefront of developing AI solutions that seamlessly blend into our products while respecting privacy, delivering unprecedented levels of intelligent security experience. Key job responsibilities - Lead and guide a team of applied scientists in designing and developing advanced computer vision and GenAI models and algorithms for comprehensive video understanding, including but not limited to object detection, recognition and spatial understanding - Drive technical strategy and roadmap for privacy-preserving CV and GenAI models and systems, ensuring the team delivers efficient fine-tuning and on-device and in-cloud inference solutions - Partner with product and engineering leadership to translate business objectives into technical roadmaps, and ensure delivery of high-quality science artifacts that ship to products - Build and maintain strategic partnerships with science, engineering, product, and program management teams across the organization - Recruit, mentor, and develop top-tier applied science talent; provide technical and career guidance to team members while fostering a culture of innovation and excellence - Set technical direction and establish best practices for AI products/features across multiple projects and initiatives