Vision-language models that can handle multi-image inputs

Attention-based representation of multi-image inputs improves performance on downstream vision-language tasks.

Vision-language models, which map images and text to a common representational space, have demonstrated remarkable performance on a wide range of multimodal AI tasks. But they’re typically trained on text-image pairs: each text input is associated with a single image.

This limits the models’ applicability. You might, for instance, want a vision-language model to take two input images and identify differences between them, or you might want to make inferences from a 3-D fusion of ultrasound or x-ray cross sections. In the Amazon Store, multiple images are frequently associated with a single product, and you might want to execute a query that factors in several of those images.

The standard way around this limitation is to concatenate a set of images and feed them to a model as, essentially, one enormous image. But this misses an opportunity to create a richer representation — or embedding — that systematically draws on complementary information from multiple images.

Related content
Novel architectures and carefully prepared training data enable state-of-the-art performance.

At this year’s Winter Conference on Applications of Computer Vision (WACV), we presented a new method for producing an aggregated embedding of multiple images, which improves performance on several multimodal AI tasks.

We considered four methods of fusing multiple images: one computes an element-wise average of the embeddings of the individual images; one uses max pooling, which records the highest value for each image feature across all images; and the other two use neural-network attention mechanisms, one with a gate on the attention values and one without.

We tested our approach on three different tasks: product categorization, product information inference, and image captioning. As a baseline, we used a model that took concatenated images, fine-tuned on each task, and we used three metrics to measure the results: accuracy, precision, and recall.

Across the board, the model using an ungated attention mechanism outperformed the others, sometimes by a considerable margin. On the image-captioning task, for instance, it was 6.4% better than baseline, and on the product attribute inference task, its precision and recall were 6.9% and 7.9% better than baseline, respectively.

Model architecture

Vision-language models typically involve an image encoder, which produces an embedding of an input image, and a projection layer, which learns to project the image embedding into the representational space of a trained large language model (LLM).

Sometimes, a query embedding generator intervenes between the image encoder and the projection layer. The query embedding generator is trained on a combination of image embeddings and the associated image captions, so it learns linguistic representations of the image embeddings that can help the projection layer better navigate the LLM’s representational space.

Two parallel image-embedding pipelines, taking as input the same image of a white sofa with pictures hanging above it. One pipeline includes a query embedding generator and one does not.
In a typical vision-language model, an image embedding passes to a projection layer that projects the embedding into the representational space of a trained LLM. Sometimes, a query embedding generator intervenes between the image encoder and the projection layer.

We introduce a multiple-instance visual component (MIVC) that, in either architecture, receives the output of the visual encoder, creating a unified representation of multiple input images.

The two parallel pipelines from earlier, each alongside a modified version of itself in which the MIVC layer, depicted as a green bubble with an arrow extending to a green array, is added, for a total of four pipelines.
Both vision-language model architectures, with and without the addition of the multiple-instance visual component (MIVC).

Permutation-invariant attention

The visual encoder learns to recognize features of the input data — which might be low-level properties like color gradients across image patches or higher-level properties like particular shapes — and assigns each input a value along each feature dimension.

Related content
Two methods presented at CVPR achieve state-of-the-art results by imposing additional structure on the representational space.

Our first MIVC method simply averages the feature values of the input images, while max pooling selects the highest value for each feature across all the images.

The attention mechanism is fine-tuned on particular tasks and learns which features of which images are most important for those tasks. We want the representation of multiple images to be invariant to the order in which the images pass to the visual encoder, so we devised an attention mechanism whose attention values for each image feature are the result of not only that image’s embedding but the embeddings of the other images as well.

A diagram with three arrays, depicting image embeddings, at left, with arrows extending to a neural network and additional arrows extending to nodes labeled "a" (for attention) on the other side of the network. The outputs of the network also connects to all three attention nodes, so that each attention node receives two arrows.
To ensure attention-based MIVCs' invariance to image order, we devised an attention mechanism whose attention values for each image (a1 – an) depend on not only that image’s embedding but the embeddings of the other input images, too.

The gated attention mechanism is like the basic attention mechanism, except that it learns an additional sigmoid function that boosts higher attention values and reduces lower ones, in an attempt to isolate the most crucial features of the input signal. In our experiments, however, it didn’t work as well as the basic attention mechanism.

Related content
Novel “checkpointing” scheme that uses CPU memory reduces the time wasted on failure recovery by more than 92%.

Because we fine-tuned the attention mechanism on the target task, we fine-tuned the baseline model, too, to ensure fair comparison. But on the attribute inference and captioning tasks, fine-tuning actually diminished the baseline model’s performance. If we use the zero-shot concatenated-image model as the baseline, the improvements offered by our method shrink slightly: on the image-captioning task, our advantage contracts to 5.6%, and on the product attribute inference task, the advantages on precision and recall contract to 5.5% and 7%. But that’s still a significant difference.

At present, the attention mechanism applies only to the visual encoding pipeline, and it operates under the assumption that all images are independently and identically distributed. In ongoing work, we’re investigating whether cross-modal attention and incorporating correlations across images offer any further improvements.

Research areas

Related content

US, CA, San Francisco
Join Amazon's Frontier AI & Robotics team and help shape the future of intelligent robotic systems from the inside out. As a Member of Technical Staff - Firmware Engineer, Electronics, you will develop the low-level firmware that brings our in-house robotic actuators to life—writing the embedded code that bridges sophisticated hardware and the high-level AI control systems that power our next-generation robots. Your work will directly enable our robots to see, reason, and act in real-world warehouse environments, making you a critical contributor to one of the most ambitious robotics programs in the world. Key job responsibilities • Develop, test, and optimize embedded firmware for custom in-house robotic actuators, including motor control algorithms (FOC, commutation, current/torque/speed/position loops) running on microcontrollers and DSPs • Design and implement real-time firmware for actuator state estimation, fault detection, and protection logic, ensuring robust and safe operation across all actuator variants deployed in FAR's robotic systems • Collaborate with electronics engineers and motor design engineers to define firmware requirements, hardware interfaces (SPI, I2C, CAN, EtherCAT, RS-485), and actuator bring-up procedures for new hardware revisions • Develop and maintain firmware for field-oriented control (FOC) and sensored/sensorless motor commutation, including tuning current regulators, velocity controllers, and position controllers for high-performance robots • Build and maintain firmware test frameworks and hardware-in-the-loop (HIL) test environments to validate firmware behavior across actuator operating conditions, edge cases, and failure modes • Partner with controls engineers and AI researchers to ensure firmware-level interfaces support high-bandwidth, low-latency communication required by whole-body control and motion planning algorithms • Contribute to actuator firmware architecture decisions, define software-hardware interface standards, and maintain firmware documentation and version control practices to enable scalable multi-actuator development • Support rapid hardware bring-up and debugging of new actuator prototypes, leveraging oscilloscopes, logic analyzers, and custom diagnostic tools to characterize and validate firmware behavior on novel hardware A day in the life Your day is rooted in the intersection of hardware and software where you’ll be wiring firmware from scratch to control custom motors. You might start your morning reviewing firmware behavior logs from the previous night's actuator characterization runs, then spend time working alongside motor design and electronics engineers to debug a torque ripple issue in the motor control loop. In the afternoon, you could be writing and validating embedded firmware for a new actuator variant, tuning (field-oriented control) FOC algorithms, and collaborating with the controls team to ensure firmware interfaces align with high-level motion planning requirements. Beyond the bench, you'll participate in architecture reviews with hardware and software engineers, contribute to code reviews, and document firmware specifications that enable smooth hardware handoffs. You'll be working on actuator variants—each with unique power, torque, and speed requirements—and you'll be the firmware voice in cross-functional design discussions that shape how our actuators are built and controlled. The pace is fast, the problems are novel, and the impact is direct. About the team Frontier AI & Robotics (FAR) is the team at Amazon building the next generation of embodied intelligence. FAR drives the development and implementation of advanced AI models within Amazon’s operations that enable robots to see, reason, and act on the world around them, supporting a number of different warehouse automation tasks.
US, CA, San Francisco
Join our Frontier AI & Robotics team to support the hardware integration of next-generation robotic systems that will transform how robots perceive and interact with the world. You'll take ownership of hands-on hardware assembly, software integration, and system validation tasks across advanced actuators, precision sensors, and robotic subsystems — ensuring they work seamlessly together to support breakthrough AI research and real-world deployment. Key job responsibilities - Assembly, Integration & DFx — Assemble and integrate robotic hardware (actuators, sensors, vision systems, machined components). Execute assembly processes and test protocols developed with engineering. Provide DFM/DFA feedback and perform simple mechanical/electrical/software design tasks; support integration/debug and partner with engineers to optimize manufacturability and testability. - R&D Prototype Test & Validation — Validate hardware revisions, verify mechanical assemblies, power sequencing, communication interfaces, and peripherals during bring-up. - Debugging & Failure Analysis — Troubleshoot and root-cause issues across the robotic platform (power, compute, comms, actuators, sensors). Conduct failure analysis from component to system level. Reproduce critical failures, interpret schematics, and bridge communication between the lab and engineering teams. - Technical Documentation — Author and maintain runbooks, failure analysis reports, assembly guides, and troubleshooting guides; uphold consistent documentation standards across the lab. - Mechanical Design Support — Perform simple R&D design tasks and test fixture design in CAD, ensuring quality and alignment with engineering priorities. - Lab Operations Support — Support machine shop capabilities, equipment maintenance, inventory management, vendor coordination, and safety/regulatory compliance. - Test Capability Development — Develop test methodologies, design jigs/fixtures, support hardware-in-the-loop (HIL) testing, and streamline failure-to-resolution workflows. A day in the life Your focus centers on the hardware and software that powers our advanced robotic platforms. You'll execute high degree-of-freedom (DoF) robotic prototype assembly and validation, working alongside engineers and fellow technicians. Your responsibilities include building, debugging, validating prototype, performing critical component and assembly quality assessments, providing DFM/DFA feedback to engineers, and designing test jigs and fixtures. Throughout the day, you balance complex assemblies and integration testing while handling urgent prototyping requests, documentation updates, and preparation for upcoming milestones. You're switching between working at the bench, collaborating in design reviews with engineers, and ensuring lab safety and equipment maintenance. About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through frontier foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
US, CA, San Francisco
Join Amazon's Frontier AI & Robotics team as a Member of Technical Staff, this Technical Program Manager will become the driving force behind breakthrough robotics innovation. You'll orchestrate complex, cross-functional programs that bridge AI research, software, hardware, and production deployment—managing the technical workstreams that enable robots to see, reason, and act in Amazon's warehouse environments. Your program leadership will directly accelerate our mission to build the next generation of embodied intelligence. Key job responsibilities · Establish and drive program management mechanisms and cadence for complex robotics and AI development initiatives spanning research, software engineering, hardware, and operations · Manage end-to-end program execution across the full robotics stack—including AI models, software engineering, and hardware deployment · Drive decision-making velocity by facilitating tradeoff discussions when there are conflicting priorities; determine whether decisions are one-way or two-way doors · Own program-level risk management, proactively identifying technical, schedule, and resource risks; escalate where necessary and drive mitigation strategies · Manage dependencies and scope changes across internal teams and partner organizations, ensuring alignment on commitments, timelines, and technical requirements · Create transparency through clear RACI frameworks, program dashboards, and communication mechanisms that keep stakeholders aligned on status, risks, and decisions · Exercise strong technical judgment to influence program-level decisions on deployment methodology, scalability requirements, and technical feasibility—acting as the voice back to research and engineering teams · Build sustainable program management processes that scale as our organization grows, adapting agile frameworks to the unique challenges of AI robotics A day in the life Your focus centers on driving velocity and alignment across our robotics programs. You might start your morning facilitating tradeoff decisions between AI researchers and software engineers on a critical prototype milestone, then transition to managing dependencies across hardware and operations teams to keep timelines on track. In the afternoon, you could be conducting risk assessments on supply chain constraints that impact our development roadmap, updating program dashboards to provide leadership visibility, or working with partner teams to align on deployment strategies. You'll establish the mechanisms and cadence that keep our fast-moving organization synchronized—from sprint planning rituals to cross-functional design reviews. Throughout the day, you balance hands-on program execution with strategic escalation, ensuring technical decisions align with our long-term vision while removing obstacles that slow teams down. You're the connective tissue that enables researchers, engineers, and operations specialists to move fast together. About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through frontier foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
US, CA, San Francisco
We are seeking a hands-on Electrical Engineer to lead the design and integration of electrical systems or subsystems for high-degree-of-freedom robotic platforms. This role involves architecting the robot’s power distribution, sensor wiring, and embedded electrical infrastructure. You will be responsible for designing across the full electrical system for advanced robotics platforms including power distribution, sensing, compute, motor controllers, communication infrastructure, battery system and power electronics in close collaboration with mechanical, controls and software engineers. You’ll play a key role in ensuring high-performance, reliable operation of complex electromechanical systems under real-world conditions. Key job responsibilities * Electrical system architect / owner for power electronics, actuation, PCBAs, battery, ware harness specs and high speed electrical/communications protocols * Design, develop and integrate power distribution, embedded electronics, motor controllers and safety-critical circuits for complex robotic systems * Own board layout of PCBAs including SoCs, microcontrollers, sensors, power devices, etc. using Cadence OrCAD/Allegro or equivalent tools. Oversee bring-up and validation * Determine appropriate high speed electrical and communication protocols (e.g., CAN, EtherCAT, USB, etc) for reliable and efficient system operation * Specify and design custom power electronics and power distribution boards to meet performance, thermal, and safety requirements * Design and route all cabling and wire harnesses across the robotic platform, considering EMI, signal integrity, serviceability, and integration with mechanical structures * Architect and integrate the robot’s battery system, including protection circuitry, battery management, charging systems, and thermal considerations * Define and implement wiring and electrical interfaces for sensors (e.g., lidar, stereo cameras, IMUs, tactile) and compute modules * Ownership over prototyping and bringing up electrical designs and creation of test & validation rigs About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through innovative foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
US, CA, San Francisco
Join our Frontier AI & Robotics team to lead the hardware integration of next-generation robotic systems that will transform how robots perceive and interact with the world. You'll take ownership of critical hardware components, from advanced actuators to precision sensors, ensuring they work seamlessly together to support breakthrough AI research and real-world deployment. Key job responsibilities - Prototype Lab Leadership — Lead & develop a cross-functional technician team supporting robotic prototype hardware; own daily priorities, team KPIs, and risk communication to FAR leadership. Serve as the technical escalation point for the lab. - Assembly, Integration & DFx ownership — Assemble & integrate robotic hardware (actuators, sensors, vision, machined components). Build assembly processes and test protocols with hardware engineering. Drive DFM/DFA feedback and own simple mechanical/electrical design tasks, lead integration/debug, and partner with engineers to optimize manufacturability and testability. - Own R&D prototype test & validation — Validate hardware revisions, verify mechanical assemblies, power sequencing, comms interfaces, and peripherals during bring-up. - Build a strong debugging & failure analysis function — Troubleshoot & root-cause across the full robot platform (power, compute, comms, actuators, sensors); hands-on for complex issues, directing the team on routine ones. Conduct failure analysis from component to system level using oscilloscopes, logic analyzers, and multimeters; train technicians on diagnostic techniques. Reproduce critical failures, interpret schematics, and bridge communication between the lab and engineering teams. - Own lab technical documentation — Own documentation & quality - author runbooks, FA reports, assembly guides and troubleshooting guides; mentor the team to maintain consistent standards. - Own mechanical design for the lab — Own mechanical design technician output. Oversee technicians performing simple R&D design tasks and test fixture design, ensuring quality and alignment with engineering priorities. - Manage prototyping lab operations — oversee machine shop capabilities and quality, equipment/inventory, vendor coordination, and safety/regulatory compliance. - Build additional lab capabilities — develop test methodologies, design jigs/fixtures, implement HIL testing, and streamline failure-to-resolution workflows. A day in the life Your focus centers on the hardware that powers our advanced robotic platforms. You'll lead a strong robotics technician and lab engineering team to support high degrees of freedom (DoF) robotic hardware prototype assembly and validation. Your team will be responsible for building, debugging and validating prototype hardware, critical component and assembly quality assessments, providing DFM/DFA feedback to engineers and designing test jigs and test set-ups. You’ll manage responsibilities like quality inspections of incoming parts, one-on-ones with technicians, and coordinating machine shop operations. Throughout the day, you balance leading your team through complex assemblies and integration testing while also handling urgent prototyping requests, documentation updates, and planning for upcoming milestones. You're switching between working at the bench alongside your technicians, collaborating in design reviews with engineers, and ensuring lab safety and equipment maintenance. About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through frontier foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
US, CA, San Francisco
Join Amazon's Frontier AI & Robotics team and take ownership of the electronics that make our robots move. As a Member of Technical Staff - Electronics Engineer, Actuators & Drives, you will conceptualize, design, and test the motor drive electronics that power our in-house robotic actuators—from the gate drivers and power stages that command motor current to the sensing circuits and communication interfaces that give our robots proprioceptive awareness. Your printed circuit board (PCB) designs will live inside each of our next-generation robotic systems, directly enabling the embodied intelligence that is central to FAR's mission. Key job responsibilities • Conceptualize, design, and validate motor drive electronics for in-house robotic actuators, including inverter power stages, gate driver circuits, current and position sensing, and power management subsystems from concept through prototype and production • Lead PCB-level design of compact, high-power-density motor drive boards, including schematic capture, component selection, and collaboration with PCB layout engineers to achieve signal integrity, thermal, and EMC requirements in constrained actuator form factors • Characterize and optimize inverter switching performance, efficiency, and thermal behavior across the full operating envelope of FAR's actuator variants, using bench measurements and simulation to guide design decisions • Define and implement current sensing architectures (shunt-based, Hall-effect, or integrated IC-based) and position/velocity sensing interfaces (encoder, resolver, Hall sensor) to support high-bandwidth FOC firmware on microcontrollers and DSPs • Partner with firmware engineers to define hardware-software interfaces for motor drive control loops, fault detection logic, and communication protocols (CAN, EtherCAT, SPI), ensuring electronics designs support the real-time control requirements of robotic actuation • Collaborate with motor design and mechanical engineers to specify the electrical characteristics of custom BLDC and PMSM motors, align inverter design to motor parameters, and validate the integrated actuator electro-mechanical system • Lead hardware bring-up, functional testing, and failure analysis for new actuator electronics prototypes, developing test plans and characterization setups that systematically validate design performance and identify failure modes • Define electronics design standards, review processes, and design-for-manufacturability (DFM) guidelines for FAR's actuator drive portfolio, and mentor junior engineers in motor drive electronics design best practices A day in the life Your day centers on the full electronics development cycle for our custom actuator drive systems. You might start by reviewing simulation results for a new inverter topology, then transition to the lab to characterize switching losses and thermal performance on a prototype motor drive board. Later in the day, you could be collaborating with motor design engineers on back-EMF waveform analysis, refining gate drive timing to optimize inverter efficiency, or working with firmware engineers to define current sensing interfaces and hardware abstraction layers. Across the week, you'll be involved in schematic capture and PCB layout reviews with your design team, participating in design review gates, and iterating on hardware based on test findings. You'll navigate the challenge of fitting high-performance drive electronics into compact, thermally constrained actuator packages—designing for the power density, reliability, and robustness our robots demand. Your work will span from concept and architecture through silicon bring-up, and you'll play a key role in defining the electronics roadmap for FAR's actuator portfolio. About the team Frontier AI & Robotics (FAR) is the team at Amazon building the next generation of embodied intelligence. FAR drives the development and implementation of advanced AI models within Amazon’s operations that enable robots to see, reason, and act on the world around them, supporting a number of different warehouse automation tasks.
US, CA, San Francisco
About the Role: We are looking for a Member of Technical Staff - Mechanical Engineer with a passion for building complex robotic systems from the ground up. This role is ideal for someone with a deep understanding of structural and electromechanical design, who thrives in hands-on environments and has experience taking high-performance robots from concept to production. You will work on the mechanical and system architecture of advanced robotics platforms, including high degree-of-freedom systems, where considerations such as actuator selection, thermal constraints, cabling, sensing integration, and manufacturability are critical. This is a cross-disciplinary role requiring close collaboration with electrical, software, and AI research teams. Beyond day-to-day hardware development, this role also provides exciting avenues to contribute to innovative research projects. Whether you’re interested in mechatronics, sensor integration, or novel actuation methods, you’ll find opportunities to explore your research interests while building real-world systems that advance in the field of high degree-of-freedom robotics. What You Bring: * A systems-thinking mindset with a strong grasp of cross-domain engineering tradeoffs. * A bias toward action: comfortable building, testing, and iterating rapidly. * A collaborative and communicative working style — especially in multi-disciplinary research environments. * A passion for robotics and advancing the state of the art in intelligent, capable machines. Key job responsibilities * Lead mechanical design of robotic subsystems and full platforms, including structures, joints, enclosures, and mechanisms for a research environment. * Own kinematic, dynamic, and structural analyses to guide the design and optimization of full systems and subsystems of high-DoF robots * Specify and integrate actuators and motors for high-torque density applications in high-degree-of-freedom systems. * Contribute to thermal management strategies for motors, sensors, and embedded compute hardware. * Integrate sensors such as lidar, stereo cameras, IMUs, tactile sensors, and compute modules into compact, functional assemblies. * Design and route cabling and wire harnesses, ensuring reliability, serviceability, and thermal/electrical integrity. * Prototype and test mechanical systems; support hands-on builds, debug sessions, and field testing. * Conduct root cause analysis on system-level failures or performance issues and implement design improvements. * Apply Design for Manufacturing (DFM) and Design for Assembly (DFA) principles to transition prototypes into scalable builds (10s–100s of units). * Collaborate with cross-functional teams in electrical engineering, controls, perception, and research to meet research and product goals. About the team Frontier AI & Robotics (FAR) is the team at Amazon building the next generation of embodied intelligence. FAR drives the development and implementation of advanced AI models within Amazon’s operations that enable robots to see, reason, and act on the world around them, supporting a number of different warehouse automation tasks.
US, MA, N.reading
Amazon is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic dexterous manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. At Amazon we leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at an unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. Key job responsibilities - Design and implement whole body control methods for balance, locomotion, and dexterous manipulation - Utilize state-of-the-art in methods in learned and model-based control - Create robust and safe behaviors for different terrains and tasks - Implement real-time controllers with stability guarantees - Collaborate effectively with multi-disciplinary teams to co-design hardware and algorithms for loco-manipulation - Mentor junior engineer and scientists
US, CA, San Francisco
Amazon is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. As an Applied Scientist, you will develop and improve machine learning systems that help robots perceive, reason, and act in real-world environments. You will leverage state-of-the-art models (open source and internal research), evaluate them on representative tasks, and adapt/optimize them to meet robustness, safety, and performance needs. You will invent new algorithms where gaps exist. You’ll collaborate closely with research, controls, hardware, and product-facing teams, and your outputs will be used by downstream teams to further customize and deploy on specific robot embodiments. Key job responsibilities As an Applied Scientist in the Foundations Model team, you will: - Leverage state-of-the-art models for targeted tasks, environments, and robot embodiments through fine-tuning and optimization. - Execute rapid, rigorous experimentation with reproducible results and solid engineering practices, closing the gap between sim and real environments. - Build and run capability evaluations/benchmarks to clearly profile performance, generalization, and failure modes. - Contribute to the data and training workflow: collection/curation, dataset quality/provenance, and repeatable training recipes. - Write clean, maintainable, well commented and documented code, contribute to training infrastructure, create tools for model evaluation and testing, and implement necessary APIs - Stay current with latest developments in foundation models and robotics, assist in literature reviews and research documentation, prepare technical reports and presentations, and contribute to research discussions and brainstorming sessions. - Work closely with senior scientists, engineers, and leaders across multiple teams, participate in knowledge sharing, support integration efforts with robotics hardware teams, and help document best practices and methodologies. About the team We leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. We are pioneering the development of robotics foundation models that: - Enable unprecedented generalization across diverse tasks - Integrate multi-modal learning capabilities (visual, tactile, linguistic) - Accelerate skill acquisition through demonstration learning - Enhance robotic perception and environmental understanding - Streamline development processes through reusable capabilities
US, CA, San Francisco
Amazon is seeking an exceptional Sr. Applied Scientist to lead the development of perception systems that harness the power of radar and thermal imaging — enabling robots to perceive and operate reliably in conditions where conventional vision alone falls short. In this role, you will develop ML-driven perception pipelines for non-traditional sensing modalities, pushing the boundaries of what robots can see, understand, and act upon in challenging real-world environments. At Amazon, we leverage advanced robotics, machine learning, and artificial intelligence to solve some of the most complex operational challenges at a scale unlike anywhere else in the world. Our fleet of robots spans hundreds of facilities globally, working in sophisticated coordination to deliver on our promise of customer excellence. As a Sr. Applied Scientist in Multi-Modal Perception, you will apply deep computer vision expertise alongside classical signal processing techniques for radar and thermal imaging — modalities that provide robustness in adverse conditions and sensing capability beyond the visible spectrum. You will develop ML-based methods to extract semantic and geometric information from radar point clouds, radar tensors, and thermal imagery, and fuse these with camera and depth data to build perception systems that are reliable, comprehensive, and ready for deployment at scale. Your work will unlock new capabilities for our robots — enabling reliable detection, classification, and scene understanding in low-visibility conditions, cluttered environments, and scenarios where traditional RGB-based perception is insufficient. You will lead research that translates cutting-edge advances in deep learning and computer vision to these underexplored but high-impact sensing modalities. Join us in building the next generation of multi-modal perception systems that will define the future of autonomous robotics at scale. Key job responsibilities - Lead the research, design, and development of ML-based perception pipelines for radar and thermal/infrared imaging modalities - Develop deep learning models for object detection, classification, segmentation, and tracking using radar data (point clouds, range-Doppler maps, radar tensors) and thermal imagery - Design and implement multi-modal fusion architectures that combine radar, thermal, camera, and depth data for robust, all-condition perception - Develop novel representations and feature extraction methods tailored to the unique characteristics of radar and thermal sensors (sparsity, noise profiles, spectral properties) - Build end-to-end perception systems — from raw sensor data processing and calibration to model training, evaluation, and real-time deployment - Collaborate closely with Hardware, Navigation, Planning, and Controls teams to define sensor configurations and deliver integrated autonomy solutions - Establish benchmarks, datasets, and evaluation frameworks for radar and thermal perception - Mentor scientists and engineers; foster a culture of scientific rigor, innovation, and high-impact delivery - Publish research findings in top-tier venues (CVPR, ICCV, ECCV, ICRA, NeurIPS, etc.) and contribute to patents A day in the life - Train ML models for deployment in simulation and real-world robots, identify and document their limitations post-deployment - Drive technical discussions within your team and with key stakeholders to develop innovative solutions to address identified limitations - Actively contribute to brainstorming sessions on adjacent topics, bringing fresh perspectives that help peers grow and succeed — and in doing so, build lasting trust across the team - Mentor team members while maintaining significant hands-on contribution to technical solutions About the team Our team is a diverse group of scientists and engineers passionate about building intelligent machines. We value curiosity, rigor, and a bias for action. We believe in learning from failure and iterating quickly toward solutions that matter.