Fitzgerald keynote.png
Amazon senior applied scientist Jack FitzGerald, delivering a keynote talk at the joint Language Intelligence @ Work and SEMANTiCS conference in Vienna, Austria.

Scaling multilingual virtual assistants to 1,000 languages

Self-supervised training, distributed training, and knowledge distillation have delivered remarkable results, but they’re just the tip of the iceberg.

Yesterday at the joint Language Intelligence @ Work and SEMANTiCS conference in Vienna, Austria, Amazon senior applied scientist Jack FitzGerald delivered a keynote talk on multilingual virtual assistants and the path toward a massively multilingual future. This is an edited version of his talk.

The evolution of human-computer interaction paradigms

In the past 50 years, computing technology has progressed from text-based terminal inputs, to graphical user interfaces, to predominantly web-based applications, through the mobile era, and finally into the era of a voice user interface and ambient computing.

Interface timeline.png
A brief history of computing interfaces.

Each of these paradigms has its own challenges with respect to multilingualism, whether it was the migration from ASCII to Unicode or proper character rendering on a website. However, I would argue that a voice AI system is the most difficult paradigm yet with respect to massive multilingualism.

The first reason is that the input space for voice interface commands is unbounded: the user can phrase each command in hundreds of different ways, all of which are valid. Another reason is that even within a single language, there can be many different dialects and accents.

Related content
Amazon Visiting Academic Barbara Poblete helps to build safer, more-diverse online communities — and to aid disaster response.

Most important, the coupling between language and culture is inescapable. Whether it’s the level of formality used, preferred activities, or religious differences, there isn’t a one-size-fits-all solution. Instead, we must adapt the virtual assistant to understand cultural context and say only things that are appropriate for a given locale.

Voice AI systems today

A typical voice AI system includes automatic-speech-recognition models, which convert raw audio into text; natural-language understanding models, which determine the user’s intent and recognize named entities; a central service for arbitration and dialogue management, which routes commands to the proper services or skills; and finally, a text-to-speech model, which issues the output. Additional tasks might include expansion of the underlying knowledge graph and semantic parsing, localization of touch screen content, or local information services.

Alexa overview.png
An overview of Alexa’s design.

Let’s look at some of the operational considerations for supporting multiple languages in such models. One is the training data: they must be topically exhaustive, meaning that they cover the full spectrum of possible user utterances, and they must be culturally exhaustive — for instance, covering all of the holidays a user might celebrate. They must also remain up-to-date, and it’s not always easy to add something new to the model without regression on existing functionalities.

A second consideration is in-house testing. Though in many cases one can get away with synthetic or otherwise artificial data for model training, for testing it’s important to have realistic utterances. Those typically need to come from humans, and collecting them can be a major expense. It’s also useful to perform live, interactive testing, which requires people who can speak and understand each language that the system supports.

Related content
New approach corrects for cases when average improvements are accompanied by specific regressions.

Finally, it’s important to have the ability to support users and process their feedback. In most cases, this again requires staff who understand each of the supported languages.

Ultimately, human-based processes are not very scalable if our goal is to support thousands of languages. Instead, we must turn to technology to the greatest extent possible.

Multilingual modeling today

One of the leading reasons for the current success of multilingual text models is self-supervision.

In traditional supervised learning, a model would be trained from scratch on the desired task. If we wanted a model that would classify the sentiment of a product review, for example, we would manually annotate a bunch of product reviews, and we would use that dataset to train the model.

Today, however, we make use of transfer learning, in which text models are pretrained on terabytes of text data that don’t require manual annotation. Instead, the training procedure leverages the structure inherent to the text itself.

Self-supervision signals.png
Self-supervised-training objectives.

We’ll call this self-supervised pretraining With the masked-language-modeling training objective, for instance, the model is fed the input “for [MASK] out loud!”, and it must predict that “[MASK]” should be filled with the word “crying”. Other objectives, such as causal language modeling, span filling, deshuffling, and denoising can also be used.

Because the datasets required for self-supervised pretraining are unlabeled and monolingual, we can leverage troves of data, such as Common Crawl web scrapes, every Wikipedia page in existence, thousands of books and news articles, and more. Couple these large datasets with highly parallelizable architectures such as transformers, which can be trained on over a thousand GPUs with near linear scaling, and we can build models with tens or hundreds of billions of dense parameters. Such has been the focus for many people in the field for the past few years, including the Alexa Teacher Model team.

One incredible consequence of the transfer learning paradigm is called zero-shot learning. In the context of multilingual modeling, it works like this: the modeler begins by pretraining the model on some set of languages, using self-supervision. As an example, suppose that the modeler trains a model on English, French, and Japanese using every Wikipedia article in those three languages.

Related content
New end-to-end approach to zero-shot video classification dramatically outperforms predecessors.

The next step is to adapt the model to a particular task using labeled data. Suppose that the modeler has a labeled dataset for intent classification, but only in English. The modeler can go ahead and fine-tune the model on the English data, then run it on the remaining languages.

Despite the fact that the model was never trained to do intent classification with French or Japanese data, it can still classify intents in those languages, by leveraging what it learned about those languages during pretraining. Given that the acquisition of labeled data is often a bottleneck, this property of language models is highly valuable for language expansion. Of course, zero-shot learning is just the extreme end of a continuum: transfer learning helps even out performance when the labeled data in different languages is imbalanced.

Zero-shot multilingual.png
Zero-shot learning for multilingual adaptation.

The next step up the data efficiency ladder is performing tasks without any additional training or fine tuning, using only a couple of labeled records or none at all. This is possible through “in-context learning,” which was popularized in the GPT-3 paper.

To perform in-context learning, simply take a pretrained model and feed it the appropriate prompts. Think of a prompt is a hint to the model about the task it should perform. Suppose that we want the model to summarize a passage. We might prefix the passage with the word “Passage” and a colon and follow it with the word “Summary” and a colon. The model would then generate a summary of the passage.

Related content
In the past few years, advances in artificial intelligence have captured our imaginations and led to the widespread use of voice services on our phones and in our homes.

This is the zero-shot in-context learning case, meaning that no fine-tuning is performed, and no labeled data are needed. To improve task performance, we can feed a few examples to the model before asking it to perform the task. Though this does require some labeled data, the amount is small, usually in the tens of examples only.

Our Alexa Teacher Model team recently trained and tested a 20-billion-parameter sequence-to-sequence model that was multilingual and showed nice performance for in-context learning. For example, we showed state-of-the-art performance on machine translation with in-context learning. The model can achieve competitive BLEU scores even for some low-resource languages, which is incredible given that no parallel data was used during pretraining, and no labeled data besides a single example was used at any step in the process.

We were particularly proud of the relatively small size of this model, which could compete with much larger models because it was trained on more data. (The Chinchilla model from OpenAI showed a similar result.) Though a large model trained on a smaller dataset and a smaller model trained on a larger dataset may use the same total compute at training time, the smaller model will require less compute and memory during inference, which is a key factor in real applications.

Given that models demonstrate multilingual understanding even without labeled data or parallel data, you may be wondering what’s happening inside of the model. Since the days of word2vec and earlier, we’ve represented characters, words, sentences, documents, and other inputs as vectors of floats, also known as embeddings, hidden states, and representations. Concepts cluster in certain areas of the representational space.

Related content
Training a product discovery system on many languages at once improves performance in all of them.

As humans, we can think only in three dimensions, whereas these representations are high-dimensional, but you can visualize this clustering in two or three dimensions as a reductive approximation. All the languages the model supports would cluster the concept of sitting in a chair in one region of the representational space; the concept of the ocean would inhabit a different cluster; and so forth.

Indeed, Pires et al. have shown that synonymous words across languages cluster together in the mBERT model. When examining 5,000 sentence pairs from the WMT16 dataset, they found that, given a sentence and its embedding in one language, the correct translation from another language is the closest embedding to the source embedding up to 75% of the time.

This manner of clustering can also be manipulated by changing the objective function. In their work on speech-to-text-modeling, Adams et al., from Johns Hopkins, were seeing undesirable clustering by language, rather than by phonemes, in the representational space. They were able to correct by adding training objectives around phoneme prediction and language identification.

The Alexa Teacher Model distillation pipeline

Once we have multilingual models, how do we adapt them to a real system? At the recent KDD conference, we presented a paper describing the Alexa Teacher Model pipeline, consisting of the following steps.

First, a multilingual model with billions of parameters is trained on up to a trillion tokens taken from Common Crawl web scrapes, Wikipedia articles, and more. Second, the models are further trained on in-domain, unlabeled data from a real system. Third, the model is distilled into smaller sizes that can be used in production. The final models can then be fine-tuned using labeled data and deployed.

ATM pipeline.png
The Alexa Teacher Model (AlexaTM) pipeline. The Alexa Teacher Model is trained on a large set of GPUs (left), then distilled into smaller variants (center), whose size depends on their uses. The end user adapts a distilled model to its particular use by fine-tuning it on in-domain data (right).

In tests, we found that our model was more accurate than a publicly available pretrained model fine-tuned on labeled data, and it significantly reduced customer dissatisfaction relative to a model trained by a smaller teacher model (85 million parameters, say, instead of billions). In short, we’ve verified that we can leverage the additional learning capacity of large, multilingual models for production systems requiring low latency and low memory consumption.

Scaling to 1,000 languages

I mentioned the fascinating ability of language models to learn joint representations of multiple languages without labeled or parallel data. This ability is crucial for us to scale to many languages. However, as we scale, we need test data that we can trust so that we can evaluate our progress.

Related content
MASSIVE dataset and Massively Multilingual NLU (MMNLU-22) competition and workshop will help researchers scale natural-language-understanding technology to every language on Earth.

Toward this end, my team at Amazon recently released a new benchmark for multilingual natural-language understanding called MASSIVE, which is composed of one million labeled records spanning 51 languages, 18 domains, 60 intents, and 55 slots. All of the data were created by native speakers of the languages. We also released a GitHub repository with code that can be used as a baseline for creating multilingual NLU models, as well as leaderboards on eval.ai.

Now, you may retort that 51 languages is still a long ways from 1,000 languages. This is true, but we purposefully chose our languages in order to maximize typological diversity while staying within our budget. Our languages span 29 language genera, 14 language families, and 21 distinct scripts or alphabets. The diversity of the chosen languages allows a modeler to test technology that should scale to many more languages within each represented genus, family, and script.

That said, we certainly have some major gaps in language coverage, including across native North and South American languages, African languages, and Australian languages. Yet we are optimistic that our fellow researchers across the field will continue to produce new labeled benchmark datasets for the world’s thousands of low-resource languages.

Massive languages.cropped.png
The 51 languages of MASSIVE, including scripts and genera.

Another difficulty with our current modeling approaches is that they rely on data sources such as web scrapes, encyclopedic articles, and news articles, which are highly skewed toward a small set of languages. Wang, Ruder, and Neubig recently presented some fascinating work leveraging bilingual lexicons — corpora consisting of word-level translations — to improve language model performance for low-resource languages. Lexicons cover a far greater portion of the world’s languages than our typical data sources for language modeling, making this an exciting approach.

Related content
Self-learning system uses customers’ rephrased requests as implicit error signals.

Researchers, missionaries, and businesspeople have been created fundamental linguistic resources for decades, from Bible translations to the Unimorph corpus. The Unimorph datasets are used for the SIGMORPHON shared task, in which a model must predict the correct formulation of word given that word’s root and certain morphological transformations, such as part of speech, tense, and person. We must find more ways to leverage such resources when creating massively multilingual voice AI systems.

As a final technique for scaling to many more languages, we can consider what we in Alexa call “self-learning.” Some of my Alexa colleagues published a paper showing that we can mine past utterances to improve overall system performance. For example, if a user rephrases a request as part of a multiturn interaction, as shown on the left in the figure below, or if different users provide variations for the same desired goal, as shown on the right, then we can make soft assumptions that the different formulations are synonymous.

All of these cases can be statistically aggregated to form new training sets to update the system, without the need to manually annotate utterances. In a multilingual system, such technology is particularly valuable after the initial launch of a language, both to improve performance generally and to adapt to changes in the lexicon.

Self-learning.png
Alexa’s self-learning mechanism.

The road ahead

I hope that you share my wonder at the current state of the art — the scale of language-model training, the magic of zero-shot learning, and the distillation of knowledge into compact models that can run in latency-sensitive systems. All of this is incredible, but we’ve only scratched the surface of supporting the world’s 7,000 languages.

To move into the next era of massive multilingualism, we must build new and increasingly powerful models that can take advantage of low-cost data, particularly unlabeled monolingual data. We must also build models that can leverage existing and upcoming linguistic resources, such as bilingual lexicons and morphological-transformation databases. And finally, we must expand available language resources across more languages and domains, including more unlabeled monolingual corpora, more parallel resources, and more realistic, labeled, task-specific datasets.

Increased multilingualism is a win for all people everywhere. Each language provides a unique perspective on the world in which we live. A rich plurality of perspectives leads to a deeper understanding of our fellow people and of all creation.

Keep building.

Research areas

Related content

US, WA, Seattle
The Sponsored Products and Brands (SPB) team at Amazon Ads is re-imagining the advertising landscape through state-of-the-art generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. Curious about our advertising solutions? Discover more about Sponsored Products and Sponsored Brands to see how we’re helping businesses grow on Amazon.com and beyond! Key job responsibilities This role will redesign how ads create personalized, relevant shopping experiences with customer value at the forefront. Key responsibilities include: - Design and develop solutions using GenAI, deep learning, multi-objective optimization and/or reinforcement learning to transform ad retrieval, auctions, whole-page relevance, and shopping experiences. - Partner with scientists, engineers, and product managers to build scalable, production-ready science solutions. - Apply industry advances in GenAI, Large Language Models (LLMs), and related fields to create innovative prototypes and concepts. - Improve the team's scientific and technical capabilities by implementing algorithms, methodologies, and infrastructure that enable rapid experimentation and scaling. - Mentor junior scientists and engineers to build a high-performing, collaborative team. A day in the life As an Applied Scientist on the Sponsored Products and Brands Off-Search team, you will contribute to the development in Generative AI (GenAI) and Large Language Models (LLMs) to revolutionize our advertising flow, backend optimization, and frontend shopping experiences. This is a rare opportunity to redefine how ads are retrieved, allocated, and/or experienced—elevating them into personalized, contextually aware, and inspiring components of the customer journey. You will have the opportunity to fundamentally transform areas such as ad retrieval, ad allocation, whole-page relevance, and differentiated recommendations through the lens of GenAI. By building novel generative models grounded in both Amazon’s rich data and the world’s collective knowledge, your work will shape how customers engage with ads, discover products, and make purchasing decisions. If you are passionate about applying frontier AI to real-world problems with massive scale and impact, this is your opportunity to define the next chapter of advertising science. About the team The Off-Search team within Sponsored Products and Brands (SPB) is focused on building delightful ad experiences across various surfaces beyond Search on Amazon—such as product detail pages, the homepage, and store-in-store pages—to drive monetization. Our vision is to deliver highly personalized, context-aware advertising that adapts to individual shopper preferences, scales across diverse page types, remains relevant to seasonal and event-driven moments, and integrates seamlessly with organic recommendations such as new arrivals, basket-building content, and fast-delivery options. To execute this vision, we work in close partnership with Amazon Stores stakeholders to lead the expansion and growth of advertising across Amazon-owned and -operated pages beyond Search. We operate full stack—from backend ads-retail edge services, ads retrieval, and ad auctions to shopper-facing experiences—all designed to deliver meaningful value.
US, CA, Palo Alto
The Sponsored Products and Brands (SPB) team at Amazon Ads is re-imagining the advertising landscape through state-of-the-art generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. Curious about our advertising solutions? Discover more about Sponsored Products and Sponsored Brands to see how we’re helping businesses grow on Amazon.com and beyond! Key job responsibilities This role will be pivotal in redesigning how ads contribute to a personalized, relevant, and inspirational shopping experience, with the customer value proposition at the forefront. Key responsibilities include, but are not limited to: - Contribute to the design and development of GenAI, deep learning, multi-objective optimization and/or reinforcement learning empowered solutions to transform ad retrieval, auctions, whole-page relevance, and/or bespoke shopping experiences. - Collaborate cross-functionally with other scientists, engineers, and product managers to bring scalable, production-ready science solutions to life. - Stay abreast of industry trends in GenAI, LLMs, and related disciplines, bringing fresh and innovative concepts, ideas, and prototypes to the organization. - Contribute to the enhancement of team’s scientific and technical rigor by identifying and implementing best-in-class algorithms, methodologies, and infrastructure that enable rapid experimentation and scaling. - Mentor and grow junior scientists and engineers, cultivating a high-performing, collaborative, and intellectually curious team. A day in the life As an Applied Scientist on the Sponsored Products and Brands Off-Search team, you will contribute to the development in Generative AI (GenAI) and Large Language Models (LLMs) to revolutionize our advertising flow, backend optimization, and frontend shopping experiences. This is a rare opportunity to redefine how ads are retrieved, allocated, and/or experienced—elevating them into personalized, contextually aware, and inspiring components of the customer journey. You will have the opportunity to fundamentally transform areas such as ad retrieval, ad allocation, whole-page relevance, and differentiated recommendations through the lens of GenAI. By building novel generative models grounded in both Amazon’s rich data and the world’s collective knowledge, your work will shape how customers engage with ads, discover products, and make purchasing decisions. If you are passionate about applying frontier AI to real-world problems with massive scale and impact, this is your opportunity to define the next chapter of advertising science. About the team The Off-Search team within Sponsored Products and Brands (SPB) is focused on building delightful ad experiences across various surfaces beyond Search on Amazon—such as product detail pages, the homepage, and store-in-store pages—to drive monetization. Our vision is to deliver highly personalized, context-aware advertising that adapts to individual shopper preferences, scales across diverse page types, remains relevant to seasonal and event-driven moments, and integrates seamlessly with organic recommendations such as new arrivals, basket-building content, and fast-delivery options. To execute this vision, we work in close partnership with Amazon Stores stakeholders to lead the expansion and growth of advertising across Amazon-owned and -operated pages beyond Search. We operate full stack—from backend ads-retail edge services, ads retrieval, and ad auctions to shopper-facing experiences—all designed to deliver meaningful value.
US, CA, Sunnyvale
Industrial Robotics Group is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine innovative AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. We leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. We are pioneering the development of robotics foundation models that: - Enable unprecedented generalization across diverse tasks - Integrate multi-modal learning capabilities (visual, tactile, linguistic) - Accelerate skill acquisition through demonstration learning - Enhance robotic perception and environmental understanding - Streamline development processes through reusable capabilities The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. As an Applied Scientist, you will develop and improve machine learning systems that help robots perceive, reason, and act in real-world environments. You will leverage state-of-the-art models (open source and internal research), evaluate them on representative tasks, and adapt/optimize them to meet robustness, safety, and performance needs. You will invent new algorithms where gaps exist. You’ll collaborate closely with research, controls, hardware, and product-facing teams, and your outputs will be used by downstream teams to further customize and deploy on specific robot embodiments. Key job responsibilities As an Applied Scientist in the Foundations Model team, you will: - Leverage state-of-the-art models for targeted tasks, environments, and robot embodiments through fine-tuning and optimization. - Execute rapid, rigorous experimentation with reproducible results and solid engineering practices, closing the gap between sim and real environments. - Build and run capability evaluations/benchmarks to clearly profile performance, generalization, and failure modes. - Contribute to the data and training workflow: collection/curation, dataset quality/provenance, and repeatable training recipes. - Write clean, maintainable, well commented and documented code, contribute to training infrastructure, create tools for model evaluation and testing, and implement necessary APIs - Stay current with latest developments in foundation models and robotics, assist in literature reviews and research documentation, prepare technical reports and presentations, and contribute to research discussions and brainstorming sessions. - Work closely with senior scientists, engineers, and leaders across multiple teams, participate in knowledge sharing, support integration efforts with robotics hardware teams, and help document best practices and methodologies.
IN, KA, Bengaluru
Alexa+ is the world’s best Generative AI powered personal assistant / agent for consumers. We are seeking an experienced Applied Science Manager to build and lead a new team of scientists in India dedicated to Alexa Conversational Ads and Personalization. As the leader of this team, you will shape both the scientific roadmap and the product strategy, working closely with global product stakeholders to ensure your team is delivering high-impact, scalable solutions. Key job responsibilities - Hire, develop, and mentor a high-performing team of applied scientists. - Partner with product management and engineering leadership to define the mid-to-long-term scientific roadmap for conversational ads and personalization. - Manage the execution of complex ML projects, ensuring rigorous experimental design, high modeling standards, and on-time delivery. - Bridge the gap between science, engineering, and product, translating business metrics into scientific goals and vice versa. - Establish best practices for ML lifecycle management, code quality, and technical documentation within the team.
IN, KA, Bengaluru
Alexa+ is the world’s best Generative AI powered personal assistant / agent for consumers. We are looking for a Senior Applied Scientist to provide technical leadership for our Alexa Conversational Ads and Personalization initiatives. You will be responsible for tackling our most ambiguous scientific challenges, setting the technical architecture for new ML systems, and pushing the boundaries of what is possible in voice-based advertising. Key job responsibilities - Define the scientific vision and lead the technical execution for complex, multi-quarter ML projects in conversational ads and personalization. - Architect end-to-end machine learning systems that operate at Alexa's massive scale. - Mentor and guide junior scientists on modeling techniques, experimental design, and best practices. - Partner closely with product and engineering stakeholders to translate ambiguous business requirements into rigorous scientific problem statements. - Contribute to the broader scientific community through internal technical papers and external publications.
IN, KA, Bengaluru
Alexa+ is the world’s best Generative AI powered personal assistant / agent for consumers. We are seeking an Applied Scientist to join our newly expanding team in India focused on Alexa Conversational Ads and Personalization. In this role, you will build machine learning models that seamlessly and naturally integrate relevant advertising into the Alexa experience while deeply personalizing user interactions. You will work closely with other scientists, engineers, and product managers to take models from conception to production. Key job responsibilities - Design, develop, and evaluate innovative machine learning and deep learning models for natural language processing (NLP), recommendation systems, and personalization. - Conduct hands-on data analysis and build scalable ML pipelines. - Design and run A/B experiments to measure the impact of new models on customer experience and ad performance. - Collaborate with software development engineers to deploy models into high-scale, real-time production environments.
US, CA, San Francisco
The Amazon Center for Quantum Computing (CQC) is a multi-disciplinary team of scientists, engineers, and technicians, all working to innovate in quantum computing for the benefit of our customers. We are looking to hire an Applied Scientist to design and model novel superconducting quantum devices (including qubits), readout and control schemes, and advanced quantum processors. The ideal candidate will have a track record of original scientific contributions, strong engineering principles, and/or software development experience. Resourcefulness, as well as strong organizational and communication skills, is essential. About the team About the team The Amazon Center for Quantum Computing (CQC) is a multi-disciplinary team of scientists, engineers, and technicians, on a mission to develop a fault-tolerant quantum computer. Inclusive Team Culture Here at Amazon, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness. Diverse Experiences Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Export Control Requirement Due to applicable export control laws and regulations, candidates must be either a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum, or be able to obtain a US export license. If you are unsure if you meet these requirements, please apply and Amazon will review your application for eligibility. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be either a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum, or be able to obtain a U.S export license. If you are unsure if you meet these requirements, please apply and Amazon will review your application for eligibility.
US, CA, Sunnyvale
Amazon Industrial Robotics Group is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine innovative AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. We leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. We are pioneering the development of robotics foundation models that: - Enable unprecedented generalization across diverse tasks - Integrate multi-modal learning capabilities (visual, tactile, linguistic) - Accelerate skill acquisition through demonstration learning - Enhance robotic perception and environmental understanding - Streamline development processes through reusable capabilities The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. As a Senior Applied Scientist, you will lead the development of machine learning systems that help robots perceive, reason, and act in real-world environments. You will set technical direction for adapting and advancing state-of-the-art models (open source and internal research) into robust, safe, and high-performing “robot brain” capabilities for our target tasks, environments, and robot embodiments. You will drive rigorous capability profiling and experimentation, lead targeted innovation where gaps exist, and partner across research, controls, hardware, and product teams to ensure outputs can be further customized and deployed on specific robots. Key job responsibilities - Lead technical initiatives for foundation-model capabilities (e.g., visuomotor / VLA / video-action worldmodel-action policies), from problem definition through validated model deliverables. - Own model readiness for our embodiment class: drive adaptation, fine-tuning, and optimization (latency/throughput/robustness), and define success criteria that downstream teams can build on. - Establish and evolve capability evaluation: define benchmark strategy, metrics, and profiling methodology to quantify performance, generalization, and failure modes; ensure evaluations drive clear roadmap decisions. - Drive the data + training strategy needed to close key capability gaps, including data requirements, collection/curation standards, dataset quality/provenance, and repeatable training recipes (sim + real). - Invent and validate new methods when leveraging SOTA is insufficient—new training schemes, model components, supervision signals, or sim↔real techniques—backed by strong empirical evidence. - Influence cross-team technical decisions by collaborating with controls/WBC, hardware, and product teams on interfaces, constraints, and integration plans; communicate results via design docs and technical reviews. - Mentor and raise the bar: guide junior scientists/engineers, set best practices for experimentation and code quality, and drive a culture of rigor and reproducibility.
US, CA, Sunnyvale
Amazon Industrial Robotics Group is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine innovative AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. We leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. We are pioneering the development of robotics foundation models that: - Enable unprecedented generalization across diverse tasks - Integrate multi-modal learning capabilities (visual, tactile, linguistic) - Accelerate skill acquisition through demonstration learning - Enhance robotic perception and environmental understanding - Streamline development processes through reusable capabilities The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. As a Senior Applied Scientist, you will lead the development of machine learning systems that help robots perceive, reason, and act in real-world environments. You will set technical direction for adapting and advancing state-of-the-art models (open source and internal research) into robust, safe, and high-performing “robot brain” capabilities for our target tasks, environments, and robot embodiments. You will drive rigorous capability profiling and experimentation, lead targeted innovation where gaps exist, and partner across research, controls, hardware, and product teams to ensure outputs can be further customized and deployed on specific robots. Key job responsibilities - Lead technical initiatives for foundation-model capabilities (e.g., visuomotor / VLA / video-action worldmodel-action policies), from problem definition through validated model deliverables. - Own model readiness for our embodiment class: drive adaptation, fine-tuning, and optimization (latency/throughput/robustness), and define success criteria that downstream teams can build on. - Establish and evolve capability evaluation: define benchmark strategy, metrics, and profiling methodology to quantify performance, generalization, and failure modes; ensure evaluations drive clear roadmap decisions. - Drive the data + training strategy needed to close key capability gaps, including data requirements, collection/curation standards, dataset quality/provenance, and repeatable training recipes (sim + real). - Invent and validate new methods when leveraging SOTA is insufficient—new training schemes, model components, supervision signals, or sim↔real techniques—backed by strong empirical evidence. - Influence cross-team technical decisions by collaborating with controls/WBC, hardware, and product teams on interfaces, constraints, and integration plans; communicate results via design docs and technical reviews. - Mentor and raise the bar: guide junior scientists/engineers, set best practices for experimentation and code quality, and drive a culture of rigor and reproducibility.
US, WA, Seattle
We are looking for a passionate Applied Scientist to help pioneer the next generation of agentic AI applications for Amazon advertisers. In this role, you will design agentic architectures, develop tools and datasets, and contribute to building systems that can reason, plan, and act autonomously across complex advertiser workflows. You will work at the forefront of applied AI, developing methods for fine-tuning, reinforcement learning, and preference optimization, while helping create evaluation frameworks that ensure safety, reliability, and trust at scale. You will work backwards from the needs of advertisers—delivering customer-facing products that directly help them create, optimize, and grow their campaigns. Beyond building models, you will advance the agent ecosystem by experimenting with and applying core primitives such as tool orchestration, multi-step reasoning, and adaptive preference-driven behavior. This role requires working independently on ambiguous technical problems, collaborating closely with scientists, engineers, and product managers to bring innovative solutions into production. Key job responsibilities - Design and build agents to guide advertisers in conversational and non-conversational experience. - Design and implement advanced model and agent optimization techniques, including supervised fine-tuning, instruction tuning and preference optimization (e.g., DPO/IPO). - Curate datasets and tools for MCP. - Build evaluation pipelines for agent workflows, including automated benchmarks, multi-step reasoning tests, and safety guardrails. - Develop agentic architectures (e.g., CoT, ToT, ReAct) that integrate planning, tool use, and long-horizon reasoning. - Prototype and iterate on multi-agent orchestration frameworks and workflows. - Collaborate with peers across engineering and product to bring scientific innovations into production. - Stay current with the latest research in LLMs, RL, and agent-based AI, and translate findings into practical applications. About the team The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through the latest generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. The Campaign Strategies team within Sponsored Products and Brands is focused on guiding and supporting 1.6MM advertisers to meet their advertising needs of creating and managing ad campaigns. At this scale, the complexity of diverse advertiser goals, campaign types, and market dynamics creates both a massive technical challenge and a transformative opportunity: even small improvements in guidance systems can have outsized impact on advertiser success and Amazon’s retail ecosystem. Our vision is to build a highly personalized, context-aware agentic advertiser guidance system that leverages LLMs together with tools such as auction simulations, ML models, and optimization algorithms. This agentic framework, will operate across both chat and non-chat experiences in the ad console, scaling to natural language queries as well as proactively delivering guidance based on deep understanding of the advertiser. To execute this vision, we collaborate closely with stakeholders across Ad Console, Sales, and Marketing to identify opportunities—from high-level product guidance down to granular keyword recommendations—and deliver them through a tailored, personalized experience. Our work is grounded in state-of-the-art agent architectures, tool integration, reasoning frameworks, and model customization approaches (including tuning, MCP, and preference optimization), ensuring our systems are both scalable and adaptive.