Fitzgerald keynote.png
Amazon senior applied scientist Jack FitzGerald, delivering a keynote talk at the joint Language Intelligence @ Work and SEMANTiCS conference in Vienna, Austria.

Scaling multilingual virtual assistants to 1,000 languages

Self-supervised training, distributed training, and knowledge distillation have delivered remarkable results, but they’re just the tip of the iceberg.

Yesterday at the joint Language Intelligence @ Work and SEMANTiCS conference in Vienna, Austria, Amazon senior applied scientist Jack FitzGerald delivered a keynote talk on multilingual virtual assistants and the path toward a massively multilingual future. This is an edited version of his talk.

The evolution of human-computer interaction paradigms

In the past 50 years, computing technology has progressed from text-based terminal inputs, to graphical user interfaces, to predominantly web-based applications, through the mobile era, and finally into the era of a voice user interface and ambient computing.

Interface timeline.png
A brief history of computing interfaces.

Each of these paradigms has its own challenges with respect to multilingualism, whether it was the migration from ASCII to Unicode or proper character rendering on a website. However, I would argue that a voice AI system is the most difficult paradigm yet with respect to massive multilingualism.

The first reason is that the input space for voice interface commands is unbounded: the user can phrase each command in hundreds of different ways, all of which are valid. Another reason is that even within a single language, there can be many different dialects and accents.

Related content
Amazon Visiting Academic Barbara Poblete helps to build safer, more-diverse online communities — and to aid disaster response.

Most important, the coupling between language and culture is inescapable. Whether it’s the level of formality used, preferred activities, or religious differences, there isn’t a one-size-fits-all solution. Instead, we must adapt the virtual assistant to understand cultural context and say only things that are appropriate for a given locale.

Voice AI systems today

A typical voice AI system includes automatic-speech-recognition models, which convert raw audio into text; natural-language understanding models, which determine the user’s intent and recognize named entities; a central service for arbitration and dialogue management, which routes commands to the proper services or skills; and finally, a text-to-speech model, which issues the output. Additional tasks might include expansion of the underlying knowledge graph and semantic parsing, localization of touch screen content, or local information services.

Alexa overview.png
An overview of Alexa’s design.

Let’s look at some of the operational considerations for supporting multiple languages in such models. One is the training data: they must be topically exhaustive, meaning that they cover the full spectrum of possible user utterances, and they must be culturally exhaustive — for instance, covering all of the holidays a user might celebrate. They must also remain up-to-date, and it’s not always easy to add something new to the model without regression on existing functionalities.

A second consideration is in-house testing. Though in many cases one can get away with synthetic or otherwise artificial data for model training, for testing it’s important to have realistic utterances. Those typically need to come from humans, and collecting them can be a major expense. It’s also useful to perform live, interactive testing, which requires people who can speak and understand each language that the system supports.

Related content
New approach corrects for cases when average improvements are accompanied by specific regressions.

Finally, it’s important to have the ability to support users and process their feedback. In most cases, this again requires staff who understand each of the supported languages.

Ultimately, human-based processes are not very scalable if our goal is to support thousands of languages. Instead, we must turn to technology to the greatest extent possible.

Multilingual modeling today

One of the leading reasons for the current success of multilingual text models is self-supervision.

In traditional supervised learning, a model would be trained from scratch on the desired task. If we wanted a model that would classify the sentiment of a product review, for example, we would manually annotate a bunch of product reviews, and we would use that dataset to train the model.

Today, however, we make use of transfer learning, in which text models are pretrained on terabytes of text data that don’t require manual annotation. Instead, the training procedure leverages the structure inherent to the text itself.

Self-supervision signals.png
Self-supervised-training objectives.

We’ll call this self-supervised pretraining With the masked-language-modeling training objective, for instance, the model is fed the input “for [MASK] out loud!”, and it must predict that “[MASK]” should be filled with the word “crying”. Other objectives, such as causal language modeling, span filling, deshuffling, and denoising can also be used.

Because the datasets required for self-supervised pretraining are unlabeled and monolingual, we can leverage troves of data, such as Common Crawl web scrapes, every Wikipedia page in existence, thousands of books and news articles, and more. Couple these large datasets with highly parallelizable architectures such as transformers, which can be trained on over a thousand GPUs with near linear scaling, and we can build models with tens or hundreds of billions of dense parameters. Such has been the focus for many people in the field for the past few years, including the Alexa Teacher Model team.

One incredible consequence of the transfer learning paradigm is called zero-shot learning. In the context of multilingual modeling, it works like this: the modeler begins by pretraining the model on some set of languages, using self-supervision. As an example, suppose that the modeler trains a model on English, French, and Japanese using every Wikipedia article in those three languages.

Related content
New end-to-end approach to zero-shot video classification dramatically outperforms predecessors.

The next step is to adapt the model to a particular task using labeled data. Suppose that the modeler has a labeled dataset for intent classification, but only in English. The modeler can go ahead and fine-tune the model on the English data, then run it on the remaining languages.

Despite the fact that the model was never trained to do intent classification with French or Japanese data, it can still classify intents in those languages, by leveraging what it learned about those languages during pretraining. Given that the acquisition of labeled data is often a bottleneck, this property of language models is highly valuable for language expansion. Of course, zero-shot learning is just the extreme end of a continuum: transfer learning helps even out performance when the labeled data in different languages is imbalanced.

Zero-shot multilingual.png
Zero-shot learning for multilingual adaptation.

The next step up the data efficiency ladder is performing tasks without any additional training or fine tuning, using only a couple of labeled records or none at all. This is possible through “in-context learning,” which was popularized in the GPT-3 paper.

To perform in-context learning, simply take a pretrained model and feed it the appropriate prompts. Think of a prompt is a hint to the model about the task it should perform. Suppose that we want the model to summarize a passage. We might prefix the passage with the word “Passage” and a colon and follow it with the word “Summary” and a colon. The model would then generate a summary of the passage.

Related content
In the past few years, advances in artificial intelligence have captured our imaginations and led to the widespread use of voice services on our phones and in our homes.

This is the zero-shot in-context learning case, meaning that no fine-tuning is performed, and no labeled data are needed. To improve task performance, we can feed a few examples to the model before asking it to perform the task. Though this does require some labeled data, the amount is small, usually in the tens of examples only.

Our Alexa Teacher Model team recently trained and tested a 20-billion-parameter sequence-to-sequence model that was multilingual and showed nice performance for in-context learning. For example, we showed state-of-the-art performance on machine translation with in-context learning. The model can achieve competitive BLEU scores even for some low-resource languages, which is incredible given that no parallel data was used during pretraining, and no labeled data besides a single example was used at any step in the process.

We were particularly proud of the relatively small size of this model, which could compete with much larger models because it was trained on more data. (The Chinchilla model from OpenAI showed a similar result.) Though a large model trained on a smaller dataset and a smaller model trained on a larger dataset may use the same total compute at training time, the smaller model will require less compute and memory during inference, which is a key factor in real applications.

Given that models demonstrate multilingual understanding even without labeled data or parallel data, you may be wondering what’s happening inside of the model. Since the days of word2vec and earlier, we’ve represented characters, words, sentences, documents, and other inputs as vectors of floats, also known as embeddings, hidden states, and representations. Concepts cluster in certain areas of the representational space.

Related content
Training a product discovery system on many languages at once improves performance in all of them.

As humans, we can think only in three dimensions, whereas these representations are high-dimensional, but you can visualize this clustering in two or three dimensions as a reductive approximation. All the languages the model supports would cluster the concept of sitting in a chair in one region of the representational space; the concept of the ocean would inhabit a different cluster; and so forth.

Indeed, Pires et al. have shown that synonymous words across languages cluster together in the mBERT model. When examining 5,000 sentence pairs from the WMT16 dataset, they found that, given a sentence and its embedding in one language, the correct translation from another language is the closest embedding to the source embedding up to 75% of the time.

This manner of clustering can also be manipulated by changing the objective function. In their work on speech-to-text-modeling, Adams et al., from Johns Hopkins, were seeing undesirable clustering by language, rather than by phonemes, in the representational space. They were able to correct by adding training objectives around phoneme prediction and language identification.

The Alexa Teacher Model distillation pipeline

Once we have multilingual models, how do we adapt them to a real system? At the recent KDD conference, we presented a paper describing the Alexa Teacher Model pipeline, consisting of the following steps.

First, a multilingual model with billions of parameters is trained on up to a trillion tokens taken from Common Crawl web scrapes, Wikipedia articles, and more. Second, the models are further trained on in-domain, unlabeled data from a real system. Third, the model is distilled into smaller sizes that can be used in production. The final models can then be fine-tuned using labeled data and deployed.

ATM pipeline.png
The Alexa Teacher Model (AlexaTM) pipeline. The Alexa Teacher Model is trained on a large set of GPUs (left), then distilled into smaller variants (center), whose size depends on their uses. The end user adapts a distilled model to its particular use by fine-tuning it on in-domain data (right).

In tests, we found that our model was more accurate than a publicly available pretrained model fine-tuned on labeled data, and it significantly reduced customer dissatisfaction relative to a model trained by a smaller teacher model (85 million parameters, say, instead of billions). In short, we’ve verified that we can leverage the additional learning capacity of large, multilingual models for production systems requiring low latency and low memory consumption.

Scaling to 1,000 languages

I mentioned the fascinating ability of language models to learn joint representations of multiple languages without labeled or parallel data. This ability is crucial for us to scale to many languages. However, as we scale, we need test data that we can trust so that we can evaluate our progress.

Related content
MASSIVE dataset and Massively Multilingual NLU (MMNLU-22) competition and workshop will help researchers scale natural-language-understanding technology to every language on Earth.

Toward this end, my team at Amazon recently released a new benchmark for multilingual natural-language understanding called MASSIVE, which is composed of one million labeled records spanning 51 languages, 18 domains, 60 intents, and 55 slots. All of the data were created by native speakers of the languages. We also released a GitHub repository with code that can be used as a baseline for creating multilingual NLU models, as well as leaderboards on eval.ai.

Now, you may retort that 51 languages is still a long ways from 1,000 languages. This is true, but we purposefully chose our languages in order to maximize typological diversity while staying within our budget. Our languages span 29 language genera, 14 language families, and 21 distinct scripts or alphabets. The diversity of the chosen languages allows a modeler to test technology that should scale to many more languages within each represented genus, family, and script.

That said, we certainly have some major gaps in language coverage, including across native North and South American languages, African languages, and Australian languages. Yet we are optimistic that our fellow researchers across the field will continue to produce new labeled benchmark datasets for the world’s thousands of low-resource languages.

Massive languages.cropped.png
The 51 languages of MASSIVE, including scripts and genera.

Another difficulty with our current modeling approaches is that they rely on data sources such as web scrapes, encyclopedic articles, and news articles, which are highly skewed toward a small set of languages. Wang, Ruder, and Neubig recently presented some fascinating work leveraging bilingual lexicons — corpora consisting of word-level translations — to improve language model performance for low-resource languages. Lexicons cover a far greater portion of the world’s languages than our typical data sources for language modeling, making this an exciting approach.

Related content
Self-learning system uses customers’ rephrased requests as implicit error signals.

Researchers, missionaries, and businesspeople have been created fundamental linguistic resources for decades, from Bible translations to the Unimorph corpus. The Unimorph datasets are used for the SIGMORPHON shared task, in which a model must predict the correct formulation of word given that word’s root and certain morphological transformations, such as part of speech, tense, and person. We must find more ways to leverage such resources when creating massively multilingual voice AI systems.

As a final technique for scaling to many more languages, we can consider what we in Alexa call “self-learning.” Some of my Alexa colleagues published a paper showing that we can mine past utterances to improve overall system performance. For example, if a user rephrases a request as part of a multiturn interaction, as shown on the left in the figure below, or if different users provide variations for the same desired goal, as shown on the right, then we can make soft assumptions that the different formulations are synonymous.

All of these cases can be statistically aggregated to form new training sets to update the system, without the need to manually annotate utterances. In a multilingual system, such technology is particularly valuable after the initial launch of a language, both to improve performance generally and to adapt to changes in the lexicon.

Self-learning.png
Alexa’s self-learning mechanism.

The road ahead

I hope that you share my wonder at the current state of the art — the scale of language-model training, the magic of zero-shot learning, and the distillation of knowledge into compact models that can run in latency-sensitive systems. All of this is incredible, but we’ve only scratched the surface of supporting the world’s 7,000 languages.

To move into the next era of massive multilingualism, we must build new and increasingly powerful models that can take advantage of low-cost data, particularly unlabeled monolingual data. We must also build models that can leverage existing and upcoming linguistic resources, such as bilingual lexicons and morphological-transformation databases. And finally, we must expand available language resources across more languages and domains, including more unlabeled monolingual corpora, more parallel resources, and more realistic, labeled, task-specific datasets.

Increased multilingualism is a win for all people everywhere. Each language provides a unique perspective on the world in which we live. A rich plurality of perspectives leads to a deeper understanding of our fellow people and of all creation.

Keep building.

Research areas

Related content

IN, KA, Bengaluru
Alexa International is looking for passionate, talented, and inventive Senior Applied Scientists to help build industry-leading technology with Large Language Models (LLMs) and multimodal systems, requiring strong deep learning and generative models knowledge. Senior applied scientists will drive cross-team scientific strategy, influence partner teams, and deliver solutions that have broad impact across Alexa's international products and services. Key job responsibilities As a Applied Scientist with II the Alexa International team, you will work with talented peers to develop novel algorithms and modeling techniques to advance the state of the art with LLMs, particularly delivering industry-leading scientific research and applied AI for multi-lingual applications — a challenging area for the industry globally. Your work will directly impact our global customers in the form of products and services that support Alexa+. You will leverage Amazon's heterogeneous data sources and large-scale computing resources to accelerate advances in text, speech, and vision domains. The ideal candidate possesses a solid understanding of machine learning, speech and/or natural language processing, modern LLM architectures, LLM evaluation & tooling, and a passion for pushing boundaries in this vast and quickly evolving field. They thrive in fast-paced environment, like to tackle complex challenges, excel at swiftly delivering impactful solutions while iterating based on user feedback, and are able to influence and align multiple teams around a shared scientific vision. A day in the life * Analyze, understand, and model customer behavior and the customer experience based on large-scale data. * Build novel online & offline evaluation metrics and methodologies for multimodal personal digital assistants. * Fine-tune/post-train LLMs using advanced and innovative techniques like SFT, DPO, Reinforcement Learning (RLHF and RLAIF) for supporting model performance specific to a customer’s location and language. * Quickly experiment and set up experimentation framework for agile model and data analysis or A/B testing. * Contribute through industry-first research to drive innovation forward. * Drive cross-team scientific strategy and influence partner teams on LLM evaluation frameworks, post-training methodologies, and best practices for international speech and language systems. * Lead end-to-end delivery of scientifically complex solutions from research to production, including reusable science components and services that resolve architecture deficiencies across teams. * Serve as a scientific thought leader, communicating solutions clearly to partners, stakeholders, and senior leadership. * Actively mentor junior scientists and contribute to the broader internal and external scientific community through publications and community engagement.
US, NY, New York
About the Role In this role, you will own the science strategy and technical vision for this intelligence layer, leading a team of applied scientists working across GenAI and predictive modeling. You will shape how heterogeneous signals — text, behavioral, network, temporal — come together to power talent applications at Amazon scale, from workforce forecasting to personalized development to compensation strategy. You will identify opportunities where science investment can have material impact on long-term objectives or annual goals and build consensus around needed investments, working comfortably across different modeling paradigms and data modalities to guide principal and senior scientists in their most challenging and strategic decisions while serving as the strategic science advisor to PXT leaders operating at the Director, VP, and SVP levels. As a hands-on leader, you will personally own development and delivery of the most complex science problems at the intersection of multiple ML disciplines, stay current with emergent AI/ML science and engineering trends to influence focus areas in a rapidly evolving landscape, and participate in organizational planning, hiring, mentorship, and leadership development. Key job responsibilities • Lead technical initiatives in people science models, driving breakthrough approaches through hands-on research and development in areas like foundation models for predictive modeling, efficient multi-modal LLMs, and zero-shot learning • Design and implement novel ML architectures that push the boundaries of how workforce signals are represented, fused, and predicted at scale • Guide technical direction for research initiatives across the team, ensuring robust performance in production environments serving hundreds of thousands of employees • Mentor and develop senior scientists while maintaining strong individual technical contributions on the most complex cross-domain problems • Collaborate with engineering teams to optimize and scale models for real-world talent applications • Influence technical decisions and implementation strategies across teams, shaping the long-term platform architecture About the team The People eXperience and Technology (PXT) Core Science Team uses science, engineering, and customer-obsessed problem solving to proactively identify mechanisms, process improvements, and products that simultaneously improve Amazon and Amazonians' lives, wellbeing, and value of work. As an interdisciplinary team combining talents from machine learning, statistics, economics, behavioral science, engineering, and product development, the Core Science team develops and delivers measurable solutions through innovation and rapid prototyping to accelerate informed, accurate, and reliable decision-making backed by science and data.
US, MA, N.reading
Amazon is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic dexterous manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. At Amazon we leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at an unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. Key job responsibilities - Design and implement whole body control methods for balance, locomotion, and dexterous manipulation - Utilize state-of-the-art in methods in learned and model-based control - Create robust and safe behaviors for different terrains and tasks - Implement real-time controllers with stability guarantees - Collaborate effectively with multi-disciplinary teams to co-design hardware and algorithms for loco-manipulation - Mentor junior engineer and scientists
IN, KA, Bengaluru
Have you ever ordered a product on Amazon and when that box with the smile arrived you wondered how it got to you so fast? Have you wondered where it came from and how much it cost Amazon to deliver it to you? If so, the WW Amazon Logistics, Business Analytics team is for you. We manage the delivery of tens of millions of products every week to Amazon’s customers, achieving on-time delivery in a cost-effective manner. We are looking for an enthusiastic, customer obsessed, Applied Scientist with good analytical skills to help manage projects and operations, implement scheduling solutions, improve metrics, and develop scalable processes and tools. The primary role of an Operations Research Scientist within Amazon is to address business challenges through building a compelling case, and using data to influence change across the organization. This individual will be given responsibility on their first day to own those business challenges and the autonomy to think strategically and make data driven decisions. Decisions and tools made in this role will have significant impact to the customer experience, as it will have a major impact on how the final phase of delivery is done at Amazon. Ideal candidates will be a high potential, strategic and analytic graduate with a PhD in (Operations Research, Statistics, Engineering, and Supply Chain) ready for challenging opportunities in the core of our world class operations space. Great candidates have a history of operations research, and the ability to use data and research to make changes. This role requires robust program management skills and research science skills in order to act on research outcomes. This individual will need to be able to work with a team, but also be comfortable making decisions independently, in what is often times an ambiguous environment. Responsibilities may include: - Develop input and assumptions based preexisting models to estimate the costs and savings opportunities associated with varying levels of network growth and operations - Creating metrics to measure business performance, identify root causes and trends, and prescribe action plans - Managing multiple projects simultaneously - Working with technology teams and product managers to develop new tools and systems to support the growth of the business - Communicating with and supporting various internal stakeholders and external audiences
GB, London
Come build the future of entertainment with us. Are you interested in shaping the future of movies and television? Do you want to define the next generation of how and what Amazon customers are watching? Prime Video is a premium streaming service that offers customers a vast collection of TV shows and movies - all with the ease of finding what they love to watch in one place. We offer customers thousands of popular movies and TV shows including Amazon Originals and exclusive licensed content to exciting live sports events. We also offer our members the opportunity to subscribe to add-on channels which they can cancel at anytime and to rent or buy new release movies and TV box sets on the Prime Video Store. Prime Video is a fast-paced, growth business - available in over 200 countries and territories worldwide. The team works in a dynamic environment where innovating on behalf of our customers is at the heart of everything we do. If this sounds exciting to you, please read on. The Insights team is looking for an Applied Scientist for our London office experienced in generative AI and large models. This is a wide impact role working with development teams across the UK, India, and the US. This greenfield project will deliver features that reduce the operational load for internal Prime Video builders and for this, you will need to develop personalized recommendations for their services. You will have strong technical ability, excellent teamwork and communication skills, and a strong motivation to deliver customer value from your research. Our position offers opportunities to grow your technical and non-technical skills and make a global impact immediately. Key job responsibilities - Develop machine learning algorithms for high-scale recommendations problems - Rapidly design, prototype and test many possible hypotheses in a high-ambiguity environment, making use of both quantitative analysis and business judgement - Collaborate with software engineers to integrate successful experimental results into Prime Video wide processes - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports A day in the life You will lead the design of machine learning models that scale to very large quantities of data across multiple dimensions. You will embody scientific rigor, designing and executing experiments to demonstrate the technical effectiveness and business value of your methods. You will work alongside other scientists and engineering teams to deliver your research into production systems. About the team Our team owns Prime Video observability features for development teams. We consume PBs of data daily which feed into multiple observability features focussed on reducing the customer impact time.
CN, 31, Shanghai
You will be working with a unique and gifted team developing exciting products for consumers. The team is a multidisciplinary group of engineers and scientists engaged in a fast paced mission to deliver new products. The team faces a challenging task of balancing cost, schedule, and performance requirements. You should be comfortable collaborating in a fast-paced and often uncertain environment, and contributing to innovative solutions, while demonstrating leadership, technical competence, and meticulousness. Your deliverables will include development of thermal solutions, concept design, feature development, product architecture and system validation through to manufacturing release. You will support creative developments through application of analysis and testing of complex electronic assemblies using advanced simulation and experimentation tools and techniques. Key job responsibilities * Evaluate and optimize thermal solution requirements of consumer electronic products * Use simulation tools like Star-CCM+ or FloTherm XT/EFD for analysis and design of products * Validate design modifications for thermal concerns using simulation and actual prototypes * Establish temperature thresholds for user comfort level and component level considering reliability requirements * Have intimate knowledge of various materials and heat spreaders solutions to resolve thermal issues * Use of programming languages like Python and Matlab for analytical/statistical analyses and automation * Collaborate as part of device team to iterate and optimize design parameters of enclosures and structural parts to establish and deliver project performance objectives * Design and execute of tests using statistical tools to validate analytical models, identify risks and assess design margins * Create and present analytical and experimental results * Develop and apply design guidelines based on project learnings
US, CA, San Francisco
MULTIPLE POSITIONS AVAILABLE Employer: AMAZON DEVELOPMENT CENTER U.S., INC., Offered Position: Research Scientist II Job Location: San Francisco, California Job Number: AMZ9674001 Position Responsibilities: Design research studies to obtain scientific information. Develop theories or models of physical phenomena encountered in quantum computing, superconducting qubit device physics, materials or process development and characterization. Collaborate with others to determine design specifications, including of superconducting quantum processor chips, microwave chip packages, and associated electrical and mechanical components. Develop scientific or mathematical models to predict physical device behavior and performance, and verify the implementation of computational models. Apply mathematical principles or statistical approaches to solve problems, for example to validate modeling predictions under experimental uncertainty using statistical methods. Operate laboratory or field equipment and scientific instrumentation for device fabrication, device characterization, or advanced materials research. Develop new algorithms or methods for designing, simulating, or measuring quantum computers. Develop performance metrics or standards related to quantum information technology. Recommend technical design or process changes to improve quality or performance of superconducting quantum processors and efficiency of their design, manufacture, and testing. Collaborate on research activities with scientists or technical specialists. Prepare scientific or technical reports or presentations and present research results to others. 40 hours / week, 8:00am-5:00pm, Salary Range $168,126/year to $212,800/year. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, visit: https://www.aboutamazon.com/workplace/employee-benefits. Amazon.com is an Equal Opportunity-Affirmative Action Employer – Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation.#0000
US, WA, Seattle
This role leads the science function in WW Stores Finance as part of the IPAT organization (Insights, Planning, Analytics and Technology), driving transformative innovations in financial analytics through AI and machine learning across the global Stores finance organization. The successful candidate builds and directs a multidisciplinary team of data scientists, applied scientists, economists, and product managers to deliver scalable solutions that fundamentally change how finance teams generate insights, automate workflows, and make decisions. As part of the WW Stores Finance leadership team, this leader partners with engineering, product, and finance stakeholders to translate emerging AI capabilities into production systems that deliver measurable improvements in speed, accuracy, and efficiency. The role's outputs directly inform VP/SVP/CFO/CEO leadership decisions and drive impact across the entire Stores P&L. Success requires translating complex technical concepts for finance domain experts and business leaders while maintaining deep technical credibility with science and engineering teams. The role demands both strategic vision—identifying high-impact opportunities where AI can transform finance operations—and execution excellence in coordinating project planning, resource allocation, and delivery across multiple concurrent initiatives. This leader establishes methodologies and models that enable Amazon finance to achieve step-change improvements in both the speed and quality of business insights, directly supporting critical processes including month-end reporting, quarterly guidance, annual planning cycles, and financial controllership. Key job responsibilities Transformation of Finance Workflows — Lead development of agentic AI solutions that automate routine finance tasks and transform how teams communicate business insights. Deploy these solutions across financial analysis, narrative generation, and dynamic table creation for month-end reporting and planning cycles. Partner with engineering and product teams to integrate these capabilities into production systems that directly support Stores Finance and FGBS automation goals, delivering measurable reductions in manual effort and cycle time. Science-Based Forecasting — Develop and deploy machine learning forecasts that integrate into existing planning processes including OP1, OP2, and quarterly guidance cycles. Partner with finance teams across WW Stores to iterate on forecast accuracy, applying these models either as alternative viewpoints to complement bottoms-up forecasts or as hands-off replacements for manual forecasting processes. Establish evaluation frameworks that demonstrate forecast performance against business benchmarks and drive adoption across critical planning workflows. Financial Controllership — Scale AI capabilities across controllership workstreams to improve reporting accuracy and automate manual processes. Leverage generative AI to identify financial risk through systematic pattern recognition in transaction data, account reconciliations, and variance analysis. Develop production systems that enhance decision-making speed and quality in financial close, audit preparation, and compliance reporting, delivering quantifiable improvements in error detection rates and process efficiency. About the team IPAT (Insights, Planning, Analytics, and Technology) is a team in the Worldwide Amazon Stores Finance organization composed of leaders across engineering, finance, product, and science. Our mission is to reimagine finance using technology and science to provide fast, efficient, and accurate insights that drive business decisions and strengthen governance. We are dedicated to improving financial operations through innovative applications of technology and science. Our work focuses on developing adaptive solutions for diverse financial use cases, applying AI to solve complex financial challenges, and conducting financial data analysis. Operating globally, we strive to develop adaptable solutions for diverse markets. We aim to advance financial science, continually improving accuracy, efficiency, and insight generation in support of Amazon's mission to be Earth's most customer-centric company.
US, NY, New York
Do you want to lead the Ads industry and redefine how we measure the effectiveness of Amazon Ads business? Are you passionate about causal inference, Deep Learning/DNN, raising the science bar, and connecting leading-edge science research to Amazon-scale implementation? If so, come join Amazon Ads to be an Economist leader within our Advertising Incrementality Measurement science team! Our work builds the foundations for providing customer-facing experimentation tools, furthering internal research & development on Econometrics, and building out Amazon's advertising measurement offerings. Incrementality is a lynchpin for the next generation of Amazon Advertising measurement solutions and this role will play a key role in the release and expansion of these offerings. Key job responsibilities As an Economist leader within the Advertising Incrementality Measurement (AIM) science team, you are responsible for defining and executing on key workstreams within our overall causal measurement science vision. In particular, you can lead the development of experimental methodologies to measure ad effectiveness, and also build observational models that lay the foundations for understanding the impact of individual ad touchpoints for billions of daily ad interactions. You will work on a team of Applied Scientists, Economists, and Data Scientists, alongside a dedicated Engineering team, to work backwards from customer needs and translate product ideas into concrete science deliverables. You will be a thought leader for inventing scalable causal measurement solutions that support highly accurate and actionable insights--from defining and executing hundreds of thousands of RCTs, to developing an exciting science R&D agenda. You will be working with massive data and industry-leading partner scientists, while also interfacing with leadership to define our future vision. Your work will help shape the future of Amazon Advertising. About the team AIM is a cross disciplinary team of engineers, product managers, economists, data scientists, and applied scientists with a charter to build scientifically-rigorous causal inference methodologies at scale. Our job is to help customers cut through the noise of the modern advertising landscape and understand what actions, behaviors, and strategies actually have a real, measurable impact on key outcomes. The data we produce becomes the effective ground truth for advertisers and partners making decisions affecting millions in advertising spend.
US, NY, New York
The Measurement Intelligence Science Team (MIST) in the Measurement, Ad Tech, and Data Science (MADS) organization of Amazon Ads serves a centralized role developing solutions for a multitude of performance measurement products. We create solutions which measure the comprehensive impact of their ad spend, including sales impacts both online and offline and across timescales, and provide actionable insights that enable our advertisers to optimize their media portfolios. We leverage a host of scientific technologies to accomplish this mission, including Generative AI, classical ML, Causal Inference, Natural Language Processing, and Computer Vision. As an Applied Science Manager on the team, you will lead a team of scientists to define and execute a transformative vision for holistic measurement and reporting insights for ad effectiveness. Your team will own the science solutions for foundational experimentation platforms, foundational customer journey understanding technologies, state of the art attribution algorithms to measure the role of advertising in driving observed retail outcomes, and/or agentic AI solutions that help advertisers get quick access to custom insights that inform how to get the most out of their ad spend. Key job responsibilities You independently manage a team of scientists. You identify the needs of your team and effectively grow, hire, and promote scientists to maintain a high-performing team. You have a broad understanding of scientific techniques, several of which may fall out of your specific job function. You define the strategic vision for your team. You establish a roadmap and successfully deliver scientific solutions that execute that vision. You define clear goals for your team and effectively prioritize, balancing short-term needs and long-term value. You establish clear and effective metrics and scientific process to enforce consistent, high-quality artifact delivery. You proactively identify risks and bring them to the attention of your manager, customers, and stakeholders with plans for mitigation before they become roadblocks. You know when to escalate. You communicate ideas effectively, both verbally and in writing, to all types of audiences. You author strategic documentation for your team. You communicate issues and options with leaders in such a way that facilitates understanding and that leads to a decision. You work successfully with customers, leaders, and engineering teams. You foster a constructive dialogue, harmonize discordant views, and lead the resolution of contentious issues. About the team We are a team of scientists across Applied, Research, Data Science and Economist disciplines. You will work with colleagues with deep expertise in ML, NLP, CV, Gen AI, and Causal Inference with a diverse range of backgrounds. We partner closely with top-notch engineers, product managers, sales leaders, and other scientists with expertise in the ads industry and on building scalable modeling and software solutions.