20B-parameter Alexa model sets new marks in few-shot learning

With an encoder-decoder architecture — rather than decoder only — the Alexa Teacher Model excels other large language models on few-shot tasks such as summarization and machine translation.

Most major advances in AI have come from supervised learning, in which machine learning models are trained on annotated data. But as commercial AI models continue to increase in scale, relying on data annotation is becoming unsustainable.

At Alexa AI, we are moving to the new paradigm of generalizable intelligence, in which models can learn new concepts and transfer knowledge from one language or task to another with minimal human input. Such models allow us to efficiently develop new features and improve Alexa on multiple languages at the same time.

As part of this move, we have introduced Transformer-based large-scale multilingual language models we call Alexa Teacher Models (AlexaTM). Given only a few examples of a task in a new language, AlexaTM can transfer what it knows to the new language with no extra human supervision.

Related content
New method would enable BERT-based natural-language-processing models to handle longer text strings, run in resource-constrained settings — or sometimes both.

In a paper we’re presenting at this year’s Knowledge Discovery and Data Mining Conference (KDD), we showed that 10-billion- and two-billion-parameter AlexaTM models can improve on state-of-art cross-lingual transfer learning and increase Alexa’s accuracy in different locales.

In a follow-up paper, which we've published on arXiv, we have taken this line of research a step further, with a 20-billion-parameter generative model called AlexaTM 20B. The experiments reported in the paper — which use only publicly available data — show that AlexaTM 20B can not only transfer what it learns across languages but also learn new tasks from just a handful of examples (few-shot learning).

In the example below, the model is provided with three examples of different intents, or tasks that the customer wants executed: book-restaurant, play-music, and get-weather. The model can generalize from these to the unfamiliar intent get-news-update and generate utterances corresponding to that intent in different languages. This allows us to develop new features more rapidly, and in multiple languages, simultaneously.

Multilingual annotation.png
Using AlexaTM 20B to generate annotated data for a new intent in different languages.

Our work is inspired by recent work by OpenAI and the development of the GPT-3 model. However, where other large language models use decoder-only architectures, the AlexaTM 20B model is a sequence-to-sequence (seq2seq) encoder-decoder.

In an encoder-decoder architecture, the encoder produces a representation of an input text using a bidirectional encoding, and the decoder uses that representation to perform some task — historically, generating a translation of the input.

20B-encoder-decoder.gif
In a language model with an encoder-decoder architecture, the encoder produces a representation of an input text using a bidirectional encoding, and the decoder uses that representation to predict the next tokens (such as words and punctuation) in the sequence.

By contrast, the decoder-only model uses left-to-right (unidirectional) encoding of the input text. This works well for language modeling, in which the task is to predict the next token in a sequence based on those that precede it, but it’s less effective for machine translation and text summarization, the tasks on which AlexaTM 20B outperforms GPT-3.

Decoder-only.final.jpeg
A decoder-only language model uses left-to-right (unidirectional) encoding of the input text.

AlexaTM 20B also tops GPT-3 by being multilingual, supporting Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu. And its carbon footprint during training is only one-fifth of GPT-3’s, thanks to its lower parameter count and internal improvements to our training engine.

Related content
Determining the optimal architectural parameters reduces network size by 84% while improving performance on natural-language-understanding tasks.

To train AlexaTM 20B, we break with convention, training on a mix of denoising and causal-language-modeling (CLM) tasks. On the denoising task, the model is required to find dropped spans and generate the complete version of the input. This is similar to how other seq2seq models like T5 and BART are trained. On the CLM task, the model is required to meaningfully continue the input text. This is similar to how decoder-only models like GPT-3 and PaLM are trained.

Training on a mix of these two pretraining tasks enables AlexaTM 20B to generalize based on the given input and generate new text (the CLM task), while also performing well on tasks that seq2seq models are particularly good at, such as summarization and machine translation (the denoising task).

Pre-training objectives.png
AlexaTM 20B pre-training objectives. During pretraining, the model is trained on the denoising task 80% of the time and on causal language modeling (CLM) 20% of the time.

For example, we demonstrated that, given a single article-summarization pair, AlexaTM 20B can generate higher-quality summaries in English, German, and Spanish than the much larger PaLM 540B can (see example, below).

Related content
Human-evaluation studies validate metrics, and experiments show evidence of bias in popular language models.

Moreover, AlexaTM 20B achieves state-of-the-art performance in few-shot machine translation (MT) across almost all language pairs supported by the model on the Flores-101 dataset. The gains in translating to and from low-resource languages like Marathi, Tamil, and Telugu are particularly significant (e.g., 21.8 Arabic-to-Tamil sentence-piece BLEU score compared to 0.9 for the supervised M2M-124 615M model).

These results suggest that large-scale seq2seq-style pretraining, as formulated in our work, improves MT for languages with few training pairs, especially when a large amount of monolingual data is available for the target language. AlexaTM 20B has no difficulty translating directly from different languages, in contrast to many-to-many MT systems that require parallel translation data for training.

News summarization.png
News summarization by AlexaTM 20B when given only a single example. The input to the encoder is in the yellow box, the decoder’s output in the pink box.

AlexaTM 20B is the largest multilingual seq2seq model to date that is also capable of few-shot learning. We will be releasing the model publicly for non-commercial use to aid the development and evaluation of multilingual large language models (LLMs). We have also implemented a function to enable loading the model on up to eight GPUs with limited GPU memory for running inference on instances of Amazon Web Services’ EC2 computation service. This provides a more flexible way for researchers to use AlexaTM 20B in their own work.

In an analysis reported in our paper, we found that AlexaTM 20B, like other LLMs, has some likelihood of reproducing toxic language, social biases, and harmful stereotypes found in its training data. Therefore, we recommend that users conduct a full task-specific fairness-and-bias analysis before using the model to fully understand and address any potential harm that might arise from its use. Depending on the downstream application that AlexaTM 20B is being applied to, one or several of the prior techniques from the literature might be used to detoxify and debias the model. We reiterate the importance of task-specific fairness auditing and emphasize the need for more research on bias measurement and mitigation from the community.

All in all, we demonstrated in our work that the proposed style of pretraining enables seq2seq models that outperform much larger decoder-only LLMs across different tasks, both in a few-shot setting and with fine-tuning. We hope our work presents a compelling case for seq2seq models as a powerful alternative to decoder-only models for LLM training.

Research areas

Related content

GB, London
Are you looking to work at the forefront of Machine Learning and AI? Would you be excited to apply Generative AI algorithms to solve real world problems with significant impact? The Generative AI Innovation Center helps AWS customers implement Generative AI solutions and realize transformational business opportunities. This is a team of strategists, scientists, engineers, and architects working step-by-step with customers to build bespoke solutions that harness the power of generative AI. Starting in 2024, the Innovation Center launched a new Custom Model and Optimization program to help customers develop and scale highly customized generative AI solutions. The team helps customers imagine and scope bespoke use cases that will create the greatest value for their businesses, define paths to navigate technical or business challenges, develop and optimize models to power their solutions, and make plans for launching solutions at scale. The GenAI Innovation Center team provides guidance on best practices for applying generative AI responsibly and cost efficiently. You will work directly with customers and innovate in a fast-paced organization that contributes to game-changing projects and technologies. You will design and run experiments, research new algorithms, and find new ways of optimizing risk, profitability, and customer experience. We’re looking for Applied Scientists capable of using GenAI and other techniques to design, evangelize, and implement state-of-the-art solutions for never-before-solved problems. Key job responsibilities As an Applied Scientist, you will • Collaborate with AI/ML scientists and architects to research, design, develop, and evaluate generative AI solutions to address real-world challenges • Interact with customers directly to understand their business problems, aid them in implementation of generative AI solutions, brief customers and guide them on adoption patterns and paths to production • Help customers optimize their solutions through approaches such as model selection, training or tuning, right-sizing, distillation, and hardware optimization • Provide customer and market feedback to product and engineering teams to help define product direction About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the next-level. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Key job responsibilities * Partner with laboratory science teams on design and analysis of experiments * Originate and lead the development of new data collection workflows with cross-functional partners * Develop and deploy scalable bioinformatics analysis and QC workflows * Evaluate and incorporate novel bioinformatic approaches to solve critical business problems About the team Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best.
US, CA, San Diego
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by protecting Amazon customers from hackers and bad actors? Do you want to build advanced algorithmic systems that help manage the trust and safety of millions of customer every day? Are you excited by the prospect of analyzing and modeling terabytes of data and create state-of-art algorithms to solve real world problems? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Amazon Account Integrity team. The Amazon Account Integrity team works to ensure that customers are protected from bad actors trying to access their accounts. Our greatest challenge is protecting customer trust without unjustly harming good customers. To strike the right balance, we invest in mechanisms which allow us to accurately identify and mitigate risk, and to quickly correct and learn from our mistakes. This strategy includes continuously evolving enforcement policies, iterating our Machine Learning risk models, and exercising high‐judgement decision‐making where we cannot apply automation. Key job responsibilities Use statistical and machine learning techniques to create scalable risk management systems Analyzing and understanding large amounts of Amazon’s historical business data for specific instances of risk or broader risk trends Design, development and evaluation of highly innovative models for risk management Working closely with software engineering teams to drive real-time model implementations and new feature creations Working closely with operations staff to optimize risk management operations Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation Tracking general business activity and providing clear, compelling management reporting on a regular basis Research and implement novel machine learning and statistical approaches.
US, NY, New York
Are you passionate about conducting research to develop and grow leaders? Would you like to impact more than 1M Amazonians globally and improve the employee experience? If so, you should consider joining the People eXperience & Technology Central Science (PXTCS) team. Our goal is to be best and most diverse workforce in the world. PXTCS uses science, research, and technology to optimize employee experience and performance across the full employee lifecycle, from first contact through exit. We use economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal. This individual should be skilled in core data science tools and methods, icnluding SQL, a statistical software package (e.g., R, Python, or Stata), inferential statistics, and proficient in machine learning. This person should also have strong business acumen to navigate complex, ambiguous business challenges — they should be adept at asking the right questions, knowing what methodologies to use (and why), efficiently analyzing massive datasets, and communicating results to multiple audiences (e.g., technical peers, functional teams, business leaders). In order to move quickly, deliver high-quality results, and adapt to ever-evolving business priorities, effective communication skills in research fundamentals (e.g., research design, measurement, statistics) will also be a must. Major responsibilities will include: - Managing the full life cycle of large-scale research initiatives across multiple business segments that impact leaders in our organization (i.e., develop strategy, gather requirements, manage, and execute) - Serving as a subject matter expert on a wide variety of topics related to research design, measurement, analysis - Working with internal partners and external stakeholders to evaluate research initiatives that provide bottom-line ROI and incremental improvements over time - Collaborating with a cross-functional team that has expertise in social science, machine learning, econometrics, psychometrics, natural language processing, forecasting, optimization, business intelligence, analytics, and policy evaluation - Ability to query and clean complex datasets from multiple sources, to funnel into advanced statistical analysis - Writing high-quality, evidence-based documents that help provide insights to business leaders and gain buy-in - Sharing knowledge, advocating for innovative solutions, and mentoring others Inclusive Team Culture Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have 12 affinity groups (employee resource groups) with more than 1M employees across hundreds of chapters around the world. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which reminds team members to seek diverse perspectives, learn and be curious, and earn trust. Flexibility It isn’t about which hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We offer flexibility and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth We care about your career growth, too. Whether your goals are to explore new technologies, take on bigger opportunities, or get to the next level, we'll help you get there. Our business is growing fast and our people will grow with it. About the team We are a collegial and multidisciplinary team of researchers in People eXperience and Technology (PXT) that combines the talents of science and engineering to develop innovative solutions to make Amazon Earth's Best Employer. We leverage data and rigorous analysis to help Amazon attract, retain, and develop one of the world’s largest and most talented workforces.
US, WA, Bellevue
The Mission of Amazon's Artificial General Intelligence (AGI) team is to "Build world-class general-purpose intelligence services that benefits every Amazon business and humanity." Are you a data enthusiast? Are you a creative big thinker who is passionate about using data to direct decision making and solve complex and large-scale challenges? If so, then this position is for you! We are looking for a motivated individual with strong analytical and communication skills to join us. In this role, you will apply advanced analytics techniques, AI/ML, and statistical concepts to derive insights from massive datasets. The ideal candidate should have expertise in AI/ML, statistical analysis, and the ability to write code for building models and pipelines to automate data and analytics processing. They will help us design experiments, build models, and develop appropriate metrics to deeply understand the strengths and weaknesses of our systems. They will build dashboards to automate data collection and reporting of relevant data streams, providing leadership and stakeholders with transparency into our system's performance. They will turn their findings into actions by writing detailed reports and providing recommendations on where we should focus our efforts to have the largest customer impact. A successful candidate should be a self-starter, comfortable with ambiguity with strong attention to detail, and have the ability to work in a fast-paced and ever-changing environment. They will also help coach/mentor junior scientists in the team. The ideal candidate should possess excellent verbal and written communication skills, capable of effectively communicating results and insights to both technical and non-technical audiences
US, WA, Bellevue
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Applied Scientist to work on methodologies for Generative Artificial Intelligence (GenAI) models. As an Applied Scientist, you will be responsible for supporting the development of novel algorithms and modeling techniques to advance the state of the art. Your work will directly impact our customers and will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate development with multi-modal Large Language Models (LLMs) and GenAI. You will have significant influence on our overall strategy by working at the intersection of engineering and applied science to scale pre-training and post-training workflows and build efficient models. You will support the system architecture and the best practices that enable a quality infrastructure. Key job responsibilities Join us to work as an integral part of a team that has experience with GenAI models in this space. We work on these areas: - Pre-training and post-training multimodal LLMs - Scale training, optimization methods, and learning objectives - Utilize, build, and extend upon industry-leading frameworks - Work with other team members to investigate design approaches, prototype new technology, scientific techniques and evaluate technical feasibility - Deliver results independently in a self-organizing Agile environment while constantly embracing and adapting new scientific advances About the team The AGI team has a mission to push the envelope in GenAI with Large Language Models (LLMs) and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, Sunnyvale
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Principal Applied Scientist with a strong deep learning background, to lead the development of industry-leading technology with multimodal systems. As a Principal Applied Scientist, you are a trusted part of the technical leadership. You bring business and industry context to science and technology decisions. You set the standard for scientific excellence and make decisions that affect the way we build and integrate algorithms. You solicit differing views across the organization and are willing to change your mind as you learn more. Your artifacts are exemplary and often used as reference across organization. You are a hands-on scientific leader. Your solutions are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility. You tackle intrinsically hard problems, acquiring expertise as needed. You decompose complex problems into straightforward solutions. You amplify your impact by leading scientific reviews within your organization or at your location. You scrutinize and review experimental design, modeling, verification and other research procedures. You probe assumptions, illuminate pitfalls, and foster shared understanding. You align teams toward coherent strategies. You educate, keeping the scientific community up to date on advanced techniques, state of the art approaches, the latest technologies, and trends. You help managers guide the career growth of other scientists by mentoring and play a significant role in hiring and developing scientists and leads. Key job responsibilities You will be responsible for defining key research directions, adopting or inventing new machine learning techniques, conducting rigorous experiments, publishing results, and ensuring that research is translated into practice. You will develop long-term strategies, persuade teams to adopt those strategies, propose goals and deliver on them. You will also participate in organizational planning, hiring, mentorship and leadership development. You will be technically strong and with a passion for building scalable science and engineering solutions. You will serve as a key scientific resource in full-cycle development (conception, design, implementation, testing to documentation, delivery, and maintenance).
US, CA, Santa Barbara
The Mission of Amazon's Artificial General Intelligence (AGI) team is to "Build world-class general-purpose intelligence services that benefits every Amazon business and humanity." Are you a data enthusiast? Are you a creative big thinker who is passionate about using data to direct decision making and solve complex and large-scale challenges? If so, then this position is for you! We are looking for a motivated individual with strong analytical and communication skills to join us. In this role, you will apply advanced analytics techniques, AI/ML, and statistical concepts to derive insights from massive datasets. The ideal candidate should have expertise in AI/ML, statistical analysis, and the ability to write code for building models and pipelines to automate data and analytics processing. They will help us design experiments, build models, and develop appropriate metrics to deeply understand the strengths and weaknesses of our systems. They will build dashboards to automate data collection and reporting of relevant data streams, providing leadership and stakeholders with transparency into our system's performance. They will turn their findings into actions by writing detailed reports and providing recommendations on where we should focus our efforts to have the largest customer impact. A successful candidate should be a self-starter, comfortable with ambiguity with strong attention to detail, and have the ability to work in a fast-paced and ever-changing environment. They will also help coach/mentor junior scientists in the team. The ideal candidate should possess excellent verbal and written communication skills, capable of effectively communicating results and insights to both technical and non-technical audiences
US, NY, New York
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the extreme. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best. Key job responsibilities - Lead and execute complex, ambiguous research projects from ideation to production deployment - Drive technical strategy and roadmap decisions for ML/AI initiatives - Collaborate cross-functionally with product, engineering, and business teams to translate research into scalable products - Publish research findings at top-tier conferences and contribute to the broader scientific community - Establish best practices for ML experimentation, evaluation, and deployment
US, NY, New York
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the extreme. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best. Key job responsibilities - Lead and execute complex, ambiguous research projects from ideation to production deployment - Drive technical strategy and roadmap decisions for ML/AI initiatives - Collaborate cross-functionally with product, engineering, and business teams to translate research into scalable products - Publish research findings at top-tier conferences and contribute to the broader scientific community - Establish best practices for ML experimentation, evaluation, and deployment