Scaling to trillion-parameter model training on AWS

Contiguous parameter management and prefetched activation offloading expand the MiCS tool kit.

In an Amazon Science blog post earlier this summer, we presented MiCS, a method that significantly improves the training efficiency of machine learning models with up to 175 billion parameters. But there is a continuing push to scale natural-language-processing models to a trillion parameters, to enable reliable few-shot learning for new tasks.

In this post, we introduce two new augmentations to MiCS, to allow AWS customers to train and fine-tune models at the trillion-parameter scale: (1) contiguous parameter management and (2) prefetched activation offloading.

Trillion-parameter training.png
Illustration of forward and backward passes of ZeRO-3 data parallelism on a cluster with four devices; a two-layer deep learning neural network model is used here for illustration purposes.

The figure above illustrates the process of parameter gathering during forward and backward passes for a two-layer deep-learning neural-network model. Before we start the forward step, each worker (rank) holds only a part of the model parameters. In order to compute the activations for the first layer, we use the all-gather operation to gather its parameters.

Related content
A new distributed-training library achieves near-linear efficiency in scaling from tens to hundreds of GPUs.

Once we obtain the output of the first layer, we immediately partition its parameters to release memory and proceed to the next neural-network layer. These two steps are repeated in a reverse order when we compute the gradient.

Repeated all-gather and partitioning processes result in heavy use of collective communication, which causes severe memory fragmentation and cache flush in Pytorch. To address this issue, we pre-allocate a contiguous parameter buffer to hold the complete parameter tensors after the gathering and to self-manage the tensor liveness and defragmentation without affecting the behavior of the Pytorch memory allocator. We observed that this approach greatly improved the performance of the memory-bounded tasks.

In addition, we have developed prefetched activation offloading to further save GPU memory, which we enable in conjunction with activation checkpointing. Each checkpointed activation is offloaded to CPU memory and prefetched when needed during backpropagation, using a dedicated stream opened using the CUDA parallel-computing platform. Since the data transfer is asynchronous, we observed only about a 1-2% speed loss using prefetched activation offloading.

Related content
Amazon researchers optimize the distributed-training tool to run efficiently on the Elastic Fabric Adapter network interface.

AWS recently announced a preview of Amazon EC2 P4de GPU instances powered by 400 Gbps networking and 80GB GPU memory. P4de provides twice as much GPU memory as P4d, enabling fewer nodes to hold a large model in GPU memory and thus lowering communication overhead. The new hardware enables us to even more efficiently scale to larger models with MiCS.

Our experimental results show that we achieve a best ever 176 teraflops per GPU (56.4% of the theoretical peak) for training a 210-layer 1.06-trillion-parameter model on 64 P4de instances in the public cloud. In this setting, the model has a hidden size of 20,480 and a vocabulary size of 50,264. We use a sequence length of 1,024, a batch size of eight per GPU, bfloat16 precision for forward and backward passes, and a float32 Adam optimizer.

EC2 ultraclusters.png
EC2 UltraClusters.

A preview of MiCS is now available in AWS Pytorch DLC and in SageMaker as sharded data parallelism. By leveraging the new techniques, AWS customers can break the GPU memory barrier and empower trillion-parameter model training with one-fourth as much networking bandwidth as an on-premise DGX-A100 cluster.

Acknowledgements: Yida Wang, RJ

Research areas

Related content

US, VA, Arlington
The People eXperience and Technology Central Science Team (PXTCS) uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal. We are looking for economists who are able to apply economic methods to address business problems. The ideal candidate will work with engineers and computer scientists to estimate models and algorithms on large scale data, design pilots and measure their impact, and transform successful prototypes into improved policies and programs at scale. We are looking for creative thinkers who can combine a strong technical economic toolbox with a desire to learn from other disciplines, and who know how to execute and deliver on big ideas as part of an interdisciplinary technical team. Ideal candidates will work in a team setting with individuals from diverse disciplines and backgrounds. They will work with teammates to develop scientific models and conduct the data analysis, modeling, and experimentation that is necessary for estimating and validating models. They will work closely with engineering teams to develop scalable data resources to support rapid insights, and take successful models and findings into production as new products and services. They will be customer-centric and will communicate scientific approaches and findings to business leaders, listening to and incorporate their feedback, and delivering successful scientific solutions. Key job responsibilities Use causal inference methods to evaluate the impact of policies on employee outcomes. Examine how external labor market and economic conditions impact Amazon's ability to hire and retain talent. Use scientifically rigorous methods to develop and recommend career paths for employees. A day in the life Work with teammates to apply economic methods to business problems. This might include identifying the appropriate research questions, writing code to implement a DID analysis or estimate a structural model, or writing and presenting a document with findings to business leaders. Our economists also collaborate with partner teams throughout the process, from understanding their challenges, to developing a research agenda that will address those challenges, to help them implement solutions. About the team We are a multidisciplinary team that combines the talents of science and engineering to develop innovative solutions to make Amazon Earth's Best Employer.
US, WA, Seattle
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Knowledge of econometrics, as well as basic familiarity with Python (or R, Matlab, or equivalent) is necessary, and experience with SQL would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of previous cohorts have converted to full time scientist employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com.
US, WA, Virtual Contact Center-WA
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Some knowledge of econometrics, as well as basic familiarity with Python is necessary, and experience with SQL and UNIX would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of previous cohorts have converted to full time scientist employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. About the team The Selling Partner Fees team owns the end-to-end fees experience for two million active third party sellers. We own the fee strategy, fee seller experience, fee accuracy and integrity, fee science and analytics, and we provide scalable technology to monetize all services available to third-party sellers. Within the Science team, our goal is to understand the impact of changing fees on Seller (supply) and Customers (demand) behavior (e.g. price changes, advertising strategy changes, introducing new selection etc.) as well as using this information to optimize our fee structure and maximizing our long term profitability.
US, WA, Seattle
This is a unique opportunity to build technology and science that millions of people will use every day. Are you excited about working on large scale Natural Language Processing (NLP), Machine Learning (ML), and Deep Learning (DL)? We are embarking on a multi-year journey to improve the shopping experience for customers globally. Amazon Search team creates customer-focused search solutions and technologies that makes shopping delightful and effortless for our customers. Our goal is to understand what customers are looking for in whatever language happens to be their choice at the moment and help them find what they need in Amazon's vast catalog of billions of products. As Amazon expands to new geographies, we are faced with the unique challenge of maintaining the bar on Search Quality due to the diversity in user preferences, multilingual search and data scarcity in new locales. We are looking for an applied researcher to work on improving search on Amazon using NLP, ML, and DL technology. As an Applied Scientist, you will lead our efforts in query understanding, semantic matching (e.g. is a drone the same as quadcopter?), relevance ranking (what is a "funny halloween costume"?), language identification (did the customer just switch to their mother tongue?), machine translation (猫の餌を注文する). This is a highly visible role with a huge impact on Amazon customers and business. As part of this role, you will develop high precision, high recall, and low latency solutions for search. Your solutions should work for all languages that Amazon supports and will be used in all Amazon locales world-wide. You will develop scalable science and engineering solutions that work successfully in production. You will work with leaders to develop a strategic vision and long term plans to improve search globally. We are growing our collaborative group of engineers and applied scientists by expanding into new areas. This is a position on Global Search Quality team in Seattle Washington. We are moving fast to change the way Amazon search works. Together with a multi-disciplinary team you will work on building solutions with NLP/ML/DL at its core. Along the way, you’ll learn a ton, have fun and make a positive impact on millions of people. Come and join us as we invent new ways to delight Amazon customers.
US, WA, Seattle
This is a unique opportunity to build technology and science that millions of people will use every day. Are you excited about working on large scale Natural Language Processing (NLP), Machine Learning (ML), and Deep Learning (DL)? We are embarking on a multi-year journey to improve the shopping experience for customers globally. Amazon Search team creates customer-focused search solutions and technologies that makes shopping delightful and effortless for our customers. Our goal is to understand what customers are looking for in whatever language happens to be their choice at the moment and help them find what they need in Amazon's vast catalog of billions of products. As Amazon expands to new geographies, we are faced with the unique challenge of maintaining the bar on Search Quality due to the diversity in user preferences, multilingual search and data scarcity in new locales. We are looking for an applied researcher to work on improving search on Amazon using NLP, ML, and DL technology. As an Applied Scientist, you will lead our efforts in query understanding, semantic matching (e.g. is a drone the same as quadcopter?), relevance ranking (what is a "funny halloween costume"?), language identification (did the customer just switch to their mother tongue?), machine translation (猫の餌を注文する). This is a highly visible role with a huge impact on Amazon customers and business. As part of this role, you will develop high precision, high recall, and low latency solutions for search. Your solutions should work for all languages that Amazon supports and will be used in all Amazon locales world-wide. You will develop scalable science and engineering solutions that work successfully in production. You will work with leaders to develop a strategic vision and long term plans to improve search globally. We are growing our collaborative group of engineers and applied scientists by expanding into new areas. This is a position on Global Search Quality team in Seattle Washington. We are moving fast to change the way Amazon search works. Together with a multi-disciplinary team you will work on building solutions with NLP/ML/DL at its core. Along the way, you’ll learn a ton, have fun and make a positive impact on millions of people. Come and join us as we invent new ways to delight Amazon customers.
US, WA, Seattle
The retail pricing science and research group is a team of scientists and economists who design and implement the analytics powering pricing for Amazon’s on-line retail business. The team uses world-class analytics to make sure that the prices for all of Amazon’s goods and services are aligned with Amazon’s corporate goals. We are seeking an experienced high-energy Economist to help envision, design and build the next generation of retail pricing capabilities. You will work at the intersection of economic theory, statistical inference, and machine learning to design new methods and pricing strategies to deliver game changing value to our customers. Roughly 85% of previous intern cohorts have converted to full time scientist employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. Key job responsibilities Amazon’s Pricing Science and Research team is seeking an Economist to help envision, design and build the next generation of pricing capabilities behind Amazon’s on-line retail business. As an economist on our team, you will work at the intersection of economic theory, statistical inference, and machine learning to design new methods and pricing strategies with the potential to deliver game changing value to our customers. This is an opportunity for a high-energy individual to work with our unprecedented retail data to bring cutting edge research into real world applications, and communicate the insights we produce to our leadership. This position is perfect for someone who has a deep and broad analytic background and is passionate about using mathematical modeling and statistical analysis to make a real difference. You should be familiar with modern tools for data science and business analysis. We are particularly interested in candidates with research background in applied microeconomics, econometrics, statistical inference and/or finance. A day in the life Discussions with business partners, as well as product managers and tech leaders to understand the business problem. Brainstorming with other scientists and economists to design the right model for the problem in hand. Present the results and new ideas for existing or forward looking problems to leadership. Deep dive into the data. Modeling and creating working prototypes. Analyze the results and review with partners. Partnering with other scientists for research problems. About the team The retail pricing science and research group is a team of scientists and economists who design and implement the analytics powering pricing for Amazon’s on-line retail business. The team uses world-class analytics to make sure that the prices for all of Amazon’s goods and services are aligned with Amazon’s corporate goals.
US, CA, San Francisco
The retail pricing science and research group is a team of scientists and economists who design and implement the analytics powering pricing for Amazon's on-line retail business. The team uses world-class analytics to make sure that the prices for all of Amazon's goods and services are aligned with Amazon's corporate goals. We are seeking an experienced high-energy Economist to help envision, design and build the next generation of retail pricing capabilities. You will work at the intersection of statistical inference, experimentation design, economic theory and machine learning to design new methods and pricing strategies for assessing pricing innovations. Roughly 85% of previous intern cohorts have converted to full time scientist employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. Key job responsibilities Amazon's Pricing Science and Research team is seeking an Economist to help envision, design and build the next generation of pricing capabilities behind Amazon's on-line retail business. As an economist on our team, you will will have the opportunity to work with our unprecedented retail data to bring cutting edge research into real world applications, and communicate the insights we produce to our leadership. This position is perfect for someone who has a deep and broad analytic background and is passionate about using mathematical modeling and statistical analysis to make a real difference. You should be familiar with modern tools for data science and business analysis. We are particularly interested in candidates with research background in experimentation design, applied microeconomics, econometrics, statistical inference and/or finance. A day in the life Discussions with business partners, as well as product managers and tech leaders to understand the business problem. Brainstorming with other scientists and economists to design the right model for the problem in hand. Present the results and new ideas for existing or forward looking problems to leadership. Deep dive into the data. Modeling and creating working prototypes. Analyze the results and review with partners. Partnering with other scientists for research problems. About the team The retail pricing science and research group is a team of scientists and economists who design and implement the analytics powering pricing for Amazon's on-line retail business. The team uses world-class analytics to make sure that the prices for all of Amazon's goods and services are aligned with Amazon's corporate goals.
US, WA, Bellevue
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Some knowledge of econometrics, as well as basic familiarity with Python is necessary, and experience with SQL and UNIX would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of interns from previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com.
US
The Amazon Supply Chain Optimization Technology (SCOT) organization is looking for an Intern in Economics to work on exciting and challenging problems related to Amazon's worldwide inventory planning. SCOT provides unique opportunities to both create and see the direct impact of your work on billions of dollars’ worth of inventory, in one of the world’s most advanced supply chains, and at massive scale. We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. We are looking for a PhD candidate with exposure to Program Evaluation/Causal Inference. Knowledge of econometrics and Stata/R/or Python is necessary, and experience with SQL, Hadoop, and Spark would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of previous cohorts have converted to full time scientist employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com.
US, WA, Seattle
The Selling Partner Fees team owns the end-to-end fees experience for two million active third party sellers. We own the fee strategy, fee seller experience, fee accuracy and integrity, fee science and analytics, and we provide scalable technology to monetize all services available to third-party sellers. We are looking for an Intern Economist with excellent coding skills to design and develop rigorous models to assess the causal impact of fees on third party sellers’ behavior and business performance. As a Science Intern, you will have access to large datasets with billions of transactions and will translate ambiguous fee related business problems into rigorous scientific models. You will work on real world problems which will help to inform strategic direction and have the opportunity to make an impact for both Amazon and our Selling Partners.