Simplifying BERT-based models to increase efficiency, capacity

New method would enable BERT-based natural-language-processing models to handle longer text strings, run in resource-constrained settings — or sometimes both.

In recent years, many of the best-performing models in the field of natural-language processing (NLP) have been built on top of BERT language models. Pretrained on large corpora of (unlabeled) public texts, BERT models encode the probabilities of sequences of words. Because a BERT model begins with extensive knowledge of a language as a whole, it can be fine-tuned on a more targeted task — like question answering or machine translation — with relatively little labeled data.

BERT models, however, are very large, and BERT-based NLP models can be slow — even prohibitively slow, for users with limited computational resources. Their complexity also limits the length of the inputs they can take, as their memory footprint scales with the square of the input length.

Pyramid-BERT architecture.png
A simplified illustration of the Pyramid-BERT architecture.

At this year’s meeting of the Association for Computational Linguistics (ACL), my colleagues and I presented a new method, called Pyramid-BERT, that reduces the training time, inference time, and memory footprint of BERT-based models, without sacrificing much accuracy. The reduced memory footprint also enables BERT models to operate on longer text sequences.

BERT-based models take sequences of sentences as inputs and output vector representations — embeddings — of both each sentence as a whole and its constituent words individually. Downstream applications such as text classification and ranking, however, use only the complete-sentence embeddings. To make BERT-based models more efficient, we progressively eliminate redundant individual-word embeddings in intermediate layers of the network, while trying to minimize the effect on the complete-sentence embeddings.

We compare Pyramid-BERT to several state-of-the-art techniques for making BERT models more efficient and show that we can speed inference up 3- to 3.5-fold while suffering an accuracy drop of only 1.5%, whereas, at the same speeds, the best existing method loses 2.5% of its accuracy.

Related content
Combination of distillation and distillation-aware quantization compresses BART model to 1/16th its size.

Moreover, when we apply our method to Performers — variations on BERT models that are specifically designed for long texts — we can reduce the models’ memory footprint by 70%, while actually increasing accuracy. At that compression rate, the best existing approach suffers an accuracy dropoff of 4%.

A token’s progress

Each sentence input to a BERT model is broken into units called tokens. Most tokens are words, but some are multiword phrases, some are subword parts, some are individual letters of acronyms, and so on. The start of each sentence is demarcated by a special token called — for reasons that will soon be clear — CLS, for classification.

Each token passes through a series of encoders — usually somewhere between four and 12 — each of which produces a new embedding for each input token. Each encoder has an attention mechanism, which decides how much each token’s embedding should reflect information carried by other tokens.

For instance, given the sentence “Bob told his brother that he was starting to get on his nerves,” the attention mechanism should pay more attention to the word “Bob” when encoding the word “his” but “brother” when encoding the word “he”. It’s because the attention mechanism must compare every word in an input sequence to every other that a BERT model’s memory footprint scales with the square of the input.

Related content
Determining the optimal architectural parameters reduces network size by 84% while improving performance on natural-language-understanding tasks.

As tokens pass through the series of encoders, their embeddings factor in more and more information about other tokens in the sequence, since they’re attending to other tokens that are also factoring in more and more information. By the time the tokens pass through the final encoder, the embedding of the CLS token ends up representing the sentence as a whole (hence the CLS token’s name). But its embedding is also very similar to those of all the other tokens in the sentence. That’s the redundancy we’re trying to remove.

The basic idea is that, in each of the network’s encoders, we preserve the embedding of the CLS token but select a representative subset — a core set — of the other tokens’ embeddings.

Embeddings are vectors, so they can be interpreted as points in a multidimensional space. To construct core sets we would, ideally, sort embeddings into clusters of equal diameter and select the center point — the centroid — of each cluster.

Centroid core set.png
Ideally, for each encoder in the network, we would construct a representative subset of token embeddings (green dots) by selecting the centroids (red dots) of token clusters (circles). The centroids would then pass to the next layer of the network.

Unfortunately, the problem of constructing a core set that spans a layer of a neural network is NP-hard, meaning that it’s impractically time consuming.

As an alternative, our paper proposes a greedy algorithm that selects n members of the core set at a time. At each layer, we take the embedding of the CLS token, and then we find the n embeddings farthest from it in the representational space. We add those, along with the CLS embedding, to our core set. Then we find the n embeddings whose minimum distance from any of the points already in our core set is greatest, and we add those to the core set.

Related content
"Perfect hashing" is among the techniques that reduce the memory footprints of machine learning models by 94%.

We repeat this process until our core set reaches the desired size. This is provably an adequate approximation of the optimal core set.

Finally, in our paper, we consider the question of how large the core set of each layer should be. We use an exponential-delay function to determine the degree of attenuation from one layer to the next, and we investigate the trade-offs between accuracy and speedups or memory reduction that result from selecting different rates of decay.

Acknowledgements: Ashish Khetan, Rene Bidart, Zohar Karnin

Related content

US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the extreme. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best. Key job responsibilities • Develop automated laboratory workflows. • Perform data QC, document results, and communicate to stakeholders. • Maintain updated understanding and knowledge of methods. • Identify and escalate equipment malfunctions; troubleshoot common errors. • Participate in the updating of protocols and database to accurately reflect the current practices. • Maintain equipment and instruments in good operating condition • Adapt to unexpected schedule changes and respond to emergency situations, as needed. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
US, WA, Seattle
Are you excited about developing generative AI and foundation models to revolutionize automation, robotics and computer vision? Are you looking for opportunities to build and deploy them on real problems at truly vast scale? At Amazon Fulfillment Technologies and Robotics we are on a mission to build high-performance autonomous systems that perceive and act to further improve our world-class customer experience - at Amazon scale. We are looking for scientists, engineers and program managers for a variety of roles. The Amazon Robotics software team is seeking a Applied Scientist to focus on large vision and manipulation machine learning models. This includes building multi-viewpoint and time-series computer vision systems. It includes using machine learning to drive hardware movement. It includes building large-scale models using data from many different tasks and scenes. This work spans from basic research such as cross domain training, to experimenting on prototype in the lab, to running wide-scale A/B tests on robots in our facilities. Key job responsibilities * Research vision - Where should we be focusing our efforts * Research delivery – Proving/dis-proving strategies in offline data or in the lab * Production studies - Insights from production data or ad-hoc experimentation. About the team This team invents and runs robots focused on grasping and packing items. These are typically 6-dof style robotic arms. Our work ranges from the long-term-research on basic science to deploying/supporting large production fleets handling billions of items per year. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
US, VA, Arlington
Amazon launched the Generative AI (GenAI) Innovation Center (GAIIC) in Jun 2023 to help AWS customers accelerate enterprise innovation and success with Generative AI (https://press.aboutamazon.com/2023/6/aws-announces-generative-ai-innovation-center). Customers such as Highspot, Lonely Planet, Ryanair, and Twilio are engaging with the GAI Innovation Center to explore developing generative solutions. GAIIC provides opportunities to innovate in a fast-paced organization that contributes to game-changing projects and technologies that get deployed on devices and in the cloud. As a data scientist at GAIIC, you are proficient in designing and developing advanced Generative AI based solutions to solve diverse customer problems. You will be working with terabytes of text, images, and other types of data to solve real-world problems through Gen AI. You will be working closely with account teams and ML strategists to define the use case, and with other scientists and ML engineers on the team to design experiments, and find new ways to deliver value to the customer. The successful candidate will possess both technical and customer-facing skills that will allow you to be the technical “face” of AWS within our solution providers’ ecosystem/environment as well as directly to end customers. You will be able to drive discussions with senior technical and management personnel within customers and partners. This position requires that the candidate selected be a US Citizen and currently possess and maintain an active Top Secret security clearance. About the team Work/Life Balance Our team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Denver, CO, USA
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the extreme. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
US, VA, Arlington
Amazon’s mission is to be the most customer centric company in the world. The Workforce Staffing (WFS) organization is on the front line of that mission by hiring the hourly fulfillment associates who make that mission a reality. To drive the necessary growth and continued scale of Amazon’s associate needs within a constrained employment environment, Amazon has created the Workforce Intelligence (WFI) team. This team will (re)invent how Amazon attracts, communicates with, and ultimately hires its hourly associates. This team owns multi-layered research and program implementation to drive deep learning, process improvements, and strategic recommendations to global leadership. Are you passionate about data? Do you enjoy questioning the status quo? Do complex and difficult challenges excite you? If yes, this may be the team for you. The Data Scientist will be responsible for creating cutting edge algorithms, predictive and prescriptive models as well as required data models to facilitate WFS at-scale warehouse associate hiring. This role acts as an internal consultant to the marketing, biz ops and candidate experience teams covering responsibilities such as at-scale hiring process improvement, analyzing large scale candidate/associate data and being strategic to providing best candidate hiring experience to WFS warehouse associate candidates. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA
US, CA, Sunnyvale
At Amazon Fashion, we are obsessed with making Amazon Fashion the most loved fashion destinations globally. We're searching for Computer Vision pioneers who are passionate about technology, innovation, and customer experience, and who are enthusiastic about making a lasting impact on the industry. You'll be working with talented scientists, engineers, and product managers to innovate on behalf of our customers. If you're fired up about being part of a dynamic, driven team, then this is your moment to join us on this exciting journey and change the world of eCommerce forever Key job responsibilities As a Applied Scientist, you will be at the forefront to define, own and drive the science that span multiple machine learning models and enabling multiple product/engineering teams and organizations. You will partner with product management and technical leadership to identify opportunities to innovate customer facing experiences. You will identify new areas of investment and work to align product roadmaps to deliver on these opportunities. As a science leader, you will not only develop unique scientific solutions, but more importantly influence strategy and outcomes across different Amazon organizations such as Search, Personalization and more. This role is inherently cross-functional and requires a strong ability to communicate, influence and earn the trust of software engineers, technical and business leadership. We are open to hiring candidates to work out of one of the following locations: Sunnyvale, CA, USA
US, CA, Sunnyvale
Are you passionate about solving unique customer-facing problem in the Amazon scale? Are you excited by developing and productizing machine learning, deep learning algorithms and leverage tons of Amazon data to learn and infer customer shopping patterns? Do you enjoy working with a diversity of engineers, machine learning scientists, product managers and user-experience designers? If so, you have found the right match! Fashion is extremely fast-moving, visual, subjective, and it presents numerous unique problem domains such as product recommendations, product discovery and evaluation. The vision for Amazon Fashion is to make Amazon the number one online shopping destination for Fashion customers by providing large selections, inspiring and accurate recommendations and customer experience. The mission of Fit science team as part of Fashion Tech is to innovate and develop scalable ML solutions to provide personalized fit and size recommendation when Amazon Fashion customers evaluate apparels or shoes online. The team is hiring Applied Scientist who has a solid background in applied Machine Learning and a proven record of solving customer-facing problems via scalable ML solutions, and is motivated to grow professionally as an ML scientist. Key job responsibilities - Tackle ambiguous problems in Machine Learning and drive full life-cycle Machine Learning projects. - Build machine learning models, perform proof-of-concept, experiment, optimize, and deploy your models into production. - Run A/B experiments, gather data, and perform statistical tests. - Establish scalable, efficient, automated processes for large-scale data mining, machine-learning model development, model validation and serving. - Work closely with software engineers and product managers to assist in productizing your ML models. We are open to hiring candidates to work out of one of the following locations: Sunnyvale, CA, USA
US, WA, Bellevue
Have you ever wondered how Amazon predicts when your order will arrive and how we ensure that it actually arrives on at the promised date/time? Have you wondered where all those Amazon semi-trucks on the road are headed? Are you passionate about increasing efficiency and reducing carbon footprint? Does the idea of having worldwide impact on Amazon's logistics network including our planes, trucks, and vans sound exciting to you? If so, then we want to talk with you! The Network Planning and Fulfillment Execution team owns and operates OR/ML and simulation systems that continually optimize the distribution of tens of millions of products across Amazon’s warehouses in the most cost-effective manner, utilizing large scale optimization techniques and distributed computing in trying to reduce overall transportation costs while improving the customer experience. We are focused on saving hundreds of millions of dollars using big data technologies, cutting edge science, machine learning, and scalable distributed software on the cloud that automates and optimizes inventory and shipments to customers under the uncertainty of demand, pricing and supply. We’re looking for a passionate, results-oriented, and inventive Research Scientist who can create and improve OR/ML models for our outbound transportation planning systems. In addition, you will be working on design, development and evaluation of highly innovative OR and ML models for solving complex business problems in the area of outbound transportation planning systems. More specifically, you will be developing a Mathematical Optimization model towards short term Origin-Destination flows that are inventory aware and adhere to facility capacities given destination demand. This will also require you to build machine learning models to predict inventory N weeks out (N<13 Weeks) and ML models to calibrate inventory bounds and math model errors. You will work closely with our product managers and software engineers to disambiguate complex supply chain problems and create ML solutions to solve those problems at scale. You will directly impact our direct customers, and even play with big data and incredible scale in the background. Watch http://bit.ly/amazon-scot to get the big picture. Key job responsibilities As part of your daily work you will: * Design, development and evaluation of highly innovative OR/ML models for solving complex business problems. * Analyze and extract relevant information from large amounts of data to help automate and optimize key processes. * Research and apply the latest ML techniques and best practices from both academia and industry. * Think about customers and how to improve the customer delivery experience. * Use and analytical techniques to create scalable solutions for business problems. * Work closely with data & software engineering teams to build model implementations and integrate successful models and algorithms in production systems at very large scale. * Technically lead and mentor other scientists in team. * Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation. A day in the life This is a great role for someone who likes to learn new things. You will have the opportunity to learn all about how Amazon plans for and executes within it's logistics network including Fulfillment Centers, Sort Centers, Delivery Stations, and more. In this role, you will be a design and develop Optimization and Machine Learning models with significant scope, impact, and high visibility. Your solutions will impact business segments worth many-billions-of-dollars and geographies spanning multiple countries and markets. From day one, you will be working with bar raising scientists, engineers, and designers. You will also collaborate with the broader science community in Amazon to broaden the horizon of your work. Successful candidates must thrive in fast-paced environments, which encourage collaborative and creative problem solving, be able to measure and estimate risks, constructively critique peer research, and align research focuses with the Amazon's strategic needs. We look for individuals who know how to deliver results and show a desire to develop themselves, their colleagues, and their career. About the team Network Planning and Fulfillment Execution Science team contains a group of scientists with different technical backgrounds including Machine Learning and Operations Research, who will collaborate closely with you on your projects. Our team directly supports multiple functional areas across Fulfillment Optimization and the research needs of the corresponding product and engineering teams. We tackle some of the most mathematically complex challenges in facility and transportation planning to improve Amazon's operational efficiency worldwide and at a scale that is unique to Amazon. We often seek the opportunity of applying hybrid techniques in the space of Operations Research and Machine Learning to tackle some of our biggest technical challenges. We disambiguate complex supply chain problems and create ML and optimization solutions to solve those problems at scale. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
US, CA, Sunnyvale
Are you passionate about solving unique customer-facing problems in the Amazon scale? Are you excited about utilizing statistical analysis, machine learning, data mining and leverage tons of Amazon data to learn and infer customer shopping patterns? Do you enjoy working with a diversity of engineers, machine learning scientists, product managers and user-experience designers? If so, you have found the right match! Fashion is extremely fast-moving, visual, subjective, and it presents numerous unique problem domains such as product recommendations, product discovery and evaluation. The vision for Amazon Fashion is to make Amazon the number one online shopping destination for Fashion customers by providing large selections, inspiring and accurate recommendations and customer experience. The mission of Fit science team as part of Fashion Tech is to innovate and develop scalable ML solutions to provide personalized fit and size recommendation when Amazon Fashion customers evaluate apparels or shoes online. The team is hiring a Data Scientist who has a solid background in Statistical Analysis, Machine Learning and Data Mining and a proven record of effectively analyzing large complex heterogeneous datasets, and is motivated to grow professionally as a Data Scientist. Key job responsibilities - You will work on our Science team and partner closely with applied scientists, data engineers as well as product managers, UX designers, and business partners to answer complex problems via data analysis. Outputs from your analysis will directly help improve the performance of the ML based recommendation systems thereby enhancing the customer experience as well as inform the roadmap for science and the product. - You can effectively analyze complex and disparate datasets collected from diverse sources to derive key insights. - You have excellent communication skills to be able to work with cross-functional team members to understand key questions and earn the trust of senior leaders. - You are able to multi-task between different tasks such as gap analysis of algorithm results, integrating multiple disparate datasets, doing business intelligence, analyzing engagement metrics or presenting to stakeholders. - You thrive in an agile and fast-paced environment on highly visible projects and initiatives. We are open to hiring candidates to work out of one of the following locations: Sunnyvale, CA, USA
US, WA, Seattle
Amazon is continuing to invest in its Advertising business to tap into the growing online advertising market. The Publisher Technologies team builds and operates extensible services that empower 1P Publishers to improve the monetization of their customer experiences, along with the experiences themselves. We bias toward standards-based and flexible designs that allow Publishers the ability to invent on top of our solutions and to interoperate well with other advertising technology providers; both internal and external. The Publisher Technology Data, Insights, and Analytics team enables faster data-driven decision making for Publishers and Monetization teams by providing them with near real time data, data management tools, actionable insights, and an easy-to-use reporting experience. Our data products provide Publishers and Monetization teams with the capabilities necessary to better understand the performance of their Advertising products along with supporting machine learning at scale. In this role, you will join a team whose data products and services empower hundreds of teams across Amazon with near real time data to support big data analytics, insights, and machine learning at scale. You will collaborate with cross-functional teams to design, develop, and implement advanced data tools, predictive models, and machine learning algorithms to support Advertising strategies and optimize revenue streams. You will analyze large-scale data to identify patterns and trends, and design and run A/B experiments to improve Publisher and advertiser experiences. Key job responsibilities - Design and lead large projects and experiments from beginning to end, and drive solutions to complex or ambiguous problems - Create tools and solve challenges using statistical modeling, machine learning, optimization, and/or other approaches for quantifiable impact on the business - Use broad expertise to recommend the right strategies, methodologies, and best practices, teaching and mentoring others - Key influencer of your team’s business strategy and of related teams’ strategies - Communication and documentation of methodologies, insights, and recommendations for senior leaders with various levels of technical knowledge We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA