Taming Transformers for text classification with millions of classes

New approach scales manageably while achieving state-of-the-art results.

Text classification is the most basic task in the field of natural-language understanding. Customer requests to Amazon Alexa, for example, are classified by domain — weather, music, smart home, information, and so on — and many natural-language-processing applications rely on parsers that classify words according to parts of speech.

For tasks in which the text classes are relatively few, the best-performing text classification systems use pretrained Transformer models such as BERT, XLNet, and RoBERTa.

But Transformer-based models scale quadratically with the input sequence length and linearly with the number of classes. For tasks with large numbers of classes — hundreds of thousands or even millions — they become impractically large.

In a paper we’re presenting this year at the Association for Computing Machinery’s annual conference on Knowledge Discovery and Data Mining (KDD), my colleagues and I describe a new method for applying Transformer-based models to the problem of text classification with large numbers of classes, or “extreme classification”.

Our model scales reasonably with the number of classes, and in experiments, we show that on the task of selecting relevant classes for a given input, it outperforms the state-of-the-art system on four different data sets.

Exploding classifiers

Typically, for natural-language-processing tasks, Transformer-based models are pretrained on large, general text corpora, learning embeddings for the words of the language, or vector representations such that associated words cluster together in the vector space. Those embeddings then serve as inputs to a new classifier, which is trained on the particular task.

Regardless of the task, the size of the Transformer model itself is fixed, usually at somewhere around 350 million parameters, where a parameter is the weight of a single connection (edge) in a neural network.

In our paper, we consider one component of the General Language Understanding Evaluation (GLUE) benchmark, the Multi-Genre Natural-Language Inference (MLNI) corpus, which contains sentence pairs that have three possible logical relationships: entailment, contradiction, or neutrality. A classifier trained to recognize these three types of relationships adds another 2,000 parameters to the model, a negligible difference.

We also consider an in-house system that suggests possible keywords for new items being added to the Amazon Store, based on their product titles (for instance, for a black digital kitchen timer, it suggests “black timer”, “kitchen timer”, “black digital timer”, and so on).

That system has about a million product categories. Training a classifier to sort product names into those categories adds more than a billion parameters to the model, almost quadrupling its size and making it much less efficient to train and operate.

We address this problem by training the Transformer-based model to assign each input to a cluster of classes instead of a single class. Then we use a simple linear classifier to select one class from the cluster. This drastically reduces the size of the Transformer-based model while preserving classification accuracy.

Extreme classifier-cropped.png
To preserve Transformers' advantages while scaling reasonably, our method uses a Transformer model to assign each input to a cluster of labels. A simpler, linear classifier then selects a single label from the cluster.
Credit: Stacy Reilly

We experimented with two different methods for clustering classes. One used embeddings produced by a pretrained XLNet model, clustering class names that were near each other in the vector space. To embed a multiword class name, we averaged the embeddings of its component words.

Another method embedded sample inputs from each class, not just the names of the classes. Again, we averaged the embeddings of individual words to produce a single embedding for each input text; then we averaged the embeddings of input texts to produce an embedding for a particular task.

In our experiments, combining both of these approaches to cluster classes worked better than using either in isolation. But our model is agnostic as to which class-clustering technique we use.

To select one class from a cluster, we use a one-versus-all classifier, which, for each class in a cluster, learns to partition members of that class from non-members in the embedding space. The partition for each class is fairly inexact, but the intersection of multiple partitions can accurately identify a single class.

Negative examples

Learning partitions requires negative training examples as well as positive. We use two different methods to construct negative examples.

First, for each class in a cluster, we draw negative examples from the other classes in the same cluster. Because the classes in a given cluster are semantically related, this ensures that the negative examples will be challenging and therefore more informative to the classifier than easy examples.

We also use the Transformer-based clustering model to identify challenging negative examples. For each input, the Transformer-based model produces a list of possible cluster assignments, ranked by probability. For positive training examples in a particular class, we identify the incorrect clusters that the model consistently predicts with high probability. We then use those clusters as the basis for additional negative examples, weighted according to probability scores.

In experiments, we compared our system to nine benchmark systems on four different data sets. On the task of identifying the single best classification label for a given input, our system was the most accurate across the board.

The margin of improvement over the second-place finisher, a recent system called AttentionXML, was narrow — around 1% — and on one data set, AttentionXML was slightly more accurate on the tasks of identifying the top three labels and the top five labels. But some of the techniques that AttentionXML uses are complementary to our system’s, and it would be interesting to see whether combining the two approaches could improve performance still further.

Related content

US, CA, Santa Clara
Job summaryAmazon is looking for a passionate, talented, and inventive Applied Scientist with a strong machine learning background to help build industry-leading language technology.Our mission is to provide a delightful experience to Amazon’s customers by pushing the envelope in Natural Language Processing (NLP), Natural Language Understanding (NLU), Dialog management, conversational AI and Machine Learning (ML).As part of our AI team in Amazon AWS, you will work alongside internationally recognized experts to develop novel algorithms and modeling techniques to advance the state-of-the-art in human language technology. Your work will directly impact millions of our customers in the form of products and services, as well as contributing to the wider research community. You will gain hands on experience with Amazon’s heterogeneous text and structured data sources, and large-scale computing resources to accelerate advances in language understanding.We are hiring primarily in Conversational AI / Dialog System Development areas: NLP, NLU, Dialog Management, NLG.This role can be based in NYC, Seattle or Palo Alto.Inclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.
US, WA, Seattle
Job summaryWorkforce Staffing (WFS) brings together the workforce powering Amazon’s ability to delight customers: the Amazon Associate. With over 1M hires, WFS supports sourcing, hiring, and developing the best talent to work in our fulfillment centers, sortation centers, delivery stations, shopping sites, Prime Air locations, and more.WFS' Funnel Science and Analytics team is looking for a Research Scientist. This individual will be responsible for conducting experiments and evaluating the impact of interventions when conducting experiments is not feasible. The perfect candidate will have the applied experience and the theoretical knowledge of policy evaluation and conducting field studies.Key job responsibilitiesAs a Research Scientist (RS), you will do causal inference, design studies and experiments, leverage data science workflows, build predictive models, conduct simulations, create visualizations, and influence science and analytics practice across the organization.Provide insights by analyzing historical data from databases (Redshift, SQL Server, Oracle DW, and Salesforce).Identify useful research avenues for increasing candidate conversion, test, and create well written documents to communicate to technical and non-technical audiences.About the teamFunnel Science and Analytics team finds ways to maximize the conversion and early retention of every candidate who wants to be an Amazon Associate. By focusing on our candidates, we improve candidate and business outcomes, and Amazon takes a step closer to being Earth’s Best Employer.
US, NY, New York
Job summaryAmazon is looking for a passionate, talented, and inventive Applied Scientist with a strong machine learning background to help build industry-leading language technology.Our mission is to provide a delightful experience to Amazon’s customers by pushing the envelope in Natural Language Processing (NLP), Natural Language Understanding (NLU), Dialog management, conversational AI and Machine Learning (ML).As part of our AI team in Amazon AWS, you will work alongside internationally recognized experts to develop novel algorithms and modeling techniques to advance the state-of-the-art in human language technology. Your work will directly impact millions of our customers in the form of products and services, as well as contributing to the wider research community. You will gain hands on experience with Amazon’s heterogeneous text and structured data sources, and large-scale computing resources to accelerate advances in language understanding.We are hiring primarily in Conversational AI / Dialog System Development areas: NLP, NLU, Dialog Management, NLG.This role can be based in NYC, Seattle or Palo Alto.Inclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.
US, CA, Santa Clara
Job summaryAWS AI/ML is looking for world class scientists and engineers to join its AI Research and Education group working on building automated ML solutions for planetary-scale sustainability and geospatial applications. Our team's mission is to develop ready-to-use and automated solutions that solve important sustainability and geospatial problems. We live in a time wherein geospatial data, such as climate, agricultural crop yield, weather, landcover, etc., has become ubiquitous. Cloud computing has made it easy to gather and process the data that describes the earth system and are generated by satellites, mobile devices, and IoT devices. Our vision is to bring the best ML/AI algorithms to solve practical environmental and sustainability-related R&D problems at scale. Building these solutions require a solid foundation in machine learning infrastructure and deep learning technologies. The team specializes in developing popular open source software libraries like AutoGluon, GluonCV, GluonNLP, DGL, Apache/MXNet (incubating). Our strategy is to bring the best of ML based automation to the geospatial and sustainability area.We are seeking an experienced Applied Scientist for the team. This is a role that combines science knowledge (around machine learning, computer vision, earth science), technical strength, and product focus. It will be your job to develop ML system and solutions and work closely with the engineering team to ship them to our customers. You will interact closely with our customers and with the academic and research communities. You will be at the heart of a growing and exciting focus area for AWS and work with other acclaimed engineers and world famous scientists. You are also expected to work closely with other applied scientists and demonstrate Amazon Leadership Principles (https://www.amazon.jobs/en/principles). Strong technical skills and experience with machine learning and computer vision are required. Experience working with earth science, mapping, and geospatial data is a plus. Our customers are extremely technical and the solutions we build for them are strongly coupled to technical feasibility.About the teamInclusive Team CultureAt AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded scientist and enable them to take on more complex tasks in the future.Interested in this role? Reach out to the recruiting team with questions or apply directly via amazon.jobs.
US, CA, Santa Clara
Job summaryAWS AI/ML is looking for world class scientists and engineers to join its AI Research and Education group working on building automated ML solutions for planetary-scale sustainability and geospatial applications. Our team's mission is to develop ready-to-use and automated solutions that solve important sustainability and geospatial problems. We live in a time wherein geospatial data, such as climate, agricultural crop yield, weather, landcover, etc., has become ubiquitous. Cloud computing has made it easy to gather and process the data that describes the earth system and are generated by satellites, mobile devices, and IoT devices. Our vision is to bring the best ML/AI algorithms to solve practical environmental and sustainability-related R&D problems at scale. Building these solutions require a solid foundation in machine learning infrastructure and deep learning technologies. The team specializes in developing popular open source software libraries like AutoGluon, GluonCV, GluonNLP, DGL, Apache/MXNet (incubating). Our strategy is to bring the best of ML based automation to the geospatial and sustainability area.We are seeking an experienced Applied Scientist for the team. This is a role that combines science knowledge (around machine learning, computer vision, earth science), technical strength, and product focus. It will be your job to develop ML system and solutions and work closely with the engineering team to ship them to our customers. You will interact closely with our customers and with the academic and research communities. You will be at the heart of a growing and exciting focus area for AWS and work with other acclaimed engineers and world famous scientists. You are also expected to work closely with other applied scientists and demonstrate Amazon Leadership Principles (https://www.amazon.jobs/en/principles). Strong technical skills and experience with machine learning and computer vision are required. Experience working with earth science, mapping, and geospatial data is a plus. Our customers are extremely technical and the solutions we build for them are strongly coupled to technical feasibility.About the teamInclusive Team CultureAt AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded scientist and enable them to take on more complex tasks in the future.Interested in this role? Reach out to the recruiting team with questions or apply directly via amazon.jobs.
US, WA, Seattle
Job summaryHow can we create a rich, data-driven shopping experience on Amazon? How do we build data models that helps us innovate different ways to enhance customer experience? How do we combine the world's greatest online shopping dataset with Amazon's computing power to create models that deeply understand our customers? Recommendations at Amazon is a way to help customers discover products. Our team's stated mission is to "grow each customer’s relationship with Amazon by leveraging our deep understanding of them to provide relevant and timely product, program, and content recommendations". We strive to better understand how customers shop on Amazon (and elsewhere) and build recommendations models to streamline customers' shopping experience by showing the right products at the right time. Understanding the complexities of customers' shopping needs and helping them explore the depth and breadth of Amazon's catalog is a challenge we take on every day. Using Amazon’s large-scale computing resources you will ask research questions about customer behavior, build models to generate recommendations, and run these models directly on the retail website. You will participate in the Amazon ML community and mentor Applied Scientists and software development engineers with a strong interest in and knowledge of ML. Your work will directly benefit customers and the retail business and you will measure the impact using scientific tools. We are looking for passionate, hard-working, and talented Applied scientist who have experience building mission critical, high volume applications that customers love. You will have an enormous opportunity to make a large impact on the design, architecture, and implementation of cutting edge products used every day, by people you know.Key job responsibilitiesScaling state of the art techniques to Amazon-scaleWorking independently and collaborating with SDEs to deploy models to productionDeveloping long-term roadmaps for the team's scientific agendaDesigning experiments to measure business impact of the team's effortsMentoring scientists in the departmentContributing back to the machine learning science community
US, NY, New York City
Job summaryAmazon Web Services is looking for world class scientists to join the Security Analytics and AI Research team within AWS Security Services. This group is entrusted with researching and developing core data mining and machine learning algorithms for various AWS security services like GuardDuty (https://aws.amazon.com/guardduty/) and Macie (https://aws.amazon.com/macie/). In this group, you will invent and implement innovative solutions for never-before-solved problems. If you have passion for security and experience with large scale machine learning problems, this will be an exciting opportunity.The AWS Security Services team builds technologies that help customers strengthen their security posture and better meet security requirements in the AWS Cloud. The team interacts with security researchers to codify our own learnings and best practices and make them available for customers. We are building massively scalable and globally distributed security systems to power next generation services.Inclusive Team Culture Here at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Work/Life Balance Our team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. We care about your career growth and strive to assign projects based on what will help each team member develop and enable them to take on more complex tasks in the future.A day in the lifeAbout the hiring groupJob responsibilities* Rapidly design, prototype and test many possible hypotheses in a high-ambiguity environment, making use of both quantitative and business judgment.* Collaborate with software engineering teams to integrate successful experiments into large scale, highly complex production services.* Report results in a scientifically rigorous way.* Interact with security engineers, product managers and related domain experts to dive deep into the types of challenges that we need innovative solutions for.
CN, 31, Shanghai
Job summaryAmazon Lab126 is an inventive research and development company that designs and engineers high-profile consumer electronics. Lab126 began in 2004 as a subsidiary of Amazon.com, Inc., originally creating the best-selling Kindle family of products. Since then, we have produced groundbreaking devices like Fire tablets, Fire TV and Amazon Echo. What will you help us create?The Role:You will be working with a unique and gifted team developing exciting products for consumers. The team is a multidisciplinary group of engineers and scientists engaged in a fast paced mission to deliver new products. The team faces a challenging task of balancing cost, schedule, and performance requirements. You should be comfortable collaborating in a fast-paced and often uncertain environment, and contributing to innovative solutions, while demonstrating leadership, technical competence, and meticulousness.Your deliverables will include development of thermal solutions, concept design, feature development, product architecture and system validation through to manufacturing release. You will support creative developments through application of analysis and testing of complex electronic assemblies using advanced simulation and experimentation tools and techniques.In this role, you will:Evaluate and optimize thermal solution requirements of handheld consumer electronic productsUse simulation tools like FloTherm XT/EFD for analysis and design of productsValidate design modifications for thermal concerns using simulation and actual prototypesEstablish temperature thresholds for user comfort level and component level considering reliability requirementsHave intimate knowledge of various materials and heat spreaders solutions to resolve thermal issuesUse of programming languages like Python and Matlab for analytical/statistical analyses and automationCollaborate as part of device team to iterate and optimize design parameters of enclosures and structural parts to establish and deliver project performance objectivesDesign and execute of tests using statistical tools to validate analytical models, identify risks and assess design marginsConduct design analysis of complex mechanical systems and electronic assemblies to verify the design health, using structural analysis tools such as FEADevelop and apply design guidelines based on project learnings
US, WA, Seattle
Job summaryAt Amazon's Alexa Web Info, our vision is to delight customers by answering questions they ask Alexa on any device or any language by leveraging the power of web. Alexa is changing the world and specifically how customers engage with AI. We are a tight night growing team in Alexa AI and we are creating a compelling business. We are seeking an innovative and technically strong data scientist with a track record of surfacing actionable insights from our data. To be successful in this role, you have a strong passion for analytics and accountability, set high standards with a focus on superior business outcome. You should also have strong business acumen who feels comfortable tackling ambiguous business problems in dynamic business environment. Your decision will influence VP and Director level product and business decisions that directly impact product roadmap and customer experience.Key job responsibilitiesThe successful candidate will have a strong quantitative background and can thrive in an environment that leverages statistics, machine learning, data science, and strong business acumen. As a Senior Data Scientist, you will discover and solve real world problems by analyzing one of the world’s largest datasets, developing statistical and machine learning models to drive business decisions, leading science research and development roadmap. You will also collaborate closely with business leaders, software engineers, and scientists. You will function as the tech lead of the team, setting the best practices for delivering high quality data science projects, influencing analytics roadmap, setting best practices, and providing guidance to the junior scientists.You will work on high visibility and high business impact problems. You will spend time formulating and defining science problem based on business requirements.You will translate business problems into analytical framework and form testing hypotheses that can be answered with available data using scientific methods or identify additional data needed in the master datasets to fill any gapsYou have real-world experience solving medium to large sized statistical and machine learning projects. You will work on a diverse set of analytics problems, such as user growth, pricing, forecasting, causal inference, marketing research, experimentation, and other machine learning problems
US, MA, Boston
Job summaryJoin us in building innovative services that protect AWS from security threats! As an Amazon Security Applied Scientist, you’ll help build and manage services that detect and automate the mitigation of cybersecurity threats across Amazon’s infrastructure. You’ll work with security engineers, software development engineers, and other scientists across multiple teams to develop innovative security solutions at massive scale. Our services protect the AWS cloud for all customers and preserve our customers’ trust in us. You’ll get to use the full power and breadth of AWS technologies to build services that proactively protect every single AWS customer, both internally and externally, from security threats – not many teams can say that!Our team is dedicated to supporting new team members. The team has a mix of experience levels, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior engineers, scientists, and managers truly enjoy mentoring junior engineers, junior scientists, and engineers from non-traditional backgrounds through one-on-one mentoring and code reviews.We care about your career growth. We assign projects and tasks based on what will help team members develop into more well-rounded scientists and enable them to take on more complex tasks in the future.Our team is intentional about attracting, developing, and retaining amazing talent from diverse backgrounds. Yes, we do get to build a cool service, but we also believe a big reason for that is the inclusive and welcoming culture we cultivate every day.We’re looking for a new teammate who is enthusiastic, empathetic, curious, motivated, reliable, and able to work effectively with a diverse team of peers. We want someone who will help us amplify the positive & inclusive team culture we’ve been building.About UsHere at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust.Our team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded professional and enable them to take on more complex tasks in the future.