Using supervised learning to train models for image clustering

Approach that uses a hierarchical graph neural network improves F-score by 49% relative to predecessors.

Most machine learning models use supervised learning, meaning they’re trained on annotated data, which is costly and time consuming to acquire.

The chief method for doing unsupervised learning, which doesn’t require annotated data, is clustering, or grouping data points together by salient characteristics. The idea is that each cluster represents some category, such as photos of the same person or the same species of animal.

To decide where to draw boundaries between clusters, clustering algorithms typically rely on heuristics, such as a threshold distance between cluster centers or the shape of the clusters’ distributions. In a paper we’re presenting at the International Conference on Computer Vision (ICCV), we propose, instead, to learn from data how to draw boundaries.

We first represent visual data using a graph, then use a graph neural network (GNN) to produce vector representations of the graph’s nodes. So far, we follow on previous work.

Instead of relying on heuristics, however, we use labeled data to learn how to cluster the vectors and, crucially, to decide how fine-grained those clusters should be. We call the labeled data meta-training data, since the goal is to learn a general clustering technique, not a specific classification model. 

In particular, we propose a hierarchical GNN, meaning that it creates clusters by adding edges between nodes of a graph, then adds edges between the clusters to create still larger clusters, and so on, iterating until it decides that no more edges should be added.

Hierarchical clustering.png
A schematic of our graph-based hierarchical clustering approach. The colors of the image borders and of the graph nodes indicate data types (in this case, photos of the same actor). Our approach is hierarchical, iteratively treating small clusters generated at one level as the units of clustering for the next level. We call our base model LANDER, for link approximation and density estimation refinement, and our hierarchical clustering method Hi-LANDER.

Finally, we apply our hierarchical clustering technique to test sets whose classification categories are disjoint with those of the meta-training data. In our experiments we found that, compared to previous GNN-based supervised and unsupervised approaches, ours increased the F-score — which factors in both false positives and false negatives — by an average of 49% and 47%, respectively.

Constructing the graph

In our paper, we investigate the case in which we are training a model to cluster visual data that is similar to the meta-training data but has no class overlaps with it. For instance, the meta-training data might be faces of movie stars, while the target application is to cluster faces of politicians, athletes, or other public figures.

The first step in our process is to use the meta-training data to build a supervised classifier: if the meta-training data is faces of movie stars, the classifier labels input images with names of movie stars.

The classifier is an encoder-decoder model: the encoder produces a fixed-length vector representation of the input, or feature vector, and the decoder uses that vector to predict a label. Once we’ve trained the classifier, however, we use only the encoder for the rest of the process.

The feature vectors define points in a multidimensional space. On the basis of the vectors’ locations, we construct a graph, in which each node represents an image, and each image’s k nearest neighbors in the feature space are connected to it (share edges with it) in the graph.

This graph will serve as the input to the clustering model, which is also an encoder-decoder model. The encoder is a GNN, which produces a vector representation of each node in the graph, based on that node’s feature vector and those of the nodes it’s connected to. Call this vector the node embedding.

The clustering model

We adopt a hierarchical approach to clustering. Based on the node embeddings, the clustering model predicts edges between nodes. A cluster is defined as a group of nodes each of which shares an edge with at least one other node in the group and none of which shares an edge with any node outside the group.

Note that the goal of the clustering model is not just to reproduce the nearest-neighbor graph but to link nodes that represent data of the same type. The nearest-neighbor linkages are useful for predicting clustering linkages, but they are not identical with them.

After the first pass through the data, we aggregate each cluster into a single, representative “supernode” and repeat the whole process. That is, we create edges between each supernode and its k nearest neighbors, pass the resulting graph through the same GNN, and predict edges based on the supernode embeddings. We repeat this process until the clustering model predicts no edges between nodes.

We train our clustering model on two different objectives. One is to correctly predict links between nodes, where a correct link is one that picks out two representatives of the same data type in the meta-training data (say, two photos of the same actor).

We also train the model to correctly predict the density of a given data type in a given graph neighborhood. That is, for each node, the model should predict the proportion of nearby neighbors of the same data type.

Past research on clustering has shown that factoring in data density improves results. Previously, however, link prediction and data density prediction were handled by separate models. By using a single model to jointly predict both, we significantly increase computational efficiency. We believe that the combination also contributes to our increase in accuracy.

The other novelty of our approach is that, because of our hierarchical processing scheme, we optimize clustering across the entire input graph. Previous approaches would first divide the graph into subgraphs, then perform inference within subgraphs. This prevents natural parallelization, which is runtime efficient, and limits the effectiveness of information flow through the graph. The full graph-wide processing is another reason for our model’s improved efficiency.

In experiments, we considered two different sets of meta-training data. One consisted of closeups of human faces, the other of images of particular animal species. We tested the model trained on human faces on two other datasets, whose data categories had zero or very little overlap with those of the meta-training set — 0% and less than 2%. We tested the model trained on animal species on a dataset of previously unseen species. Across both models and the three test sets, our average improvements over previous GNN-based clustering models and unsupervised clustering methods were 49% and 47%, respectively.

In ongoing work, we are investigating the possibility training a more general clustering model, whose performance at inference time will be more transferrable across different data types — accurately clustering both faces and animal species, for instance.

Acknowledgements: Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zhang Zheng, Stefano Soatto

About the Author
Yifan Xing is an applied scientist with Amazon Web Services.
About the Author
Tong He is an applied scientist with Amazon Web Services.

Related content

US, Virtual
Job summaryHow do you manage inventory when you don’t own it? How do you design and provide right incentives for millions of sellers that inbound and ship billions of customer orders? How do you optimize Amazon’s third-party supply chain using new ideas never implemented at this scale to benefit millions of customers worldwide? If these type of questions get your mind racing, we want to hear from you.Supply Chain Optimization Technologies (SCOT) optimizes Amazon’s global supply chain end to end and build systems to deliver billions of products to our customers’ doorsteps faster every year while saving hundreds of millions of dollars using science, machine learning, and scalable distributed software on the Cloud. FBA is an Amazon service for our marketplace third party sellers, where our sellers leverage our world-class facilities and provide customers Prime delivery promise on all their goods. SCOT has launched a new team called Fulfillment by Amazon (FBA) Automation & Optimization to focus on optimizing our third-party supply chain, and is in search to hire a Principal Economist.Key job responsibilities· Design and develop rigorous models to understand and assess third party sellers’ behaviors and experience, including causal impact of various Amazon inventory policies on their short-term and long-term performance.· Design and conduct experiments to validate theories and improve understanding of Amazon’s third party ecosystem.· Collaborate with product managers, scientists, and software developers to incorporate models into production processes and influence senior leaders.· Own the scientific vision and direction related to FBA Sellers.· Own all development phases of economic modeling, including defining key research questions, recommending measures, working with multiple data sources, evaluating methodology and design, executing analysis plans, and interpreting and communicating results· Effectively communicate econometric models to business teams and incorporate feedback into project analysis/modeling.About the teamSellers are a critical part of Amazon’s ecosystem to deliver on our vision of offering the Earth’s largest selection and lowest prices. Fulfillment By Amazon (FBA) enables Sellers to provide fast and efficient deliver to their customers using Amazon fulfillment services. In 2020, Sellers enjoyed strong growth using FBA shipping more than half of all products offered on Amazon. To our consumers, FBA provides a broad and diverse inventory of products from Books, Electronics and Apparel to Consumables and beyond with many of them available with 1-Day shipping. The FBA Inventory team within the Amazon Supply Chain Optimization Technology (SCOT) organization is in charge of defining and delivering fulfillment services to our Sellers by leveraging Amazon’s expertise in machine learning, inventory optimization, big data, and distributed systems to deliver the best inventory management experiences for our FBA Sellers. We work full stack, from foundational backend systems to future-forward user interfaces. Our culture is centered on rapid prototyping, rigorous experimentation, and data-driven decision-making.
US, CA, Palo Alto
Job summaryAmazon is the 4th most popular site in the US (http://www.alexa.com/topsites/countries/US). Our product search engine is one of the most heavily used services in the world, indexes billions of products, and serves hundreds of millions of customers world-wide. We are working on a new AI-first initiative to re-architect and reinvent the way we do search through the use of extremely large scale next-generation deep learning techniques. Our goal is to make step function improvements in the use of advanced Machine Learning (ML) on very large scale datasets, specifically through the use of aggressive systems engineering and hardware accelerators. This is a rare opportunity to develop cutting edge ML solutions and apply them to a problem of this magnitude. Some exciting questions that we expect to answer over the next few years include:· Can a focus on compilers and custom hardware help us accelerate model training and reduce hardware costs?· Can combining supervised multi-task training with unsupervised training help us to improve model accuracy?· Can we transfer our knowledge of the customer to every language and every locale ?This is a unique opportunity to get in on the ground floor, shape, and build the next-generation of Amazon Search. We are looking for exceptional scientists and ML engineers who are passionate about innovation and impact, and want to work in a team with a startup culture within a larger organization.Please visit https://www.amazon.science for more information
US, CA, Palo Alto
Job summaryAmazon is the 4th most popular site in the US (http://www.alexa.com/topsites/countries/US). Our product search engine is one of the most heavily used services in the world, indexes billions of products, and serves hundreds of millions of customers world-wide. We are working on a new AI-first initiative to re-architect and reinvent the way we do search through the use of extremely large scale next-generation deep learning techniques. Our goal is to make step function improvements in the use of advanced Machine Learning (ML) on very large scale datasets, specifically through the use of aggressive systems engineering and hardware accelerators. This is a rare opportunity to develop cutting edge ML solutions and apply them to a problem of this magnitude. Some exciting questions that we expect to answer over the next few years include:· Can a focus on compilers and custom hardware help us accelerate model training and reduce hardware costs?· Can combining supervised multi-task training with unsupervised training help us to improve model accuracy?· Can we transfer our knowledge of the customer to every language and every locale?· Can we compress an extremely large model to a small model with minimal accuracy loss?This is a unique opportunity to get in on the ground floor, shape, and build the next-generation of Amazon Search. We are looking for exceptional scientists and ML engineers who are passionate about innovation and impact, and want to work in a team with a startup culture within a larger organization.Please visit https://www.amazon.science for more information
US, CA, Sunnyvale
Job summaryAre you seeking an environment where you can drive innovation? Do you want to apply learning techniques and advanced mathematical modeling to solve real world problems? Do you want to play a key role in the future of Amazon's Retail business? Come and join us!Amazon’s Customer Analytics team is looking for Research Scientists, who can work at the intersection of machine learning, statistics and economics; and leverage the power of big data to solve complex problems like long-term causal effect estimation.As a research scientist, you will bring statistical modeling and machine learning advancements to analyze data and develop customer-facing solutions in complex industrial settings. You will be working in a fast-paced, cross-disciplinary team of researchers who are leaders in the field. You will take on challenging problems, distill real requirements, and then deliver solutions that either leverage existing academic and industrial research, or utilize your own out-of-the-box pragmatic thinking.Key job responsibilitiesUnderstand and mine the large amount of data, prototype and implement new learning algorithms and prediction techniques to improve long-term causal estimation approaches.Collaborate with product managers and engineering teams to design and implement solutions for Amazon problems
US, Virtual
Job summaryAlexa is the voice activated digital assistant powering devices like Amazon Echo, Echo Dot, Echo Show, and Fire TV, which are at the forefront of this latest technology wave. To preserve our customers’ experience and trust, the Alexa Sensitive Content Intelligence (ASCI) team builds services and tools through Machine Learning techniques to implement our policies to detect and mitigate sensitive content in across Alexa.We are looking for an experienced Principal Applied Science to build industry-leading technologies in attribute extraction, annotation, and sensitive content detection and interpretation across all languages, modal, and countries. A Principal Applied Scientist will be a tech lead for a team of exceptional scientists to develop novel algorithms and modeling techniques to advance the state of the art in NLP and Computer Vision related tasks. You will work in a hybrid, fast-paced organization where scientists, engineers, and product managers work together to build customer facing experiences. You will collaborate with and mentor other scientists to raise the bar of scientific research in Amazon.Key job responsibilitiesA Principal Applied Scientist should have good understanding of NLP models (e.g. Bi-LSTM, BERT, and other transformer based models) and where to apply them in different business cases. You leverage your exceptional technical expertise, a sound understanding of the fundamentals of Computer Science, and practical experience of building large-scale distributed systems to creating reliable, scalable, and high-performance products. In addition to technical depth, you must possess exceptional communication skills and understand how to influence key stakeholders. Your work will directly impact our customers in the form of products and services that make use of speech, language, and computer vision technologies.You will be joining a select group of people making history producing one of the most highly rated products in Amazon's history, so if you are looking for a challenging and innovative role where you can solve important problems while growing as a leader, this may be the place for you.A day in the lifeYou will be working with a group of talented scientists on researching algorithm and running experiments to test scientific proposal/solutions to improve our sensitive contents detection and mitigation for worldwide coverage. This will involve collaboration with partner teams including engineering, PMs, data annotators, and other scientists to discuss data quality, policy, model development, and solution implementation. You will mentor other scientists, review and guide their work, help develop roadmaps for the team. You work closely with partner teams across Alexa to deliver platform features that require cross-team leadership.About the teamThe mission of the Alexa Sensitive Content Intelligence (ASCI) team is to (1) minimize negative surprises to customers caused by sensitive content, (2) detect and prevent potential brand-damaging interactions, and (3) build customer trust through appropriate interactions on sensitive topics.The term “sensitive content” includes within its scope a wide range of categories of content such as offensive content (e.g., hate speech, racist speech), profanity, content that is suitable only for certain age groups, politically polarizing content, and religiously polarizing content. The term “content” refers to any material that is exposed to customers by Alexa (including both 1P and 3P experiences) and includes text, speech, audio, and video.Job responsibilities
US, WA, Virtual Location - Washington
Job summaryVoice-driven AI experiences are finally becoming a reality and Amazon’s Alexa voice cloud service and Echo devices are at the forefront of this latest technology wave. We deliver world-class products on aggressive schedules that are used every day, by people you know, in and about their homes. At the same time, we obsess about customer trust and ensure that we build products in a manner that maintains our high bar for customer privacy. We are looking for a passionate and talented Applied Scientist with experience in delivering production systems based on innovative research. This is a unique opportunity to play a key role in an exciting, fast growing business. You will be working on one of the world's most cutting edge customer experience and technology. You'll design and run experiments, research new algorithms, and find new ways of optimizing customer experience. Besides theoretical analysis and innovation, you will work closely with talented engineers and ML scientists to put your algorithms and models into practice. Your work will directly impact the trust customers place in Alexa, globally.You should thrive in ambiguous environments that require to find solutions to problems that have not been solved before. You enjoy and succeed in fast paced environments where learning new concepts quickly is a must. You leverage your exceptional technical expertise, a sound understanding of the fundamentals of Computer Science, and practical experience building large-scale distributed systems to creating reliable, scalable, and high performance products. Your strong communication skills enable you to work effectively with both business and technical partners.You will be joining a select group of people making history producing one of the most highly rated products in Amazon's history. Candidates can work in Arlington, VA OR Seattle, WA.
US, WA, Seattle
Job summaryAre you inspired by building new technologies to benefit customers? Do you dream of being at the forefront of robotics and autonomous system technology? Would you enjoy working in a fast paced, highly collaborative, start-up like environment? If you answered yes to any of these then you've got to check out the Amazon Scout team.We’ve been hard at work developing a new, fully-electric delivery system – Amazon Scout – designed to get packages to customers using autonomous delivery devices. These devices were created by Amazon, are the size of a small cooler, and roll along sidewalks at a walking pace.We developed Amazon Scout at our research and development lab in Seattle, ensuring the devices can safely and efficiently navigate around pets, pedestrians and anything else in their path.The Amazon Scout team shares a passion for innovation using advanced technologies, a love of solving complex challenges, and a desire to impact customers in a meaningful way. We're looking for individuals who like dealing with ambiguity, solving hard, large scale problems, and working in a startup like environment. To learn more about Amazon Scout, check out our Amazon Day One Blog post here: http://amazon.com/scoutAs a Sr. Applied Scientist specializing in Computer Vision, you will combine cutting-edge Deep Learning techniques with classical Computer Vision to create intelligent systems.In this job you will: - Collaborate closely with Robotics scientists and Hardware teams to develop perception systems for Robots.· Take responsibility for technical problem solving, including creatively meeting product objectives and developing best practices.· Interact with teammates in variety of roles to accomplish your goals.· Identify and initiate investigations of new technologies, prototype and test solutions for product features, and design and validate designs that deliver an exceptional user experience.· Recruit, hire and develop other Applied Scientists.You are a person with a commitment to team work, who enjoys working on complex systems, is customer centric, and thrives on the challenge of prototyping new systems that will eventually operate at world-wide scale.
SE, Stockholm
Job summaryCome build the future of entertainment with us.Are you interested in shaping the future of movies and television? Do you want to define the next generation of how and what Amazon customers are watching?Prime Video is a premium streaming service that offers customers a vast collection of TV shows and movies - all with the ease of finding what they love to watch in one place. We offer customers thousands of popular movies and TV shows from Originals and Exclusive content to exciting live sports events. We also offer our members the opportunity to subscribe to add-on channels which they can cancel at anytime and to rent or buy new release movies and TV box sets on the Prime Video Store. Prime Video is a fast-paced, growth business - available in over 240 countries and territories worldwide. The team works in a dynamic environment where innovating on behalf of our customers is at the heart of everything we do. If this sounds exciting to you, please read on.We strive to be a fast-moving, creative, and high-impact organization, but we think it is equally important to be collaborative, supporting, and high-trust in the way we work. We want to come to work every day loving not only what we do, but who we have the privilege of working with. Come help us make all of this a reality.Key job responsibilitiesAs part of the Automated Excellence organization, the Automated Reasoning team applies deep and cutting-edge automated reasoning techniques to detect defects automatically in Prime Video’s core systems and device-level code. The tools we build are mission-critical to the software development and release cycle of many Prime Video engineering organizations, and will represent a huge step forward in the sophistication of our approach to automated Quality Assurance. Your work on this team will help us address a new dimension of scale our business faces as we deliver our applications on an ever-expanding set of client devices.A day in the lifeYou will have the opportunity to apply your deep knowledge of automated reasoning techniques, such as static analysis, formal verification, symbolic execution, etc., to concrete problems our product and engineering teams face on a daily basis. You will collaborate with team members to design and deliver enterprise-scale systems that will be used by both internal and external customers. You will have the opportunity to analyse and verify code to solve real-world problems and translate business and functional requirements into quick prototypes or proofs of concept. You will help set and continuously evolve a culture of innovation and curiosity that helps us find and solve our customers’ biggest problems.About the teamTo help a growing organization quickly deliver more features to Prime Video customers, Prime Video’s Automated Excellence organization is innovating on behalf of our global software development team consisting of thousands of engineers. We build services and utilities that make developer’s lives easier and more productive, and that help them deliver at higher levels of quality.
IE, D, Dublin
Job summaryAre you a MS or PhD student interested in a 2022 Applied Science Internship in the fields of Speech, Robotics, Computer Vision, or Machine Learning/Deep Learning?Do you enjoy diving deep into hard technical problems and coming up with solutions that enable successful products that improve the lives of people in a meaningful way?If this describes you, come join our research teams at Amazon. As an Applied Science Intern, you will have access to large datasets with billions of images and video to build large-scale machine learning systems. Additionally, you will analyze and model terabytes of text, images, and other types of data to solve real-world problems and translate business and functional requirements into quick prototypes or proofs of concept.We are looking for smart scientists capable of using a variety of domain expertise combined with machine learning and statistical techniques to invent, design, evangelize, and implement state-of-the-art solutions for never-before-solved problems.
US, VA, Arlington
Job summaryThe AWS Human Resources Operations and Analytics organization is a critical piece of the AWS flywheel. We are the curators of people data for the industry leader in Cloud Computing. As pioneers in this space, we get to answer new and interesting problems in the People Analytics space, always at scale, and across a variety of business and technical leaders. Our data is sourced from a variety of internal and external sources. The work we do enables leaders to continue to make industry shaking decisions with the knowledge that they are doing so based on reliably sourced and responsibly secured data. We own systems and database environments which are built with reliability and security as the foundation on which balances accessibility, speed, scale, and insight generation. Our systems of self-service data today will quickly evolve into self-service insights in 2022 and beyond.Research Scientists on this team have end-to-end range and capabilities. They work closely with stakeholders to define key business needs and deliver on commitments, retrieve and aggregate data from multiple sources, and compile it into a digestible and actionable format. They also gather and use complex data sets across domains, work closely with product managers, and lead the development of key machine learning features from development to deployment in a cross-functional team.The successful candidate will create documents and share findings in line with scientific best practices for both technical and nontechnical audiences and occasionally present research result at internal and external conferences. They will also work closely with Amazon worldwide operations and the People, Experience, Technologies team to define key business objectives, metrics, and data science deliverables, as well as lead the development of key machine learning features from inception to production in an agile development environment.