KDD: Graph neural networks and self-supervised learning
Amazon Scholar Chandan Reddy on the trends he sees in knowledge discovery research and their implications for his own work.
As a Senior Program Committee member at this year’s Knowledge Discovery and Data Mining Conference (KDD), with a wide perspective on paper submissions, Chandan Reddy noticed two major research trends: work on graph neural networks and on self-supervised learning.
“Graph neural networks has been an extremely hot topic of research in recent years, and at this year’s KDD conference as well,” says Reddy, an Amazon Scholar and a professor of computer science at Virginia Tech. “In machine learning, you often assume that the different data samples are independent of each other. But in the real world, you always have more information about relationships between two entities. If you have two people, there are connections between them. Knowing about your neighbor, we can start to predict something about you. So naturally you have a lot of data that is being collected that can be represented in the form of graphs.”
In the context of knowledge discovery, the nodes of the graph usually represent entities, and the edges usually represent relationships between them. Graph neural networks provide a way to represent nodes as vectors in a multidimensional space, such that nodes’ locations in the space encode information about their relationships to each other. Graph neural networks can, for instance, help identify missing edges in a graph — that is, previously unnoticed relationships between entities.
With self-supervised learning, a machine learning model is trained, using unlabeled data, on a proxy task that is related to its target task but not identical to it. Then it’s fine tuned on labeled data. If the proxy task is well chosen, this can dramatically reduce the need for labeled data.
Amazon at KDD
Read more about Amazon's involvement at KDD — papers, program committee membership, and participation in workshops and tutorials.
Self-supervised learning “was introduced in natural-language processing about three years back through this BERT model and some other masked language-modeling approaches,” Reddy explains. “It has now become kind of a mainstream topic in the data-mining community.”
BERT is a language model, meaning that it encodes the probabilities of different sequences of words in a particular language. It’s trained on unlabeled texts in which individual words have been randomly masked out, and its proxy task is to fill in the missing words.
“In graph neural networks, the analogy is that you remove an edge and you try to predict whether there was an edge or not,” Reddy explains. “Based on that, you can then use that information to learn the dependencies between the nodes.”
But, Reddy explains, while the same basic BERT model has proved useful for a wide range of problems in natural-language processing (NLP), the ideal vector representation of a node in a knowledge network is very much dependent on the ultimate application. In part, this is because knowledge networks can have heterogeneous data types. A graph depicting online shoppers’ buying preferences, for example, could have nodes representing classes of products, nodes representing specific products, and nodes representing product features, such as battery capacity or fabric type.
In machine learning, you often assume that the different data samples are independent of each other. But in the real world, you always have more information about relationships between two entities.
“When you have a link prediction model, where you want to predict whether a link can be formed between these two nodes, you don't want to learn a single representation for a particular node,” Reddy explains. “If a person has to be recommended a book, the representation has to be different from the same person being recommended movie. You would want a book representation that is different when it is being recommended to a group of people who are interested in this genre of books or if it's being recommended to a person who's interested in a different genre of books. In some sense you have to have a multiaspect or a multiview representation of this node.”
In his own research, Reddy frequently works on knowledge discovery for health care, where the problem of data heterogeneity is particularly acute.
“Some of these lab values, for example, you are monitoring over time,” he explains. “The patient is admitted to an ICU, and blood pressure, blood work is done on a regular basis every 12 hours. So you have a time series data, which is sequential in nature. You have demographic data, which is static in nature. And then you have clinical notes, which are again sequential, but they’re not temporal, whereas in time series it is temporal. And you have image data in the form of x-rays and CT scans.”
“Now we have to come up with a deep-learning model that can leverage all these different forms of data. Health care is just one application, but you can think of so many other applications where leveraging such multimodal data is becoming an important problem. In real-world data, you don't just see data in one particular form. You have multiple heterogeneous forms of data that are collected about any particular entity.”
Self-supervised learning is, fundamentally, a technique for doing machine learning more efficiently: labeling data is inefficient, and leveraging unlabeled data reduces dependence on labeled data. In addition to serving as a Senior Program Committee member at KDD, Reddy is also one of the organizers of the conference’s Workshop on Data-Efficient Machine Learning, together with Amazon’s Nikhil Rao and Sumeet Khatariya.
“People talk a lot about domain adaptation in the presence of limited data,” Reddy says. “There are different topics related to it like few-shot or zero-shot learning, transfer learning, meta-learning, multitask learning, et cetera. Some people talk about out-of-domain distribution. There are several concepts that try to achieve data-efficient learning in real-world applications. We wanted to have all these discussions in a more coherent manner in this workshop, so we can share knowledge, we can see what works, what doesn't. We tried to bring people from different communities so that they can learn both success and failure stories of different approaches in various domains.
“Some of these graph papers that were published last year are basically inspired by a simple technique that was borrowed from the NLP and computer vision communities. We are trying to see if we can share more recent trends and knowledge from these domains.”