Machine learning

3 questions with George Karypis: Making learning from data embedded in graphs easy and scalable

Karypis is a featured speaker at the first virtual Amazon Web Services Machine Learning Summit on June 2.

By Staff writer

May 31, 2021

3 min read

The first Amazon Web Services (AWS) Machine Learning Summit on June 2 will bring together customers, developers, and the science community to learn about advances in the practice of machine learning (ML). The event, which is free to attend, will feature four audience-focused tracks, including Science of Machine Learning.

The science track is focused on the data science and advanced practitioner audience and will highlight the work AWS and Amazon scientists are doing to advance machine learning. The track will comprise six sessions, each lasting 30 minutes, and a 45-minute fireside chat.

Amazon Science is featuring interviews with speakers from the Science of Machine Learning track

For the sixth and final edition of the series, we spoke to George Karypis. Karypis is a Distinguished McKnight University Professor at the University of Minnesota Twin Cities and an AWS senior principal scientist. During the course of his career, Karypis’ research has focused on developing novel algorithms in the fields of data mining, recommender systems, learning analytics, high-performance computing, information retrieval, and bioinformatics. Karypis is also an Amazon Scholar, a group of academics that work on large-scale technical challenges while continuing to teach and conduct research at their universities.

Q. What is the subject of your talk going to be at the ML Summit?

I am going to talk about how deep graph neural networks (GNNs) are used to develop machine learning models for solving problems on domains where the underlying data is represented through graphs.

Learning graph with neural networks - The Web Conference 2020

The adoption of GNNs has exploded in recent years, as data scientists move from developing deep learning models for two-dimensional signals, such as images, and three-dimensional images, such as video, to learning from structured and related data embedded in graphs.

At Amazon, my team has been working on the Deep Graph Library (DGL), an easy-to-use, high-performance and scalable Python package for deep learning on graphs. DGL is a framework that allows developers to program graph neural networks. DGL supplements existing tensor-based frameworks such as Tensorflow, PyTorch, and MXNet. During my talk, I’m going to talk about how DGL can be used to make the development, training, and use of GNNs easy and scalable.

Q. Why is this topic especially relevant within the science community today?

There are a variety of problems for which we can develop accurate machine learning solutions by leveraging the linked data that is inherent to graphs.

George Karypis

GNNs are used in a number of fields today. For example, they play an increasingly important role in social networks, where graphs show connections among related people.

We have organizations like Marinus Analytics using GNNs to help victims of human trafficking. In the medical field, GNNs are playing an increasingly important role in finding candidate drugs for new diseases. At Amazon, they are used to develop recommender systems, build mechanisms for fraud and abuse detection, and develop Alexa's conversational capabilities, among other applications.

There are a variety of problems for which we can develop accurate machine learning solutions by leveraging the linked data that is inherent to graphs.

GNNs are increasingly delivering state-of-the-art results in graph learning on a wide-range of domains. They are now one of the most actively researched areas in deep learning, even as we transition from academic and industrial research to actively powering products and services.

Q. What are some breakthroughs in the world of GNN development that are particularly exciting to you?

I’m excited about the development of both new algorithms and infrastructure that will help us scale the training of GNNs.

George Karypis

I’m excited about the development of both new algorithms and infrastructure that will help us scale the training of GNNs.

This is particularly important for deploying GNNs in real-world applications, where graphs can be very large — to give just one example, a social network can contain millions of nodes (or users) and billions of links.

When it comes to being able to tackle large graphs, the development of large-scale benchmarks allows us to zero in on viable architectures and validate approaches that can be scaled for more complex datasets.

In addition, the development of hardware accelerators for GNN workloads are going to be instrumental in allowing us to harness the power of machine learning to solve problems with graph-structured inputs.

(You can learn more about George Karypis’s research here and watch his free talk at the virtual AWS Machine Learning Summit on June 2 by registering at the link below).