Machine learning

3 questions with Ryan Tibshirani: The science behind COVIDcast and pandemic tracking

Tibshirani is a featured speaker at the first virtual Amazon Web Services Machine Learning Summit on June 2.

By Staff writer

May 18, 2021

4 min read

The first Amazon Web Services (AWS) Machine Learning Summit on June 2 will bring together customers, developers, and the science community to learn about advances in the practice of machine learning (ML). The event, which is free to attend, will feature four audience-focused tracks, including Science of Machine Learning.

The science track is focused on the data science and advanced practitioner audience, and will highlight the work AWS and Amazon scientists are doing to advance machine learning. The track will comprise six sessions, each lasting 30 minutes, and a 45-minute fireside chat.

Amazon Science is featuring interviews with speakers from the Science of Machine Learning track. For the fourth edition of the series, we spoke to Ryan Tibshirani, associate professor of statistics and machine learning at Carnegie Mellon University (CMU), and an Amazon Scholar.

Tibshirani is a co-principal investigator (co-PI) of the CMU Delphi Group. In March 2020, the Delphi Group launched an effort called COVIDcast, a large repository of richly detailed, real-time indicators of COVID activity in the U.S. The American Statistical Association recently presented the 2021 Statistical Partnerships Among Academe, Industry, and Government (SPAIG) Award to Tibshirani, the Delphi Group team, and their COVIDcast partners. The team was recognized for its “commitment to the theory and practice of epidemic tracking and forecasting through building and modeling unique public health data streams.”

Q. What is the subject of your talk going to be at the ML Summit?

I will talk about the digital ecosystem that we are developing for epidemic tracking and forecasting at the Delphi Group.

Ryan Tibshirani: Epidemiological Forecasting Tools for COVID-19

Before the pandemic, the group’s work focused on seasonal influenza. Over the years, we've supported the Center for Disease Control’s (CDC) Influenza Division in advancing and growing a scientific community around flu forecasting. When the pandemic hit, everything changed. We pivoted to focus on COVID.

Since March 2020, our main area of focus has been developing and maintaining COVIDcast, which is the nation's largest repository of diverse, geographically detailed and real-time indicators of COVID activity in the U.S.

Our work for the COVIDcast focuses on the entire data pipeline that helps inform key indicators. Several of the underlying data sources on which these indicators are built would not exist, or for that matter be publicly available, without our efforts. These indicators are also used to power our nowcasting and forecasting models.

Today, COVIDcast is used regularly by public health officials, government agencies, journalists, healthcare companies, financial firms, and fellow modelers. We also work closely with the CDC on COVID forecasting. Our collaborations have resulted in the creation of an ensemble model developed from the various submissions the agency receives through the COVID Forecast Hub. In my talk, I will go into COVIDcast and COVID forecasting, while also delving into the important lessons we have learned over the last year.

Q. Why is this topic especially relevant within the science community today?

The obvious answer to this question: we are in the midst of a global pandemic. The World Health Organization said last year that a global pandemic required a world effort to end it. Each and every one of us should be paying attention to the fight against COVID, and helping as best they can. This is especially true of scientists — not just virologists, and the like, but also computational scientists who can play a hugely important role as well.

Each and every one of us should be paying attention to the fight against COVID, and helping as best they can. This is especially true of scientists.

Ryan Tibshirani

This pandemic has produced some of the richest data we have ever seen in terms of infectious disease tracking. The data is of a far higher resolution than anything that we had before for the seasonal flu. That is not to say that the data is perfect. On the contrary, it can be hugely challenging to deal with, and bridging the gap between the challenges and the opportunities will be a major theme of my talk.

Beyond COVID, I want to promote epidemic tracking and forecasting as key areas of research for the scientific community — even for scientists working in areas outside epidemic tracking and forecasting.

Q. What does the future of epidemic tracking and forecasting look like?

The future is bright. Never before has the global community paid this much attention to epidemic tracking and forecasting. The gears appear to be in motion to set up more substantial and permanent structures that will help us deal with epidemics and pandemics, and mitigate their effects on public health as best as possible, in the years to come. After all, in the event of an emergency such as a pandemic, you want to be able to use systems that have already been in place for many years. You don’t want to rely on systems that have been set up on the fly.

At Delphi, our long-term mission has always been to make epidemic forecasting as widely accepted and useful as weather forecasting is today. Our work over the next several years will broaden in scope, stretching beyond COVID and influenza, to other fast-moving epidemics that present the prospect of posing a significant burden on the public health system.

(You can learn more about Ryan Tibshirani’s research here and watch his free talk at the virtual AWS Machine Learning Summit on June 2 by registering at the link below).