Machine learning helps unlock data related to COVID-19 epidemic
AWS’s Cord-19 Search website allows researchers to easily query tens of thousands of scientific and medical papers on the deadly virus.
COVID-19 has wreaked a devastating toll. According to the World Health Organization, by late May, COVID-19 had infected more than 4.9 million people worldwide, with more than 320,000 lives lost.
As the world grapples with this disease, researchers and scientists have launched a determined effort to understand COVID-19 and find effective vaccines and treatments. Since the virus was first identified in late 2019, a huge amount of cutting-edge research on ways to fight COVID-19 has been published, with more appearing every day.
But the tsunami of potentially life-saving information is coming so fast that researchers can’t keep up with it.
“It’s amazing to see the amount of data being shared,” says Taha Kass-Hout, director of machine learning and chief medical officer for AWS. “We started with 10,000 or 15,000 papers earlier in the year, and that has tripled.”
The result: important clues or research paths could be overlooked as scientists search for the signal through the noise.
To address the challenge, Amazon Web Services released in late April Cord-19 Search, a new website powered by machine learning that can help researchers quickly and easily use natural language questions to search tens of thousands of research papers and documents. Cord-19 Search combs the data set released by the Allen Institute for AI, and is an outgrowth of a White House remote roundtable held in March with technology company representatives.
Cord-19 Search helps researchers navigate the fast-growing body of coronavirus literature to efficiently find relevant and up-to-date information. Cord-19 Search provides a simple interface where researchers can ask questions using natural language such as, “When is the salivary viral load highest for Covid-19?” and “Is convalescent plasma therapy a precursor to vaccine?” Cord-19 Search produces precise answers as well as source documents.
For example, the answer to COVID-19’s highest viral load says: “Salivary viral load was highest during the first week after symptom onset and subsequently declined with time.” The response to the plasma therapies query: “In the absence of vaccine would provide a stopgap measure, ideally consider to give to those who are at risk of exposure or early in showing symptoms (as a preparedness measure)” along with related scientific articles from past trials during SARS and Ebola.
Cord-19 Search also provides evidence-based topics on incubation, transmission, therapeutics, and risk factors. This is of enormous value to scientists who can quickly query, validate their research, and advance their investigations.
“One of the great things about how Cord-19 works under the covers is that it enhances the data set in response to a query depending on how you want to slice and dice the data,” says Ben Snively, an AWS principal solutions data science architect. “It doesn’t just attach a keyword to a bunch of documents.”
Cord-19 Search is built on AWS machine learning services. Its original dataset was enriched with Amazon Comprehend Medical, which uses machine learning to extract information from unstructured text, including diseases, treatment, and timeline. The data is then mapped to clinical models and medical topics associated with COVID-19.
That information is then indexed in Amazon Kendra, a highly accurate enterprise-search service powered by machine learning that delivers powerful natural-language query capabilities that make it easier to find and rank related articles. The Amazon Comprehend Medical-enriched data and Amazon Kendra search are built from data available in the public AWS COVID-19 data lake, where anyone can experiment with and analyze data.
“We think that Cord-19 will really help researchers connect the dots and make real progress against the virus,” says Kass-Hout. “The combination of Kendra and Comprehend Medical help researchers advance their understanding of Covid-19 and how they might discover a drug or vaccine to fight it. That’s not just like finding a needle in a haystack – it’s like finding a needle at the bottom of the ocean.”
Adds Snively: “From my perspective, Cord-19 Search provides very complete answers to research questions with a very simple interface. Most clinicians and researchers are not deep technologists, but they want to be able to dive deeply into things. Cord-19 Search gives them that opportunity.”
By mid-May, Cord-19 Search had responded to thousands and thousands of questions asked of the system.
COVID-19 isn’t going away anytime soon, of course. Researchers expect that we may see a second wave of the virus, and perhaps even a third. So, work on Cord-19 will continue. AWS’s long-term vision is to expand the Cord-19 Search architecture to incorporate even more data resources. This will allow researchers to uncover patterns of disease progression, make data-driven decisions, and help improve patient outcomes.
“The evidence is going to continue evolving over the coming months,” says Kass-Hout. “We’re in this fight against coronavirus for the long term.”