AI-Powered Coronavirus Literature Database Set-up at Stanford

Researchers report the establishment of Coroncentral, a daily-updated resource that uses machine learning to process research literature on SARS-CoV-2, SARS-CoV, and MERS-CoV, categorizing the literature into article types and topics such as therapeutics, disease forecasting, and long-COVID to enable the scientific community and science communicators to keep up with the rapidly growing coronavirus-related literature.

A spokesperson for the project stated: "As of 3 March 2021, CoronaCentral covers 128,921 papers. The top topic, Clinical Reports, covers articles describing patients and their symptoms, including case reports."

"The second top topic, the Effect on Medical Specialties, covers how specific specialties (e.g., oncology) must adapt to the pandemic. While other approaches have focused on viral biology, we have made a specific effort to also identify papers that discuss societal impacts, including the psychological aspects, the inequality highlighted by the pandemic, and the long-term effects of COVID".

"This final topic, also known as “long COVID,” is covered by the Long Haul topic, which currently includes 362 papers. We find the first Long Haul COVID papers appeared in April 2020, and there has been a slow steady increase in publications since then, with ∼30 papers per month recently. While all of the annotated Long Haul documents used to train our system focus on SARS-CoV-2, our system finds 12 papers for the long-term consequences of SARS-CoV and one for MERS-CoV."

"Our approach also identifies the article type, which is important, given our estimate that 24.7% of publications are comments or editorials and not original research."

This project has been supported by the Chan Zuckerberg Biohub and through a National Library of Medicine Grant. 



