Snowflake Powers Largest Anticipated Coronavirus Data Set for Health

Author: Todd Crosslin

Snowflake News

A coalition of leading healthcare companies have joined forces, and will join their data, to create the potentially largest healthcare data repository ever. The COVID-19 Research Database will be made available for free to public health and policy researchers. With this much data in one location, health professionals will be able to reveal insights not possible before to help combat the COVID-19 global pandemic, and hopefully other serious afflictions.

More than 30 healthcare and technology companies are behind the effort to deliver a highly secure repository of anonymized and HIPAA-compliant data. The group includes Advarra, Aetion, AnalyticsIQ, Arcadia.io, Berkeley Research Group, BHE, Change Healthcare, Datavant, Elsevier, Glooko, Health Care Cost Institute, Healthjump, Helix, Medidata (a Dassault Systèmes company), Mirador Analytics, Munich Re Life US, Office Ally, OMNY, Parexel, Prognos Health, QIAGEN, SAS, Snowflake, Sumitomo Dainippon Pharma, Symphony Health, Veradigm, and Verana Health.

The COVID-19 Research Database will comprise anonymized longitudinal data on medical claims, pharmacy claims, electronic healthcare records, laboratory data, demographic data, and many more data sources as more healthcare companies offer their data to be added to this data set. Researchers will be able to extract the deepest insights to help understand and combat the COVID-19 virus and others like it.

Specifically, they’ll be able to understand the impact of certain drugs on coronavirus. They’ll also be able to determine the demographic factors and pre-existing conditions that relate to people requiring a ventilator, or who died from the virus. In addition, they’ll be able to look at the impact quarantine and social distancing measures in different geographic areas have had on containing COVID-19. 

« The first challenge many researchers have run into with this crisis is the difficulty of accessing high-quality health data that can be used to answer pressing questions such as drug and non-drug treatment effects, factors that drive differential risk of catching the disease and very different outcomes in those who do, » the chair of the coalition’s scientific steering committee and Professor of Medicine at Stanford University, Dr. Mark Cullen said. »

Researchers will access the COVID-19 Research Database via Snowflake Cloud Data Platform, so they can easily and securely conduct large-scale research. All sensitive data that could identify someone is anonymized within Snowflake. A number of the healthcare companies involved already use Snowflake Cloud Data Platform as their solution to store, integrate, analyze, and securely share live data in real time with partner organizations. And some make anonymized versions of this data available via Snowflake Data Marketplace. This data will be available for free to researchers in a highly governed and secure way that promotes data privacy and adherence to data compliance regulation. They can register to request research access to the COVID-19 Research Database here.

The combination of the Snowflake Cloud Data Platform and Snowflake Data Marketplace means researchers get a one-stop shop for accessing and analyzing this data. More importantly, this data will be analytics-ready and therefore will remove the complex processes of assembling these data sets, loading them into one repository, and integrating them with other data sets available on the marketplace or with data from the organization the researcher works for. This has all been possible via the coalition of healthcare and technology companies that have partnered to provide this data.

This effort, combined with the COVID Alliance, which helps municipalities and state governments in the U.S. leverage data to navigate their steady return to normalcy, and the Starschema COVID-19 data set, which is available free of charge on the Snowflake Data Marketplace, are examples of the power of data, but more importantly the power of live data shared between organizations to create insights that historically took months or years to deliver, now takes hours or days.