Data for Humanity
Apr 28, 2020 | 5 Min Read
Author: Benoit Dageville
Snowflake Thought Leadership
I considered other titles for this blog. At first, “Data for Good” seemed appropriate. But COVID-19 has impacted nearly all of humanity. It has killed more than 200,000 people, infected more than three million, and has affected the lives of billions both emotionally and financially. We’re in new territory, and not as nations or continents but as all of humanity. At the center of everyone’s efforts is data. Snowflake has been busy enabling solutions and free data sets to help fight coronavirus and help prevent new outbreaks as communities globally begin to relax social distancing measures.
The data challenges we must overcome
Thousands of organizations – governments, healthcare providers, and businesses – are all asking the same questions as the spread of the virus slows: Which social distancing measures should we relax, when, and at what cadence? How do we prevent future outbreaks? If one does occur, what resources will a local area need to stop another outbreak?
Data will answer these questions, and provoke more questions from these organizations, such as: How difficult will it be to acquire this data? Is it analytics-ready? How often is it updated? How much will it cost? Is there one place we can find this data and acquire new data sets as they emerge? Can we easily combine it with our own data to reveal additional insights previously unavailable to us?
The data providers, and data analytic service providers, continue to step up. They’re making available myriad solutions and data sets that rely on data about infection rates, population densities, the impact of social distance measures, and even weather patterns. Every day, new data sets become available for free to help ensure a safe society in the months and years ahead after we gain control of COVID-19 and others like it.
But these data and solution providers are asking their own questions to make this happen: How do we enable the consumers of these data sets, and quickly? What data security measures do we need to take? What about data governance and data privacy? How much information can we share? What parts of that information can’t we share and how do we do that?
Snowflake Cloud Data Platform has emerged as the platform of choice to build these solutions and store these data sets. Snowflake Data Marketplace has emerged as the marketplace to share these data sets for free. Together, our platform and our marketplace are the ideal combination to load, store, integrate, and securely share any amount or type of data to prevent future outbreaks of coronavirus as communities around the world relax social distancing policies. Read on to learn who’s involved and how it all works.
The data analytics problem Snowflake solves
When my co-founders and I started Snowflake in 2012, our vision was to make more data more easily accessible for analysis. We committed to creating an architecture and technology that would disrupt the data analytics industry, which is now four decades old. What emerged was Snowflake Cloud Data Platform, which enables any organization to analyze data exponentially faster than other solutions. We also made our platform global, so organizations can seamlessly and securely work with and share data between different regions and different cloud providers. As of today, Snowflake spans about 20 cloud regions worldwide and is hosted by the three major cloud infrastructure providers. Data can be replicated easily to any of those regions independent of what region or cloud the data is located. What inspired us was what this would do for business, and how it could impact healthcare, science, and other humanity-driven endeavors. More than 4,000 organizations around the world rely on Snowflake Cloud Data Platform, and that number continues to grow.
Snowflake Data Marketplace emerged from the work we did on Snowflake Cloud Data Platform. It is where data providers offer data stored in our platform to data consumers. But we wanted to make our marketplace as revolutionary as our platform. So, we enabled live, governed, secure, and instant data sharing as the foundation of our marketplace. Data providers share access to read-only views of their data sets listed on the marketplace. This means data doesn’t have to move in order for data consumers to access the data. It is always live, so data consumers receive updates immediately from the data provider. Specific to COVID-19, Snowflake Data Marketplace enables data consumers to use any of these data sets, and even combine them with their own data to acquire previously unobtainable insights. All the while, Snowflake’s data security, governance and privacy features enable data providers and consumers to adhere to industry and regional data compliance regulations.
How Snowflake is helping to stop another outbreak of coronavirus
Snowflake is a COVID Alliance partner, supplying Snowflake Cloud Data Platform as the foundation for a number of tools the Alliance continues to create so governments and healthcare organizations can use them to predict, detect, and contain future outbreaks. Both our platform and our marketplace are enabling these organizations to assemble these applications and data sets, build these tools, and make them available in days, not weeks or months, so they can have an impact now. And they’re connecting to each other’s data and tools through Snowflake Data Marketplace to enhance their solutions beyond what’s possible by working alone.
In addition to what Snowflake is enabling for the COVID Alliance, a similar solution is rolling out in the European Union (EU). Snowflake customer Keboola has built its Smart Quarantine solution on Snowflake Cloud Data Platform to help countries gradually relax quarantine measures to restart their economies, while avoiding a resurgence of the epidemic. The Czech and Slovak republics have already deployed the solution. Keboola has partnered with global technology services provider, Capgemini, to help deploy it to other EU countries.
Snowflake is also part of a coalition of more than 30 healthcare and technology companies behind the effort to deliver a highly secure repository of anonymized and HIPAA-compliant data. The COVID-19 Research Database includes anonymized longitudinal data on medical claims, pharmacy claims, electronic healthcare records, laboratory data, demographic data, and many more data sources as more healthcare companies offer their data to be added to this data set. Researchers will also have access to tools to extract the deepest insights from this data set to help understand COVID-19 and others like it. With this much data in one location, health professionals will be able to reveal insights not possible before to help combat the COVID-19 global pandemic, and hopefully other serious afflictions that may arise in the future.
In addition, Snowflake and its partners have made available a number of COVID-19 data sets and dashboards that are analytics-ready, and are free of charge on Snowflake Data Marketplace. They include:
- Starschema COVID-19 – Anonymized epidemiological data, population densities, and geolocation data from multiple sources and amassed into a single, analytics-ready data set. More than 2,000 organizations have already requested access to this data set.
- COVID-19 Weather Data Set – Weather Source has collated hourly and global weather data with epidemiological data to determine climates where COVID-19 is more, or less, active in local areas.
Some of the other COVID-19 data sets powered by Snowflake, include:
- COVID-19 Critical Risk Index – Created by Carrot Health and totaling more than 100 data sources that provide anonymized health, lifestyle, and other other data about more than 260 million adults living in the U.S. The data set is designed to determine the risk to local communities and required medical care if an outbreak does emerge.
- AirDNA – This Denver-based data and analytics services company tracks the daily activity of more than 10 million short-term rentals, which spans more than 800,000 rental markets worldwide, and provides this anonymized data set to determine the movements to and from rural, suburban, and urban areas that are experiencing a decline or rise in COVID-19 cases.
Protecting humanity, protecting data
To help contain COVID-19 now and in the future, we need easily accessible data and tools that will have an immediate impact at local, state, country, and global levels. We also know that protecting the data required to achieve this goal is equally as important. This is why all anonymized COVID-19 data sets located on Snowflake Cloud Data Platform, and hosted on Snowflake Data Marketplace, are managed by the organizations that compile them and the third-party organizations they have partnered with to review these initiatives for data accuracy, consent, anonymity, governance, and transience. It’s important that technology companies do not own this data. Instead, we need to enable these solutions to make these data sets and tools readily available.
Fully defeating COVID-19 will take much more than data. But until we have a vaccine, data will stay at the center of this effort. And just as important, these data sets and tools will help fight the next potential outbreak. Alongside so many others, Snowflake is very proud to play its part in this fight. Let’s not forget this is humanity’s fight. Stay safe!