The team at Starschema, a data services and technology company, believes data has the power to change the world. When the COVID-19 pandemic hit the world last March, Starschema felt an obligation to act. The company quickly responded by packaging and sharing data to help inform better decision-making. The Starschema team created a robust COVID-19 epidemiological data set and made it available, for free, to thousands of companies via Snowflake Data Marketplace.
The data set has served as a single source of truth for COVID-19 incidence and mortality data throughout the pandemic, helping organizations build contingency plans and make more-informed decisions in response to the global health crisis. To date, hundreds of organizations have leveraged the data in Snowflake Data Marketplace, including healthcare companies, retailers, and even financial services providers. For example, Harvard Business Reviews reports that Capital One uses Starschema’s data to forecast and plan response scenarios for its workforce and customers.1
“Everyone is dealing with the effects of COVID-19 in one way or another,” Tamas Foldi, CEO at Starschema, told Business Wire when the data was first made available.2 “Our goal is to deliver the highest quality data, sound enough to stake lives on, with the utmost transparency.”
Taking a closer look at what made this data set so powerful, a few best practices emerge.
Put your data where users are
Instead of housing the data exclusively on its website and trying to bring potential users to it, Starschema also made it immediately available to all Snowflake customers through Snowflake Data Marketplace, which spurred quick adoption. Since the Starschema dataset was available with one-click simplicity on the Snowflake Marketplace, our customers could begin using it right away, greatly reducing time to value.
Remove barriers to access
Given the critical importance of this data set to organizations across industries and geographies, Starschema made it available to Snowflake customers for free. While not every business can afford to build and maintain such a data set, lowering the cost of access encourages more organizations to use the data. Additionally, because the data is available on Snowflake Data Marketplace, Snowflake customers save money on data integration costs. No ETL is required. Instead, the live, ready-to-query data is always at their fingertips.
Keep it fresh
With COVID-19, every day tells a new story. The Starschema team works tirelessly to ensure its data set always includes the highest quality, most-recent data. In this way, they empower users to leverage it for real-time decision-making. With Snowflake Data Marketplace, Snowflake customers always have access to the most-current data.
From day one, the Starschema team has used a public GitHub repository to house and document any data transformations. The team is transparent about where data originated and when it was last refreshed, and they are careful to include any caveats. “When it comes to data that will support critical decisions, we cannot simply offer a black box,” said Chris von Csefalvay, Starschema’s VP of Special Projects. “We need to show every step of the way for how we arrive at the numbers.”
By keeping the data up to date and making its origins transparent, Starschema has built trust with users, encouraging greater usage over time.
Bust data silos
The COVID-19 data set combines information from a variety of sources. These contributing data sets typically exist in a wide variety of formats, each with its own structure and fields, making them difficult to integrate and normalize. By partnering with Snowflake, Starschema broke down silos and brought these data sets together, increasing their value to users.
The Starschema team leveraged automation to reduce errors and make processing more reliable. By writing a script to normalize and unify the data once, the team updated the data set automatically. “Without automation and a streamlined execution model, we could not have sustained the growing workload coming from an increasing number of data sets,” von Csefalvay explained.
The Starschema team continuously maintains the data set, which includes monitoring it for issues, fixing breaks, and replacing data sources if one stops publishing data or a better source becomes available—while always letting users know what’s changed.
Listen to your users
Starschema understands that the most valuable data sets are not static; rather, they should grow and evolve over time as use cases and user needs change. The team actively solicits feedback on its data set from users, asking how they leverage the data, what data types would be useful to add, and more. In response to user feedback, more inputs are added over time, including the recent addition of COVID-19 vaccination rates.
Starschema takes inbound feedback seriously, responding to the majority of specific customer requests for data sets, clarifications, or bug fixes within 24 hours. By involving users in the data set’s evolution, Starschema has turned them into collaborative partners.
“Good things happen when people share data, and few things confirm this more than this project,” von Csefalvay said.