Isaac Cabrera, State of California Geographic Information Officer and Manager of Data and Geospatial Services at the California Department of Technology (CDT) met with a Snowflake Public Sector User Group in February 2021 to share how his organization relies on Snowflake in its role as the guardian of public data during the COVID-19 pandemic.
CDT is a leader in information technology services and solutions and oversees all aspects of technology across the state. As the manager of data and geospatial services, Cabrera works on critical cross-state agency crises such as wildfires, earthquakes, health emergencies, and more. In addition, he represents the state in all geographic information system (GIS) matters, helping to bridge the gap between the state, local governments, and the public and private sectors.
Prior to Cabrera joining, the CDT team began looking into the idea of building a virtual data warehouse in early January 2020. When the COVID-19 pandemic hit shortly thereafter, it became apparent that California needed a centralized and authoritative location from which to share COVID-19 data. It was a timely coincidence that the team had just completed a Snowflake proof of concept and was impressed by the result.
As the pandemic grew, it was helpful that Snowflake was already up and running with the needed tooling and unlimited cloud storage space. Snowflake met the state’s requirements for securely aggregating and storing data about positive COVID-19 cases, deaths, and testing, as well as California Hospital Association data such as the numbers of hospital beds available in the state.
Snowflake also integrated easily with Ersi GIS software and Tableau, CDT’s chosen dashboard and analytics solutions. The data for the dashboards and analytics flowed in from Snowflake, either as an extract or a direct-connect query.
With Snowflake in place, CDT quickly established the authoritative data repository on COVID-19 for the state. Leveraging Snowflake Data Marketplace, CDT was then able to securely share data with state agencies and departments, county health departments and agencies, and health partners such as hospitals, vaccine providers, and others. CDT was also able to create a state COVID-19 website, an open data portal that the public could access for updates at any time.
“Our rollout of the COVID-19 data warehouse on Snowflake went very well,” Cabrera explains. “We had rapid adoption, and the various users were able to scale up without issue even as the data increased rapidly.”
Informing Everyone: Urgent Need for Reliable, Accurate Data
The COVID-19 data warehouse involved many technologies, information sources, ingestion points, and ETL loads. Once collected, data was curated in different formats and distributed. As the pandemic progressed, reporting needs expanded to include COVID-19 test reporting and, eventually, vaccine availability, distribution points, and progress reporting.
“The Snowflake platform enabled us to consolidate data from multiple sources and then push a single message out to many locations where people could access the data,” Cabrera says. “Having a single source of truth in Snowflake helped us ensure there was one consistent message going out to all users.”
“The Snowflake team spent a lot of time with CDT to make sure this was successful,” Cabrera adds. “We really appreciate the Snowflake team.”
Enabling Covid Test Reporting and Public Notification
CDT helped the California Department of Public Health (CDPH) create an auditing system that showed how many COVID-19 test results were coming in. The team created a mirrored solution using Snowflake that logged what was coming in and created a dashboard that would make it easy to watch for anything out of sync. The effort was successful.
The data that CDT collected with Snowflake was also funneled into CA Notify, a phone app developed by Google and Apple allowing people who had been tested for COVID-19 to anonymously collect data via Bluetooth as they were walking around. Snowflake data fed into the app’s opt-in choice to send notifications letting people know if they’d been in contact with someone who tested positive.
Open Source Development for Easy Sharing
“A key advantage of the solution we implemented with Snowflake is that we were able to turn around and make it open source, allowing other states to benefit from the work we did,” says Cabrera. “The code is in Python. We are working on a DevOps template that you can use to play around with the code. Everything rolls out from Snowflake warehouses to the Azure serverless functions. We are now applying this technology to vaccination reporting.”
With Snowflake technology, CDT can virtually connect the state of California’s COVID data warehouse directly to a vendor system that CDPH uses—and there is no data duplication. Prior to this, duplications could be a problem or data could become quickly out of date.
“Snowflake’s data sharing has made things a lot easier, especially during the pandemic,” Cabrera notes.
A Toolset for the Future
Snowflake is now a part of a diversified set of tools used by the department. Through this set of tools, CDT looks to enable self-service access to data analytics and is working on a metadata model that includes data governance.
“We’ve created an amazing data platform with Snowflake. We had the first version deployed in just a few days and had it fully integrated into our agile processes within a month. I’m proud to be a part of this project and proud of everyone who worked together to roll it out so quickly,” says Cabrera. “The question we have now is not whether something is possible, but rather how do we get the most out of our data, get people access to it, and then build in more automation.”