Open data promised to change the business of government, but ended up changing business, full stop. In the aftermath of the 2007 financial crisis, some of the first open data initiatives provided access to public sector spending data. Websites like Where Does My Money Go? in the U.K. and USASpending.gov in the U.S. provided visibility into government spending. Over time, however, governments came to recognize that the data served a wider purpose.
Open data fuels innovation
Open data strategies—initially driven by goals of increased transparency and accountability in government spending and performance—shifted to supporting innovation and creating economic value. The history of weather and GPS data had already demonstrated the value of open data; both types of data, made public in the ’70s and ’80s, created multibillion dollar industries.
The most recent wave of open data sought to drive similar innovation and create even more economic value in the private sector. New mandates for open data at all levels of government sparked an explosion of data availability. National, state, and local governments all over the world now publish data on government assets, operations, and performance; for example, the London Datastore or Analyze Boston. International organizations such as the World Bank and the United Nations publish open data, as do research institutes and other NGOs. Other open data sets, such as OpenStreetMap and Wikidata, are crowdsourced.
What does this new open data innovation look like? A well-known example is Zillow, which began in 2006 pulling data from county records on real estate transactions, property values, and taxes, among other sources. Zillow’s market cap is now over $8 billion. Another example is Climate Corporation, which uses open data from the U.S. National Weather Service, U.S. Geological Survey, and even NASA to help farmers make better informed operating and financing decisions. Monsanto acquired the company for $1.1 billion in October 2013. Today, many companies use weather data to forecast demand or optimize routing, or data on property sales or new business licenses to identify and qualify prospective customers.
Open data hasn’t always meant free or easy
The promise of open data was (and still is) huge. There is a wealth of open data out there. However, the data is not always complete nor accessible within the tools data analysts and data scientists use regularly. And much of it just isn’t available at all. Take the U.S. federal government, for example. One open source project, Project Open Data Dashboard, assesses federal agencies’ progress on implementing the open data policy. One of the measures is whether or not the data’s download URL is actually working. We’re all familiar with those maddening 404 errors. Well, it turns out, many open data websites end up there. While agencies like the Social Security Administration have an impressive success rate of 98% working URLs, others like the Department of Labor Statistics have a rate of less than 2%.
SOURCE: https://data.gov/metrics.html# Based on data through December 31, 2021
In addition to the 404 errors, those who try to use open data might also find that:
- It can be hard to find the data needed in data portals that lack structure, and in data sets not always tagged for use cases or industry.
- It can be hard to make sense of the data without proper business definitions, or with poor quality and inconsistencies across the data.
- It can be hard to query and combine the data with other data sets, particularly without documentation or with a lack of knowledge of the data sets.
- It can be hard to access the data because of obscure formats or undocumented APIs, or non-working download URLs.
The result? Dealing with complex maintenance, change management, data integration, and poor data quality ultimately increases costs. As with open source software, open data doesn’t mean easy or free. And worse, data quality issues could lead to poor decisions and reduce confidence in future analysis.
Open data on Snowflake Marketplace
Fortunately, all is not lost. Data providers have stepped in to make “open” data more accessible and usable. I recently sat down (virtually) with Uli Bethke, the founder and CEO of Sonra, a data provider on Snowflake Marketplace. Our discussion in the webinar How to Get Started Monetizing Your Data centered on how Sonra recognized this new opportunity. Having started by providing data engineers tools to convert data to readable formats, and to facilitate visualization, optimization, formatting, and capturing data lineage, the company found that these tools were particularly suited to resolving some of the challenges of open data. Sonra began to provide not only the tools but the data itself, particularly reference data to enhance its customers’ analytics and insights.
One of the reasons data providers are able to offer curated open data sets is Snowflake Marketplace itself, which provides:
- Direct access to near real-time data without the need for ETL or APIs
- Templates for creating listings with titles, tags, sample queries, and pricing options
- Integrated governance including granular, revocable access and usage monitoring
- Auto-fulfillment across clouds and geographies for broader reach
- Managed billing directly via Snowflake
- Self-service discovery and access with use case and industry tagging of data sets
- Try-and-buy for better customer due diligence, lowering the barrier to data acquisition
These features enable data providers to offer a better customer experience with greater margins for their data products, to reach new markets with self-serve products, and bring these new products to market faster. Also—and this was the topic of a previous blog post—Snowflake enables companies to monetize their data, data service, or applications with usage-based pricing.
Some examples of direct access to open data sources on Snowflake Marketplace include:
- REWORTH ANALYTICS: Mexican Credit Card Data by State – Reworth reformats data on the number of available ATM and POS (TPV) across each state made available by Banxico, the central bank of Mexico. Reworth also combines these data with basic spatial and demographics.
- CYBERSYN, INC: FHFA: Single Family Home Appraisals and Values – Cybersyn reformats data from the Federal Housing Finance Agency (FHFA) to enable analysis and make it compatible with other data sets.
- SONRA: Canada Postal Codes – Sonra curates postal code data from Canada Post, including addresses and geographical boundaries for use as reference data.
Tips and tricks
Finally, our recent webinar provided a few lessons for potential providers of open data. While Snowflake Marketplace makes it easier for data providers, they still need to:
- Hire people with good data engineering skills to build the data pipelines and publish to Snowflake Marketplace.
- Understand how much it will cost to process, store, and replicate the data, particularly across clouds and geographies.
- Focus on paid listings rather than custom listings, particularly to leverage Snowflake’s auto-fulfillment, usage monitoring, and payment capabilities.
- Ensure appropriate, consistent, and balanced legal terms with professional legal advice.
- Provide thorough documentation on how to use the data, FAQs, and sample queries to save time answering questions.
- Obfuscate any PII data from all the data sets to ensure regulatory compliance.
- Understand use cases and consumption to enable product and customer experience improvements.