Introducing the Snowflake Data Warehouse Challenge
Sep 22, 2015
Author: Jon Bock
Market News, Snowflake Technology
When we founded Snowflake, our goal wasn’t just to make a data warehouse that was 10% or 20% better than existing data warehouses. Our goal was to build the dream data warehouse for the new cloud world, a data warehouse orders of magnitude better than existing solutions.
The dream data warehouse we set out to create needed to be extremely easy to use, have unlimited scalability, provide instant elasticity, and be a fully managed service that transparently provided availability (including across data centers), data protection, and optimized performance. But it wouldn’t stop there: it would also handle semi-structured data (like JSON, Avro, and XML) as a first-class citizen–all of the ways that the data warehouse handles and optimizes structured data (columnar storage, intelligent pruning, vectorized processing, etc.) would be applied to semi-structured data at scale.
However, we quickly realized to make the dream data warehouse a reality, we needed to completely reinvent the data warehouse from scratch.
It definitely took a lot of time, development effort, and perseverance to go that route. However, after three years of development we can proudly say that we have delivered on that dream and beyond. Snowflake’s data warehouse is delivering amazing features and has capabilities that were previously impossible or impractical.
To illustrate what becomes possible when you reinvent the data warehouse, we’re introducing the “Snowflake Challenge”–a series on this blog that will take challenges that would be impossible for existing data warehouses and demonstrate how Snowflake makes them possible.
Here are a few of the themes that we’ll be demonstrating in the Snowflake Challenge:
- Faster is free: in traditional data warehousing, performance is limited by resources, meaning that more performance requires spending money to get more resources. However, Snowflake’s unique architecture removes that tax, making it possible to do things like run your workload 10x faster at the same cost, or update a half billion rows in minutes without spending a fortune.
- Capacity planning as we know it is dead: rather than force you to determine your needs years in advance, Snowflake makes it possible to scale storage, compute, and users independently, on-the-fly and without disruption. For example, that means you never need to worry about running out of storage, can scale up concurrency without impacting response time, and more.
- Stop moving and copying data: until now you needed different systems for different types of data, uses, and users–a system for semi-structured data, a separate system for each group of users in order to avoid performance contention, a system for testing, and more. With Snowflake you can put semi-structured and structured data in a single place without sacrificing performance or flexibility, instantly clone data to create a testing environment, and securely share data inside and outside your organization.
- Focus on using your data, not on managing your data warehouse: because Snowflake is a data warehouse as a service, you can focus on loading and querying your data–Snowflake takes care of the rest. Data protection, cross-datacenter availability, online upgrades, encryption, and more are provided automatically by the Snowflake data warehousing service.
We’re kicking off the Snowflake Challenge with one of the oldest challenges in data warehousing–loading data without impacting query performance. That’s something that Snowflake customers are doing every day. You can read more in this post.
We’re interested in hearing your feedback, and in hearing the challenges that you’re looking to solve. Join the conversation.