Support One to One Hundred Data Warehouse Workloads with Snowflake
Nov 02, 2018
Author: Michael Nixon
In a previous blog post, Breaking Free From Complexity Saves Time and Money, I explained why Snowflake’s data warehouse-as-a-service is unique for its abilities to automatically handle nearly all database operations while eliminating complexity. In this blog, I’ll continue the theme by discussing workload concurrency and eliminating time delays caused by resource contention.
Why Multiple Workloads create Bottlenecks
Understanding why workload concurrency can be challenging requires a quick review of the hardware structure of typical enterprise data warehouse platforms. Whether cloud-based or on-premises, nearly all data warehouse platforms (including hardware appliances) operate as either a single node system or a multi-node cluster. As a platform, multi-node clusters technically remain a single system; however, to support parallel processing, workloads are spread across the nodes.
Illustration 1 represents a typical multi-node data warehouse platform. Because the platform is a single system, all workloads share cluster resources. Sharing resources creates resource contention (also known as competition or thrashing) when differing workloads require varying amounts of compute resources.
Workarounds to prevent thrashing are possible, but the options include manually assigning workloads to specific nodes of the cluster and partitioning (segmenting) the data to move and match up with the related compute resources. All of this takes time and energy. In addition, there are negative performance implications with carving up and effectively reducing the amount of resources available to a workload. As such, most data warehouse teams avoid the headaches associated with carving up compute resources. Throwing more hardware into the mix is an option, but it only adds more cost while increasing complexity.
Queueing up for delays
To prevent resource contention and added costs (while maintaining performance), most data warehouse shops use the entire cluster for a workload, then schedule workloads to run at different times throughout the day (also known as queueing). Some teams will take the chance and allow different workloads to run simultaneously, but they must actively monitor the platform to ensure a workload is neither starved for resources nor over-consuming them at the expense of another workload.
The trade-off with queueing is time delays. It’s common to push large data jobs, including data science workloads, to late in the day (or to another day) when resources are more readily available. Similarly, for multiple workloads running concurrently, the highest priority workload, such as executive dashboards, get the resources while other, competing jobs get booted off the platform.
Unlimited Workload Concurrency
Snowflake enables you to achieve faster time-to-value by accommodating any number of concurrent users while avoiding complexity and queueing. Without ever competing for resources, you are free to create and size as many virtual warehouses as needed while supporting different workloads. By isolating workloads (shown in illustration 3) you also eliminate manually partitioning data.
With Snowflake, you can set up individual warehouses to automatically scale, without user or operator intervention. Through Snowflake’s data ingestion service, Snowpipe, data ingestion runs in the background without tapping into a single warehouse.
Empower Your Teams and Accelerate Results
The sky’s the limit with Snowflake. Through our virtual data warehouses and concurrent operation, there are never constraints. Create as many data warehouses as you need; dynamically adapt to performance requirements; be more responsive to users, and never experience a fight for resources. With Snowflake, teams get faster access to data by executing workloads based on the timing that works best for them.