Blog/Core Platform/Generalized Skew Handling: How Snowflake Automatically Mitigates Data Skew
MAY 28, 2026/8 min readCore Platform

Generalized Skew Handling: How Snowflake Automatically Mitigates Data Skew

Generalized Skew Handling (GSH) is a Snowflake query execution feature that automatically detects data skew at runtime and redistributes work to idle threads — without any user configuration. GSH processes tens of millions of queries daily, delivering an average 3.46% execution time reduction fleet-wide, with an individual customer workload seeing a 27x speedup.

GSH provides:

  • Faster queries: Reduces idle-thread bottlenecks that make parallel queries run serially
  • Lower cost potential: Queries may finish sooner, which can reduce warehouse credit consumption in some workloads
  • Zero configuration: Activates automatically at runtime — no tuning or hints required

What is data skew, and why does it matter?

In a massively parallel system, a query's speed is ultimately dictated by its slowest worker. Data skew, where one worker processes a disproportionate amount of data compared to others, effectively converts a parallel execution into a serial one. The rest of the warehouse remains idle while a single thread struggles through an overloaded workload.

Unlike most optimizations that focus on reducing the total work, the philosophy behind GSH is to better utilize available resources to do the same amount of work. It deliberately redistributes work to utilize idle compute capacity, even if it results in a slight increase in the total work done.

Customers routinely encounter skew originating from various sources:

  • Tables with fewer files than the number of worker threads.
  • Filter predicates that are more selective on some workers than others.
  • Certain aggregate and window functions that have limited parallelization.

Even minor skew can be amplified into a major performance problem by downstream operators. For example:

  • Exploding joins can turn minor initial skew into a significant performance issue.
  • INSERT and COPY statements can cause a query to bottleneck entirely on IO.
  • Expensive functions and user-defined functions (UDFs) exacerbate small differences in row count.

Until now, Snowflake addressed skew on a case-by-case basis through operator-specific mitigations, such as work stealing during table scans. GSH extends this by adding general skew mitigation for operators that do not require a specific data distribution.

When does Generalized Skew Handling help?

GSH automatically activates when Snowflake detects skew at runtime. No user configuration is needed. Workloads that benefit most include:

  • Queries scanning tables with fewer files than available threads
  • Filter predicates that are highly selective on some partitions but not others
  • Joins that explode row counts unevenly across workers
  • INSERT/COPY operations that bottleneck on IO in a single thread
  • Expensive UDFs or window functions with skewed partition keys

How does Snowflake execute queries?

Snowflake's execution engine utilizes a query plan structured as a directed acyclic graph of operators. These operators are connected by dataflow edges that govern how and whether data moves between servers. The plan runs across multiple threads on multiple servers. For efficiency and maximum parallelism, data is typically kept on the same thread throughout the query, which keeps threads largely independent.

Skew arises when data from a skewed source — such as a table scan that only reads a few files — stays on the same thread. All downstream operators inherit this skew, making the execution serial even if the operators themselves are theoretically capable of running in parallel. In these cases, rebalancing data across threads immediately after the skewed source can significantly improve performance.

Consider a simple query plan with a table scan feeding into a filter, which then feeds into an expensive UDF. Most rows are filtered out, but the few that remain are processed by the expensive UDF, creating a massive skew bottleneck in a single thread. The diagram below assumes a two-server warehouse for simplicity, but the same principle can lead to even worse slowdowns on larger warehouses.

Figure 1: Generalized Skew Handling: How Snowflake Automatically Mitigates Data Skew
Figure 1: Generalized Skew Handling: How Snowflake Automatically Mitigates Data Skew

In this example, the filter is highly effective at a global level — it reduces input from 2,000 rows to just 100. However, due to the data distribution across the two servers, all 100 surviving rows happen to be on Server 1, leaving Server 2 idle for the expensive UDF evaluation.

How does Generalized Skew Handling work?

The core idea behind GSH is the introduction of adaptive dataflow edges that decide at runtime whether data should stay on the same thread or be redistributed to idle workers. Instead of the query planner making a static decision at compile time, GSH defers the distribution strategy until execution, when the actual data distribution is known.

  • Compile time: The query planner identifies specific dataflow edges in the plan where skew is a possibility and marks them as candidates for adaptive redistribution.
  • Runtime: Each instance of a GSH edge operates using an independent state machine that continuously monitors the flow of data.
  • Adaptation: If the edge detects that other worker threads are idle, it transitions to load balancing mode and begins sending rows to the underutilized workers. If all workers are busy, it remains in local mode to prevent unnecessary network overhead.

The state machine transitions through: INITIAL → DECIDING → LOCAL or LOAD_BALANCING. Key transitions are triggered by sending rowsets, detecting skew and receiving end-of-file signals. Crucially, each instance's state machine is completely independent and requires no cross-worker coordination, enabling adaptation to happen within milliseconds.

Figure 2: Generalized Skew Handling: How Snowflake Automatically Mitigates Data Skew
Figure 2: Generalized Skew Handling: How Snowflake Automatically Mitigates Data Skew

How do we detect skew?

The central challenge for GSH is that skew is easy to diagnose after a query finishes, but difficult to detect during execution. Redistributing data too eagerly wastes CPU cycles and network bandwidth, while being too conservative leaves CPUs idle. Finding the appropriate balance called for some experimentation.

  • Row percentage model (initial approach): If a thread processes more rows than a constant multiple of the average, it switches to load balancing. However, this model assumes uniform per-row cost, which is often false. An expensive UDF can make fewer rows take 10x longer.
  • Idle time model (production default): After a grace period, GSH checks whether worker instances have been idle (no rows received) for a specified duration. If enough instances are idle, the system switches to load balancing. This captures row-count skew, compute skew and UDF skew by measuring what matters most: whether workers are available.

What performance improvements does GSH deliver?

Generalized Skew Handling processes tens of millions of queries per day across the Snowflake platform. In internal evaluations of affected queries, Snowflake observed an average execution time reduction of 3.46%.

  • In one production workload evaluated internally, execution time improved from 192 seconds to 7 seconds after GSH parallelized a heavily skewed join.
  • In another production workload evaluated internally, execution time decreased from 53 seconds to 6.6 seconds when GSH redistributed a skewed insert operation.

How did we test for correctness?

Many operators assume specific properties regarding the distribution of data across servers. Such operators must not have their inputs redistributed. To achieve robust testing at scale, the engineering team built a test hook that forces GSH onto every qualifying edge in all test queries, followed by exhaustive testing of state machine transitions across a wide variety of query shapes.

The development process also led to the creation of several powerful debugging tools:

  • Integrated test and production logs with Graphviz to quickly and easily visualize state transitions.
  • Perfetto trace file generation, which enables timeline visualization of state transitions across all instances.
  • Integration with internal query profiling tools for post-hoc analysis of skew.

A careful, incremental rollout proved essential — it uncovered various edge cases, such as interactions with specific file formats used in COPY statements, and allowed the team to fix them before expanding the feature to a broader population.

What's next?

Future work on GSH is focused on greater adaptability and broader applicability:

  • Continuously deciding model: Moving beyond the current terminal state of load balancing, future work will allow GSH to transition freely between local and redistributing modes as the query progresses.
  • Extended applicability: Applying GSH to an expanded set of operator types and query patterns, including TopK queries and window functions with skewed partition keys.

Conclusion

Generalized Skew Handling does not attempt to avoid work, but rather helps ensure that available compute resources are always utilized when there is useful work to be done. By employing adaptive runtime decisions at the dataflow edge, GSH solves a broad and fundamental class of skew problems without requiring operator-specific knowledge, providing performance benefits to tens of millions of queries daily across the Snowflake fleet.

Further reading

Learn more about the author

Corbin McElhanney

Corbin McElhanney

Software Engineer

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More