We understand that query cost and performance are important topics for Snowflake customers, and so we have been building an efficient and zero-maintenance engine that enables our customers to achieve industry-leading performance without needing to constantly monitor and tune query performance. Snowflake customers can simply rely on our engine and don’t need to develop an expert understanding of query execution.
That being said, many customers love visibility into query execution and embrace fine-grained query control. Snowflake’s Query Profile provides a graphical representation of query execution, along with query execution statistics. Based on the Query Profile, customers can take actions such as changing their cluster key, altering their warehouse size, or rewriting their query. This feature has been available in Snowsight, and programmatic access to Query Profile is now available in public preview. Customers can now use get_query_operator_stats to analyze query performance at scale.
The table below summarizes the new table function that enables programmatic access to Query Profile:
When optimizing Snowflake, customers tend to focus on the most expensive queries. Customers can use query_history, an account usage view that contains query data from the past year, to rank queries by execution time or disk spilling. They can focus on the queries with the highest execution time or those with the largest disk spilling, and use Query Profile to delve more deeply in query execution.
Once customers have identified the queries that they want to optimize, they can use programmatic access to a Query Profile to bring more visibility to where improvements can be made, including finding inefficient pruning, “exploding” joins, and identifying patterns in expensive queries.
Identify inefficient pruning
In Snowflake, query pruning can improve query performance by minimizing the amount of data queries need to scan. Customers can improve query pruning using features such as clustering and Search Optimization Service.
In order to diagnose inefficient pruning, customers can use get_query_operator_stats to compare the partitions_scanned and partitions_total values within TableScan operators. Customers can also check for a narrow filter above the TableScan in order to identify potential query performance improvements from better pruning.
Get_query_operator_stats includes the column names for each query operator, so columns used in the filter can be good candidates for a cluster key or search optimization index.
Visibility into “exploding” joins
When a user joins two tables and provides a condition where rows from one table match multiple rows from another table, the result is an “exploding” join where the output can exceed millions of rows. An “exploding” join can also occur if a user omits a join condition.
select table1.c1, table2.c2 from table1 join table2 on table1.c1 = table2.c1 and table1.c2 = table2.c2;
In the above query, if the properties c1 and c2 have a low cardinality, it’s highly likely that this query will output an excessive number of records, even if the source table1 and table2 are relatively small. If the query is expensive, customers can use get_query_operator_stats to analyze the input_rows and output_rows count for the join operator. If the output_rows is excessively large compared to the input_rows, the customer can attribute the poor query performance to the “exploding” join and, if possible, rewrite the query.
Identifying patterns in expensive queries
In most cases, expensive queries aren’t just executed one time. The same query may be repeated thousands of times, each time with different parameter values. Customers can use query_history and get_query_operator_stats to identify queries with similar shape and analyze them all at once, as opposed to troubleshooting each one individually with the Query Profile UI.
Programmatic access to Query Profile is now in public preview. To get started, review the documentation for get_query_operator_stats.