10 Best Practices Every Snowflake Admin Can Do to Optimize Resources
Oct 30, 2020 | 10 Min Read
Author: David A. Spezia | Contributing Authors: Mike Klaczynski
As we covered in part 1 of this blog series, Snowflake’s platform is architecturally different from almost every traditional database system and cloud data warehouse. Snowflake has completely separate compute and storage, and both tiers of the platform are near instantly elastic. The need to do advanced resource planning, agonize over workload schedules, and prevent new workloads on the system due to the fear of disk and CPU limitations just go away with Snowflake. As a cloud data platform, Snowflake can near instantly scale to meet planned, ad hoc, or surprise growth. This means instead of paying for a fixed, limited amount of storage and compute, the amount of storage and compute grows and shrinks as your needs change over time.
By taking advantage of a core tenet of the cloud, elasticity and compute can be dynamically scaled to workloads throughout the day as concurrency needs or raw compute power fluctuate to meet demand. Storage will grow and shrink over time for databases, tables, and meta-data. There are a few optimizations every Snowflake account administrator should make and some more-advanced methods they should consider as their Snowflake compute footprint grows. Because compute and storage are separated and they are elastic, these resources should be monitored for consumption, surprise growth, and resource efficiency.
Snowflake is virtually unlimited by default, and account administrators can put in place minor account-level and resource-level restrictions to defend against rogue users or suboptimal use of resources and credits. For example, they can proactively control compute at the individual virtual warehouse level, at the user level, or at the account and organization level through resource monitors. Users, databases, tables, queries, and workloads can be monitored through the ACCOUNT_USAGE schema shared with all Snowflake accounts. The following figure shows the status and basic configuration of virtual warehouses in an account:
Given all this, here are 10 best practices Snowflake account administrators should be doing.
Best Practice #1: Enable Auto-Suspend
Make sure all virtual warehouses are set to auto-suspend. This way, when they are done processing queries, auto-suspend will turn off your virtual warehouses when they are done processing queries, and thus stop credit consumption. Run the following query to identify all the virtual warehouses that do not have auto-suspend enabled:
SHOW WAREHOUSES; SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) WHERE "auto_suspend" IS NULL;
Best Practice #2: Enable Auto-Resume
Make sure all virtual warehouses are set to auto-resume. If you are going to implement auto-suspend and set appropriate timeout limits, enabling auto-resume is a must; otherwise, users will not be able to query the system. Run the following to identify all the virtual warehouses that will not auto-resume when they are queried:
SHOW WAREHOUSES; SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) WHERE "auto_resume" = FALSE;
Best Practice #3: Set Timeouts Appropriately for Workloads
All virtual warehouses should have an appropriate timeout for their particular workload:
- For task, data loading, and ETL/ELT warehouses, set the timeout for suspension immediately upon completion.
- For BI and SELECT query warehouses, set the suspension timeout to 10 minutes in most situations to keep data caches warm for frequent access by end users.
- For DevOps, DataOps, and data science warehouses, set the suspension timeout to 5 minutes because having a warm cache is not as important for ad hoc and highly unique queries.
Here’s an example configuration:
SHOW WAREHOUSES; SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) WHERE "auto_suspend" > 600; --Your Threshold in Seconds --SQL Pro, just do all 3 SHOW WAREHOUSES; SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) WHERE "auto_resume" = FALSE OR "auto_suspend" IS NULL OR "auto_suspend" > 600;
Best Practice #4: Set Account Statement Timeouts
Use the STATEMENT_QUEUED_TIMEOUT_IN_SECONDS and STATEMENT_TIMEOUT_IN_SECONDS parameters to automatically stop queries that are taking too long to execute, either due to a user error or a frozen cluster. Customize warehouse, account, session, and user timeout-level statements according to your data strategy for long-running queries.
Here’s an example:
ALTER WAREHOUSE LOAD_WH SET STATEMENT_TIMEOUT_IN_SECONDS= 3600; SHOW PARAMETERS IN WAREHOUSE LOAD_WH;
Best Practice #5: Identify Warehouses Deviating from the Seven-Day Average
Here’s a handy tip that came from a direct interaction I had with a customer who set a warehouse to a larger size to do a task but did not put it back the way he found it. I made the following query for him to run every morning to identify warehouse credit usage that deviates from the seven-day average. The following figure shows the results of running the query.
SELECT WAREHOUSE_NAME, DATE(START_TIME) AS DATE, SUM(CREDITS_USED) AS CREDITS_USED, AVG(SUM(CREDITS_USED)) OVER (PARTITION BY WAREHOUSE_NAME ORDER BY DATE ROWS 7 PRECEDING) AS CREDITS_USED_7_DAY_AVG, (TO_NUMERIC(SUM(CREDITS_USED)/CREDITS_USED_7_DAY_AVG*100,10,2)-100)::STRING || '%' AS VARIANCE_TO_7_DAY_AVERAGE FROM"SNOWFLAKE"."ACCOUNT_USAGE"."WAREHOUSE_METERING_HISTORY" GROUP BY DATE, WAREHOUSE_NAME ORDER BY DATE DESC;
Best Practice #6: Monitor Warehouses That Are Approaching the Cloud Service Billing Threshold
The following query looks at warehouses where cloud services costs comprise a high percentage of the workload. Overall for an account (and outside of serverless features), Snowflake will charge for cloud services only if they exceed 10% of the daily virtual warehouse credit consumption. Cloud services tasks are useful for meta-data operations such as BI tool discovery queries, heartbeat queries, SHOW commands, cache usage, and several other service optimizing features. So if you use 100 compute credits in a day, but you use 15 additional credits for cloud services (unlikely), you will be charged an additional 5 credits for that day for the 5 cloud service credits that were over the 10% allowance. This means 105 credits total would be billed for the day with Snowflake providing 10 free credits of cloud services usage. This query helps you figure out which warehouses are nearing or exceed that 10% threshold so that you can investigate.
WITH cloudServices AS (SELECT WAREHOUSE_NAME, MONTH(START_TIME) AS MONTH, SUM(CREDITS_USED_CLOUD_SERVICES) AS CLOUD_SERVICES_CREDITS, COUNT(*) AS NO_QUERYS FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY" GROUP BY WAREHOUSE_NAME,MONTH ORDER BY WAREHOUSE_NAME,NO_QUERYS DESC), warehouseMetering AS (SELECT WAREHOUSE_NAME, MONTH(START_TIME) AS MONTH, SUM(CREDITS_USED) AS CREDITS_FOR_MONTH FROM"SNOWFLAKE"."ACCOUNT_USAGE"."WAREHOUSE_METERING_HI TORY" GROUP BY WAREHOUSE_NAME,MONTH ORDER BY WAREHOUSE_NAME,CREDITS_FOR_MONTH DESC) SELECT *, TO_NUMERIC(CLOUD_SERVICES_CREDITS/NULLIF(CREDITS_FOR_MONTH,0)*100,10,2) AS PERCT_CLOUD FROM cloudServices JOIN warehouseMetering USING(WAREHOUSE_NAME,MONTH) ORDER BY PERCT_CLOUD DESC;
Best Practice #7: Drop Unused Tables
You might have unused tables that are candidates to be dropped. Just make sure no one is querying these tables. In fact, you might want to make it mandatory to check all tables before deletion. (If you have Time Travel set up, you can undrop a table if you make an error.) This is specific to the database context, so be sure to look at tables across your databases. Also, be mindful of tables used only in view DDLs. Here’s an example query:
--DML from the Information Schema to identify Table sizes and Last Updated Timestamps SELECT TABLE_CATALOG || '.' || TABLE_SCHEMA || '.' || TABLE_NAME AS TABLE_PATH, TABLE_NAME, TABLE_SCHEMA AS SCHEMA, TABLE_CATALOG AS DATABASE, BYTES, TO_NUMBER(BYTES / POWER(1024,3),10,2) AS GB, LAST_ALTERED AS LAST_USE, DATEDIFF('Day',LAST_USE,CURRENT_DATE) AS DAYS_SINCE_LAST_USE FROM INFORMATION_SCHEMA.TABLES WHERE DAYS_SINCE_LAST_USE < 90 --Use your Days Threshold ORDER BY BYTES DESC; -- Last DML on Object SELECT (SYSTEM$LAST_CHANGE_COMMIT_TIME( 'DATABASE.SCHEMA.TABLE_NAME')/1000)::TIMESTAMP_NTZ; -- Queries on Object in Last NN Days SELECT COUNT(*) FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY" WHERE CONTAINS(QUERY_TEXT,'TABLE_NAME') AND DATEDIFF('Day',START_TIME,CURRENT_DATE) < 90; -- Last Query on Object in Last NN Days SET TABLE_NAME = 'MY_TABLE'; SELECT START_TIME, QUERY_ID, QUERY_TEXT FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY" WHERE CONTAINS(QUERY_TEXT,$TABLE_NAME) AND QUERY_ID = ( SELECT TOP 1 QUERY_ID FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY" WHERE CONTAINS(QUERY_TEXT,$TABLE_NAME) ORDER BY START_TIME DESC) AND DATEDIFF('Day',START_TIME,CURRENT_DATE) < 90; -- Object Used in a View Definition SELECT * FROM INFORMATION_SCHEMA.VIEWS WHERE CONTAINS(VIEW_DEFINITION,'TABLE_NAME') AND DATEDIFF('Day',LAST_ALTERED,CURRENT_DATE) < 90;
Best Practice #8: Purge Dormant Users
It’s a good idea to purge from your account dormant users or users that never logged in to Snowflake. Here’s an example that generates a list of both types of users:
--Never Logged In Users SHOW USERS; SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) WHERE "last_success_login" IS NULL AND DATEDIFF('Day',"created_on",CURRENT_DATE) > 30; --Stale Users SHOW USERS; SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) WHERE DATEDIFF('Day',"last_success_login",CURRENT_DATE) > 30
Guardrails for Automatic Scaling
Real-life usage of a data platform varies significantly from hour to hour, day to day, and month to month. By default, Snowflake is designed to scale automatically and provide maximum performance and efficiency. However, some workloads are better served by static and highly predictable resources, and Snowflake can easily be configured to provide that consistent consumption model for every day of the year. By implementing a few account-level and resource-level restrictions, account admins can prevent unexpected usage by careless users or suboptimal scaling profiles:
- Through resource monitors, admins can receive proactive alerts and control compute at the account level, virtual warehouse level, and even the user level.
- On a reactive basis, admins can monitor users, databases, tables, queries, and workloads through the ACCOUNT_USAGE schema shared with all Snowflake accounts. This data is commonly used to forecast usage trends and provide showback and chargeback billing for departments, teams, and workloads. Daily usage metrics are built into the platform for both individual users, account administrators, and organization administrators. This figure shows the built-in dashboard providing an hourly breakdown of credits for both compute and cloud services:
Best Practice #9: Find Warehouses That Don’t Have Resource Monitors
Resource monitors are a great way to proactively control workload budgets and prevent unexpected resource spikes. Resource monitors can help monitor both user usage and service account usage in Snowflake. First, you should have dedicated virtual warehouses for each of your loading, ELT, BI, reporting, and data science workloads as well as for other workloads. Accounts and warehouses can have total, yearly, monthly, weekly, and daily credit quotas.
The following query will identify all warehouses that do not have a resource monitor:
SHOW WAREHOUSES; SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) WHERE "resource_monitor" = 'null';
Best Practice #10: Apply Resource Monitors
You can use the UI or SQL to apply your resource monitor policy. Based on account preference settings, resource monitors can notify you when consumption reaches a lower threshold, and then suspend the warehouse or account at a higher threshold.
Considerations for Resource Monitoring
- We recommend setting monitors to notify you when a certain threshold of consumption is reached.
- When consumption approaches the maximum budgeted level, set the resource monitor to auto-suspend the warehouse or the entire account, allowing queries to complete but preventing future requests.
- Resource monitors can also be used to terminate all currently running queries and immediately suspend the resource or account. This setting is usually reserved for situations where a hard quota is exceeded.
- For customers that do not want to set hard limits, it’s still always a good idea to have notification monitors set on all warehouses in case usage unexpectedly spikes. That way, all admins within the account will get an email or on-screen notification when thresholds are reached.
The following figure shows the resource monitor configuration screen:
The SQL below can be used to programmatically create resource monitors:
CREATE RESOURCE MONITOR "WAREHOUSE_RM" WITH CREDIT_QUOTA = 150 TRIGGERS ON 75 PERCENT DO NOTIFY ON 90 PERCENT DO NOTIFY; ALTER WAREHOUSE "WAREHOUSE_WH" SET RESOURCE_MONITOR = "WAREHOUSE_RM";
Bonus Best Practice: Use BI Partner Dashboards
As a bonus eleventh best practice, use the dashboards created by the Snowflake enthusiasts at some of our BI and analytics partners to help you monitor Snowflake usage. Because Snowflake’s usage is shared back to every account with a standard schema, these are plug-and-play dashboards. One of the best parts of Snowflake is that it uses data sharing at petabyte scale to share pipelines of usage history back to all Snowflake accounts. Please see the documentation for more details.
Tableau has put together a variety of dashboards showing credit usage, performance, and user adoption for the platform. The Compute Cost Overview dashboard, shown below, can be used to understand credit burn, plan budget allocation, and identify peak outliers to reduce the impact of “buy times” spend. Tableau’s Enterprise Analytics team uses these dashboards to uncover emerging usage patterns and optimize warehouse cost efficiency.
The team at Sigma put together a series of dashboards looking at compute costs, user uptick, and performance. Sigma Analytics offers easy-to-use templates to create presentation-ready dashboards displaying your Snowflake account usage data. These dashboards can be used to quickly dive into understanding and sharing your organization’s Snowflake usage. The figure below shows the pre-built usage dashboard available in Sigma:
Follow the links below to see the various usage dashboards pre-built by our BI partners:
With Snowflake’s highly elastic compute and per-second billing model, account administrators should monitor usage, growth, and resource efficiency on an ongoing basis to make sure they match performance requirements and budgets. Even though Snowflake can help to optimize resources automatically, there are opportunities for account administrators to further tune their deployment, especially as their compute footprint grows. We recommend these basic best practices for monitoring and optimizing resources to avoid common pitfalls that are easy to overlook.