Using the Snowflake Information Schema
Nov 01, 2016
Author: Kent Graziano
Engineering, How to Use Snowflake, Snowflake Technology
Like any good database, Snowflake has a data dictionary that we expose to users. We call it the Information Schema. In fact it is a set of views against our metadata layer that make it easy for you to examine some of the information about the databases, schemas, and tables you have built in Snowflake.
What objects can I see?
There are 18 views in the Information Schema that you can query directly. You can see the full list in the documentation here.
It is important to note that, for every database in Snowflake, there is a separate Information Schema so that queries only return data about your current database. Additionally, when writing the SQL, the view names in the Info Schema must be fully-qualified, particularly with ‘information_schema’ as you will see in the examples.
For ease of reference I have reverse engineered the schema of the Information Schema into a data model diagram and added in the appropriate PKs and FKs. You can use this diagram as a guide when writing SQL against the schema. (Download a PDF with descriptions here)
How might I use the Information Schema?
For those that may not have written queries against a data dictionary before, let me give you a few examples.
1. List the Tables and Views in a Schema
A very simple place to start is to list the tables and views in one of your database schemas:
SELECT table_name, table_type FROM kent_db.information_schema.tables WHERE table_schema = 'PUBLIC' ORDER BY 1;
Note that this SQL example (and all the examples in this post) specifies a particular schema (i.e., PUBLIC). If you have multiple schemas in your database, it would be wise to include a schema specification in the predicate whenever possible (unless you really do want to see everything in the database). Also keep in mind that the values in the information schema views are usually strings and are case sensitive so be sure to enclose them in single quotes when referencing in the predicate clause.
The results should look like this:
2. Build a Data Dictionary
Suppose you want to generate a data dictionary type listing about your tables for a report. If you have recorded COMMENTs against your table, you could run this query to create that listing:
SELECT 'Comment on table.'||table_name||' is '''||nvl(tables.comment||'''', 'No Comment Recorded''') FROM kent_db.information_schema.tables tables WHERE table_schema = 'PUBLIC';
In addition to showing you comments on each table in the PUBLIC schema, this SQL also points out where you have not added comments to your tables.
NB: Most BI tools can reverse engineer comments from a data dictionary, so this information can be used to build out the metadata in your BI tool to let users know the meaning of the various tables in the system.
Another possibility is you may want a list of all the constraints you have defined in a particular schema. You could run a SQL like this:
SELECT table_name, constraint_type, constraint_name FROM sales.information_schema.table_constraints WHERE constraint_schema = 'TPC_DV_OPT' ORDER BY table_name;
The results should look like this:
3. Dynamically Generate SQL
Dynamic SQL is a method for using data from the information schema to generate SQL statements. For example, suppose you need to clean up a database and drop most of the tables so you can regression test the CREATE script. There are many ways you can do this. If you want to use a SQL script to do it, you could write the script by hand, which is fine if you only have a few tables, but it would be better to generate the script.
Using the Information Schema in Snowflake, you can do something like this:
SELECT 'drop table '||table_name||' cascade;' FROM kent_db.information_schema.tables tables WHERE table_schema = 'PUBLIC' ORDER BY 1;
The output should be a set of SQL commands that you can then execute. And as the schema evolves and more tables are added, this script will pick up the new tables the next time you run it so you don’t even have to remember to edit it (hence the “dynamic” part of “dynamic SQL”).
With the Snowflake UI, you can export these results, save to a script, then execute.
With access to this kind of metadata, the possibilities are endless.