Automating Snowflake’s Semi-Structured JSON Data Handling: Part 2

Author: Craig Warman

How to Use Snowflake, Snowflake Technology

This blog post presents a technique for automatically building database views based on semi-structured JSON data stored in Snowflake tables. It’s real time-saver, and you’ll find the complete code plus a usage example near the bottom of this post. Feel free to jump there if you’re in need of a quick solution. Or, read on if you’d like to dig a little deeper – you’ll find details here on exactly how everything works!

Overview

The first of this two-part blog post series described a technique for automating creation of a database view over JSON data that has been loaded into a table column of type VARIANT.  This is typically done to hide the complexity associated with SQL SELECT statements that must reference JSON document elements by their hierarchical paths.  But, it’s a manual process to both create and maintain these views, so having an automated process can help reduce errors and save time.

Background

The stored procedure outlined in the first blog post was pretty basic in its construction.  Its primary purpose was to illustrate the concept of how the overall process essentially works. Here are some topics that we’ll consider in this blog post to enhance that stored procedure so it’s ready for regular use in the real world:

  • Column case : We should have an option that allows the generated view’s column case match that of the JSON elements, rather than being uppercase (which is Snowflake’s default behavior).
  • Column types – We should also have an option to disable defining the datatype of each view column such that it matches that of the underlying data. This will allow us to handle cases where multiple JSON documents have attributes with the same name but actually contain data with different datatypes. I’ll provide more details for this in the sections below.
  • Arrays – JSON documents can contain both simple and object arrays, and the existing stored procedure simply returns these as view columns of datatype ARRAY.  It would be desirable to have the contents of these arrays exposed as separate view columns.

Column Case

Here’s an example of one of the documents in the sample JSON dataset from the first blog post:

{
  "ID": 1,
  "color": "black",
  "category": "hue",
  "type": "primary",
  "code": {
    "rgb": "255,255,255",
    "hex": "#000"
  }
}

Now, compare that with the Data Definition Language generated by the existing stored procedure:

CREATE OR REPLACE VIEW mydatabase.public.colors_vw AS 
SELECT 
json_data:"ID"::FLOAT as ID, 
json_data:"category"::STRING as category, 
json_data:"code"."hex"::STRING as code_hex, 
json_data:"code"."rgb"::STRING as code_rgb, 
json_data:"color"::STRING as color, 
json_data:"type"::STRING as type
FROM mydatabase.public.colors

When we look at the view’s description, we see that the column names are uppercased (which is Snowflake’s default behavior):

desc view colors_vw;
+----------+-------------------+--------+-------+---------+-------------+------------+-------+------------+---------+
| name     | type              | kind   | null? | default | primary key | unique key | check | expression | comment |
|----------+-------------------+--------+-------+---------+-------------+------------+-------+------------+---------|
| ID       | FLOAT             | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| CATEGORY | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| CODE_HEX | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| CODE_RGB | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| COLOR    | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| TYPE     | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
+----------+-------------------+--------+-------+---------+-------------+------------+-------+------------+---------+

But what if we want the case of the generated view column names to match that of the JSON document elements? It turns out that the fix is relatively easy. What we need to do is enclose the view column name aliases within double quotes, like so:

CREATE OR REPLACE VIEW mydatabase.public.colors_vw AS 
SELECT 
json_data:"ID"::FLOAT as "ID", 
json_data:"category"::STRING as "category", 
json_data:"code"."hex"::STRING as "code_hex", 
json_data:"code"."rgb"::STRING as "code_rgb", 
json_data:"color"::STRING as "color", 
json_data:"type"::STRING as "type"
FROM mydatabase.public.colors;

The case of the generated view’s columns will now match those of the JSON document elements:

We’ll implement this by adding a COLUMN_CASE parameter to the stored procedure definition:

create or replace procedure create_view_over_json (TABLE_NAME varchar, COL_NAME varchar, VIEW_NAME varchar, COLUMN_CASE varchar)

I happen to like self-describing parameters, so we’ll give the user who calls this stored procedure to specify either of these two settings for this parameter:

  • ‚uppercase cols‘ :The generated view column names will be uppercased (the current behavior)
  • ‚match col case‘  :The generated view column names will match those of the JSON document elements

Since these parameter settings differ in their first character, we can simply check to see if the first character of the COLUMN_CASE parameter is set to ‚M‘ and, and if it is, add double-quotes around the column aliases that get generated for the view creation DDL.

First, here’s the parameter check, with assignment to the alias_dbl_quote variable:

var alias_dbl_quote = "";
if (COLUMN_CASE.toUpperCase().charAt(0) == 'M') {
   alias_dbl_quote = "\""; }

Later, as we loop through the elements in the JSON documents and build column expressions, we’ll include a reference to the alias_dbl_quote variable:

while (element_res.next()) {
      col_list += COL_NAME + ":" + element_res.getColumnValue(1);                       // Path name
      col_list += "::" + element_res.getColumnValue(2);                                 // Datatype
      col_list += + alias_dbl_quote + element_res.getColumnValue(3) + alias_dbl_quote;  // Alias
     }

It turns out that this ability to enclose view column aliases with double quotes solves another potentially thorny problem.  Sometimes JSON documents contain elements whose names are actually reserved words, which can cause SQL compilation errors to be thrown during the CREATE VIEW execution. Here’s an example of what that might look like:

{
  "ID": 1,
  "color": "black",
  "category": "hue",
  "table": "primary",
  "code": {
    "rgb": "255,255,255",
    "hex": "#000"
  }
}

We’d get an error if we tried to generate a view that had TABLE as a column name:

CREATE OR REPLACE VIEW mydatabase.public.colors_vw AS 
SELECT 
json_data:"ID"::FLOAT as ID, 
json_data:"category"::STRING as category, 
json_data:"code"."hex"::STRING as code_hex, 
json_data:"code"."rgb"::STRING as code_rgb, 
json_data:"color"::STRING as color, 
json_data:"table"::STRING as table
json_data:"ID"::STRING as ID
FROM mydatabase.public.colors;

The easiest work-around in this case is to enclose those column aliases with double quotes – and we now have that option with our new COLUMN_CASE parameter:

CREATE OR REPLACE VIEW mydatabase.public.colors_vw AS 
SELECT 
json_data:"ID"::FLOAT as "ID", 
json_data:"category"::STRING as "category", 
json_data:"code"."hex"::STRING as "code_hex", 
json_data:"code"."rgb"::STRING as "code_rgb", 
json_data:"color"::STRING as "color", 
json_data:"table"::STRING as "table"
FROM mydatabase.public.colors;

Column Types

Our existing stored procedure sets the datatype of each view column so it matches that of the underlying data. However, sometimes there are cases where multiple JSON documents have attributes with the same name but actually contain data with different datatypes. For example, one document might have an attribute that contains the value „1“ while another document has the same attribute with a value of „none“. Here’s what that might look like:

{
  "ID": 1,
  "color": "black",
  "category": "hue",
  "type": "primary",
  "code": {
    "rgb": "255,255,255",
    "hex": "#000"
  }
},{
  "ID": "none",
  "color": "white",
  "category": "value",
  "code": {
    "rgb": "0,0,0",
    "hex": "#FFF"
  }
}

The current stored procedure would try to generate a view containing two columns of the same name, but with different datatypes:

CREATE OR REPLACE VIEW mydatabase.public.colors_vw AS 
SELECT 
json_data:"ID"::FLOAT as ID, 
json_data:"ID"::STRING as ID,
json_data:"category"::STRING as category, 
json_data:"code"."hex"::STRING as code_hex, 
json_data:"code"."rgb"::STRING as code_rgb, 
json_data:"color"::STRING as color, 
json_data:"type"::STRING as type
FROM mydatabase.public.colors

That obviously won’t work.  There’s probably a couple of different ways around this problem. For the sake of simplicity (or maybe because I’m just basically lazy) I’ve chosen to add a parameter that disables setting the datatype of each view column so it matches that of the underlying data and instead forces all datatypes to STRING (or ARRAY, as we’ll discuss later).  We’ll start off by adding a COLUMN_TYPE parameter to the stored procedure definition:

create or replace procedure create_view_over_json (TABLE_NAME varchar, COL_NAME varchar, VIEW_NAME varchar, COLUMN_CASE varchar, COLUMN_TYPE varchar)

Next, we’ll give the user who calls this stored procedure to specify either of these settings for this parameter:

  • ‚match datatypes‘ : The generated view column will match those of the corresponding JSON data attributes (the current behavior)
  • ’string datatypes‘ : The generated view column names will be set to STRING or ARRAY

Once again, since these parameter settings differ in their first character, we can simply check to see if the first character of the COLUMN_TYPE parameter is set to ‚S‘ and – if so – set the datatypes of the columns that get generated for the view creation DDL to STRING or ARRAY.  

First, here’s the parameter check, with assignment to the attribute_type variable:

if (COLUMN_TYPE.toUpperCase().charAt(0) == 'S') {
   attribute_type = "DECODE (typeof(f.value),'ARRAY','ARRAY','STRING')"; }

You might recall from the first blog post that this variable was originally defined like so:

var attribute_type = "DECODE (substr(typeof(f.value),1,1),'A','ARRAY','B','BOOLEAN','I','FLOAT','D','FLOAT','STRING')";

So, what we’re doing is overriding that definition with this new one, which simply returns either STRING or ARRAY for the column datatype.  Here’s what that looks like for our current example:

CREATE OR REPLACE VIEW mydatabase.public.colors_vw AS 
SELECT 
json_data:"ID"::STRING as ID,
json_data:"category"::STRING as category, 
json_data:"code"."hex"::STRING as code_hex, 
json_data:"code"."rgb"::STRING as code_rgb, 
json_data:"color"::STRING as color, 
json_data:"type"::STRING as type
FROM mydatabase.public.colors

Yes, it’s a lazy solution, but it effectively solves the problem!

Arrays

JSON documents can contain both simple and object arrays, and the current stored procedure simply returns these as view columns of datatype ARRAY.  Here’s an example of what a simple array might look like – I’ll simply modify our current example dataset like so:

{
  "ID": 1,
  "color": "black",
  "category": "hue",
  "type": "primary",
  "code": {
    "rgb": [255,255,255],
    "hex": "#000"
  }
}

The existing procedure will generate the corresponding view column with a datatype of ARRAY:

desc view colors_vw;
+----------+-------------------+--------+-------+---------+-------------+------------+-------+------------+---------+
| name     | type              | kind   | null? | default | primary key | unique key | check | expression | comment |
|----------+-------------------+--------+-------+---------+-------------+------------+-------+------------+---------|
| ID       | FLOAT             | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| CATEGORY | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| CODE_HEX | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| CODE_RGB | ARRAY             | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| COLOR    | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| TYPE     | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
+----------+-------------------+--------+-------+---------+-------------+------------+-------+------------+---------+

This could present problems for end users, because when they access the data in this view they’ll get something like this:

It would be desirable to have the contents of these arrays exposed as separate view columns; that way, end users can look at individual array elements as columnar data rather than being forced to dig around within an array construct.  In order to do this, we’ll need a second query that can extract these individual array elements whenever we come across an element of datatype ARRAY. Here’s how we’ll build it:

var array_query = "SELECT DISTINCT \n"+
                 path_name + " AS path_name, \n" +
                 attribute_type + " AS attribute_type, \n" +
                 alias_name + " AS attribute_name, \n" +
                 "f.index \n" +
                 "FROM \n" + 
                 TABLE_NAME + ", \n" +
                 "LATERAL FLATTEN(" + COL_NAME + ":" + element_res.getColumnValue(1) + ", RECURSIVE=>true) f \n" +
                 "WHERE REGEXP_REPLACE(f.path, '.+(\\\\w+\\\\[.+\\\\]).+', 'SubArrayEle') != 'SubArrayEle' ";  // Don't return elements of nested arrays

Note that this second query looks a lot like the query that returns the elements and their datatypes that we explored in the first blog post.  Here are the main differences between this query and that one:

  • The text of this query is assigned to array_query, while the text of the other query is assigned to element_query.  The statement object and result sets will be similarly named in order to keep everything separate.
  • This query includes a reference to index, which is a column returned by the LATERAL FLATTEN that provides the index of each array element returned by the query. We’ll use this to generate both the element path and the view column alias.
  • The LATERAL FLATTEN in this query includes a reference to the current (outer) element whose datatype was found to be ARRAY – it’s contained within element_res.getColumnValue(1).  That’s the key to being able to seek out the elements of this particular array.

Notice that this query includes a WHERE clause that effectively prevents the return of elements that might be present in any nested arrays.  Suffice it to say that handling nested arrays gets pretty complicated, so we’ll be forced to codify them as columns of type ARRAY (in other words, we’re „exploding out“ only the first level arrays here).

We’ll execute this query in a similar fashion to how we executed the element_query in the first blog post:

var array_stmt = snowflake.createStatement({sqlText:array_query});
var array_res = array_stmt.execute();

We can now loop through the array elements returned by this query and build view column expressions that look like this:  

col_name:"code"."rgb"[0]::FLOAT as code_rgb_0

The mechanism we’ll use is very similar to that which was described for the element query in the first blog post.  Here it is:

while (array_res.next()) {
     simple_array_col_list += COL_NAME + ":" + element_res.getColumnValue(1);    // Path name
     simple_array_col_list += "[" + array_res.getColumnValue(4) + "]";           // Array index
     simple_array_col_list += "::" + array_res.getColumnValue(2);                // Datatype
     simple_array_col_list += " as " + alias_dbl_quote + element_res.getColumnValue(3) + "_" + array_res.getColumnValue(4) + alias_dbl_quote;   // Alias
     }

Note the reference to element_res.getColumnValue(1) which gives us the current (outer) element whose datatype was found to be ARRAY – Again, this is the key to being able to seek out the elements of this particular array.  Note also that the current array element’s index (which was returned by that index column) is referenced as array_res.getColumnValue(4) in the path definition, and is also used to generate the column alias.  Incidentally, that column alias incorporates the array element’s alias element_res.getColumnValue(3) as a prefix which helps ensure the resulting view column name is unique.

Once the loop completes, the contents of simple_array_col_list are appended to the view column list contained by the col_list string.

Now that we’ve got the necessary logic in place, the stored procedure is now capable of creating a view that exposes the contents of the array’s elements as separate columns.  Here’s what that would look like for our current example:

CREATE OR REPLACE VIEW mydatabase.public.colors_vw AS 
SELECT 
json_data:"ID"::FLOAT as ID, 
json_data:"category"::STRING as category, 
json_data:"code"."hex"::STRING as code_hex, 
json_data:"code"."rgb"[0]::FLOAT as code_rgb_0, 
json_data:"code"."rgb"[1]::FLOAT as code_rgb_1, 
json_data:"code"."rgb"[2]::FLOAT as code_rgb_2, 
json_data:"color"::STRING as color, 
json_data:"type"::STRING as type
FROM mydatabase.public.colors;

Here’s what we’ll see when we describe the view:

As a side note, you may remember from the first blog post that I mentioned that view column datatypes will essentially be limited to ARRAY, BOOLEAN, FLOAT, and STRING.  The reason for doing this is to prevent cases where the same element is returned with multiple datatypes similar to what we explored in the Column Types section above.  For example, suppose another instance of this array elsewhere in the JSON document set looked like this: [10.3, 255, 255]. That would cause us to end up with two separate view columns of the same name (CODE_RGB_0) but with datatypes of INTEGER and FLOAT, thus causing an error to be thrown when the stored procedure tries to run the view creation DDL.  So, the best (and, once again, easiest) solution is to cast all numeric datatypes to FLOAT – and that’s why you see FLOAT as the datatype for the numerics in this example’s array.

Finally, here’s what the end result looks like when queried by an end user:

select * from colors_vw;
+----+----------+----------+------------+------------+------------+--------+-----------+
| ID | CATEGORY | CODE_HEX | CODE_RGB_0 | CODE_RGB_1 | CODE_RGB_2 | COLOR  | TYPE      |
|----+----------+----------+------------+------------+------------+--------+-----------|
|  1 | hue      | #000     |        255 |        255 |        255 | black  | primary   |
|  2 | value    | #FFF     |          0 |          0 |          0 | white  | NULL      |
|  3 | hue      | #FF0     |        255 |          0 |          0 | red    | primary   |
|  4 | hue      | #00F     |          0 |          0 |        255 | blue   | primary   |
|  5 | hue      | #FF0     |        255 |        255 |          0 | yellow | primary   |
|  6 | hue      | #0F0     |          0 |        255 |          0 | green  | secondary |
+----+----------+----------+------------+------------+------------+--------+-----------+

Object Arrays

Object arrays add a level of complexity to the mix.  Here’s an example of what they might look like in some JSON source data:

The difference between object arrays and (what I’m referring to as) simple arrays is that object array elements are referenced by name rather than by index. Consequently, we have to construct the view column expressions accordingly. For example:

a1.value:"type"::STRING as "phone_type",
a1.value:"number"::STRING as "phone_number"

Now compare this to the view column expression for the simple array (above) you’ll notice two important differences:

  • The element values are referenced by name („type“ and „number“, in this case)
  • An a1.value reference has been introduced in place of the name of the VARIANT column in the JSON source table 

It turns out that we’ll need to include a LATERAL FLATTEN in the generated view definition for each object array that we find in the source JSON structure.  We’ll then alias each of those LATERAL FLATTENs as well – in this case, that alias happens to be a1 since it’s the first (and, in this case, only) object array in the source JSON structure.  Had there been two object arrays then we would have two LATERAL FLATTENs in the generated view definition, with the first aliased as a1 and the second aliased as a2.  Likewise for additional object arrays.

Consequently, the mechanism we used to build view column expressions for object arrays looks a little different from that for simple arrays.  Here it is:

Once the loop completes, the contents of object_array_col_list are appended to the view column list contained by the col_list string.  What’s different, though, is the inclusion of the LATERAL FLATTENs discussed earlier.  Rather than get into the intricacies of the code, I’ll just show you want gets generated for our example JSON dataset:

CREATE OR REPLACE VIEW mydatabase.public.contacts_vw AS 
SELECT 
json_data:"ID"::FLOAT as "ID", 
json_data:"primary"::BOOLEAN as "primary", 
json_data:"val"::FLOAT as "val", 
json_data:"name"."first"::STRING as "name_first", 
json_data:"name"."last"::STRING as "name_last",
a1.value:"number"::STRING as "contact_phone_number", 
a1.value:"type"::STRING as "contact_phone_type", 
json_data:"contact"."email"::STRING as "contact_email"
FROM mydatabase.public.contacts,
 LATERAL FLATTEN(json_data:"contact"."phone") a1;

Notice how that LATERAL FLATTEN got added at the bottom of the generated view definition, along with the a1 alias.  Again, had there been additional object arrays in the JSON structure then they’d get their own LATERAL FLATTENs (along with corresponding aliases).  

Here’s what the resulting view looks like:

desc view contacts_vw;
+----------------------+-------------------+--------+-------+---------+-------------+------------+-------+------------+---------+
| name                 | type              | kind   | null? | default | primary key | unique key | check | expression | comment |
|----------------------+-------------------+--------+-------+---------+-------------+------------+-------+------------+---------|
| ID                   | FLOAT             | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| primary              | BOOLEAN           | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| val                  | FLOAT             | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| name_first           | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| name_last            | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| contact_phone_number | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| contact_phone_type   | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
| contact_email        | VARCHAR(16777216) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    |
+----------------------+-------------------+--------+-------+---------+-------------+------------+-------+------------+---------+

And the data:

select * from contacts_vw;
+----+---------+----------+------------+-----------+----------------------+--------------------+----------------------+
| ID | primary |      val | name_first | name_last | contact_phone_number | contact_phone_type | contact_email        |
|----+---------+----------+------------+-----------+----------------------+--------------------+----------------------|
|  1 | False   |     NULL | Jack       | Doe       | 404-555-1234         | work               | jsmith@home.com      |
|  1 | False   |     NULL | Jack       | Doe       | 770-555-1234         | mobile             | jsmith@home.com      |
|  2 | True    | 12342.56 | Diane      | Smith     | 678-555-5678         | home               | jane.doe@company.com |
|  2 | True    | 12342.56 | Diane      | Smith     | 404-555-5678         | work               | jane.doe@company.com |
+----+---------+----------+------------+-----------+----------------------+--------------------+----------------------+

It’s worth noting here that we’re getting two records for each of the contacts in our example – one for each object array record.  That’s intentional, because object arrays can be composed of any arbitrary number of elements (with unique names) and records. A sufficiently large number of individual object array records – if we were to represent them as individual view columns – could result in the creation of a view with an untenable number of columns.  Thus, the approach I’ve taken with this stored procedure is to represent unique instances of object array elements as view columns, with each object array record materialized as a corresponding view record. The result is effectively a Cartesian product. I suppose this is one of the rare instances where one could make the case that a Cartesian product is the preferable outcome.  In any case, that’s the approach I ultimately decided to take, and my reasoning behind it.

Multi-schema Support

An important thing to keep in mind is that there are cases where multiple JSON document schemas exist within a given semi-structured JSON dataset. A common solution in this case is to build multiple views over such a dataset, each having WHERE clauses that return the records for the schema of interest. Note that this stored procedure does not, however, provide provision for handling those cases. In other words, the views it generates reflect „flattened“ versions of all the underlying JSON document schemas found within the given dataset.  

If this describes your use case, all is not lost. You can still use this stored procedure as a starting point. One solution you might consider would be implementing a pre-process step whereby JSON document records are written to separate columns (or tables) based on their schema prior to generating corresponding views with this procedure, possibly through the use of materialized views. An alternative approach would be to add WHERE clauses to this procedure’s element and array queries so as to isolate the proper JSON schema for each execution (note that this would require inclusion of corresponding WHERE clauses in the generated views as well). Either way, it can be done. I hope this stored procedure helps you get there faster.

What About Other Semi-Structured Data Formats?

Snowflake provides excellent support for multiple semi-structured data formats, including Avro, Parquet, ORC, and XML. The stored procedure presented here should be able to work with most of these, although I haven’t rigorously tested it for those use cases as of this writing. If changes are required to support those formats they’d likely need to be made in the regular expressions path name, datatype, and alias portions of the queries.

Putting It All Together

If you created the version of the stored procedure from the first blog post of this series, you might consider removing that version before creating the updated version below. Otherwise, you’ll actually end up with two separate versions of this stored procedure that are differentiated by the number of parameters each requires.  Here’s how to get rid of that prior version:

drop procedure if exists CREATE_VIEW_OVER_JSON(VARCHAR, VARCHAR, VARCHAR);

And now here’s the complete code for the finished stored procedure that incorporates all of the functionality described in both blog posts:

Take a look at the History tab of the Snowflake web interface after calling this stored procedure. First, you’ll see at least three SQL statements being executed – first, the stored procedure call, and then beneath that you’ll see where it runs a query that returns the JSON document elements and their datatypes.  You’ll also see one or more queries associated with the arrays in your JSON document structure (if any are present). Finally, you should see the view creation DDL that it generated – It’ll start off with „CREATE OR REPLACE VIEW“.

Final Thoughts

My primary goal for these blog posts was to provide an automated process for generating views over semi-structured JSON data. But my secondary goal was to demonstrate how SQL statements can be dynamically generated and executed by Snowflake stored procedures. I hope the code I’ve provided here will prove helpful to the Snowflake community for both of these scenarios.