PLEASE NOTE: This post was originally published in July 2018. It has been updated to reflect currently available products, features, and functionality.

This blog post is part 2 in our 2-part series on how to easily query XML with Snowflake SQL. Picking up where we left off with Part 1, once you have the XML data loaded, you can query the data in a fully relational manner, expressing queries with robust ANSI SQL. You can then easily issue SQL queries to gain insight into the data without transforming or pre-processing the XML.

This post outlines a few sample queries to easily analyze the data using the various XML functions and the semi-structured data querying feature called LATERAL FLATTEN. Powerful LATERAL FLATTEN capabilities enable you to access the inherent hierarchical structures within the XML data.

Using the XML element and attribute functions

Snowflake provides a number of SQL functions to access XML constructs such as elements and attributes of XML documents. The following queries show some usage examples of these functions. For a quick glance at all of the XML documents loaded into our table, execute the following query:

SELECT src_xml FROM treasury_auction_xml;

  • This query returns the full XML document in the XML column.
    • Example output:
XML element and attribute functions

To access the attribute of an element, leverage the “@” operator in the following query:

SELECT src_xml:”@” FROM treasury_auction_xml;

  • This query returns the name of the root element of the XML document in the XML column.
    • Example output:

“td:AuctionData”

The following query leverages the “$” operator to query the contents of the root element of the XML document:

SELECT src_xml:”$” FROM treasury_auction_xml;

  • This query returns the contents of the root element of the XML document in the XML column.
    • Example output:
Flexible XMLGET function
Using the flexible XMLGET function

The following query retrieves the first element contained within the root element of the XML document:

SELECT XMLGET(src_xml, ‘AuctionAnnouncement’, 0) FROM treasury_auction_xml;

  • This query returns the contents of the root element of the XML document in the XML column.
    • Example output:

The following query retrieves all the contents of the first “AuctionAnnouncement” element:

SELECT XMLGET(src_xml, ‘AuctionAnnouncement’, 0):”$” FROM treasury_auction_xml;

  • This query returns all the contents of the first element under the root element.
    • Example output:
Lateral Flattening
Using the powerful LATERAL FLATTEN capabilities

The powerful LATERAL FLATTEN table-value function is one of the most fundamental mechanisms offered by Snowflake for querying and exploring semi-structured data. If you are familiar with the concept of the SQL UNNEST function in relational database management, then you can think of FLATTEN along the same lines. Essentially, it returns a “virtual table” of {zero, one or many} values that are subsequently joined against the current row to yield {zero, one or many} rows that contain all the attributes of the original row plus the value from the virtual table.

In the following queries, we offer some examples on how to leverage this powerful functionality.

Let’s consider a query that we can use to return a row for each element included in the announcement of a Treasury security. Examples of elements include security term, offering amount, issue date, amount awarded, discount rate, or interest rate.

SELECT

auction_announcement.index as auction_contents_index,

auction_announcement.value as auction_contents_value

FROM treasury_auction_xml,

LATERAL FLATTEN(to_array(treasury_auction_xml.src_xml:”$” )) xml_doc,

LATERAL FLATTEN(to_array(xml_doc.VALUE:”$” )) auction_announcement;

  • In this query, LATERAL FLATTEN takes an XML input and explodes the list of sub-elements into its own (denormalized) row.
    • Example output (Snowflake web interface):
Auction Announcement Element

This example combines the usage of XMLGET with the LATERAL function:

SELECT XMLGET(value, ‘SecurityType’ ):”$” as “Security Type”,

XMLGET( value, ‘MaturityDate’ ):”$” as “Maturity Date”,

XMLGET( value, ‘OfferingAmount’ ):”$” as “Offering Amount”,

XMLGET( value, ‘MatureSecurityAmount’ ):”$” as “Mature Security Amount”

FROM treasury_auction_xml,

LATERAL FLATTEN(to_array(treasury_auction_xml.src_xml:”$” )) auction_announcement;

  • In this query, LATERAL FLATTEN iterates through the child-elements of the XML document and XMLGET maps the child element “SecurityType,” or “MaturityDate,” or “OfferingAmount,” or “MatureSecurityAmount” to its own column.
    • Example output (Snowflake web interface):
XMLGet with LATERAL function
Summary

With the steps and query examples we’ve shared in this 2-part series, we have demonstrated how straightforward it is to ingest XML data. Snowflake natively ingests semi-structured data and enables you to immediately query the data with robust ANSI SQL, without the need to first transform or pre-process the data. This approach also dramatically simplifies the process to work with semi-structured data by eliminating data preparation steps, and simplifies your data architecture by eliminating the need for a second data processing platform.

Consolidate your diverse data onto Snowflake and receive an easy to use, software-driven SQL data warehouse as a service, to boot, for your analytics.

Start Your 30-Day Free Trial during which you can receive $400 of credits to try Snowflake’s features.