How to Easily Load and Query XML Data with Snowflake (Part 2)

Author: Seeling Cheung | Contributing Authors: Michael Nixon

Engineering, How to Use Snowflake

With this blog, we conclude our two-part series on how to easily query XML with Snowflake SQL.

Picking up where we left off with Part 1, with the XML data loaded, you can query the data in a fully relational manner, expressing queries with robust ANSI SQL. We can then easily issue SQL queries to gain insight into the data without transforming or pre-processing the XML.

This blog outlines a few sample queries to easily analyze the data using the various XML functions and the semi-structured data querying feature called LATERAL FLATTEN. Powerful FLATTEN LATERAL capabilities enable you to access the inherent hierarchical structures within the XML data.

Using XML element and attribute functions

Snowflake provides a number of SQL functions to access XML constructs like element and attributes of XML documents. The following queries show some usage examples of these functions. For a quick glance of all of the XML documents loaded into our table, execute the following query:

  • SELECT src_xml FROM treasury_auction_xml
  • This query returns the full XML document in the XML column
    • Example output:

XML element and attribute functions

To access the attribute of an element, leverage the “@” operator in the following query:

  • SELECT src_xml:”@” FROM treasury_auction_xml;
  • This query returns the name of root element of the XML document in XML column
    • Example output: “td:AuctionData”
  • The following query leverages the “$” operator to query the contents of the root element of an
    XML document

    • SELECT src_xml:”$” FROM treasury_auction_xml;
    • This query returns the contents of the root element of the XML document in the XML column
    • Example output:

Flexible XMLGET function

Using the flexible XMLGET function

The following query retrieves the first element contained within the root element of the XML document:

  • Select XMLGET(src_xml, ‘AuctionAnnouncement’, 0) FROM treasury_auction_xml.
  • This query returns the contents of the root element of the XML document in the XML column
    • Example output:

  • The following query retrieves all the contents of the first “AuctionAnnouncement” element:
    • SELECT XMLGET(src_xml, ‘AuctionAnnouncement’, 0):”$” FROM treasury_auction_xml;
    • This query returns all the contents of the first element under the root element
    • Example output:

Lateral Flattening

Using the powerful FLATTEN LATERAL capabilities

The powerful FLATTEN LATERAL table-value function is one of the most fundamental mechanism offered by Snowflake for querying and exploring semi-structured data. If you are familiar with the unnest concept function in relational database management, then you can think of FLATTEN along the same lines. Essentially, it returns a “virtual table” of {zero, one or many} values that are subsequently joined against the current row to yield {zero, one or many} rows that contain all the attributes of the original row plus the value from the virtual table.

In the following queries, we offer some examples on how to leverage this powerful functionality:

  • Let’s consider a query that we can use to return a row for each element included in the announcement of a Treasury security. Examples of element may include security term, offering amount, issue date, amount awarded, discount rate or interest rate.
    • SELECT
      auction_announcement.index as auction_contents_index,
      auction_announcement.value as auction_contents_value
      FROM treasury_auction_xml,
      LATERAL FLATTEN(to_array(treasury_auction_xml.src_xml:”$” )) xml_doc,
      LATERAL FLATTEN(to_array(xml_doc.VALUE:”$” )) auction_announcement
    • In this query, LATERAL FLATTEN takes an XML input and explode the list of sub-elements into its own (denormalized) row
    • Example output:

Auction Announcement Element

 

  • This example combines the usage of XMLGet with LATERAL function. LATERAL FLATTEN iterates through the child-elements of the XML document and XMLGet maps the child element “SecurityType”, or “MaturityDate”, or “OfferingAmount”, or “MatureSecurityAmount” to its own column.
    • Select XMLGet(value, ‘SecurityType’ ):”$” as “Security Type”,
      XMLGET( value, ‘MaturityDate’ ):”$” as “Maturity Date”,
      XMLGET( value, ‘OfferingAmount’ ):”$” as “Offering Amount”,
      XMLGET( value, ‘MatureSecurityAmount’ ):”$” as “Mature Security Amount” from treasury_auction_xml,
      LATERAL FLATTEN(to_array(treasury_auction_xml.src_xml:”$” )) auction_announcement;
    • In this query, LATERAL FLATTEN iterates through the child-elements of the XML document and XMLGet maps the child element “SecurityType”, or “MaturityDate”, or “OfferingAmount”, or “MatureSecurityAmount” to its own column.
    • Example output:

XMLGet with LATERAL function

Summary

With these steps and query examples over both blogs in this series, we demonstrate how straightforward it is to ingest XML data. Snowflake natively ingests semi-structured data and enables you to immediately query the data with robust ANSI SQL, without the need to first transform or pre-process the data. This approach also dramatically simplifies the process to work with semi-structured data by eliminating data preparation steps. This approach also dramatically simplifies your data architecture by eliminating the need for a second data processing platform, such as Hadoop.

Consolidate your diverse data onto Snowflake and receive a cloud-built, easy to use, software-driven SQL data warehouse-as-a-service, to boot, for your analytics.

Start Your 30-Day Free Trial. Receive $400 of credits to try all Snowflake features.