Product and Technology

How to Easily Load and Query XML Data with Snowflake (Part 2)

How to Easily Load and Query XML Data with Snowflake (Part 2)

PLEASE NOTE: This post was originally published in July 2018. It has been updated to reflect currently available products, features, and functionality.

This blog post is part 2 in our 2-part series on how to easily query XML with Snowflake SQL. Picking up where we left off with Part 1, once you have the XML data loaded, you can query the data in a fully relational manner, expressing queries with robust ANSI SQL. You can then easily issue SQL queries to gain insight into the data without transforming or pre-processing the XML.

This post outlines a few sample queries to easily analyze the data using the various XML functions and the semi-structured data querying feature called LATERAL FLATTEN. Powerful LATERAL FLATTEN capabilities enable you to access the inherent hierarchical structures within the XML data.

Using the XML element and attribute functions

Snowflake provides a number of SQL functions to access XML constructs such as elements and attributes of XML documents. The following queries show some usage examples of these functions. For a quick glance at all of the XML documents loaded into our table, execute the following query:

SELECT src_xml FROM treasury_auction_xml;

  • This query returns the full XML document in the XML column.
    • Example output:
XML element and attribute functions

To access the attribute of an element, leverage the “@” operator in the following query:

SELECT src_xml:"@" FROM treasury_auction_xml;

  • This query returns the name of the root element of the XML document in the XML column.
    • Example output:

"td:AuctionData"

The following query leverages the “$” operator to query the contents of the root element of the XML document:

SELECT src_xml:"$" FROM treasury_auction_xml;

  • This query returns the contents of the root element of the XML document in the XML column.
    • Example output:
Flexible XMLGET function
Using the flexible XMLGET function

The following query retrieves the first element contained within the root element of the XML document:

SELECT XMLGET(src_xml, 'AuctionAnnouncement', 0) FROM treasury_auction_xml;

  • This query returns the contents of the root element of the XML document in the XML column.
    • Example output:

The following query retrieves all the contents of the first “AuctionAnnouncement” element:

SELECT XMLGET(src_xml, 'AuctionAnnouncement', 0):"$" FROM treasury_auction_xml;

  • This query returns all the contents of the first element under the root element.
    • Example output:
Lateral Flattening
Using the powerful LATERAL FLATTEN capabilities

The powerful LATERAL FLATTEN table-value function is one of the most fundamental mechanisms offered by Snowflake for querying and exploring semi-structured data. If you are familiar with the concept of the SQL UNNEST function in relational database management, then you can think of FLATTEN along the same lines. Essentially, it returns a “virtual table” of {zero, one or many} values that are subsequently joined against the current row to yield {zero, one or many} rows that contain all the attributes of the original row plus the value from the virtual table.

In the following queries, we offer some examples on how to leverage this powerful functionality.

Let’s consider a query that we can use to return a row for each element included in the announcement of a Treasury security. Examples of elements include security term, offering amount, issue date, amount awarded, discount rate, or interest rate.

SELECT

auction_announcement.index as auction_contents_index,

auction_announcement.value as auction_contents_value

FROM treasury_auction_xml,

LATERAL FLATTEN(to_array(treasury_auction_xml.src_xml:"$" )) xml_doc,

LATERAL FLATTEN(to_array(xml_doc.VALUE:"$" )) auction_announcement;

  • In this query, LATERAL FLATTEN takes an XML input and explodes the list of sub-elements into its own (denormalized) row.
    • Example output (Snowflake web interface):
Auction Announcement Element

This example combines the usage of XMLGET with the LATERAL function:

SELECT XMLGET(value, 'SecurityType' ):"$" as "Security Type",

XMLGET( value, 'MaturityDate' ):"$" as "Maturity Date",

XMLGET( value, 'OfferingAmount' ):"$" as "Offering Amount",

XMLGET( value, 'MatureSecurityAmount' ):"$" as "Mature Security Amount"

FROM treasury_auction_xml,

LATERAL FLATTEN(to_array(treasury_auction_xml.src_xml:"$" )) auction_announcement;

  • In this query, LATERAL FLATTEN iterates through the child-elements of the XML document and XMLGET maps the child element “SecurityType,” or “MaturityDate,” or “OfferingAmount,” or “MatureSecurityAmount” to its own column.
    • Example output (Snowflake web interface):
XMLGet with LATERAL function
Summary

With the steps and query examples we’ve shared in this 2-part series, we have demonstrated how straightforward it is to ingest XML data. Snowflake natively ingests semi-structured data and enables you to immediately query the data with robust ANSI SQL, without the need to first transform or pre-process the data. This approach also dramatically simplifies the process to work with semi-structured data by eliminating data preparation steps, and simplifies your data architecture by eliminating the need for a second data processing platform.

Consolidate your diverse data onto Snowflake and receive an easy to use, software-driven SQL data warehouse as a service, to boot, for your analytics.

Start Your 30-Day Free Trial during which you can receive $400 of credits to try Snowflake’s features.

Cloud Data Engineering for Dummies

Share Article

Handle Unpredictable or Ad Hoc Queries with the Query Acceleration Service

Our Query Acceleration Service is now generally available—which is great for customers handling mixed workloads, unpredictable query data volumes, and ad hoc queries.

New Approaches to Visualizing Snowflake Query Statistics with Snowflake Technology Partners

In this post, we showcase our amazing Snowflake technology partners and their ways of using the new query function within their respective products.

Load XML with SQL Review Part One | Snowflake Blog

Review how to load XML data into Snowflake and query with ANSI-standard SQL. Doing so will enable you to have full relational database capabilities for XML.

Understanding Snowflake Utilization Part 3 - Query Profiling

In this post, we will deep-dive into understanding query profiling. To do this, we will show you examples using the QUERY_HISTORY family of functions.

Synthetic Data Generation at Scale – Part 2

In the first part of this blog series, I showed how to automate data generation in Snowflake using standard SQL statements.

How to Connect to Snowflake with JDBC

JDBC lets you connect any custom or packaged Java based applications to Snowflake and access information in SQL databases using standard SQL queries.

Continued Investments in Price Performance and Faster Top-K Queries

Our optimization delivers fast results, with some customer queries reaching 99.8%.

Concurrent Load and Query | Snowflake Data Warehousing Blog

In most data warehouses, workloads compete to use the same resources. Snowflake’s unique data architecture fixes this issue with concurrent load and query.

The Dream Data Warehouse Development Environment

Leveraging Snowflake’s SAMPLE clause considerably reduces the amount of test dataset for a developer without losing data integrity.

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime