Modern analytical workloads often require complex transformations or augmentations that require using custom code or third-party services. However, using external services and libraries can often complicate data pipelines. To simplify using remote services, Snowflake created External Functions, which enables users to invoke external APIs and custom code from within Snowflake and blend the results into their query results.

This blog post provides an overview of the External Functions feature,and covers the latest announcements and updates.

Overview of External Functions

As shown in Figure 1, the Snowflake External Functions feature enables customers to securely call third-party applications and external custom code from within Snowflake via a cloud-native proxy, thus simplifying access to external API services such as geocoders, machine learning models, and custom code running outside of Snowflake.

Figure 1: Using external functions to call a remote service

External Functions support for AWS API Gateway & Azure API Management is now GA

Earlier this month Snowflake announced the general availability (GA) of external functions for AWS API Gateway and Azure API Management. 

External functions feature was initially released into public preview in June 2020 and it is one of the key features for building extensible data pipelines in the Snowflake Data Cloud. We are thrilled to see all the varied and interesting use cases our customers and partners are using external functions for.

Customer Example: How Sainsbury’s Uses External Functions

Sainsbury’s is the UK’s second largest retailer with over 1,400 stores and a large digital presence. The Sainsbury’s teams have leveraged External Functions to enable different use cases. In this blog, we will walk through two use cases at a high level. 

Sainsbury’s “Banana” squad have built continuous data pipelines using Snowpipe, tasks and streams with Snowflake. In order to make more robust pipelines and to take immediate actions when needed, the team wanted to monitor task executions and notify the operations team in case of a failure. This has helped reduce remediation time; crucial in a world where Sainsbury’s is ingesting millions of online events a day, billions per week; the aggregation of which contributes to key business metrics and subsequent decisioning. With External Functions, the team can now capture the reason for the task failure.  The TASK_HISTORY table is also monitored to check whether the task itself fails. This External Function, calls a processing function through an API endpoint, which sends a notification to Slack.

“For example, when we were loading some transaction data from an external stage, into Snowflake, the task failed. This was a result of the definition of the underlying table having changed. Our notification system, based on External Functions, immediately notified us of the failure and we were able to fix the issue”, explained by Joan Fuerte, Data Engineering Manager.

“External Functions provide Sainsbury’s with an excellent way of communicating with the external world from Snowflake. This has opened up a range of possibilities regarding how we are able to integrate Snowflake more seamlessly into the wider data environment and improve our data workflows”, said Fuerte.

External Functions also allow the Sainsbury’s team to implement a HTTPS asynchronous query pattern and enable an event-based orchestrated workflow around Snowflake operations. External Functions are used to implement a call-back mechanism which is triggered once a query run against Snowflake completes.

“We wanted to provide a mechanism for users to submit queries using a HTTPS API” said Ian Fawcus, Engineering Manager. A query is submitted using a stored procedure that contains a Javascript try/catch statement. Upon completion of the query, the stored procedure executes an External Function, passing the query ID and outcome (success/fail) as parameters. The External Function calls an API endpoint, integrated with a processing function, which in turn notifies the user who submitted the query. “Both the submission of the query and the External Function API call use private networks so that we can secure the end-to-end flow.”, continued Fawcus.

Here are a few other common use cases on how customers are leveraging external functions:

Machine Learning Scoring

Customers can train machine learning (ML) models using a platform of their choice and deploy the trained model as a REST service, making it available as a remote service to Snowflake. Using external functions, the model endpoint can then be invoked from within Snowflake to score data. All of this is executed with familiar SQL statements and commands. With the use of external functions, customers no longer need to export data from Snowflake to score data. By calling the external function in SQL using the Snowflake user interface, customers can update tables with predictions directly in Snowflake, simplifying ML scoring pipelines.

See these blog posts for examples of using external functions with different ML providers:

Geocoding

Geocoding provides geographical context for a string that represents a location. The string input may be broad (“California”) or specific (“450 Concar Drive, San Mateo, CA, USA”). 

A geocoder generally does three things: 

  1. Parses the location string and attempts to “understand” what it means
  2. Maps the string to a geographic location (latitude/longitude)
  3. Returns location and geographic contextual information such as the city, county, and state

Geocoders typically have sophisticated logic and are complex to implement, so customers commonly use third-party APIs such as Google’s Geocoding API, Mapbox API, HERE Geocoder API etc. Using external functions, customers can call out to these third-party geocoding services and directly append the results to database tables from a Snowflake worksheet. 

See these blog posts for examples of using external functions for geocoding:

External Tokenization

Tokenization is a process of replacing sensitive elements (for example, social security numbers and credit card numbers) with unique symbols, referred to as tokens. Tokens by themselves don’t have any exploitable value, and can be mapped back to sensitive elements by the tokenization system or service. External tokenization enables organizations to tokenize sensitive data externally using partner solutions or in-house customer-developed solutions before loading that data into Snowflake. Organizations can then dynamically detokenize data at query runtime for authorized users through masking policies that call the external tokenization service using external functions. This provides enhanced data security to the most sensitive data in an organization.

Multiple partners, including Protegrity and Microfocus Voltage, have integrated their solutions with Snowflake to provide external tokenization services.
See the Snowflake documentation or Protegrity for Snowflake solution brief for more information.

Custom business logic 

Customers can write and call their own remote services with external functions. These remote services can be written using any HTTP server stack, including cloud serverless compute services such as AWS Lambda and Azure Functions, and the programming language of choice, including Python, C#, Go, and others, making it easier to write custom business logic.

External Functions Support for Amazon API Gateway Private Endpoints Now in Public Preview

In January 2021, Snowflake announced the public preview of external functions support for Amazon API Gateway Private Endpoints. This enables Snowflake customers with higher security needs to communicate between virtual private clouds via PrivateLink. This feature requires Business Critical edition (or higher). 
For more information about Private Endpoints support see the Snowflake documentation.

Support for Asynchronous External Functions

In December 2020, Snowflake released support for asynchronous external functions, which enables users to write remote services that process requests without blocking on the initial request, and return the results in a future response. 

Asynchronous functions use polling to reduce timeout errors, which can occur when remote services are heavily loaded, data volumes are large, or remote computations take a long time to run. This feature is particularly useful for users who might call long-running remote services, such as translation services, and run into the cloud-native gateway’s service timeouts. For example, AWS API Gateway has a maximum 30-second response timeout limit. 
For more information about asynchronous external functions, see the Snowflake Documentation.

Templates and Examples

Several templates and examples are available to help you get started with External Features, as described in this section.

Cloud deployment templates

External functions require you to set up the cloud-native proxy service, security roles, and often either a Lambda or Azure function for JSON data formatting before requests can be forwarded to an external API. To make it easier to get started with external functions, we have added documentation and sample templates for deploying resources on AWS and Azure.

Snowflake External Functions github repository

In addition to the basic examples provided as part of the Snowflake documentation, see the Snowflake External Functions repository on github for additional real-world examples to help you get started. Bookmark the link and check back as the External Functions team will continue to add new examples. 

Conclusion

The Snowflake External Functions feature effectively eliminates the need to export and reimport data when using third-party services, making your data pipelines more effective. 

This blog post provided the latest updates and just a few examples of how external functions can benefit your organization. Contact your Snowflake representative to learn more, or see the following links for more information.

General information about external functions: