As you heard in our latest product launch on June 2 (Get Ready for the Data Cloud), we’ve just released external functions to public preview. External functions open up some exciting new possibilities for Snowflake, and we are looking forward to seeing what the community does with them.

External functions allow you to create SQL functions that securely invoke externally implemented HTTPS endpoints via an AWS API Gateway. This means your functions can be implemented in any language and use any libraries, all the while using your data in Snowflake, giving you more options for processing your data.

Some example uses of external functions include:

  • Calling out to third-party services for geocoding, text analysis, or tokenization 
  • Accessing machine-learned models you have created and hosted
  • Creating complex custom code in whatever language you’d like

In the rest of this post, we’d like to give you a deeper peek at what external functions are and how you can get started.

External Function Architecture

Snowflake securely accesses your AWS API Gateway endpoints through temporary credentials based on AWS Identity and Access Management (IAM) roles. This is done through the use of API integration objects, which also handle whitelisting and blacklisting, effectively giving administrators the power to gate the use of external functions and control what endpoints their data can go to.

Once an external function is set up, the basic operation is as follows:

  1. A Snowflake user calls an external function with data by using the function in a query.
  2. Snowflake batches the data and sends it to the designated endpoint in a customer’s AWS API Gateway.
  3. The AWS API Gateway can apply transformations to the HTTPS request and then forwards it to the back-end service.
  4. The back end processes the batch of records and then returns the result to the AWS API Gateway.
  5. Snowflake receives the data back from the AWS API Gateway and continues processing the query.

Here is a visual representation of the architecture.

Simplify Your Workflow

Before external functions, if users wanted to process their Snowflake data with an external library or service, they would have to export their data, process it, and then reimport it. This made it hard for SQL users to process their data, and it encouraged complicated, expensive data pipelines. External functions greatly simplify this workflow.

Example

Imagine that you have a table with all your invoice records written in English, but you need them to be translated to German for your accounting department. Using AWS Lambda as the back end, the following code translates English to German, assuming your executing role has the proper permissions.

import json
import boto3
translate = boto3.client('translate')
def lambda_handler(event, context):
   translated = []
   body = json.loads(event["body"])
   status_code = 200
   for row in body["data"]:
       try:
           translated_text = translate.translate_text(
               Text = row[1],
               SourceLanguageCode = 'en',
               TargetLanguageCode = 'de')["TranslatedText"]
       except Exception as e:
           print(e);
           translate_text = "ERROR"
           # status code of 400 implies an error
           status_code = 400
       translated.append([row[0], translated_text])
   json_compatible_string_to_return = json.dumps({"data" : translated})
   # return data according to Snowflake's specified result format
   return {
       'statusCode': status_code,
       'body': json_compatible_string_to_return
   }

To set up the AWS API Gateway endpoint, you need to set up a resource with the POST method and choose “Lambda” as the integration type. Later, you can set the authentication method to be IAM and restrict access to Snowflake. Finally, you can deploy the endpoint and use it to configure an external function.

First, create a child resource for the endpoint, as shown in the following screenshot:

 

Set up the calling method. In this case, you want to make sure you’re using the Lambda integration:

Check that you’re using AWS_IAM authentication. You have to manually set authentication to use IAM. You also need to manually set a resource policy for the endpoint to describe who has access to it, as described here:

Click Actions -> Deploy API to deploy your endpoint. You’ll be able to see the endpoint that the external function should call out to, labeled as the “invoke URL.”

 

After setting up permissions on AWS to enable Snowflake to securely generate access credentials to access your AWS API Gateway endpoint, use the following code to create the API integration and external function:

-- create API integration
create or replace api integration external_api_integration
   api_provider=aws_api_gateway
   api_aws_role_arn='arn:aws:iam::<aws_account_id>:role/<role_name>'
   api_allowed_prefixes=('https://gr8i78j7ei.execute-api.us-east-2.amazonaws.com/test/translate-en-de')
   enabled=true;

-- create external function
create or replace external function translate_en_de(input string)
   returns string
   api_integration = external_api_integration
   as 'https://gr8i78j7ei.execute-api.us-east-2.amazonaws.com/test/translate-en-de';

 Here is the table that contains your information:

-- create table with english invoices

create or replace table invoices(invoice string);

All that remains is to call the external function on the table! In the following screenshot, you can see that the call to the external function goes through, translating your English invoice to German.

Limitations to Keep in Mind

Currently in the public preview, external functions need to go through an AWS API Gateway instance, although that gateway may connect to any HTTPS endpoint. We’re planning to extend this to support the API gateways native to Microsoft Azure and Google Cloud Platform, so stay tuned.

Get More Out of Snowflake

Interested in trying out external functions? Take a look at the documentation here to get started, and let us know what fun applications you come up with!