At Snowflake, data governance is all about providing our customers native capabilities to easily and efficiently govern data at scale. Previously, we launched capabilities such as Object Tagging, Dynamic Data Masking, Row Access Policies, and Access History to help keep track of sensitive data by tagging it, assigning masking policies to protect columns with sensitive data from unauthorized access, and audit the access of sensitive columns using Access History.
For security and compliance, it’s critical that all columns containing certain types of sensitive data that constitutes personally identifiable information (PII) are protected by masking policies. A commonly used approach is to periodically scan for data that is tagged as sensitive, and apply the appropriate masking policy. However, this approach can be cumbersome, and may leave data exposed until the policy is applied.
Tag-based masking addresses this challenge by automatically applying the designated policies from the moment that the column is tagged. For example, if you have tagged columns with phone numbers in your account as PII=‘Phone Number’, you can assign a masking policy to the PII tag, and Snowflake will automatically mask all phone number columns as specified in the masking policy, thereby preventing access by those without the proper authorization.
Tag-based masking offers a scalable, uniform, and automated solution for sensitive data protection by:
- Making it easier to manage data at scale
- Applying policies uniformly to corresponding tagged columns
- Instantly enforcing a policy as soon as sensitive data is tagged
How tag-based masking works
A tag-based masking policy combines the object tagging feature and masking policy feature to allow a masking policy using an ALTER TAG command to be set on a tag. When the data type in the masking policy signature and the data type of the column match, the tagged column is automatically protected by the conditions in the masking policy. The tag can support one masking policy for each data type that Snowflake supports. The masking policy conditions can be written to protect the column data based on the tag name or tag value assigned to the column.
Tag-based masking in action
Let’s take a look at a simple example to understand how tag-based masking works. This example demonstrates how tagged columns can be protected without directly assigning a masking policy to those columns. We will use the following setup:
- DATA_GOVERNOR role for administering data governance responsibilities
- PII_ALLOWED role for users who can access PII data unmasked
- A table HR.PRODUCT.EMPLOYEE with columns EMAIL and SSN that must be protected
use role accountadmin; // data governor role setup create role data_governor; grant role data_governor to user alice; -- assuming user alice is available in the account grant apply masking policy on account to role data_governor; grant apply tag on account to role data_governor; // create a database data_governance and grant the ownership of the database to data_governor create database data_governance; grant ownership on database data_governance to role data_governor; // table setup create database hr; create schema hr.product; create table hr.product.employees(email string, ssn string); insert into hr.product.employees values ('[email protected]', 'aaa-aa-aaaa'),('[email protected]', 'bbb-bb-bbbb'); // pii_allowed role setup create role pii_allowed; grant role pii_allowed to user alice; -- assuming user alice is available in the account // grant appropriate access on the table to the pii_allowed role and public grant usage on database hr to role pii_allowed; grant usage on schema hr.product to role pii_allowed; grant select on table hr.product.employees to role pii_allowed; grant usage on database hr to role public; grant usage on schema hr.product to role public; grant select on table hr.product.employees to role public;
The DATA_GOVERNOR performs the following steps to protect the data with a tag-based masking policy:
- Creates a tag DATA_GOVERNANCE.TAGS.PII_DATA
- Creates a masking policy DATA_GOVERNANCE.POLICIES.PII_MASK_STRING
- Assigns PII_MASK_STRING to the tag PII_DATA
- Assigns the PII_DATA to the columns EMAIL and SSN
use role data_governor; use database data_governance; // create the tag create schema data_governance.tags; create tag data_governance.tags.pii_data; // create the policy create schema data_governance.policies; create masking policy data_governance.policies.pii_mask_string as (data string) returns string -> case when is_role_in_session('PII_ALLOWED') then data else '***masked***' end; // assign masking policy to tag alter tag data_governance.tags.pii_data set masking policy data_governance.policies.pii_mask_string; // assign tag to sensitive columns use role data_governor; alter table hr.product.employees modify column email set tag data_governance.tags.pii_data=’EMAIL’; alter table hr.product.employees modify column ssn set tag data_governance.tags.pii_data=’SSN’;
Finally, let’s test to ensure that only users with PII_ALLOWED role are authorized to view the columns tagged as PII_DATA unmasked.
// role authorized to view pii_data unmasked use role pii_allowed; // EMAIL and SSN columns are unmasked select * from hr.product.employees; // role unauthorized to view pii_data use role public; // EMAIL and SSN columns are masked select * from hr.product.employees;
In addition to the ability to assign masking policy to the tag name, you can look up the value of tags associated with the column at the time of query execution with the new GET_TAG_ON_CURRENT_COLUMN system function. The tag value can be used to determine authorization to access the data unmasked. The following masking policy is a simple example to demonstrate a scenario where columns with SENSITIVE_COLUMN tag and ‘YES’ tag value are always masked.
create or replace masking policy mask_confidential_string as (data string) returns string -> case when system$get_tag_on_current_column('data_governance.tags.sensitive_column’') = 'YES' then '***masked***' else data end;
How to get started with tag-based masking
You need to perform only three simple steps to get started with tag-based masking to accomplish automated protection of sensitive columns such as PII:
- Define masking policies for each data type you want to protect.
- Define a tag (for example, PII_DATA) and assign the masking policies to the tag.
- Assign the tag to any column with PII data.
If you have already tagged columns with an appropriate tag, you can protect all columns by simply assigning the masking policies to the tag.
Start using tag-based masking today
Tag-based masking is now generally available. You can start using it right away to:
- Easily manage sensitive data protection at scale.
- Eliminate the need for periodic scanning of tagged columns for policy assignment.
- Protect columns as soon as they are tagged appropriately.