One year after implementation, the European Union’s General Data Protection Regulation (GDPR) continues to be a hot regulatory topic. As organizations work to bring their data practices into compliance with the new law, one question comes up repeatedly: How does Snowflake, the data warehouse built for the cloud, enable my organization to be GDPR compliant?
My answer tends to surprise people. Simply put, compliance is not a function of your database but rather a function of the design you choose. Although Snowflake provides the cloud-based technology and tools that enable compliance, each organization maintains sole responsibility for designing an architecture that is, in fact, GDPR compliant.
With that said, Snowflake offers some powerful features that don’t exist in other databases. Therefore, it behooves database architects to have a working knowledge of Snowflake’s data protection and recovery features when designing their cloud-based data warehouse.
How Time Travel and Fail-Safe work
Snowflake provides continuous data protection (CDP) with two features called Time Travel and Fail-Safe. These unique features eliminate traditional data warehousing challenges (costly backups, time-consuming rollbacks) and enable teams to experiment with their data with confidence, knowing that it will never get lost accidentally.
System administrators can use Time Travel to revert back to any point in the last 24 hours. This feature is useful whenever a mistake is made (for example, table or schema is dropped in production) or a failed release requires a database rollback (for example, a new ETL operation corrupts the data). Through a simple SQL interface, data can be restored based on a point in time or a query ID, at the database, table, and schema level. By default, Time Travel is always on for Snowflake customers and is set to 24 hours, although enterprise customers have the capability to set Time Travel for any window up to 90 days.
If you accidentally drop a table or database and if the Time Travel window has passed, Snowflake offers a “get out of jail free” card called Fail-Safe. This data recovery feature provides seven days in which you can contact the Snowflake Support Team to bring your data back. A Snowflake administrator must complete this restoration, because the data is inaccessible to an end user. Once the Fail-Safe seven-day window passes, data is removed permanently from Snowflake and the cloud, so it’s important to act quickly.
Best practices for GDPR compliance with CDP
GDPR compliance can be extremely challenging if you don’t have a well-thought-out database architecture, especially for handling the “right to erasure (right to be forgotten)” in GDPR Article 17. Once an individual’s personally identifiable information (PII) is requested, organizations have 30 to 90 days in which to delete the individual’s PII from their database.
Two questions often arise at this point:
- How do you ensure that an individual’s PII is removed completely and permanently removed from your database?
- What do you need to account for in the data architecture, given the automated recovery measures that exist in Snowflake for CDP?
Based on strong data management principles, here are three best practices that alleviate concerns around GDPR compliance while you use Snowflake’s CDP features.
#1: Build a data model that segregates PII data
Arguably the most important data management decision you can make is to build a data model that segregates PII data into a separate table or set of tables. By creating an inventory, you can identify and account for every type of PII data you hold. This best practice is key for adhering to privacy regulations because it makes PII data simpler to find and delete.
The pitfalls of alternative strategies demonstrate why PII data segregation is your strongest option:
- The risk of losing peripheral data: If PII data is interspersed in a big table with, say, 100 columns, and 20 of those columns are PII data, what happens when you need to delete PII for a single individual? You will likely end up deleting a row from the table that also eliminates 80 columns of non-GDPR-related data that could be valuable for analytical and business purposes.
- Reliance on costly update operations: You can run an operation that obfuscates all the PII data in a table by scrambling the targeted information and leaving the other data intact. However, that procedure is prone to errors and amounts to a much more expensive methodology than simply deleting data from a separate PII table from the get-go.
#2: Conduct batch deletions and apply Time Travel parameters
Rather than carry out PII deletions as requests come in, borrow a best practice from HIPAA (Health Insurance Portability and Accountability Act) and use batch deletions. By adding a GDPR delete flag and date to your data management process, you can execute a batch process once a month within the 30-day GDPR window.
For PII erasure requests, you must consider Time Travel and its setting. For example, GDPR regulations provide 30 days to delete PII (and up to 90 days under extenuating circumstances), which means that in Snowflake’s enterprise version, you should set Time Travel for PII-specific tables to no more than 30 days; otherwise, the data could be inadvertently restored.
Conversely, another useful aspect of Time Travel is that if you inadvertently delete the wrong person’s data, you can easily do a point-in-time restore of just those records (if you are still within the Time Travel window). This strategy allows recovery from a mistake without violating GDPR.
#3: Implement tracking
Another best practice is to maintain a table where you track PII erasure requests and PII deletions. This tracking approach also helps you avoid any rollback issues, which is an important safety concern when using Time Travel. For instance, if you happen to restore back to a time before a batch deletion was executed, you’ll know to query the metadata table so you can delete the PII data again.
The same holds true with Fail-Safe, which allows the restoration of all your “lost” or deleted data. As such, you may need to use your list of PII erasure requests to delete those individuals’ PII again. The good news is that Fail-Safe operates within a seven-day period, so you’ll always be within the 90-day GDPR window if you do monthly batch deletions.
At the heart of the EU law is the mandate for organizations to take full responsibility for the data they hold. This regulation is putting a much-needed focus on database architecture and management principles that ultimately makes companies better at safeguarding data.
If you design your database architecture with the most-restrictive privacy policies and regulations in mind, you can avoid heavy refactoring in the future. Today, that means adhering to GDPR and implementing a database design that keeps all your PII ducks in a row while still benefiting from Snowflake’s CDP.