Product and Technology

Unlock the Value of Your Sensitive Data with Differential Privacy, Now Generally Available

Digital illustration of a database stack with a security icon and wavy lines emerging from it.

The Snowflake AI Data Cloud has democratized data for thousands of customers, removing data silos and powering data sharing and collaboration use cases. Many customers have been able to unlock enormous value from their data with Snowflake, including safely collaborating on sensitive data using Snowflake Data Clean Rooms and Data Governance features. However, some highly sensitive data has remained off-limits due to regulatory requirements and privacy concerns — until now.

To address these challenges and truly democratize even highly sensitive data, we are excited to announce the general availability of differential privacy policies in Snowflake. These are built on a technology called differential privacy, which is regarded in academic literature as the gold standard for private data analytics. Differential privacy brings mathematical rigor to privacy protection, enabling customers to leverage previously inaccessible data. With the data unlocked, Snowflake customers can use it to power collaboration use cases like cross-organization, cross-geography data sharing, and even create new revenue streams with data monetization.

What is differential privacy?

Differential privacy is a privacy-enhancing technology that helps minimize the risk of sensitive information leakage by protecting the identity of individual entities in a data set, like people, organizations, location, etc. It has been implemented in many high-profile use cases with sensitive data, including the 2020 U.S. Census Data Release and Apple’s user data collection, and is highlighted in the 2023 AI Executive Order

With differential privacy, data consumers can run analytical queries on the full dataset, but they cannot see the row-level data nor can they reverse engineer sensitive information. It complements data privacy methods that protect data at rest, in motion and in use. 

To do this, Snowflake differential privacy policies dynamically add noise to query results. The amount of noise added depends on how sensitive the query is, as determined by the mathematical techniques of differential privacy. For example, if the query is calculating broad aggregates, the amount of noise will be relatively small, potentially negligible. If the query concerns a small group or even one individual, the noise will be large enough to obscure their identities and protect your most sensitive data against privacy attacks.

Typically, implementations of differentially private systems require significant investment and expertise; open source differential privacy libraries are not only not differentially private end-to-end, but they also may not implement features that make differential privacy useful for real-world use cases. Snowflake differential privacy policies, however, are ready to use out of the box, without these downsides.

Differential privacy unlocks rather than devalues data

Differential privacy is a significant improvement compared to existing approaches, which do much more to devalue data. To illustrate this, let’s examine a use case from the healthcare industry that uses a dataset of patient visits to healthcare providers. In this example, the data provider needs to protect the patient identities to comply with privacy regulations.

For the basic version of this use case, let’s assume we have a dataset where each row represents one visit between a patient and a healthcare provider. Without differential privacy, the data provider would typically mask the fields that could be used to identify the patient, like the date of the visit. Fields like these would be masked to a coarser level of granularity, such as removing the month and day and leaving just the year. While this approach may appear to make sense from a privacy point of view, it drastically reduces the value of the data. For example, data consumers can no longer ask questions like, “On average, how long is drug X taken for condition Y?”

With differential privacy, the data provider does not need to mask or remove any fields, enabling data consumers to ask these kinds of detailed questions and receive useful answers.

While data masking removes some of data's value by hiding information, differential privacy makes that data available while still preventing data consumers from seeing row-level data.
Data masking versus differential privacy: Masking makes data unusable for granular questions, while differential privacy allows the data to be analyzed while preventing data consumers from seeing row-level data.

The difference between existing approaches and differential privacy becomes even more stark when we look at a more realistic version of this use case. Often in datasets like patient-doctor visits, the condition and drug prescribed is not cleanly available as a column. Instead, the data has an unstructured text field that contains the doctor’s notes from the visit. 

It is difficult to safely redact sensitive data from fields like this, and in many cases the redaction will devalue the data. For example, the notes field often contains identifying information like the patient’s name because it includes a copy-and-paste of all the fields from the provider’s intake system.

In many cases, approaches to redacting this information also redacts personal health information like the condition diagnosed or the drug prescribed, which — while sensitive on a per-visit or per-patient basis — also contains the data’s main analytical value. Because of the devaluation, these types of use cases are often not possible with today’s commonly used privacy techniques.

Differential privacy unlocks these use cases and enables data consumers to query against unstructured text fields in aggregate. Analysts and researchers can now ask and get answers to questions like, “Based on the doctor’s notes for drug Y, what percentage of patients are experiencing side effects?”

Use cases for differential privacy span all industries

While the previous example involves a use case in healthcare, we have seen the same story over and over again in all industries. Here is a small sampling of use cases:

  • Advertising, media and entertainment: Protect individuals in event-level data on ad outcomes for ad targeting, optimization and measurement

  • Financial services: Consumer banking data monetization; capital markets data provisioning including Central Risk Book; prime brokerage; asset management

  • Healthcare and life sciences: Unlock datasets like electronic medical records, social determinants of health, genomic data, medical and pharmacy claims, and clinical trials for applications in research, data monetization and drug development

  • Manufacturing: Product telemetry on equipment for predictive maintenance

  • Public sector: Cross-agency/cross-office data sharing; publishing statistics outside of classification zones or to the public; broaden access to PII and PHI for research

  • Retail and consumer goods: Cross-organization Customer 360, cross-outlet consumer behavior data monetization, and supply chain benchmarking

  • Technology: External data sharing for research and policy makers; product telemetry analytics

  • Telecom: User device movement data monetization

While some of these use cases are possible today, they are not reaching their full potential because information is redacted from data in an attempt to protect privacy. With differential privacy, these use cases are fully unlocked, and the data remains protected from privacy leaks and targeted privacy attacks.

Realize the full value of your sensitive data

Differential privacy policies are now available to all Snowflake customers with Enterprise edition accounts or higher. To get started and learn more about differential privacy in Snowflake, check out this demo video and read the Snowflake documentation. And don’t miss our on-demand webinar on privacy-enhancing technologies that can help you balance the trade-offs between data privacy and utility.

Photo illustration of a man looking at a laptop with a flow-chart style diagram behind him
Report

The Definitive Guide to Governance in Snowflake

Learn why a strong governance foundation is critical not just for managing data, but also for successful applications and AI assets.
Authors
Share Article

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Start your 30-DayFree Trial

Try Snowflake free for 30 days and experience the AI Data Cloud that helps eliminate the complexity, cost and constraints inherent with other solutions.