During the process of turning data into insights, the most compelling data often comes with an added responsibility—the need to protect the people whose lives are caught up in that data. Plenty of data sets include sensitive information, and it’s the duty of every organization, down to each individual, to ensure that sensitive information is handled appropriately. A foundational part of protecting people’s data is putting in place policies that govern how that data may be used, and how it may not. It’s the data platform’s job, then, to provide the technical features that make it possible for organizations to live up to their policy requirements. In the Data Cloud, the technical challenges multiply, and the ability to easily blend multiparty data requires innovative approaches to data protection.
In this blog post, I’ll explore how organizations have made their journey to processing PII in the Snowflake Data Cloud. What challenges did they face? What solutions did they use to overcome those challenges? And how does it look to use the Snowflake Data Cloud to process the world’s most valuable assets?
But before continuing, note the following:
- Sensitive information has many monikers. An abbreviated list would include PII, PHI, classified data, and privileged information. For the purposes of this discussion, I will use these terms interchangeably. It is absolutely true that the specific needs of PHI versus those for trade secrets are very different. But I’ll be discussing the things that are common to both and to all the other forms of sensitive information, too. So when you see any one of these terms, imagine it as an implied list of all of them.
- Second, it is important to stress that the goal of this exercise is to understand how the Snowflake Data Cloud can—and is—being used to process sensitive information today. I won’t go into any specific organization’s detailed use case. But, if you’re trying to harness Snowflake’s capabilities, and you’re wondering how you can use PII in your workloads or how others have built a framework to do that, this post is for you. You should walk away with a good understanding of how the Data Cloud offers ways to get the value that you require while still meeting your duty as a data steward.
The Challenges of Using Sensitive Information in Analytical Workloads
Let’s go straight to the elephant in the room—the « cloud » in the Data Cloud is often an immediate obstacle for organizations considering processing their sensitive information using Snowflake. In many IT security groups, protecting information is synonymous with locking it behind layers and layers of firewalls and specialized systems that are deeply embedded in privately owned data centers. The notion of removing the crown jewels from that heavily fortified vault and moving them to what IT thinks of as the Wild West of the cloud is seen as absurd. In fact, the organization’s challenge can be simply stated as follows: We need to apply the same standards and scrutiny to Snowflake’s security practices that we do to our own operations. Many security, governance, and compliance professionals cannot imagine that a SaaS operation in the cloud could measure up to their standards and practices. Some may have had bad experiences with services in the past that led them to feel this way, while others may have read headlines that made them hesitant to put their own data into the cloud.
Another challenge when dealing with PII is that there are often well-established processes and policies dictating how it may be used. This isn’t the simple “we’ve always done it this way” challenge that any new, transformative technology such as the Data Cloud may face. It’s not simple technical momentum; to prove that they are meeting their duty to protect sensitive data, organizations create policies and follow specific procedures with auditing baked in. Even if a new technology turns the data processing world upside down, it’s not allowed to sidestep these important processes. Insights may represent new, exciting opportunities, but in many cases failing to meet information security standards and laws, such as PCI or HIPAA, are existential threats to the organization as a whole. New revenue streams don’t mean anything to firms that have been sued out of existence, or who have lost their right to handle the very data that powers these workloads due to negligence. So the Data Cloud must be able to fit into these existing architectures—no exceptions!
Perhaps the most fascinating challenge many face is adapting their processes after years of living with exceptions. Those layers and layers of customer-built protections can actually hide a lot of technical debt. Exploring why organizations are reluctant to move to the cloud many times reveals that it’s because they’ve been running under a series of complex risk exceptions that don’t make sense outside of their very specialized systems. Oddly enough, in these cases, the security folks are usually quite happy to move to the cloud—not because they are particularly enamored with the idea of putting data in the cloud, but because it’s a forcing function that sweeps away the whacky custom stuff in favor of standard, well-tested approaches to security and governance. If that sounds like a solution rather than a challenge, the challenge here is taking on a complete overhaul of the way sensitive data is handled. It turns a migration into a transformation, and transformations are challenging.
Communication is the most overlooked challenge when undertaking a transformational move to the Data Cloud that involves processing sensitive data workloads. Many processes that are critical for meeting the duty to protect PII happen on a longer time scale than IT changes. It’s easy to overlook the annual, biannual, and even quarterly cadences of governance and compliance when you’re on an agile cadence measured in weeks and days. But this can be a critical error. These processes can arrest plans and ruin schedules. It’s easy to blame the messenger—and it’s one way security people get a bad rep—but everyone shares the duty to protect people by protecting their data. So in the rush to extract value ahead of the competition, it’s important not to sacrifice the need to be transparent, document fully, and check in with security partners. Remember: Security partners are not partners in crime but partners in not committing crimes.
How the Snowflake Data Cloud Overcomes Challenges Processing Sensitive Information
The first surprise many security and compliance professionals get when evaluating Snowflake is the long and growing list of compliance benchmarks that the Data Cloud has achieved and maintains across all three of our cloud provider platforms: ISO/IEC 27001, HITRUST/HIPAA, and PCI DSS are available in the Data Cloud, and you can have FedRAMP Moderate levels of assurance in our government deployments.
Compliance, of course, is not a replacement for security, but auditors do make you write down your controls, policies, and practices in painstaking detail. So you can read our SOC 2 Type 2 report, or any of the others, to learn about all the ways Snowflake is running a tight ship. It’s also worth pointing out that all these things have been achieved by Snowflake’s Data Cloud itself. Sometimes SaaS vendors “pass through” the reports of the cloud providers. It’s very good that our cloud provider partners have also achieved many of these benchmarks, but everything here is all about the Data Cloud and is specific to how Snowflake runs things. When security, governance, and compliance professionals cannot imagine that a SaaS operation in the cloud can measure up to their current standards and practices, seeing how well Snowflake technology is built and how stringent Snowflake security is comes as a big shock.
The way Snowflake runs its side of the shared responsibility of protecting data is only half the solution. The Data Cloud must be able to fit into existing security architectures. Snowflake supports both common security standards and traditional approaches used for securing data platforms. Security architectures that can integrate to any other data platform will find integrating with Snowflake just as easy. It starts with authentication integrated with your authoritative sources such as Azure Active Directory, Okta, Ping, and many more.
Anything that can speak SAML or OAuth can talk to Snowflake:
- Authorization is done using an RBAC structure that will fit into any existing roles-management program.
- SCIM (System for Cross-domain Identity Management) support is provided to link Snowflake roles to your authoritative sources directly.
Customers will also hook up:
- Privileged identity management solutions to control administrative rights
- SIEM and event management platforms to monitor security related events
- Secrets management solutions to automate credential lifecycles for users and service accounts
In other words, the Data Cloud can be every bit as locked down as you would need to manage sensitive information, and likely using the same tools you already have.
It takes more than technology to tackle the challenges posed by transforming your security approach to protect sensitive information and more than ensuring the ongoing security benchmarks presented by the compliance lifecycle over quarters or years. If sensitive data protection has been accomplished through years of exceptions and customizations, moving to even a robust set of built-in, standardized, security features is going to take experience. Luckily, Snowflake’s consultants and partners have been doing that since the first days of the Data Cloud. Data isn’t the only part of the tech world going through a digital transformation. Security and compliance are doing the same. Using the move to Snowflake’s robust security approach as a catalyst, these projects are not only meeting the needs to protect sensitive information; they are also simultaneously aligning these new security measures with the new approaches that security and compliance are taking to these problems.
The Journey to Processing Sensitive Information in the Data Cloud
Finally, here’s the question at hand: How have organizations made the journey to processing PII in the Data Cloud? The foregoing discussion provides part of the answer. In every case, the first step to putting sensitive information into the Data Cloud is letting people realize that the concept is sound. Every organization that takes its duty to protect PII seriously must ask questions. They have forced their own IT operations and platforms to meet these high standards. Of course Snowflake will have to meet them as well.
Above, I walked through the first step of that journey. So now, you can see Snowflake has what it takes to be a platform for running even the most sensitive workloads.
The next step is determining where you really are at the moment. Not every organization starts from the same place. One major U.S. retailer came to Snowflake with pain around processing its largest customer data workloads. The data was loaded with PII, required a patchwork of national and international privacy protection assurances, and was currently being processed on premises using a platform that was more than 10 years old and buried in compliance-related technical debt (loads of bespoke patches to technology and process). The company spent quite a while on the first step I covered to become convinced the journey was even possible. Once it stepped over the threshold, the questions became more practical. How could this be done for its organization specifically? There is no magic wand to wave for this. What it took was sitting down and making careful plans. Snowflake’s implementation partner and consulting team brought experience to the table, and the retailer brought its team’s encyclopedic knowledge of their data, security policies, compliance burdens, and past technical approaches. Every day the company remained on its old platform was costing money and opportunities, so there was pressure to act fast. As usual, these security conversations were portrayed as friction that was delaying things. Luckily, the executives involved understood that there would be zero benefit without the right time invested in the security architecture because the whole project would fail. They saw the truth of it: The ability for the Data Cloud to meet their security needs was precisely the key reason they would be able to reap any of the other benefits.
That retailer’s journey is not typical. Most organizations will have a journey that happens in stages. More typical is the journey of a global financial institution that has been moving slowly but steadily to the Data Cloud for all its needs. After the first step, it discovered it had some work to do on its side. And it’s fair to note that Snowflake did, too. The customer had needs that the Data Cloud had to stretch to meet. So Snowflake and the company collectively came up with a list of needs, and then assessed the risks associated with the items on that list. Now the company has an internal process. As each line of business assesses whether it’s ready to move to the Data Cloud, it evaluates the risk of the current state. When this process started more than 18 months ago, the list was longer and the risk was seen as much higher. Of course, when you’re a major multinational financial institution, there’s nothing that you assess to have zero risk. So when a data leader from a line of business is making an assessment, that leader is weighing costs and benefits as well as risks and advantages. Over time, as both Snowflake and this institution have checked off items on the list, the risk has gone down, and lines of business with more sensitive information have come on board. Now, the CDO’s goal is to have the whole institution on the Data Cloud. This type of partnership is what most Snowflake journeys look like. Of course, the whole Data Cloud benefits from these results because these advancements aren’t made for just one organization. When the Snowflake security tide rises, all boats floating on the Data Cloud rise, too (to coin a common phrase).
Some challenges are not as simple as technology, processes, and people. Policies in organizations that handle large amounts of sensitive information are designed to be very hard to change—much like the laws of a nation. And in some cases, they literally are the laws of some nations. So some face a step in the journey that looks like a dead end: Their policy states plainly that they may not put sensitive data into the cloud. It doesn’t matter what the business wants when the policy is that clear. Of course, they can work to change that policy, but how can they reap any benefit in the meantime? This is why an extremely common step in this journey is introducing third parties to handle the tokenization or encryption of the sensitive data. This is not a new pattern for most of these organizations. An EMEA-based healthcare provider was already applying tokenization to its on-premises data warehouse. When it wanted to move to the Data Cloud to expand what it could get from its information, it was a hard requirement that Snowflake offer the same control. In this case, it had to literally be the same control because the exact approach and solution it used had found its way into the policy itself. Luckily, the company wasn’t the first one to have its journey require that step in the Data Cloud. So Snowflake could accommodate the company on day one. People often ask if tokenization is a good thing, and the answer is always going to be that it depends on your policy. What is sure is that tokenization is an excellent way to get the benefits of the Data Cloud today while you wait for policy to change.
Is It Time for Your Sensitive Information to Be in the Data Cloud?
If you’ve read this far, you know the answer to this question: It depends on where your organization is in its journey. And it depends on the policy your organization must adhere to. That policy may come from your own security and compliance groups or from your government. It depends on what your current data platform is and how much technical debt you’ve piled on it to meet these policy needs today. It depends on how convinced your security and compliance folks are that Snowflake really has made the Data Cloud a secure place to act as a home for sensitive information. It also depends on how willing your organization is to partner with Snowflake to work on completing your security digital transformation.
There are things I can say for sure, though. The Data Cloud is ready for sensitive data workloads today. Snowflake has invested considerable effort at every level of the platform to make sure of that. And many organizations have acknowledged that, and they are reaping the benefits as you read this. The journey each one of them took to get to that point will echo many of the themes from this discussion. When you’re ready to start your own journey, you will see each of these themes as well. Snowflake is here to partner with you every step of the way.
Here are useful links to implementation partners and Snowflake security partners:
Snowflake services partners