The future of AI, revealed live

Stream Summit keynotes free June 1–2.

Foundational Guide

Data Security: Risks, Controls and the Governance Foundation

Modern data security has to account for how data is stored, accessed, shared and used across increasingly complex environments. Protecting sensitive data across its lifecycle requires core controls, clear ownership, consistent policies and visibility into how data moves.

DATA SECURITY DEFINED

Data security is the discipline of protecting digital data from unauthorized access, corruption, theft or loss across its lifecycle using controls such as access management, encryption, masking and monitoring — supported by governance that defines ownership, classification and policy.

The data estate has changed faster than most security programs have. Data now moves through pipelines, partner shares, AI agents and downstream applications — and these movements often create control points that traditional security models weren’t built to handle at scale. Encryption, access control and monitoring are still essential, but data security now depends on governance: knowing what data exists, who owns it, how sensitive it is and what rules apply to it.

Data security and governance are distinct disciplines, but when they operate separately, policies are difficult to enforce as information moves between systems, teams and environments. This guide covers what data security means now, why the stakes have increased, the major risks and regulations shaping enterprise programs, and the controls organizations use to protect data across its lifecycle.

What is data security?

Data security is the set of practices, controls and technologies that protect digital data from unauthorized access, corruption, theft or loss across its lifecycle. It has to account for data at rest, data in transit and data in use — and for the identities, policies and systems that interact with that data at each stage. The control points differ depending on where data sits and how it moves, which is what makes the discipline complex in practice.

The foundational framework is the CIA triad: confidentiality, integrity and availability. Confidentiality means only authorized users, services and workloads can access sensitive data. Integrity means data remains accurate, complete and trustworthy as it is created, transformed, queried, shared or used in downstream applications. Availability means data remains accessible to authorized users when the business needs it, including during failures, disruptions or attacks.

Data security is related to but distinct from data privacy and data governance. Data privacy defines how personal data should be collected, used, retained and disclosed, typically in response to regulations such as GDPR or CCPA. Data governance defines who owns data, how it is classified, what rules apply and who or what is permitted to use it. Data security enforces those rules — through access controls, encryption, masking, monitoring, retention policies and incident response.

Why data security matters now

Sensitive data no longer sits in a small number of controlled systems. It moves through SaaS applications, data lakes, warehouses, partner environments, collaboration workflows and AI pipelines. A single column containing account numbers, protected health information or financial details can be copied, transformed, embedded, shared or retrieved in multiple places — which means security has to follow the data rather than assume the network boundary will hold.

Regulatory pressure reflects the evolution of the data estate. GDPR, CCPA, HIPAA, PCI DSS and SOX continue to shape how organizations protect personal, payment, healthcare and financial data. DORA entered into application on Jan. 17, 2025, for financial entities in the EU, while EU Member States were required to transpose NIS2 into national law by Oct. 17, 2024. India’s Digital Personal Data Protection Act, 2023, established the framework for digital personal data, and the Digital Personal Data Protection Rules, 2025, set phases for implementation timelines.

Quote Icon

Security has to follow the data rather than assume the network boundary will hold.

Compliance, however, is merely the baseline. A company can satisfy a narrow audit requirement and still have over-permissioned roles, stale copies of sensitive data, unmonitored partner shares or AI workloads that retrieve more context than the user should see.

Effective data security requires defense in depth: encryption to protect data at rest and in transit, access control to restrict who can use it, masking and row-level policies to limit exposure inside queries, data loss prevention to reduce inappropriate movement, posture management to surface risk and continuous monitoring to detect unusual behavior.

COMMON PITFALL

Focusing only on AI guardrails to secure AI systems can miss the underlying data risk: sensitive information may still be exposed through training data, embeddings, retrieval systems, agents or outputs.

AI has made data security more urgent. Training data, vector embeddings, retrieval-augmented generation systems, agent-accessed context and model outputs create leakage paths that did not exist in the same form five years ago. The access-control question is no longer just “Can this employee query the table?” but also “Can this agent retrieve the row, use the embedding, call the tool and produce an output that exposes sensitive data?”

Many organizations are lagging with AI security. According to Bedrock Data’s 2025 Enterprise Data Security Confidence Index, 79% of security teams struggle to classify sensitive data used in AI and machine learning systems, and fewer than half report high confidence in controlling sensitive data used for AI training.

Watch to learn how to implement security best practices with Trust Center in Snowflake Horizon:

Access control

Access control is the foundational data security mechanism: the rules and enforcement that determine who can read, write, modify or share specific data. Access management defines the operational lifecycle — how access is requested, approved, provisioned, reviewed and revoked — while access control enforces those decisions through roles, attributes, privileges, policies and audit trails.

Most enterprise programs combine role-based access control (RBAC), attribute-based access control (ABAC) and fine-grained policies. RBAC grants privileges through roles tied to job functions or responsibilities. ABAC evaluates attributes such as department, region, clearance, data sensitivity or workload type. Row- and column-level rules then refine access inside a table, so users can work from shared data without receiving the same view of every record.

Role-based access control

Role-based access control is the most widely deployed access-control model because it maps permissions to the way organizations already work. Instead of granting privileges directly to each user, administrators grant privileges to roles, assign users to those roles and use a role hierarchy to inherit permissions where appropriate.

The principle of least privilege should shape the role model from the start. Roles should grant the minimum access required for a person, service account or workload to perform its job, with elevated privileges separated, reviewed and time-bound when possible. As environments grow, teams must monitor for role sprawl, inherited permissions and exception-based access.

Data encryption

Data encryption transforms readable data into ciphertext that only authorized parties can decrypt. It protects data at rest in storage, data in transit across networks and, depending on the architecture, data in use during processing. Encryption reduces exposure if storage is accessed improperly, traffic is intercepted or a backup leaves its intended environment.

Encryption at rest protects stored data such as tables, files, snapshots and backups. Encryption in transit protects data as it moves between clients, services, applications and platform components, typically through TLS. Encryption in use is the harder category because data often has to be processed in readable form, but modern platforms increasingly reduce exposure through secure execution environments, policy enforcement and carefully controlled processing paths.

Organizations need to have an operating model for key management to determine who controls the keys, how keys are rotated, how access to keys is logged and whether regulatory or internal requirements call for customer-managed keys. For organizations with strict control requirements, customer-managed keys provide additional separation between the platform operator and the organization’s cryptographic control model.

The major mechanisms of encryption protect the data layer, while governance determines which users, workloads and policies can cause decrypted data to be used.

Data masking

Data masking obscures sensitive values from users who should not see them while preserving enough structure for analysis, testing or operations. For example, a support analyst may need to know that a customer record exists without seeing the full Social Security number, credit card number or diagnosis code. A data scientist may need geographic or demographic patterns without direct identifiers.

Static masking creates a sanitized copy of data, often for lower environments such as development, testing or training. Dynamic data masking applies policy at query time, so the same column can return different values depending on the user’s role, attributes or context. Tokenization is related: sensitive values are replaced with non-sensitive tokens that can be reversed only under controlled conditions. Format-preserving encryption can also protect sensitive values while retaining the structure that applications expect.

Effective masking depends on accurate data classification. A policy cannot protect the email_address column or an unstructured file containing payroll data unless the organization can identify the sensitive data in the first place. Classification, tagging and masking work best as one loop: discover sensitive data, apply a tag, attach the appropriate policy and monitor whether new assets require the same treatment.

Row-level security

Row-level security restricts which rows a user can see in a table based on their identity, role, attributes or relationship to the data. It’s especially useful when multiple teams, regions, business units or tenants need to work from the same table without receiving the same records.

A multi-tenant analytics application is the classic example: the application may serve hundreds of customers from a shared table, but each customer should see only its own rows. Without row-level security, teams often create duplicate views, extracts or application-specific filters that are hard to audit and easy to misconfigure.

Row-level security is more durable when it’s declarative and attached to the data object rather than buried in application logic. If the policy sits at the table level, it applies consistently across queries and downstream use cases — reducing the chance that a new dashboard, notebook, agent or application bypasses the intended filter.

Data loss prevention

Data loss prevention (DLP) is the discipline of detecting and preventing sensitive data from being exfiltrated, leaked or shared inappropriately. The cause may be malicious, including credential theft and data exfiltration, or negligent, such as an analyst exporting more data than needed, a pipeline writing sensitive data to an unmanaged location, or a partner share that remains active after a business relationship changes.

Modern DLP combines classification, policy enforcement, monitoring and response. Classification identifies sensitive data. Sensitivity labels or tags attach meaning to that data. Policies determine whether the data can be queried, copied, exported, shared or used by a workload. Monitoring then looks for access patterns, egress activity or policy exceptions that require investigation.

Traditional DLP often operated as a network or endpoint control, which made sense when sensitive files moved through email, endpoints and managed network channels. In a modern data environment, sensitive data moves through pipelines, shares, APIs, notebooks, applications and AI systems. DLP has to become more data-centric: policies should follow the data object, not just the network path.

Data security posture management

Data security posture management (DSPM) is the continuous discovery, classification and risk assessment of sensitive data across an organization’s environments. It answers three practical questions: Where is sensitive data? Who or what can access it? What risk exists because of that access, location or policy state?

DSPM emerged because many security tools were built around infrastructure, not data. A cloud security posture management (CSPM) tool can identify an exposed bucket, an open port or a weak identity and access management policy — but it does not necessarily tell a security team whether the exposed resource contains customer records, source code, health data, payment data or high-risk operational logs. DSPM focuses on the data itself: sensitivity, location, access paths, policy coverage, usage patterns and risk.

Platform-native DSPM reduces some of the gaps that arise when a third-party tool has to crawl every data store independently. Because the platform already knows what objects exist, which users and roles can access them, what tags and policies apply, and how data is being queried or shared, discovery and risk scoring have a more continuous foundation than periodic scans alone.

Data security risks and threats

Data security risks fall into three broad categories: external threats, internal threats and AI-specific risks.

  • External threats include ransomware, phishing, credential theft, supply-chain compromise and attacks that use stolen or abused identities to reach data systems. These remain active because attackers do not need direct database exploits if they can obtain credentials, compromise a service account or move laterally from a connected system.
  • Internal threats include malicious insiders, negligent users, shadow IT, misconfiguration and over-privileged accounts. A user with broad access can query sensitive data, export it, share it or route it into an unmanaged workflow without triggering the same alarms as an external attacker — which is why least privilege, separation of duties, access reviews and monitoring are data security controls, not just identity administration tasks.
  • AI-specific risks add a newer layer. Prompt injection can cause a retrieval-augmented system to access or summarize sensitive context in ways the user did not intend. Training-data leakage can surface sensitive records through model behavior or outputs. Agentic overreach can occur when autonomous agents receive broader tool or data access than the user should have. Model output exfiltration can happen when sensitive results leave the governed environment through tool calls, plugins or connected applications.

Regulations and compliance

Regulations affecting data security define the obligations that security controls must satisfy. A regulation may specify safeguards, breach notification rules, audit expectations, retention limits or rights related to personal data, and organizations subject to that law must translate the requirements into working controls across their data estate.

  • GDPR is the EU’s rights-based framework for personal data. It requires a lawful basis for processing, gives individuals rights over their personal data, and creates obligations around protection, accountability and breach notification.
  • CCPA applies to personal information of California consumers. It defines consumer privacy rights and, in certain breach scenarios, includes a private right of action.
  • HIPAA governs protected health information in the U.S. healthcare system. It requires administrative, physical and technical safeguards for covered entities and business associates.
  • PCI DSS defines technical and operational requirements for protecting cardholder data environments. The PCI Security Standards Council describes its standards as a framework of specifications, tools, measurements and support resources for safe handling of cardholder information.
  • SOX focuses on controls over financial reporting. For data security teams, the practical concern is the integrity, access control and auditability of financial data used in reporting processes.
  • DORA applies to financial entities in the EU and focuses on digital operational resilience, including the ability to withstand, respond to and recover from information and communication technology disruptions. It entered into application on Jan. 17, 2025.
  • NIS2 expands cybersecurity and incident-reporting obligations for critical sectors across the EU.
  • India’s DPDP Act governs digital personal data in India. The Act describes its purpose as providing for the processing of digital personal data in a way that recognizes both the right of individuals to protect personal data and the need to process it for lawful purposes.

The common operational problem is not simply knowing which rules apply, but proving that access, masking, retention, monitoring and incident response controls are applied consistently across the systems where regulated data lives. Data governance helps support regulatory compliance by connecting regulatory obligations to the data assets they cover — so teams know what is protected, what policy applies and where the gaps are.

QUICK TIP

Identify your most sensitive data assets and their owners before expanding security controls. Clear ownership and classification make it easier to enforce least-privilege access, masking and AI governance across the entire data lifecycle.

Data security best practices and controls

Data security programs vary by industry, architecture and risk profile. The practices below cover the foundational areas that most enterprise programs need to get right. They’re not exhaustive, but they reflect the most important ones to data estates that are growing more distributed and implementing AI workloads.

Classify and inventory sensitive data

Start with the objects that need protection: tables, views, files, stages, data sets, embeddings, shares, applications and AI-accessible context. Classification identifies whether those objects contain PII, protected health information, cardholder data, credentials, intellectual property or other sensitive data. Inventory work then connects that classification to owners, lineage, retention rules, access paths and downstream usage.

Apply least privilege access across users, services and workloads

Each user, service account, application and AI workload should receive only the access it needs to perform a defined task. Access should have a lifecycle: a new user receives access through an approved workflow, elevated access expires when the task ends, dormant users are removed and service accounts are reviewed when pipelines or applications change. Zero trust principles reinforce this model by treating every request as something to verify based on identity, context, device, workload and sensitivity.

Protect sensitive data with policy-based controls

Dynamic data masking can hide sensitive column values at query time. Row access policies can restrict records by region, tenant, account or role. Tokenization can preserve useful structure while reducing exposure of the original value. These controls are especially important when multiple teams use the same governed data. Rather than creating many copies of a customer table for each region or function, teams apply policies to the shared object and let the platform enforce the appropriate view.

Secure data movement, sharing and collaboration

Each time data moves — through a pipeline, a partner share, an export or a model — it creates a control point. Is the destination approved? Is the data still classified? Does the same masking or row-level policy apply? Is the share still needed? Secure collaboration works best when teams can share governed data without creating unmanaged extracts.

Monitor access and investigate unusual activity

Query history, access history, login history, object changes, failed access attempts and policy changes all help security teams detect misuse, investigate incidents and prove control effectiveness. Monitoring should combine policy context, identity context and data sensitivity — a single query against a protected table may be normal for one role and suspicious for another.

Govern data across its full lifecycle

Temporary tables, development copies, test data, derived features, model training data, vector indexes, reports and archived data can all contain sensitive information. Lifecycle controls should define retention, deletion, archival, backup, recovery, dev/test use and approved AI use. Sensitive production data should not be copied into lower environments without masking or synthetic-data controls.

Prepare for incident response and recovery

Incident response plans should define who investigates suspicious access, who can revoke privileges, who communicates with legal or compliance teams and how evidence is preserved. Backups, replication, failover, protected recovery points and restoration testing belong in the data security conversation because availability is part of the CIA triad.

Audit controls and manage exceptions

Access reviews, policy exceptions, masking coverage, classification status, incident records, key rotation, retention enforcement and monitoring findings should produce artifacts that auditors and internal risk teams can review. Exceptions should be explicit, approved and time-bound — a permanent exception is usually a policy failure in disguise.

Data security with Snowflake

Snowflake brings data security and data governance into the same platform control plane, so organizations can define policies, classify data, manage access, monitor activity and govern AI workloads without maintaining separate enforcement models for each use case.

Snowflake Horizon Catalog is central to this model. It provides governance, discovery and catalog capabilities for the data platform, helping teams manage data, apps and AI assets with shared context — including classification, masking policies, row access policies, lineage and data quality monitoring.

RBAC, masking, row access policies, tagging and data protection policies work from the same governed data model, reducing double bookkeeping across governance and security teams. Snowflake data protection policies let teams define granular permissions once and enforce them consistently at query time, without needing to create additional roles or views as data and teams grow.

The Snowflake Compliance Center gives teams a way to monitor Snowflake account security posture, surface scanner findings and evaluate data security and authentication readiness from within Snowflake — including detection findings focused on potentially suspicious activity.

Finally, AI workloads rely on governed data but introduce new access patterns through agents, tools, retrieval systems and model outputs. Horizon serves as a universal AI catalog that unifies context and governance across data and AI assets, while Snowflake’s Bedrock Data partnership addresses visibility into AI agents and the data they can access.

Data governance is the foundation of data security

Data security controls are only as effective as the foundation beneath them. Access policies, masking rules, row-level restrictions and monitoring all require the same inputs to work reliably: an accurate picture of where sensitive data lives, how it is classified and who is responsible for it. Without that foundation, teams make reasonable decisions with incomplete information, and the gaps tend to show up at the worst moments, during an incident investigation or a compliance audit.

Governance provides that foundation, helping teams determine where and how data security controls are applied. That is why the most defensible data security programs treat classification, ownership and policy not as governance deliverables but as security prerequisites.

KEY TAKEAWAY

Data security works best when controls such as access management, encryption, masking and monitoring are connected to clear ownership, classification and policy context.

Frequently Asked Questions

Your common questions about data security, answered by Snowflake experts.

Data security is the technical and procedural protection of data assets. Data privacy is the framework of rights, policies and obligations that governs how personal data is collected, used, shared and retained. Security enforces what privacy promises.

Data security and data governance overlap closely. Governance defines many of the rules — ownership, classification, access intent and policy — while security enforces those rules through controls such as encryption, access policies, masking, monitoring and incident response

The major categories include encryption, access control, data masking, tokenization, row- and column-level policies, data loss prevention, backup and recovery, posture management and continuous monitoring. Each category protects a different part of the data lifecycle.

Data security posture management is the continuous discovery, classification and risk assessment of sensitive data. It focuses on the data itself: where sensitive data lives, who can access it, how it is protected and where risk exists. That makes it different from cloud security posture management, which focuses primarily on cloud infrastructure configuration.

AI expands the sensitive-data attack surface. Training data, vector embeddings, retrieval-augmented systems, agent-accessed context and model outputs can all expose data if access controls and policies are not applied consistently. Securing AI means treating models, agents and tools as new participants in the access-control model, not adding AI guardrails on top of an unchanged data security posture.

Explore Data Governance Resources

Explore Data Governance Topics

Deep dives into every aspect of data governance