Today’s organizations are generating more and more data from growing numbers of sources. Many have struggled to keep pace as they seek to get more value from their data, more quickly. One solution to emerge in recent years is the data mesh. This decentralized data organizational approach relieves many of the growing pains that occur when an organization sets out to become more data-driven.
In this article, we unpack what a data mesh is and how using this approach can solve many of the barriers to implementing a successful self-service data program at scale. We cover four core principles of the data mesh concept and show how you can use Snowflake to empower your data mesh.
What Is a Data Mesh?
The concept of a data mesh was first proposed in 2019 by Zhamak Dehghani in a groundbreaking blog post. A data mesh emphasizes a domain-oriented, self-service design. It represents a new way of organizing data teams that seeks to solve some of the most significant challenges that often come with rapidly scaling a centralized data approach relying on a data warehouse or enterprise data lake.
In a data mesh, distributed domain teams are responsible for the data in their respective business domains and for the pipelines that produce data products for consumption by data consumers in the organization. Consumption, storage, transformation, and output of data are all decentralized, with each domain data team handling its own specific data. Foundational to this new level of autonomy is a commitment to maintaining universal governance standards that ensure consistent interoperability and data standards across all domains and data products.
4 Core Principles of a Data Mesh Approach
The data mesh approach represents a major paradigm shift, and successful implementation relies on four guiding principles.
A traditional data architecture with a centralized data warehouse typically places data ownership with the data warehouse team. However, with the data mesh approach, data ownership is transferred to the domain teams. Additional responsibilities include ingesting, cleaning, transforming, managing, and governing data to create finished data products that can be easily accessed and readily shared with other teams when needed.
The reasoning for this arrangement is that the domain team is most familiar with the data in their business area and is therefore in the best position to manage the data expeditiously. As a result, placing data ownership with the domain teams increases the data agility in the organization.
A data mesh avoids data silos (and prevents domains from becoming silos) by embracing the concept of data as a product in each domain and through federated governance, which requires domains and data products to adhere to interoperability standards.
Data as a product
Organizations are encouraged to think about data in terms of “products.” Data sets have “customers” who will be using them, and the domain teams are responsible for not only their creation, but also for their maintenance to ensure they remain accurate and up to date with high quality. Data products need to be easily accessible and ready for use by other domain teams or data consumers within an organization.
A successful data mesh approach must rest on a common platform and set of tools that are easy to use—even for those without a technical data infrastructure background. Domain teams must be able to independently build and maintain their own data products. Without a self-service infrastructure in place, domain teams will be forced to rely on limited infrastructure resources and lack the tools needed to truly own their own data.
A data mesh supports scalability more effectively than a traditional framework because it doesn’t rely on a centralized data engineering team to have sufficient domain knowledge. Domain teams are empowered to contribute their expertise. Implementing a decentralized approach makes rapid scaling more feasible and enables quick access to actionable data.
Maintaining consistent access controls and data protections remains vital in a decentralized data mesh approach. Under the traditional, centralized approach, data warehouse teams are responsible for data quality. This arrangement presents potential problems since they are often not as familiar with the data as the source teams. Pivoting to a decentralized data mesh can improve the quality of the data by placing responsibility for maintaining high-quality data with those most familiar with it.
Federated governance can set metadata and documentation standards that each domain needs to apply to their data products. Governance also ensures that data products from different domains can be combined easily. It’s important to identify a balance between upholding global governance policy standards and allowing individual domain teams the freedom to interpret how standards are to be implemented when creating and sharing their data products.
How Snowflake Can Empower Your Data Mesh
The Snowflake Data Cloud connects organizations and data teams with the data they need, when they need it. Snowflake’s solution eliminates complexity and data silos that keep actionable data out of reach. Here’s how Snowflake can help your organization benefit from implementing a data mesh approach.
Enabling distributed domain teams with a powerful, self-service platform
True data ownership is only possible with the right technology for a self-service data platform. Domain teams need on-demand access to the resources and tools to support them at each phase of the data product lifecycle. Snowflake provides a rich set of capabilities for implementing automated data transformation pipelines and for creating and governing data products. Snowflake’s platform is designed for ease of use, near-zero maintenance, and instantaneous scaling of resources to enable a true self-service experience. Each domain team can deploy and scale their own resources according to their needs without impacting others and freeing them from reliance on an infrastructure team.
Sharing and discovering data products
Snowflake’s platform allows domain teams to operate independently and yet easily share data products with each other. Each domain can designate which data objects to share and then publish product descriptions in a Snowflake Data Exchange, which serves as an inventory of all data products in the data mesh. Other teams can search that inventory to discover data products that meet their requirements. Access to data products can be obtained instantaneously or optionally through a request-and-approval process between the data producer and the consumer. Either way, consumers get live access to data products without ETL-ing or copying data between domains. Each domain can easily track who is using their data products and how often.
Supporting federated governance
Snowflake also provides many native cross-cloud governance controls needed to support federated governance. This includes tracking of object dependencies, data lineage, metadata tags for data products, row-level access control, dynamic data masking for sensitive information, and other controls. In Snowflake, the definition of governance controls such as tags or access policies is separate from applying these controls to data objects. This enables organizations to define common governance standards for the data mesh while allowing individual domain teams to extend and apply these standards to the data in their domain as they see fit. It enables organizations to implement federated governance with the desired balance between global standards and local autonomy in the domains.
A data mesh approach eliminates many of the obstacles organizations face as they scale up their data collection and analysis capabilities. Snowflake eliminates technical barriers to entry, making data easy to access by those who understand it best.
To see how Snowflake can support your data mesh firsthand, sign up for a free trial.