Over the last couple of years, the data mesh architecture has emerged as a new framework to help solve many of the challenges that have plagued organizations, especially as they’ve scaled their data and data teams and tried to deliver more value, faster. Removing these barriers to data and delivering value at scale is a lofty goal and one that Snowflake also feels passionate about helping our customers solve. As with any architectural pattern, succeeding with a data mesh is not simply a technology problem to solve; it’s also about having the right technology to set up your teams for success and even catalyze change throughout your organization.
Let’s break down the key principles of the data mesh and how the Snowflake Data Cloud, backed by its unique platform, can give you the right foundation to build on as you embark on your data mesh journey.
The Four Principles of a Data Mesh
Data mesh was coined by Zhamak Dehghani, director of emerging technologies at Thoughtworks, in her seminal pieces of How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh1 and Data Mesh Principles and Logical Architecture.2 The idea of a data mesh was a reaction to the trade-offs organizations were being forced to make as they scaled their data into less-governed and less-structured monolithic data lakes. As the number of data sources and data consumers grew, so did the number of data pipelines needed to connect them all. This pushed more and more of the work burden onto specialized teams who had the skills to develop for these notoriously challenging technologies but were disconnected from the domain experts who needed the data to do their jobs. This led to the all-too-common scenario of downstream data consumers waiting on complex pipelines and loosely stitched-together technologies to get the data they needed, and it also led to overworked engineering teams trying to keep up with demand.
Figure 1, from Data Mesh Principles and Logical Architecture by Zhamak Dehghani,3 shows the four core principles that define a data mesh architecture:
- Domain-driven ownership
- Data as a product
- Self-service infrastructure
- Federated governance
Figure 1: The four core principles of a data mesh architecture
Principle 1: Domain-driven ownership and architecture
The first principle of a data mesh is shifting the power of data and ownership into the hands of the domain teams. They own the data end to end—from ensuring they have the right sources or ingested data to work with, to building and maintaining any processing pipelines necessary, to serving the data out for other domain teams to tap into as products (more on that later) with the right quality guarantees and governance controls in place. The domain teams can be defined by department, business unit, or other similarly motivated groupings and, if they are properly implemented, new domain teams should be able to be added fluidly especially when data is being correlated into new data products.
Principle 2: Data as a product
As alluded to in the first principle, domain teams aren’t just responsible for the data; they are also responsible for the resulting data products. And data products need to be treated like any other product. Data products need to be discovered and usable by consumers and other domain teams, and the domain owner is responsible for maintaining and updating (or deprecating) these products to ensure quality and accuracy. What can this look like in practice? Imagine a supply chain team creating an inventory data product that a marketing team can tap into to develop new discount campaigns or that can be used by regional teams for placing new orders.
Principle 3: Self-service infrastructure as a platform
The third principle is to make all this self-service and easy for the domain teams. Complex technologies and niche skills are simply not sustainable in a data mesh design. There needs to be a common platform and set of tools that any domain team can tap into at any time to build and serve their data products, without getting bogged down in infrastructure maintenance or resource limitations.
Principle 4: Federated governance
The final piece of a successful data mesh is governance. A data mesh architecture cannot come at the expense of access controls and data protections. There needs to be a balance between having global governance policies and controls, and ensuring each domain team maintains the ability to define and implement these policies when developing and sharing their data products. This federated governance is critical not only for ensuring data privacy and compliance but also for aiding discovery at scale.
Data Mesh Success with Snowflake
Connecting organizations and data teams to the most relevant data when they need it, without silos or complexity, is what the Snowflake Data Cloud is designed to do. How does it deliver on that? It’s backed by Snowflake’s platform, which is uniquely built for performance at scale, ease of use, and governed data sharing and collaboration; and it’s well suited to support both the centralized standards and the decentralized ownership necessary for a successful data mesh deployment.
Delivering self-service infrastructure as a platform
Building a self-service infrastructure is the most obvious data mesh principle where the right technology can help. It’s critical that domain teams can access the resources and tools they need on demand to support them at every stage of the data product lifecycle—from accessing the right data, to processing and preparing it, to analyzing it or creating models. Snowflake’s platform gives these teams a one-stop shop, while supporting a wide range of skills.
With Snowflake’s elastic performance engine, domain teams can power large-scale pipelines, ad hoc exploration, BI reporting, feature engineering, interactive applications, and much more, all with a single engine. This allows organizations to simplify their architectures without sacrificing speed or usability. Regardless of whether the teams work in SQL, code (such as Java, Scala, or Python), or a mix, the performance engine supports them all the same. And with elastic scalability and isolated multi-cluster compute, each domain team gets access to the dedicated resources they need without impacting performance or concurrency for other teams.
Delivering domain-driven ownership and data as a product
This last concept of scalable, dedicated resources has allowed Snowflake customers to implement a distributed domain-driven design logically, while maintaining a standard central platform backing it all. This central platform can incorporate a wide range of data types and file formats, and even support access to external data for comprehensive coverage of the data landscape. And as a fully managed service with built-in automations, the central Snowflake platform makes it easy for domain teams to self-serve. IT teams don’t need to worry about provisioning, maintenance, upgrades, or downtimes. And domain teams operate as distinct units that can scale to practically any number of users who can work with virtually any amount of data on demand, with no infrastructure-expertise or tuning required.
However, even with this design, a data mesh still runs the risk of turning into a bunch of domain silos. And silos are the killer of any organization. This is where Snowflake is especially well suited to help ensure success with a data mesh, enabling the domain teams to seamlessly connect and share data products without copying or ETL-ing it between domain teams.
Leveraging a unique set of technologies called Snowgrid, Snowflake changes what data sharing and collaboration can look like not only within an organization but even with partners and third parties. Through Snowgrid, domain teams can securely share a single copy of data that other domain teams can discover and access immediately—eliminating the need for any ETL. All data is live, with any updates automatically propagated to other teams. Teams can tap into the broad ecosystem of third-party data on Snowflake Data Marketplace to enrich their data products, without lengthy procurement or FTP cycles. And teams are not even limited to just data as the product. They can publish and share pre-developed models or functions as a product, thereby providing additional value by sharing their expertise with other domain teams.
What’s especially powerful is that Snowgrid spans globally, seamlessly connecting domain teams that may be separated by region or even by cloud. This means organizations can implement a data mesh without needing to standardize on a single cloud vendor or operate with regional silos. Each domain team can operate locally, running on its preferred cloud or region, but all that is obfuscated to these domains. They can share data products as easily with a domain team on the other side of the world as they can with a team in the same office. And the organization can replicate data between clouds or regions to operate without disruption and maintain new levels of business continuity and regulatory protections.
Delivering federated governance
Within Snowgrid are all of the native cross-cloud governance controls that act as the foundational building blocks for enabling federated governance. Organizations can strike the right balance between allowing domain owners to easily define and apply their own fine-grained policies and having centrally managed governance processes. Policies can be defined at the data and role level, and they follow the data for consistent enforcement—even as data is shared between clouds, regions, or workloads. Domain teams can discover and query the same data, and their resulting views change based on their role and the data sensitivity, drastically simplifying governance at scale while still allowing teams to get value from their data. Organizations can also integrate these governance controls with their existing governance and catalog standards, such as Alation, to further enhance quality, discoverability, and data protection across their domain teams.
At Snowflake, we have helped customers break apart their monolithic approaches to data and move to a more fluid and dynamic model of connecting teams with the right data they need as they need it, while removing the technical barriers to entry. As one of our customers put it, “With Snowflake, DPG Media can focus more on enabling their domains in using the platform than on keeping it running.” (For more details on how DPG Media implemented a data mesh on Snowflake, check out their posts: Data Mesh — A Self-service Infrastructure at DPG Media with Snowflake4 and Data mesh at DPG Media.5) Are you embarking on a data mesh journey? Let us know! We’d love to hear about your experience.