Data Mesh for Self-Service Data

Data mesh is a decentralized data organizational approach that relieves many of the growing pains that occur when an organization sets out to become more data-driven. 

  • Overview
  • What Is a Data Mesh?
  • 4 Core Principles of a Data Mesh Approach
  • Resources

Overview

Today’s organizations are generating more and more data from growing numbers of sources. Many have struggled to keep pace as they seek to get more value from their data, more quickly. One solution to emerge in recent years is the data mesh. This decentralized data organizational approach relieves many of the growing pains that occur when an organization sets out to become more data-driven. 

Let’s explore what a data mesh is and how using this approach can solve many of the barriers to implementing a successful self-service data program at scale. 

What is a data mesh?

A data mesh emphasizes a domain-oriented, self-service design for data management. It offers a new approach to organizing data teams, addressing key challenges in scaling centralized data architectures such as data warehouses and data lakes.

In a data mesh, teams actively manage the data within their specific business domains. These teams also build and maintain pipelines that deliver data products to consumers throughout the organization. Each domain data team independently handles the consumption, storage, transformation and output of its own data. This autonomy rests on a strong commitment to universal governance standards, which ensure consistent interoperability and data standards across all domains and data products.

4 core principles of a data mesh approach

The data mesh approach represents a major paradigm shift, and successful implementation relies on four guiding principles. 

1. Domain-driven ownership: Traditionally, a centralized data warehouse architecture vests data ownership in the data warehouse team. The data mesh approach instead transfers data ownership to domain teams. These teams now ingest, clean, transform, manage and govern data to create finished data products, which they readily share with other teams as needed. This structure works because domain teams possess the deepest knowledge of their business area's data and, therefore, manage it most efficiently. Consequently, placing data ownership with domain teams boosts organizational data agility.

2. Data as a product: Organizations should view data as "products" and the people who use them as "customers" to help drive a more user-centric and value-driven approach to data management. Domain teams not only create these products but also maintain them to ensure accuracy, currency and high quality. 

3. Self-service infrastructure: A successful data mesh approach relies on a common platform and user-friendly tool set, accessible even to those without a technical data infrastructure background. Domain teams must independently build and maintain their data products. Without a self-service infrastructure, domain teams must depend on limited infrastructure resources and lack the tools to truly own their data.

A data mesh scales more effectively than a traditional framework because it does not require a centralized data engineering team to possess complete domain knowledge. Domain teams contribute their expertise. This decentralized approach facilitates rapid scaling and enables quick access to actionable data.

4. Federated governance: Maintaining consistent access controls and data protections remains crucial in a decentralized data mesh approach. In the traditional, centralized approach, data warehouse teams bear responsibility for data quality. This arrangement creates problems because these teams often lack the source teams' data familiarity. Shifting to a decentralized data mesh improves data quality by placing responsibility for maintaining high-quality data with those most familiar with it.

Federated governance establishes metadata and documentation standards that each domain applies to its data products. Governance also ensures seamless integration of data products from different domains. Striking a balance between upholding global governance policy standards and granting individual domain teams the freedom to interpret and implement these standards when creating and sharing their data products is essential.