We’ve all experienced the frustration of a bottleneck, like when in line for a chairlift at a ski resort. During the winter holidays, thrill seekers often face an infuriating wait for their ride up the mountain. Some seek new methods to get to the top, hence the rise of ski touring or “uphill skiing.” The mantra is “earn your turns.”
Now think of the growing demand for data and insights. Traditionally, organizations relied on central data teams, often residing within IT, to build data pipelines and deliver data—or even central analytics teams to deliver the insights themselves. This centralization led to bottlenecks and friction-inducing wait times. If it takes weeks or months to provision data or deliver insights, the opportunity is likely long gone, much like the fresh snow that has already been skied on by the time the lift delivers you to the top.
Data mesh offers an answer in four principles
The data mesh paradigm set out to solve some of these issues. The data mesh is based on four basic principles that interact with each other, and support the objectives of the organization:
- Domain ownership: This allows for distribution or decentralization of data ownership. But it’s not a free-for-all. The data mesh paradigm establishes guardrails.
- Self-service infrastructure: This is a common data platform or infrastructure that enables the governance and use of the data. This common platform set prevents tech proliferation.
- Data-as-a-product: This means that data must be treated as a product to be used by others. The data mesh domains don’t keep it sequestered. They make the data available for others, and take responsibility for requirements such as data formats, data quality, data documentation, and frequency of refresh.
- Federated computational governance: An enterprise-wide data governance must be applied by everyone; it should be coordinated centrally but applied and enforced in a distributed manner, keeping in mind the requirements of the data sources, domain owners, and consumers.
Moving from principle to practice: An organizational perspective
While the new paradigm sounds promising in principle, many questions remain about how to make it work in practice. Technology is often said to be the easier part, relatively speaking. None of it is necessarily a walk in the park. Architecture and data modeling decisions can be challenging. But organizational considerations can be particularly thorny. We’ve gathered some of the questions most frequently asked:
What is the best way to introduce a data mesh in an organization?
A data mesh is usually not created in a big bang. Snowflake customer Vio (formerly FindHotel) found its way to a data mesh somewhat unintentionally. As demands began to overwhelm the centralized data team, they explored how to distribute responsibilities. Some potential domains were more mature than others. These domains became the logical place to start. The team embraced a more distributed organization out of pure necessity. They then discovered that what they were trying to achieve was in fact a data mesh. Vio’s journey is described here.
Roche Diagnostics also started small, with a single domain back in May 2021. Less than a year later, it had more than six domains and multiple product teams working on its platform.
For a systematic approach, start with these first steps:
- Assess your organization’s needs and current frustrations: You’ll likely hear about bottlenecks creating delays in the data pipeline, long waits for requested data and insights, and frustrations with data quality. But you might also hear confusion about the goals of the organization and the objectives of the data strategy. Maturity levels differ widely across many organizations.
- Share these findings with key executives and other stakeholders to enlist support: Executive support is often cited as the primary success factor in executing on a data strategy. But don’t forget to build broader support with buy-in from business leaders and analysts across the company.
- Determine skill levels and gaps across teams: Domains will be at different stages. Assess which can be more autonomous first, and how to support others. These domains will likely be your data mesh champions and form the core of a data council that can drive the strategy and work to coordinate requirements and distribute responsibilities. More on the data council below.
- Start with a domain and use case, then iterate: The first use case serves as a proof of concept to demonstrate how a domain can build a data product that can be consumed by others. Ideally, this use case addresses an immediate business need to garner as much support as possible and build momentum. Adopting the model for additional use cases can expand domain ownership and self-service infrastructure access. However, this “start small” approach does not necessarily mean “bottom up.” Executive support is usually required from the start.
Who are the key stakeholders to get on board?
The pressure to “do something” could come from the consumers who tire of waiting for the data and insights they need, or from the central data teams frustrated by the bottleneck they know they’ve become. All actors in the data chain should be involved, from the data source owners through to the consumers. As mentioned, executive support facilitates the changes required. Other considerations include a representative from legal or a compliance function who could (perhaps periodically) provide input into enterprise-wide governance requirements, or someone from Human Resources to help drive a more formal change management process and literacy program across the organization. The cultural shift to becoming a data-driven organization requires quite extensive evangelism. Not everyone needs to be involved in the core process but a broader constituency should be included in communication and outreach.
How do we choose domains and define their scope and responsibilities?
Roche Diagnostics offers some recommendations for mapping out domains. In their initial experience, business and IT leadership had defined domains based on processes boundaries, not necessarily by their functional structure. That made sense, but there still wasn’t full alignment on those definitions. A gray area remained. The advice? “At the end of the day, it’s a matter of making sure there is a concept of ownership close to where the data is being generated in the source system.” The fine-tuning comes through collaboration across stakeholders.
Common considerations for choosing domains include the following:
- Domains should be defined along functional business areas where people have a common functional objective in the organization.
- Domains should not be defined around existing technology stacks to avoid creating technology domains rather than functional data domains.
- Data domains should be chosen with a clearly defined scope of responsibility for data. Companies often strive to avoid overlapping data responsibility between domains.
- Sometimes it can make sense to have source-oriented domains or consumer-oriented domains. The former are responsible for making data from operational data sources available with the quality and standardization that other domains require. The latter create integrated, higher-value data products that are geared towards requirements of the data consumers on the business side; for example, a customer 360 domain aggregating data from multiple sources.
How do you convince a potential domain owner to become a domain owner and be responsible for its data (products)? Some teams are data producers but not data consumers. Will they see the benefit of a data mesh transformation, or only see more work coming to them? How can we get them on board with the data mesh approach and a data product mindset?
Admittedly, I used to think that domains would be clamoring for ownership of their data. But it’s not necessarily the case, particularly with the obligations that the data mesh prescribes. Domain ownership entails responsibility for data quality, data pipelines, and data products. It’s not just a question of securing and storing data. Some domain teams that haven’t traditionally had these more extensive responsibilities have balked at what they see as more work.
The challenge lies in creating an environment conducive to the success of the data owner. First of all, the role must be understood, valued, and rewarded. Behavioral change requires carrots and sticks. Not everyone responds to the same motivations, so multiple methods help drive the desired behaviors. See the next question.
How can we incentivize domains to take more ownership of and responsibility for their data; that is, to ensure data quality not necessarily for their own needs for the needs of other domains or data consumers?
Organizational changes require behavioral changes—a focus on the people and process within the organization. New goals must be established to drive the desired behavior. Both teams and the individuals in them will have new measures of success. Goals will be more focused on the data customer, the use of the data, and the value delivered to the organization.
You need to invest in:
- Enablement – How do I do it?
All teams need to understand what is expected of them. They need to have clear metrics that they can control. They also need to have the right self-service tools to do their job. And, they need to be trained in how to use those tools effectively. But, perhaps more importantly, they need to be trained in how the organization works, how to gather data consumer requirements, and how to collaborate with their peers in other domains to ensure the effective development of new data products that meet the needs of their “customers.”
- Incentives – Why should I do it?
To change behavior, domains need to be motivated to do things differently; that is, take responsibility for the data products. They may know how but they’ll need to be shown what’s in it for them. New incentives can be social in nature: raising awareness of the role teams are playing in delivering new insights to the business, offering enterprise-wide recognition of the value delivered, and clearly showing appreciation for a job well done. Realistically, however, these social incentives should be combined with material incentives both for teams and individuals. Material incentives include team-level grants for new resources, increased headcounts, budgets and promotions, or bonuses for individuals who stand out.
- Enforcement – What happens if I don’t?
Finally, these responsibilities must also come with enforcement—the proverbial stick. This assumes that the requirements were clear and actionable, and the metrics were measurable and reasonable. The first step would be to work with the teams to understand the challenges they face. Perhaps it is a lack of training or the need for new tools. But, ultimately, these new mandates must be enforced across the organization. Socially, other teams will know who is not stepping up. Materially, those who just can’t make the change might be better in another part of the organization. Accountability is key to driving organizational change.
How can we best prevent duplication of work and/or duplication of data across domains?
First, this raises the question whether duplication is always a bad thing. Sometimes duplication can accelerate agility without a loss of consistency. Communication is the key to coordination, and ultimately that’s the only way to control unnecessary duplication across domains. Communication must be multi-directional: top down, bottom up, and peer-to-peer. Here are a few key mechanisms to promote communication and coordination:
- Data leadership facilitates coordination and drives change: Many organizations have appointed data leaders. Titles may vary. Some organizations have appointed a CDO, but others have a Head of Data & Analytics or a VP of Insights. Regardless of their titles, effective data leaders often act as diplomats, working across the organization to define data strategy, develop data culture, and deliver data value. Although the role and priorities of data leaders may change when responsibilities are distributed, there is still a need for diplomacy, as well as coordination and communication to demonstrate the benefits of change and cultivate a shared vision. These data leaders also help put in place the incentives and enforcement mechanisms to drive change.
- A data council enables coordination across stakeholders: A data council brings together representatives of the domains and their consumers, as well as other relevant stakeholders. For example, a representative from legal might be included to ensure regulatory compliance requirements are taken into account. This council then serves as a coordinating body to identify common infrastructure needs, define common data terms and formats, develop enterprise-wide governance policies, and allocate resources to reflect priorities. The result is then implemented at the domain level, as stipulated by the federated governance principle of the data mesh. The data council can also serve as a Center of Excellence or even a service bureau, offering advice and assistance to bring less mature domains and even consumers up to speed.
Which success criteria or KPIs can we use to measure the progress of the organizational transformation towards data mesh?
The ultimate measure of success for data teams is their impact on the business, either in cost reduction or value generated. KPIs along a data mesh journey often focus on the maturity of domains, the effectiveness of federated governance, the value of data products, and the ease of use of a self-service data platform. For example, during the transition of data product responsibilities from a central team to distributed domains, an organization can measure the number of data products that are partially or fully owned locally in the domains. They may also want to track the frequency of support that a domain requires from a central infrastructure team.
As the organization matures, success will shift toward data use and value delivered. Some companies measure data product usage over time in terms of the number of requests for a given product or the frequency of access. While such metrics are extremely valuable, beware that low usage may not always imply low value or low quality of the data product. For example, a domain may produce a high-value, high-priority data product that its consumers need only once per month. Ultimately, the value metric must reflect the contribution to a specific use case; that is, the lift in the conversion rates of those marketing campaigns or in the accuracy of the fraud-detection algorithms.
To ensure continued value delivery, product feedback or ratings from the data consumers can also be a very powerful metric to collect. Other useful metrics can include the improvements in data quality, completeness of data product documentation, the degree of automation, the cost and resiliency of data products, or the time for data consumers to derive value from a data product.
How to scale data mesh as a methodology and mindset across the organization?
Scaling data mesh requires both architectural and organizational considerations. From an organizational perspective, a data mesh—and being data-driven—requires a cultural shift. You’ll be cultivating both sides of the supply and demand for data: data consumers establish the product requirements that the domain teams must satisfy. That’s a tall order.
Developing the new data culture requires communication. It starts with “What is data?” “How can it be used?” “What value does it deliver?” and, most importantly, “What is my role?” Obviously, the consumers and domains need to understand their roles, but so does everyone else in the organization. Everyone has a role in capturing, protecting, or using the data. It’s not just the data teams. But that’s a topic for a different blog post.
From a data mesh perspective, communicating how the distributed, yet coordinated architecture works is critical. Take federated governance, for example. According to Omar Khawaja, Head of Business
Intelligence at Roche Diagnostics, “People have heard of governance, and they translate governance into ‘bureaucracy’ 90% of the time. That’s when you say it’s ‘federated governance.’ That’s how you bring people to the table, by telling them they are now part of the decision-making, and this has been a buy-in factor in some cases.” It’s about explaining the new processes and the new roles. When communicated effectively, the value is understood and buy-in more likely.
There are likely many more questions about how to set up a data mesh effectively. For example, we’ve been asked: How do you convince teams to standardize technology? How do you stop or reverse the proliferation of tools? How can someone estimate costs and benefits? These and other questions will be addressed in a future blog post.
In the meantime, for more on how to use Snowflake for data mesh, check out our website.