Digital transformation is driving a data explosion, and it’s having an especially dramatic effect in the healthcare industry. From wearable devices and telemedicine to prescription refills and patient outcomes, the amount of life sciences data being collected—and being made available to patients, partners, labs, service providers, and other stakeholders—is exploding. Processes like research and development, manufacturing, and distribution are becoming more automated and more intelligent as companies apply AI. The COVID-19 pandemic put diagnostics in the spotlight for the past two years, with special focus on fast, accurate screening, testing, and monitoring. All of this creates a huge amount of data—and huge demand for that data.
So how does an organization avoid the pitfalls of data duplication, processing bottlenecks, and inaccessible resources to ensure data teams can scale and consistently deliver real value to the business? For Roche Diagnostics, a division of pharmaceutical and life sciences giant Roche, the answer involves a new approach: a data mesh.
Data mesh architectures aim to remove barriers associated with scaling data and making it available to users across the organization. This isn’t just a technology solution; it also requires a transformative cultural shift to view data as an asset and open it up to domain teams without burdensome complexity. (Learn more about data mesh in our previous blog.)
Omar Khawaja, Head of BI and Analytics at Roche Diagnostics, and his team have been on a journey to implement data mesh, including self-service data and analytics infrastructure capabilities. We spoke to Omar Khawaja and Paul Rankin, Head of Data Management and Architecture at Roche Diagnostics, about what attracted them to a data mesh, why it made sense for Roche, the role of Snowflake in their framework, and the lessons they learned along the way.
How did you get interested in data mesh?
Omar Khawaja: When I joined Roche Diagnostics, my first deliverable was the BI & Analytics strategy. Part of the strategic priorities was (and is) to modernize our technology and architecture landscape. After trying all the traditional approaches, we hadn’t yet achieved the true value of data or we had delivered [but] with limited success. And it made me rethink how to execute on this strategy, how to address the challenges in a way that’s scalable and also fits the decentralized and empowering culture at Roche.
Khawaja reached out to Zhamak Dehghani, who pioneered the data mesh methodology, and her Thoughtworks team. The conversations with Zhamak and Thoughtworks turned into a Roche-wide webinar that garnered massive interest. From there, they created a first implementation and learned how to establish the data mesh framework and onboard the teams in a systematic manner.
Khawaja: In February 2021, we properly established the foundations of the data mesh program, building upon our early success, and that’s when we took a more holistic approach for how we will address the challenges. We started with one domain back in May 2021, and as we speak we have 6+ domains already on board, where multiple product teams are working and using the platform.
Why did you choose a data mesh for Roche? What shifts did you make to set up your organization for success?
Khawaja: I have seen firsthand how the number of data people has grown in many companies, and at Roche that’s the case as well. We have not just analysts creating dashboards, but data scientists and data engineers within business and IT teams. That’s where the product thinking came in, and that was a good way forward with the majority of the team so far.
Paul Rankin: People [here] are now starting to think about data products. Six months ago, maybe even a year ago, everyone was thinking dashboards and data sets, exactly what a data lake brings. You have to build up a level of data-centric maturity and understanding about data products before you can even think about implementing this methodology.
What are your guiding principles for the design and technology of a data mesh?
Khawaja: There are four principles of data mesh, and I have started to call them the mind, the heart, the soul, and the body of the data. The “soul” is your first principle: domain-driven design.
The “heart” of the data mesh is the “data as a product” principle, which introduces a lot of change in project thinking versus product thinking. It’s the modern way of working as one team and bringing that concept of DevOps/DataOps to life. It’s how to co-create things, take ownership of the data, and not think that data is somebody else’s responsibility. That’s the biggest mindset change we need. This is where we focus on value creation for our customers and end users.
The third one, the self-service data analytics infrastructure, is what I call the “body” of data mesh—like a skeleton. This is where we took a very different approach of developing the platform from the angle of capabilities and plugging in the technologies that support those capabilities. Snowflake fulfills a number of those capabilities, together with an ecosystem of tool chains.
Last but not least, when you have this amount of decentralization, you need governance. This is the “mind” of data mesh, where each data domain becomes part of the solution thanks to the federation and the computational guidelines, enabled by as much automation as possible. And if you don’t have that automation enabled by a data infrastructure, I mean, this will be chaos, right?
What are some challenges you encountered with the data mesh methodology?
Khawaja: Let’s tackle it by those four principles, because each of them has a different challenge. Let’s start with data domains. Before I joined Roche, our data team was already thinking about what the data strategy would be. The approach they’d taken was very similar to data mesh, and the concept of data domains was introduced there. Together with the leadership team of business and IT, some of the domains concept was already created, driven by the fact that you define domains based on where the processes’ boundaries are, not 100% based on our functional structure.
This was a good start [but] we still have some areas to address. Let’s say there was 90% alignment on the definitions and a gray area for 10%. Once we have established [the definitions], we are learning to implement them and we can adjust them as well. At the end of the day, it’s a matter of making sure there is a concept of ownership close to where the data is being generated in the source system.
The next principle, data as a product—this is easy and this is very hard. We are essentially talking about end-to-end ownership of the things within the data. It’s not just about that dashboard. It’s not about Snowflake, even. Your dashboards or your use cases cannot be just mapped one-to-one to data products. There’s a step-by-step process to actually do it.
And then the third, the body of this platform. That’s where the change happens, in how you build the platform. You build it from the perspective of “the platform is also a product.” Traditionally the platform was built by central teams [and] the pipelines were built by central teams, and, as a result, the teams became a bottleneck despite their sincere intentions.
Now we have to create a platform to enable the product team, empower them, and get out of their way. Keep in mind that the data product teams, which comprise people in business and IT and vendors, may not have the skill level of traditional [product] teams. So you need to choose something that supports that variety of users, and so that it’s not a huge learning curve for the teams to work.
Lastly, the federated computational governance. People have heard of governance, and they translate governance into “bureaucracy” 90% of the time. That’s when you say “it’s federated governance.” That’s how you bring people to the table, by telling them they are now part of the decision-making, and this has been a buy-in factor in some cases. Federated governance brings in policies, procedures, and standardization, some of it enforced by IT, some of it enforced by business. And of course that computational part, the “mind” part of the data mesh. You enable these many controls and policies by design through automation. So when we talk about enforcing things around data masking or PII, or automating deployments or making sure a pipeline is deployed—whatever data product shape and form it is, your metadata is flowing into that catalog by design and not by chance.
What makes Snowflake the right choice for your data mesh?
Khawaja: It works—it’s easy to use, simple. You can bring your data and start using it. You can access it from anywhere. When you are coming from an on-premises world, you are just amazed to see there is no such thing as performance challenges any more. It’s a matter of if you can afford the performance that you need, and that’s a different question to consider.
Roche has a very decentralized culture. We believe in empowering the people on the ground, the people in the countries. From the data mesh perspective, when we’re talking about decentralization, we are talking about people on all spectrums of their journey and skill set coming in. We need to have something that can truly scale across the organization. If you can enable the teams easily, without complex security setups and limiting boundaries, and onboard them—that’s what makes it relatively easy for the product teams to work.
We are getting a variety of these benefits with Snowflake. Plus data sharing and internal data exchange—even if you go to a level where each domain has their own Snowflake account, the internal data exchange can still enable that reuse and sharing factor without creating crazy duplicate data sets everywhere.
Rankin: One of the most important aspects of Snowflake that absolutely helps us in this data mesh world is zero copy cloning for developers when it comes to CICD release cycles, GitFlow process. Absolutely a game changer. Automatically the developers create their feature branch, and it spins up Zero-Copy Cloning of the production database based on their feature so they can test their features without stepping on anyone’s toes, go straight into production directly after, knock down their cloned database, their feature database, and start again. Amazing.
Khawaja: One thing to watch out for: Snowflake does not equal data mesh, and data mesh does not equal Snowflake. Data mesh is beyond technology, and Snowflake is a key enabler for it. You don’t want to implement data mesh? With Snowflake you can go ahead and implement a data warehouse. Or if you want to do data lakes, Snowflake lets you do it.
What advice do you have for other organizations interested in a data mesh?
Khawaja: I will start with this: Data mesh may not be the solution for every company. That’s a reality, and people need to understand this. If you’re not willing to decentralize, it’s not for you.
Second, data mesh is truly a paradigm shift. It touches people, processes, and technologies. If you are ready to embrace this change in every aspect of data, go for it. Take a leap of faith and empower the data product teams. I highly recommend creating some standard definitions and artifacts that can be reused, and continue building upon them. Make technology choices that enable freedom and borderless collaboration for data product teams.
Want more details about Roche Diagnostics’ data mesh and Data Cloud journey? Watch Omar’s Thoughtworks session on the company’s data mesh implementation, then check out Snowflake’s Rise of the Data Cloud podcast with Omar for a discussion about data sharing and decentralization.