Snowflake connected with Suhas Joshi, Senior Director of Clinical Data Analytics at IQVIA, to learn how the company moved its transformation workloads to Snowflake with Snowpark. Watch the full IQVIA webinar here.
IQVIA is the leading global provider of advanced analytics, technology solutions, and clinical research services to the life sciences industry. IQVIA Technologies, a solution from IQVIA, helps accelerate innovation by integrating industry-leading data and analytics with AI/ML strategies geared toward the ultimate life sciences goal: having a more substantial impact on human health and treatment.
From orchestrated clinical trials to customer engagement, safety regulatory compliance, and more, IQVIA Technologies provides smart, modular, and interoperable solutions for the life sciences industry.
Meeting the data needs of increasingly sophisticated clinical trials
Modern clinical trials are increasingly sophisticated and produce massive amounts of data. Trials can be global, spanning states and countries, and require data collection across multiple sources, languages, and data types, including unstructured data (for example, audio and images). With technological advancements in medical devices, there’s a massive onslaught of data—some trials have data that must be collected 24/7, such as continuous blood glucose monitors.
The cure to the diseases these organizations are researching is in the data. But many organizations in the life sciences industry spend more time managing and aggregating data, and less time getting insights from analysis. “Our goal is to maximize the value of this data and not spend time managing it. We want to bring in AI and machine learning (ML) solutions that can provide more insights and solve problems,” said Joshi.
Leveraging data during clinical drug development
Clinical Data Analytics Suite (CDAS) is IQVIA’s foundational data platform for clinical R&D, that aggregates data across the clinical trial lifecycles seen in Figure 1, helping with interoperability across all the clinical phases and supporting business users who need to better understand clinical drug development from end-to-end.
According to Joshi, “Our goal for CDAS is to bring all this data together in near real time to enable data driven decision-making at people’s fingertips. But because we’re dealing with large volumes of heterogeneous data coming from many different sources across the globe, this gets very complex.”
Many individual applications, internal and external, are used in managing different aspects of the clinical trial to account for operational needs, data sensitivity, and regulatory requirements. This creates data silos and redundancy, and requires separate excessive infrastructure to consolidate, integrate, and provide a unified view of the trial data.
Building CDAS on top of Snowflake
Snowflake made it possible to centralize all data types, elastically scale, and allow effortless data sharing in a governed way. All of CDAS’ intelligent applications are now supported in a serverless architecture within Snowflake, so business users have an end-to-end view of the drug development lifecycle.
“Having the ability to monitor and track costs for chargeback is very important,” said Joshi. “Snowflake provides a great way to do this, both in how the infrastructure can be deployed to support individual teams and also with additional built-in monitoring.”
Prior to Snowflake, IQVIA stitched multiple services together to model and process their complex data sets. The architecture was heavily reliant on Spark for processing, requiring manual operations and limiting the number of workloads the platform could handle. “Our approach is to continue to enhance and take advantage of native Snowflake features,” said Joshi. “The compute has moved to Snowflake, and our intelligent applications are moving to Snowpark as well.”
With this powerful, simplified architecture in the Snowflake Data Cloud, it was also important for IQVIA to have high availability and disaster recovery (HADR). The clinical industry has systems that directly impact patient safety, so there needs to be redundancy within each of the systems. Building its platform on Snowflake means that HADR is available out-of-the-box and Joshi’s team is configuring CDAS to be available across cloud providers. Their next project will lean into Snowflake’s multi-cloud compatibility. “We want to be cloud agnostic, and Snowflake enables us to meet our vendors where they are without having to complicate our internal Azure strategy,” said Joshi.
Snowpark reduces platform complexity
All of the intelligent applications sitting on top of CDAS have their own workflows, generating a lot of data that traditionally requires multiple levels of infrastructure for end-to-end support. With Snowpark, developers can execute Python, Java, or Scala custom functions to build powerful and efficient pipelines, ML workflows, and data applications. Snowpark allows easy development, migration, deployment, and execution within these intelligent applications in a serverless manner.
“Before, we had to move the data for processing with other languages and then bring results back to make those accessible. Now with Snowpark, we are bringing the processing to the data, streamlining our architecture and making our data engineering pipelines and intelligent applications more cost effective with processing happening within Snowflake, our one single platform.”Suhas Joshi, IQVIA
Benefits that IQVIA has experienced from Snowflake:
- A single place for governed access to data
- Improved performance for near real-time data refreshes
- Improved data readability and accessibility for business users with traditional RDBMS/SQL skills; easier adoption, onboarding, and upskilling of resources
- Powerful built-in SQL functions and capabilities such as Streams, Time Travel, and User Defined Functions (UDFs) bring code simplicity and reusability
- Extensibility for custom code in programming languages such as Java and Python with Snowpark has reduced the platform complexity by eliminating the need for separate infrastructure
“UDFs bring a lot of simplicity, because a lot of Java processing that was previously in Spark is now able to be coded to a UDF and can be easily made accessible for execution as part of a SQL statement. We’re able to have a lot of portability and bring simplicity to our workflows.”Suhas Joshi, IQVIA
Professional Services gives predictability to migration
The Snowflake Professional Services (PS) team was brought in to ensure a smooth migration from Spark to Snowflake for the CDAS platform. The PS team ran exploratory code libraries to determine the portability and scope of the migration project, and with strong alignment from IQVIA’s engineering team, completed code conversion to Snowpark within two weeks. “A lot of code that was done in Spark could be simplified to a UDF,” said Joshi. “Collaborating with PS helped us solve business problems and gave us many learnings. I’d highly recommend others to take advantage of their services, as it gives predictability to the migration.”
“We bring in the expertise of business use cases. And pairing that with Snowflake PS, who has the technology expertise, enables us to figure out how it can be done better and reduces complexity,” said Joshi. For example, IQVIA had an entire algorithm to generate hash keys, and what previously were lines of code that were repeatedly called in from many places is now just a Snowflake function.
A healthy future ahead
As IQVIA continues to advance its CDAS platform with more sophisticated clinical trial data on the Snowflake Data Cloud, Snowpark will play an important role to support AI and ML solutions that are geared toward improving human health and treatment.
“We are moving more into Snowflake and will continue to do so.”Suhas Joshi, IQVIA