Today, organizations need data and analytics insights faster, with better quality, and with more resilience to business dynamics. It is no surprise that data and analytics leaders turn to DataOps for an agile and collaborative framework for managing data.
Gartner®, defines DataOps as a « a collaborative data management practice focused on improving the communication, integration, and automation of data flows between data managers and data consumers across an organization. »1 Today, enterprises are looking to implement DataOps platforms and solutions that are scalable, cost-effective, and easy to manage. One such platform is LTI Mosaic Decisions, enabling businesses to use collected data, develop independent lifecycles of data products and build the foundation to generate actionable insights.
Data processing at speed and scale is the core of any DataOps platform. Given the varied choices for data processing engines, there is a need to thoroughly evaluate each option and select the one closely aligned to your business objectives. For enabling businesses to choose the most effective data processing engine for their unique needs, LTI recently conducted a detailed study comparing the capabilities of two popular choices: Snowflake and Apache Spark.2 For this comparative study, LTI used LTI’s Mosaic Decisions as the DataOps platform. The study revealed some fascinating insights across different parameters:
- Performance: Snowflake offers a data processing capacity that is typically 200% of the Apache Spark analytics engine. In terms of performance and TCO, Snowflake runs faster and outperforms Spark by a significant margin across the ETL cycle. Assuming that its other features align with your business needs, Snowflake becomes a natural and preferred choice to be integrated and used with Mosaic Decisions.
- Agility: Given that it is a true SaaS solution, Snowflake is simple to get started with; requires no hardware or software to install, configure, and manage; and even takes care of the maintenance operations for its components. On the other hand, Spark is a technology built for analytics experts and could prove a challenge for less tech-savvy users. Additionally, data pipelines executed on a Spark cluster took about five minutes to start running, thus delaying the overall processing, whereas on Snowflake all data executions began instantaneously.
- Stability: Some job failures due to memory or other issues that are harder to debug and conduct root cause analysis (RCA) on were observed when Spark was used. For the Snowflake execution, on the other hand, not a single job failure was registered.
- Ease-of-use: Organizations have realized that it is imperative to enable strategic investments in data solutions that are scalable, cost-effective, and easy to manage. Many parameters need to be configured to extract performance from Spark, whereas for Snowflake, everything simply works out of the box.
- Concurrency: When there are too many concurrent users, it becomes necessary for the system to scale up to cater to the users’ needs. Here both Spark interactive clusters as well as Snowflake virtual warehouses offer an auto-scaling capability. However, Snowflake performed 3x better even when using only 25% of the resources while the Spark cluster struggled to manage 100+ concurrent users.
Observations, Insights, and Recommendation
The combination of LTI Mosaic Decisions and Snowflake is a win-win solution for enterprises, as it harnesses and complements the capabilities of each product. LTI Mosaic Decisions comes with out-of-the-box support for Snowflake’s platform and Mosaic Decisions’ architecture puts Snowflake’s architecture to its best use by taking advantage of its high scalability and performance. Hence, Snowflake’s highly performance-oriented architecture is fully leveraged.
Mosaic Decisions supports cloud-native pushdown data transformations for Snowflake. This enables it to take advantage of procedures already stored in Snowflake, as well as supporting future requirements to integrate easily with new products and services such as Snowpark. Mosaic Decisions has features natively built for the easy configuration of Snowflake, and an application built specifically for any Snowflake warehouse execution can be managed easily. It also provides drag-and-drop features for the easy creation of Snowflake workloads.
In summary, while LTI’s Mosaic Decisions platform supports both Snowflake and Spark, the combination of Snowflake and Mosaic Decisions yields a DataOps platform that is optimized for both time to solution and ROI.To learn more about the study on which this blog is based, please read the LTI white paper Benchmarking Snowflake Versus Spark for Optimized DataOps.
1 Gartner IT Glossary, “DataOps”, 1 September 2021 [https://www.gartner.com/en/information-technology/glossary/dataops]. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission