In Snowflake’s 6 Data Science Trends in 2023, one of the more prominent trends we identified was the emerging use of unified tools and infrastructure for SQL and Python. According to a recent article by McKinsey, huge investments have been made in data science, AI, and machine learning (ML), driven by the promise of higher financial returns, more efficient processes, and greater overall business resilience. Adoption of AI has more than doubled since 2017, according to McKinsey’s Global Survey on AI, with companies that make greater investments in AI pulling ahead of their competitors.
With an increase in demand for data science and AI/ML, companies have modernized their architecture by migrating from legacy on-premises data warehouse to the cloud to scale growth. Many started by copying their data into cloud object stores and investing in separate processing infrastructures and tooling for programming languages like Python, SQL, and Java. The gradual approach resulted in complex infrastructure management and limited collaboration across teams.
Challenges with siloed data architecture
SQL and Python are among the most popular languages in the modern data stack for transformations, analysis, and ML. While SQL is the long-standing database language for querying and transforming data, Python has emerged as the preferred programming language for data science and ML. Data scientists and data engineers often need to stitch multiple tools together to finish a single analysis when using multiple languages. Even for those confident in both languages, the friction of having to set up and manage separate compute environments for each language can be both frustrating and time-consuming.
The lack of interoperability has created a profound siloing effect wherein users of one language aren’t able to collaborate on analysis or workflows with users of the other. These challenges are exacerbated by the continued growth and maturity of the data industry. Statista projects that the volume of data generated will reach more than 180 zettabytes by 2025 and the U.S. Bureau of Labor Statistics estimates that the number of data scientist jobs is expected to increase by 36% between 2021 and 2031. With the explosion in supply and demand for data-driven insights, the challenges from data silos are increasingly painful.
Solutions that unify tools and infrastructure for SQL and Python
These complexities are solved by using Snowflake’s Snowpark, which allows execution of multiple languages on a single platform. Snowpark is a new developer framework for Snowflake that allows data engineers, data scientists, and data developers to write code in their preferred language, and run that code in Snowflake. Supporting interfaces for development in SQL, Python, Java, and Scala, Snowpark allows developers to easily switch between languages to program without moving data or setting up separate clusters.
Tools that unify programming languages are crucial for continued growth through collaboration. Data engineers, scientists, and analysts no longer have to work in isolation for lack of a shared language; they can work together on a single platform to move from raw data to insight. This knowledge-sharing ultimately creates more agile data engineering and MLprojects with better long-term results.
Snowpark also has native integrations with tools like dbt, Hex, and Dataiku that reduce silos:
dbt: dbt is a data transformation workflow for data teams to follow software engineering best practices such as modularity, portability, CI/CD, and documentation. It supports a SQL-first transformation workflow, and in 2022, dbt introduced Python as a second language using Snowpark under the hood to meet the growing demand for seamless solutions to work between languages on the same project. Learn how to get started with dbt and Snowpark.
Hex: Hex is a modern platform for analytics and data science that makes it easy to connect to data, analyze it in collaborative SQL and Python-powered notebooks, and share work as interactive data apps and stories. To provide near-unlimited processing scalability, its approach is not to load all data into a notebook but rather to push compute down to the data platform. Hex is integrated with Snowpark and provides users a new and powerful interface for their Snowflake data. Learn how to get started with Hex and Snowpark.
Dataiku: Dataiku is the platform for Everyday AI, enabling data experts and domain experts to work together to build AI into their daily operations. The joint solution with Snowflake provides an easy-to-use, visual interface where coders and non-coders alike can access data in Snowflake, and collaborate to build production-ready data pipelines and data science projects in Snowpark. Learn how to get started with Dataiku and Snowpark.
Ready to learn more?
To learn more about Snowpark, check out the documentation and quickstart guide for product details and step-by-step setup instructions. For any further questions, the community in the Snowflake Forums is a great resource to get answers.