Skip to content
  • AT SNOWFLAKE
  • Industry solutions
  • Partner & Customer Value
  • Product & Technology
  • Strategy & Insights
Languages
  • Deutsch
  • Français
  • Português
  • Español
  • English
  • Italiano
  • 日本語
  • 한국어
  • Deutsch
  • Français
  • Português
  • Español
  • English
  • Italiano
  • 日本語
  • 한국어
  • AT SNOWFLAKE
  • Industry solutions
  • Partner & Customer Value
  • Product & Technology
  • Strategy & Insights
  • Deutsch
  • Français
  • Português
  • Español
  • English
  • Italiano
  • 日本語
  • 한국어
  • 개요
    • Why Snowflake
    • 고객 사례
    • 파트너 네트워크
    • 서비스
  • 데이터 클라우드
    • 데이터 클라우드
    • 플랫폼 개요
    • SNOWFLAKE 데이터 마켓플레이스
    • Powered by Snowflake
    • 라이브 데모
  • WORKLOADS
    • 협업
    • 데이터 사이언스&머신러닝
    • 사이버 보안
    • 애플리케이션
    • 데이터 웨어하우스
    • 데이터 레이크
    • 데이터 엔지니어링
    • 유니스토어
  • PRICING
    • Pricing Options
  • 산업별 솔루션
    • 광고, 미디어 및 엔터테인먼트
    • 금융 서비스
    • 의료 및 생명 과학
    • 제조
    • 공공 부문
    • 소매 / CPG
    • 테크놀로지
  • 리소스
    • 리소스
    • Documentation
    • 핸즈온 랩
    • 트레이닝
  • CONNECT
    • Snowflake 블로그
    • 커뮤니티
    • 이벤트
    • 웨비나
    • 팟캐스트
  • 개요
    • 회사 소개
    • 투자정보
    • 리더십 및 이사회
    • 채용
Author
Ripu Jain Ripu Jain
Anders Swanson Anders Swanson
Share
Subscribe
2022년 10월 07일

Upgrade to the Modern Analytics Stack: Doing More with Snowpark, dbt, and Python

  • 파트너 및 고객 가치
    • 파트너 관점
Upgrade to the Modern Analytics Stack: Doing More with Snowpark, dbt, and Python

A large number of organizations are already using Snowflake and dbt, the open source data transformation workflow maintained by dbt Labs, together in production. Python is the latest frontier in our collaboration. This article describes some of what’s made possible by dbt and Snowpark for Python (in public preview).

The problem at hand

Building data applications (inclusive of visualization, machine learning (ML) apps, internal/external business apps, and monetizable data assets) has traditionally required teams to export data out of their analytical store due to a language/tooling preference or limitations of SQL. While users get access to bespoke tooling, improved productivity, and well-documented design patterns for individual personas and scenarios, it also created a situation of:

  • More data silos introduced by different tools and processes in the mix
  • Increased maintenance and operating costs because of complicated architecture
  • Increased security risks introduced by data movement

But what if this wasn’t the case? What if there was a way to address the challenges without sacrificing the advantages?

Introducing our players

In order to solve the problem at hand, we need to examine who are the key players most affected:

  • An analytics engineer who may occasionally reach for the Python wrench, for example, using a popular fuzzy string matching library vs. rolling your own implementation in SQL (keep reading, demo below). 
  • A Python-preferring data scientist and ML engineer deploying ML capabilities (featurization, scoring, training) who is expected to have SQL skills in order to access the enriched, transformed, trusted data from Snowflake. 

These players haven’t been properly equipped in the past. When our analytics engineer tries to use Python, they are faced with the challenge of having two data processes to manage. On the flip side, the data scientist and ML engineer don’t easily collaborate with the analytics engineer, who has already transformed the data into the data cloud, which means duplicating existing work. 

But what if I told you that Snowflake and dbt Labs can help with this conundrum, and deliver data products with improved productivity, without the issues we described earlier?

Enter Snowpark!

Snowpark is a data programmability framework to explore and transform your organization’s data and leverages Snowflake for data processing, while employing all the benefits that come along with it, such as enterprise-grade governance and security, near-zero infrastructure maintenance, and monetization opportunities. 

Snowpark for Python recently became available for public preview, and the use cases it enables are almost limitless, especially for data scientists and ML engineers—from feature engineering to training to serving batch inference. 

But what exactly comprises Snowpark for Python? Watch this video for a helpful explanation, which is illustrated in the image below. Hint:*

  • A client-side API to allow users to write Spark-like Python code
  • Custom Python Functions and Objects support that can run Python libraries available through the Anaconda integration
  • Stored Procedure support providing additional capabilities for compute pushdown

Organizations across industries are putting their data to use, leveraging Snowpark for Python for data science and ML workloads, and solving a number of unique business use cases.

Now, as awesome as Snowpark for Python itself is, its usefulness gets boosted when partners like dbt leverage Snowpark to allow data teams to unify data pipelines for both analytics and ML use cases. In fact, dbt Core’s most asked-for feature was support for Python models in the DAG. 

Enter dbt Python models

As dbt Labs CEO Tristan Handy notes in his recent post, Polyglot Pipelines: Why dbt and Python Were Always Meant to Be: 

[In July 2017] I wrote ‘we’re excited to support languages beyond SQL once they meet the same bar for user experience that SQL provides today.’ And over the past five years, that’s happened.

dbt-labs/new-python-wrench-demo serves to illustrate that Python has arrived with a great user experience. The made-up data, from a fictional “fruit purchasing” app, was created to illustrate a sample use case of when fuzzy string matching can be useful for an analytics engineer. Below are two video walkthroughs of the background, business problem, and code. If you already have a Snowflake account and a dbt project, you can also run this code today. Be sure to open an issue on the repo if you run into trouble.

  • Python wrench I: Intro & background

Taking the next step

If you’re interested in diving deeper into how to get the most out of dbt and Snowpark, then you won’t want to miss dbt’s Coalesce Conference 2022, starting October 17 in New Orleans (as well as virtually.) Expect to see talks such as this one, in which Eda and Venkatesh, Snowflake Partner Sales Engineers, explore how Snowpark further enhances a dbt + Snowflake development experience by supporting new workloads.

If you’d prefer to sink your teeth into something immediately, then the “Getting Started with Snowpark Python” hands-on guide and Eda’s blog post taking a first look at dbt Python models on Snowpark are fantastic resources.
All of this is just scratching the surface of the value created by dbt and Snowpark with Python. Where does it ultimately lead? Toward a future with fewer silos between the people working on analytics workflows and the people working on data science workflows—and we couldn’t be more excited for it.

Share

Related Content

  • 제품 및 기술
    • 데이터 사이언스
2022년 07월 24일

Snowpark Python: 데이터 클라우드에 엔터프라이즈급 Python 혁신을 가져오다

기쁘게도 Python용 Snowpark가 이제 모든 Snowflake 고객에게 공개 미리 보기로 제공됩니다. Python 개발자는 이제 Snowpark의 엔진에 기본적으로 통합된 Java 및 Scala를 포함해, 널리 사용되는 다른 Snowpark 언어들과 동일한 수준의 사용…

Have a look
Read More
  • 제품 및 기술
    • 데이터 사이언스
2022년 08월 18일

Build Your Code in Snowflake Using Snowpark and Your Favorite Notebook

One of the biggest announcements during Snowflake Summit 2022 was Snowpark for Python reaching public…

Full Details
Read More
  • 제품 및 기술
  • 회사에 관한 정보
    • 이벤트
2021년 06월 15일

Welcome to Snowpark: New Data Programmability for the Data Cloud

At Snowflake Summit 2021, we announced that Snowpark and Java functions were starting to roll…

More Details
Read More

Join the live demo: 

Building the Future of Data Science with Python

Data Science Platforms

Data science platforms enable new technologies and data science innovation.

Learn More
Read More

Snowflake + Fivetran + dbt: Turn Your Marketing Data Silos into...

The solution to marketing problems is not always the newest silo'd SaaS app. See how Snowflake + Fivetran + dbt turn data...

Explore
Read More

Database Normalization for Faster Data Science

Database Normalization: benefits and methods explained. Learn how normalizing data optimizes data science processes.

More Details
Read More

Snowflake Invests In dbt Labs, Cementing Our Partnership And Paving...

At Snowflake, we aim to provide the best possible experience for data professionals in the Data Cloud, which is why we’re...

Full Details
Read More
Snowflake Inc.
  • 플랫폼 개요
    • 아키텍처
    • 데이터 애플리케이션
  • 데이터 마켓플레이스
  • Snowflake 파트너 네트워크
  • 지원 및 서비스
  • 회사
    • 문의하기

Sign up for Snowflake Communications

Thanks for signing up!

  • Privacy Notice
  • Site Terms
  • Cookie Settings

© 2023 Snowflake Inc. All Rights Reserved