a man pointing with a pen at code on monitor

Spark to Snowpark

for Data Engineering

Migrate Spark pipelines with minimal code changes and reduce operational overhead with an elastic processing engine that natively supports Python, Java, Scala, and SQL.

Code Like PySpark.

Execute Faster.

Develop data transformations and custom business logic from your integrated development environment or notebook of choice using the Snowpark Client Library. Push down operations to Snowflake's engine for elastic, performant, and governed processing.

snowpark diagram

Snowpark

Overview

DataFrame API

Build queries using Spark-like DataFrames from your integrated development environment  of choice and push down processing to Snowflake’s elastic processing engine.

PySpark Function Parity »

DataFrames Cheat Sheet »

User Defined Functions

Run custom logic written in Python or Java to run directly in Snowflake using User Defined Functions (UDFs) that can be migrated from Spark with minimal code changes.

PySpark UDF Code Comparison »

Scalar Versus Vectorized UDFs »

Stored Procedures

Operationalize and orchestrate your pipelines and custom logic directly inside Snowflake and make it accessible to your SQL users.

Guide to Python-Stored Procedures »

UDFs Versus Stored Procedures »

Snowflake Platform for

Multiple Languages

Snowflake’s unique multi-cluster shared data architecture powers the performance, elasticity, and governance of Snowpark.

Hear from

Snowpark Developers

Customers are using familiar programming in Snowpark to build scalable and governed data pipelines.

LANGUAGE OF CHOICE

“Snowpark enables us to accelerate development while reducing costs associated with data movement and running separate environments for SQL and Python.”

—Head of Data Engineering and ML, HyperFinity

STREAMLINED ARCHITECTURE 

“UDFs bring simplicity, because a lot of processing that was previously in Spark is now able to be coded to a UDF and can be easily made accessible for execution as part of a SQL statement.”

—Sr. Director Clinical Data Analytics, IQVIA

ELASTIC SCALABILITY

“With our previous Spark-based platforms, there came a point where it would be difficult to scale, and we were missing our load SLAs. With Snowflake, the split between compute and storage makes it much easier. We haven’t missed an SLA since migrating.”

—Senior Manager of Data Platforms, EDF