DOCS
Documentation
Feature documentation for Snowpark Connect for Apache Spark™.
Feature
Choose Snowpark Connect for Apache Spark™ to get faster performance and lower costs without operational overhead.
Your complex Spark workloads execute on an average 5.1x faster# than managed Spark providers with a vectorized engine purpose-built for scale.
Skip cluster provisioning and avoid data movement costs with a fully managed environment.
Free your team from the burden of provisioning and tuning Spark clusters. Focus engineering capacity on building high-value data products instead of managing infrastructure.
Benefits
Run Spark natively
Snowpark Connect uses the open-source Spark Connect protocol to push workloads natively within Snowflake, achieving on an average 42% savings on complex ETL tasks and 5.1x faster# performance while maintaining your existing Spark code.


Bridge existing workloads
If your workloads require external Spark environments or existing APIs (including RDDs and MLlib), the Snowflake Connector for Spark provides a high-performance bridge. Snowflake security and governance controls still apply for the data transfer.
Use your tools
Connect your Spark client from favorite environments like Jupyter Notebooks, VS Code and Apache Airflow™ to run Spark jobs.


Process data in place
Avoid costly data movement and egress fees.
Apply unified governance controls once across your entire data lifecycle.
Snowpark Connect for Apache Spark
Find answers to common questions about Snowpark Connect for Apache Spark and how it helps run your Spark workloads on Snowflake.
Snowpark Connect allows you to use Spark Clients (like PySpark) to connect to Snowflake to run modern Apache Spark DataFrame, Spark SQL and UDF code directly with the Snowflake engine. This reduces the overhead of maintaining separate Spark environments.
Snowpark Connect is a managed compute offering that executes all operations within the Snowflake engine via query pushdown, eliminating the need to provision a separate Spark cluster, data movement and associated egress/ingress costs. The Spark Connector requires a separate Spark cluster, involves data transfer and can only push down a subset of Spark SQL operations.
Snowpark Connect can read and write common file formats like CSV, JSON and Parquet. It supports data directly within Snowflake native tables as well as in an open lakehouse via Snowflake-managed and externally managed Apache Iceberg™ Tables.
Snowpark Connect is built on the open-source Spark Connect protocol, which separates the client from the execution engine. Snowpark Connect uses a lightweight Spark Connect server to parse the Spark logical plan and then pushes the entire workload down to the Snowflake Vectorized Engine for execution. This means you do not run a Spark cluster; all computation happens within Snowflake.
Most code centered around Dataframe operations should work by repointing the session to Snowflake. You can use the Snowpark Migration Accelerator (SMA) to learn more about the compatibility of any size codebase.
Customers migrating Spark workloads to Snowflake have seen, on average, 5.1x faster performance and 42% cost savings.