Summit 26 from June 1-4 in San Francisco

Lead your organization in the era of agents and enterprise intelligence.

Feature

Apache Sparkon Snowflake

Choose Snowpark Connect for Apache Spark™ to get faster performance and lower costs without operational overhead.

watch the demo
time icon

Run faster workloads at production scale

Your complex Spark workloads execute on an average 5.1x faster# than managed Spark providers with a vectorized engine purpose-built for scale.

cost savings icon

Cut total cost of ownership

Skip cluster provisioning and avoid data movement costs with a fully managed environment.

consolidate icon

Drop the operational overhead

Free your team from the burden of provisioning and tuning Spark clusters. Focus engineering capacity on building high-value data products instead of managing infrastructure.

Benefits

The full power of Snowflake, now for your Apache Spark™ code

Run Spark natively

Accelerate Spark Pipelines with Snowpark Connect

  • Run Spark DataFrames, SQL and UDFs directly on Snowflake’s vectorized engine — no external Spark clusters to provision or manage. 
  • Snowpark Connect uses the open-source Spark Connect protocol to push workloads natively within Snowflake, achieving on an average 42% savings on complex ETL tasks and 5.1x faster# performance while maintaining your existing Spark code.

snowpark connect diagram
snowflake spark interaction diagram

Bridge existing workloads

Connect external Spark clusters

If your workloads require external Spark environments or existing APIs (including RDDs and MLlib), the Snowflake Connector for Spark provides a high-performance bridge. Snowflake security and governance controls still apply for the data transfer.

Use your tools

Work where you already work

snowpark connect for apache spark diagram
enterprise lakehouse

Process data in place

Run Spark wherever your data lives

  • Execute Spark code in Snowflake native tables or Interoperable Lakehouse formats like Apache Iceberg™.
  • Avoid costly data movement and egress fees. 

  • Apply unified governance controls once across your entire data lifecycle.

Snowpark Connect for Apache Spark™ Partners

Global

North America

accenture logo
CAPGEMINI logo
Deloitte logo
BlueCloud Logo
Infostrux logo
Kipi.ai logo
LTI Mindtree Logo
phData logo
Slalom logo
Tredence logo

Snowpark Connect for Apache Spark

Frequently Asked Questions

Find answers to common questions about Snowpark Connect for Apache Spark and how it helps run your Spark workloads on Snowflake.

Snowpark Connect allows you to use Spark Clients (like PySpark) to connect to Snowflake to run modern Apache Spark DataFrame, Spark SQL and UDF code directly with the Snowflake engine. This reduces the overhead of maintaining separate Spark environments.

Snowpark Connect is a managed compute offering that executes all operations within the Snowflake engine via query pushdown, eliminating the need to provision a separate Spark cluster, data movement and associated egress/ingress costs. The Spark Connector requires a separate Spark cluster, involves data transfer and can only push down a subset of Spark SQL operations.

Snowpark Connect can read and write common file formats like CSV, JSON and Parquet. It supports data directly within Snowflake native tables as well as in an open lakehouse via Snowflake-managed and externally managed Apache Iceberg™ Tables.

Snowpark Connect is built on the open-source Spark Connect protocol, which separates the client from the execution engine. Snowpark Connect uses a lightweight Spark Connect server to parse the Spark logical plan and then pushes the entire workload down to the Snowflake Vectorized Engine for execution. This means you do not run a Spark cluster; all computation happens within Snowflake.

Most code centered around Dataframe operations should work by repointing the session to Snowflake. You can use the Snowpark Migration Accelerator (SMA) to learn more about the compatibility of any size codebase.

Customers migrating Spark workloads to Snowflake have seen, on average, 5.1x faster performance and 42% cost savings.

Where Data Does More