Product and Technology

Snowpark-Optimized Warehouses: Production-Ready ML Training and Other Memory-Intensive Operations

Snowpark-Optimized Warehouses: Production-Ready ML Training and Other Memory-Intensive Operations

With Snowpark, our customers have begun to leverage Snowflake for more complex data engineering and data science workloads using languages such as Java and Python. This new wave of developers using Snowflake often requires more flexibility in the underlying compute infrastructure to unlock memory-intensive operations on large data sets such as ML training.

To support these workloads in production, we’re excited to launch Snowpark-optimized warehouses in general availability in all Snowflake regions across AWS, Azure, and GCP. 

Demo: Running 200 forecasts in 10 minutes using XGBoost and Snowpark-optimized warehouses

Snowpark-optimized warehouses have compute nodes with 16x the memory and 10x the local cache compared with standard warehouses. The larger memory helps unlock memory-intensive use cases on large data sets such as ML training, ML inference, data exports from object storage, and other memory-intensive analytics that could not previously be accommodated in standard warehouses. 

As a result, data teams can now run end-to-end ML pipelines in Snowflake in a fully managed manner without having to use additional systems or move data across governance boundaries.


Snowpark-optimized warehouses also inherit all the benefits of Snowflake virtual warehouses:

  • Fully managed: Snowflake oversees the maintenance, security patching, tuning, and delivery of the latest performance enhancements transparently
  • Elastic: Elastic scaling of compute supports virtually any number of users, jobs, or data with multi-tenant security and resource isolation
  • Reliable: Industry-leading SLA is consistently upheld
  • Secure: Governance controls are applied across all workload without trade-offs

Since the new warehouse option was announced in public preview in November 2022, we’ve rolled out performance improvements, increased region availability, and made behind-the-scenes stability improvements.

The 10x larger local cache on each Snowpark-optimized warehouse node helps accelerate subsequent run execution through speedups when cached artifacts (Python packages, JARs, intermediate results, etc.) are reused across runs. With these performance improvements, Snowpark developers continue to get more out of each compute credit and more efficiently process large data sets. We have also invested in improving the performance of the most popular Python libraries by adding Joblib multiprocessing support in Snowpark for Python-stored procedures.

In addition to unlocking single-node ML training use cases, Snowpark-optimized warehouses also include optimizations for multi-node use cases. When UDFs are run on a warehouse with multiple nodes (size L or larger), Snowflake will leverage the full power of the warehouse by parallelizing computations through redistribution of rows between nodes in the warehouse. Statistics on UDF execution progress are used to optimize the distribution of work among compute nodes to optimize parallelism.

Since moving to public preview, we have seen the adoption of a variety of memory-intensive use cases by customers such as Spring Oaks Capital and Innovid.

Customer success stories

Spring Oaks Capital is a national financial technology company that focuses on the acquisition of consumer credit portfolios. The data science team evaluates millions of records to provide predictions that give their team the insights needed to optimize their debt pricing and purchasing strategies. One of their machine learning models runs every morning to provide call centers with prioritized call lists based on expected conversion. 

To ensure the highest levels of productivity with the latest set of features, Spring Oaks needs to compute large amounts of feature data reliably every morning. Watch an overview of the architecture that has given Spring Oaks 8x performance over the prior solution. 

Innovid, which powers advertising delivery, personalization, and measurement for the world’s largest brands, has also been using Snowpark-optimized warehouses. Innovid collects approximately 6 billion data points from over 1 billion ads each day. Using Snowpark-optimized warehouses, the data science team is able to process these very large data sets and train ML models to provide sophisticated solutions in cross-platform ad serving, data-driven creative, and converged TV measurements for their global client base. Read more about Innovid’s experience using Snowpark for ML.

How to get started

You can get started with Snowpark-optimized warehouses by following usage instructions in our documentation and quickstart guide, which includes step-by-step setup instructions and product details. We’re continuously looking for ways to improve, so if you have any questions or feedback about the product, make sure to let us know in the Snowflake Forums community

Snowpark: Building Better Data Pipelines and Models in the Data Cloud

Snowpark, a new developer framework of Snowflake, allows data engineers, data scientists, and data developers to code in their familiar way with their language of choice, and execute pipeline, ML workflow and data apps faster and more securely, in a single platform.
Share Article

Build and Deploy ML with Ease Using Snowpark ML, Snowflake Notebooks, and Snowflake Feature Store

Snowflake's AI/ML updates, including Snowpark ML, Snowflake Notebooks, and the Feature Store, streamline ML workflows for data scientists.

Container Runtime: GPU Training & Inference with Snowflake Notebooks

Container Runtime for Snowflake Notebooks, available in public preview across AWS regions, comes pre-configured with popular Python libraries and frameworks.

Snowflake for Python: Machine Learning, Feature Engineering

Learn how Snowpark for Python and Snowpark-optimized warehouses support machine learning model training, feature engineering, and more.

Accelerate Your Machine Learning Workflows with Snowpark ML

Using Python to build and manage ML models in Snowflake is now easier and faster with Snowpark ML Modeling and Snowpark ML Operations.

Build and Manage ML features for Production-Grade Pipelines

Snowflake Feature Store simplifies ML pipelines with centralized features, automated refresh and enterprise-grade security for production-ready models.

Scalable Model Development and Production in Snowflake ML

Discover how Snowflake ML enables scalable model development and production with integrated tools for training, inference, observability, and governance.

Snowflake Postgres: Built for Developers, Ready for the Enterprise

Discover how Snowflake Postgres provides a production-ready, enterprise-grade foundation of compatible Postgres to support developers through the entire development lifecycle.

Snowflake Completes CCCS Protected B Assessment on AWS and Azure

Snowflake has completed the Canadian Centre for Cyber Security (CCCS) Protected B Assessment, empowering the Canadian government to securely use sensitive data for critical initiatives on AWS and Azure.

Snowpark ML: The ‘Easy Button’ for Open Source LLM Deployment in Snowflake

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime