Introducing Container Runtime: Enabling Flexible, Scalable Training and Inference on GPUs from a Snowflake Notebook
Predictive machine learning continues to be a cornerstone of data-driven decision-making. However, as organizations accumulate more data in a wide variety of forms, and as modeling techniques continue to advance, the tasks of a data scientist and ML engineer are becoming increasingly complex. Oftentimes, more effort is spent on managing infrastructure, jumping through package management hurdles, and dealing with scalability issues than on actual model development.
Today, we’re excited to expand the functionality of Snowflake ML with the new Container Runtime for Snowflake Notebooks, available in public preview across all AWS commercial regions. This fully-managed, container-based runtime comes preconfigured with the most popular Python libraries and frameworks, with the flexibility to extend from open source hubs such as PyPi and HuggingFace. Container Runtime includes APIs that automatically parallelize data loading and model training. This delivers 3-7x execution speed improvement – based on internal benchmarking over running the same workload with OSS libraries outside of the runtime — making it easy to efficiently scale your ML workflows. By using Snowflake Notebooks on Container Runtime, data scientists and ML engineers spend significantly less time on infrastructure and scalability, and can spend more time developing and optimizing their ML models, and focusing on rapid business impact.
Build cost-effective, scalable and flexible ML models in Snowflake
Many enterprises are already using Container Runtime to cost-effectively build advanced ML use cases with the easy access to GPUs. Customers include CHG Healthcare, Keysight Technologies, and Avios.
CHG Healthcare, a healthcare staffing company with over 45 years of industry expertise, uses AI/ML to power its workforce staffing solutions across 700,000 medical practitioners representing 130 medical specialties. CHG builds and productionizes its end-to-end ML models in Snowflake ML.
“Using GPUs from Snowflake Notebooks on Container Runtime turned out to be the most cost-effective solution for our machine learning needs. We appreciated the ability to take advantage of Snowflake's parallel processing with any open source library in Snowflake ML, offering flexibility and improved efficiency for our workflows.” – Andrew Christensen, Data Scientist, CHG Healthcare
Keysight Technologies is a leading provider of electronic design and test solutions. With over $5.5 billion in global revenues and over 33,000 customers in 13 industries, Keysight holds over 3,800 patents for its innovations. Keysight builds scalable sales and forecasting models in Snowflake ML with Container Runtime.
“Having tried Snowflake Notebooks on the Container Runtime, we can say the experience has been remarkable. The flexible container infrastructure supported by distributed processing on both CPUs and GPUs, optimized data loading, and seamless integration with Model Registry have improved our workflow efficiency.” – Krishna Moleyar, Analytics & Automation for IT Global Applications, Keysight Technologies
Avios, a leader in travel awards with more than 40 million members and 1,500 partners, uses Snowflake Notebooks on Container Runtime to perform deeper analysis and data analysis tasks with the flexibility needed for their business.
“I have really enjoyed using Snowflake Notebooks on Container Runtime for the flexibility and speed they offer. I am able to run my code without worrying about it time-ing out or variables being forgotten. Enabling PyPI integration, I also have the added benefit of using a wider range of Python packages, making my analysis and data science tasks more flexible.” – Olivia Brooker, Data Scientist, Avios
Simplifying ML infrastructure management
In just a few clicks, Container Runtime abstracts away the headache of package management and infrastructure provisioning by providing:
A simple notebook configuration of selecting a Compute Pool for a Snowflake Notebook, enabling data scientists to choose from a predefined set of resource pools – CPU or GPU – based on the scale and degree of complexity needed for their resource-intensive tasks such as model training.
A set of runtime CPU and GPU-specific images, pre-installed with the latest and most popular libraries and frameworks (PyTorch, XGBoost, LightGTM, Scikit-learn and many more) supporting ML development, so data scientists can simply spin up a Snowflake Notebook and dive right into their work.
Access to open source packages via pip and the ability to bring in any model from hubs such as Hugging Face (see example here).
Scaling ML development
Container Runtime helps data scientists to iterate faster and focus on creating value instead of managing infrastructure. It offers easy access to self-serve infrastructure with optimized data loading and distributed model training that lets you train on very large data and models, or to simply speed up ML workflows.
Snowflake ML APIs for data loading offer efficient materialization of Snowflake tables as pandas or PyTorch Dataframes. Data is efficiently ingested in parallel and surfaced up in the Notebook as a dataframe by parallelizing across multiple CPUs or GPUs.
When dealing with more complex model training frameworks such as PyTorch’s Distributed Data Parallel approach, Snowflake’s new ShardedDataConnector API simplifies the task of ingesting the source table and sharding it so that it’s available for each parallel process — all with the same efficiency and speed.
from snowflake.ml.data.data_connector import DataConnector
# Retrieve data from a Snowflake table
table_name = 'LARGE_DATASET'
snowpark_df = session.table(table_name)
# Materialize it into a pandas dataframe using DataConnector. Snowflake leverages a distributed compute cluster to load in the data in parallel
pandas_df = DataConnector.from_dataframe(snowpark_df).to_pandas()
Users can use any open source framework like scikit-learn, XGBoost, LightGBM, Pytorch, Tensorflow or any others for feature engineering and model training on this materialized data and not lose any flexibility or choice.
# OSS XGBoost
import xgboost as xgb
X = pandas_df.drop('LABEL', axis=1)
y = pandas_df['LABEL']
# Train on GPUs using the materialized pandas dataframe
model = xgb.XGBRegressor(n_estimators=1000, tree_method="gpu_hist")
model.fit(X, y)
Snowflake ML APIs for model training extend the familiar open source interfaces provided by XGBoost, LightGBM and PyTorch. Users can now simply import their existing training notebooks and get the security, scale and cost benefits of running in the Container Runtime.
What’s more, with a simple extension to the same open source APIs, users can leverage distributed training over multiple CPUs or GPUs without the need for orchestrating any of the underlying infrastructure. Data scientists can now supercharge their training efforts and efficiently train models over datasets of 100s of GB or more!
from snowflake.ml.modeling.distributors.xgboost import XGBEstimator, XGBScalingConfig
input_cols = ["FEATURE1", "FEATURE2", "FEATURE3", "FEATURE4", "FEATURE5", "FEATURE6", "FEATURE7"]
label_col = 'TARGET'
# Snowflake will distribute the training across multiple GPUs
scaling_config = XGBScalingConfig(use_gpu=True)
estimator = XGBEstimator(n_estimators=10, scaling_config=scaling_config)
model = estimator.fit(data, input_cols=input_cols, label_col=label_col)
Governing, deploying, and serving models with ease
As the complexity of data processing and ML workflows grows, a strong end-to-end lineage is a “must have” in order to ensure the reproducibility and auditability of models deployed into downstream applications. With seamless integration to the Snowflake Model Registry and automatic ML lineage (private preview), data scientists can easily promote their models from within an experiment in a Notebook to a fully-managed entity that is governed and tracked through the deployment and consumption phases.
# Log the trained model to the Snowflake Model Registry
model_ref = registry.log_model (
model,
model_name="ChurnClassifier",
version_name="v1",
sample_input_data=train_data.limit(1).to_pandas(),
)
Once a model is logged, ensuring that it is available for inference, either as a batch analytics workload or a record-by-record inference invocation from an application through an endpoint, is automatically handled. Models logged in Model Registry can directly be used for inference with flexible options as shown below:
In addition to Warehouse-based inference from Python or SQL that was previously supported (and is a great option for CPU models that use Snowflake Conda packages), Model Registry now supports serving models in Snowpark Container Services in public preview in AWS commercial regions using a simple but powerful create_service() API. The key benefits of container-based model serving are:
Accelerate jobs with GPUs: Users can run large ML models on a distributed compute pool with GPUs, so that any model trained in Container Runtime, along with models brought in from external sources, can be used in Snowflake. Distributed inference with multiple processes on available instance nodes and GPUs are automatically handled and the user does not have to worry about optimizing for resource utilization.
Use any Python library: Models can use pip package dependencies for packages that may not be supported in the Warehouse.
Eliminate complex container image creation and management: Snowflake automates the creation of an optimized model-specific inference server container image and deploys the service. Users do not have to deal with any management or configuration of the container image.
Flexible inference options: The model service can be invoked from Python SDK, directly from SQL, and from REST API endpoints from applications.
See our documentation for more details and examples.
Production-ready use cases in Snowflake ML
The new Container Runtime for ML unlocks the ability for data scientists to build complex, production-ready pipelines directly in Snowflake ML. There is no need to move data out or set up additional ML infrastructure in order to be interoperable with existing workflows or tools. ML workloads can even be easily applied over large-scale data running complex ML techniques, due to support for highly optimized data and distributed workflows as part of the Snowflake ML APIs.
Advanced ML use cases made possible by Snowflake with Container Runtime include:
Image anomaly detection: In this example, a manufacturing company is looking to build anomaly detection for their industrial inspection workflows. Their goal is to detect part defects by training a computer vision model capable of identifying anomalous images. This requires building a computer vision model trained on over a million images. Watch this use case in action at BUILD and follow along in this quickstart.
Recommendation engine: In this example, a global food truck company is looking to build a recommendation engine to power hundreds of food trucks to generate highly accurate, hyper-local menu recommendations. We demonstrate how, using a PyTorch-based recommendation algorithm, you can train and deploy a model to do exactly that. See this quickstart to learn more.
Generate embeddings at scale: In order to build and operate an internal knowledge discovery platform powered by a RAG workflow, you need to generate embeddings at scale from text data. We demonstrate how you can do this with the Cortex embed function, a fully-managed experience for generating high quality embeddings. If you have use cases that require custom embeddings, you can use this quickstart and the Container Runtime option.
Get started with Container Runtime
Container Runtime is now available in public preview in all AWS commercial regions (excluding free trials). Try this intro to Container Runtime quickstart that walks you through the experience of creating a Notebook and building a simple ML model. To learn how to leverage GPUs for model development in Snowflake, you can follow along in this more advanced quickstart: Training an XGBoost Model with GPUs, which includes a companion video demo.
For further details about Container Runtime or how to use this from Snowflake Notebooks, be sure to check out the documentation: Snowflake Notebooks on Container Runtime for ML.