At Snowflake, we’re helping data scientists, data engineers, and application developers build faster and more efficiently in the Data Cloud. That’s why at our annual user conference, Snowflake Summit 2023, we unveiled new features that further extend data programmability in Snowflake for their language of choice, without having to compromise on governance.
A key highlight at this year’s Summit, is all the innovation enhancing and expanding Snowpark libraries and runtimes developers can use to deploy and process your non-SQL code even easier and more secure. To make it even easier to process data with Snowpark Python UDFs and Stored Procedures, we have added support for Python 3.9 and 3.10 and unstructured data support, now in public preview. To enhance the security and governance of code in Snowflake, we also added granular allowlists and blocklists for Python packages in private preview. Integrating with APIs and endpoints in a secure way is now possible with external network access, now in private preview. This includes security features to only allow network traffic to user specified network locations.
These are just a few of the Snowpark innovations, with many more across Snowflake that continue to expand the scope and possibilities of programmability in the Data Cloud, delivering unique innovations that enable customers to:
- Simplify, accelerate and scale end-to-end AI/ML workflows
- Expand streaming capabilities
- Enhance observability and DevOps experience
Simplify, accelerate and scale end-to-end AI/ML workflows
The AI/ML workflow is one that can be broadly split into three steps for the models: development, operations and consumption. The end-to-end process requires collaboration across many data, engineering and business teams in order to capitalize on the value that can be captured from AI-powered insights. But because most of these teams use different technologies and work with different programming languages, copies of data are moved to siloed environments, making it challenging to systematize and scale the entire workflow in most organizations.
To power a much broader set of development, we launched Snowpark Container Services (private preview), to enable developers to effortlessly deploy, manage and scale containerized models using secure Snowflake-managed infrastructure with configurable hardware options such as GPUs. This new Snowpark runtime eliminates the need for users to deal with complex operations of managing and maintaining compute and clusters for containers nor expose governed data to security risks by moving it outside of their Snowflake account. The additional flexibility that comes in both programming languages (e.g. R) and hardware (e.g. GPUs), helps increase the speed of development and the ability to deploy sophisticated apps such as hosted Notebooks and LLMs via Snowflake Native Apps. Other Snowpark innovations to streamline AI/ML development, operations, and consumption include:
Snowpark ML Modeling API to accelerate feature engineering and simplify AI/ML training
Snowpark ML APIs, composed of the ML Modeling API (public preview) and ML Operations API (private preview), will enable easier end-to-end ML development and deployment in Snowflake. On the development side, the Snowpark ML Modeling API scales out feature engineering and simplifies model training in Snowflake.
The Snowpark ML Modeling API allows for the implementation of Sklearn-style processing natively on data in Snowflake without the need for creating Stored Procedures and taking advantage of parallelization.
It also allows data scientists to train models with familiar APIs directly on data in Snowpark by using Sklearn and XGBoost natively on data without importing through Stored Procedures for a simpler user experience.
Snowpark Model Registry to store and govern all of an organization’s AI/ML models
After a model has been developed, data scientists can also seamlessly deploy their model in Snowflake with the Snowpark ML Operations API, which includes the Snowpark Model Registry (private preview). This provides a unified repository for an organization’s ML models to streamline and scale their machine learning model operations (MLOps).
The registry provides centralized publishing and discovery of models to streamline collaboration as part of the process where data scientists handoff successful experiments to ML engineers to deploy them as models in production on Snowflake infrastructure.
Streamlit in Snowflake to bring data and models to life as interactive apps
Streamlit in Snowflake (public preview soon) brings data and ML models to life with interactive apps built with Python. It combines the component-rich, user-friendly Streamlit open-source library for app development with the scalability, reliability, security, and governance of the Snowflake platform.
Streamlit gives data scientists and Python developers the ability to quickly turn data and models into interactive, enterprise-ready applications.
Simplified streaming pipelines in Snowflake
We are expanding our streaming capabilities with Dynamic Tables (public preview). Dynamic Tables drastically simplifies continuous data pipelines for transforming both batch and streaming data. You can now have streaming data pipelines as easy as a Create Table as Select (CTAS) statement. Together with Snowpipe Streaming (GA soon), Snowflake removes the boundaries between batch and streaming systems and makes streaming pipelines easier than ever before.
A Dynamic Table is a new table that’s defined as a query and continually maintains the result of that query as a table. Dynamic Tables can join and aggregate across multiple source objects, and incrementally update results as sources change. With Dynamic Tables, customers provide a query and information on how frequently to update, and Snowflake automatically materializes the results. This means the onus to perform data pre-computing is automated rather than a manual step to be completed by a data engineer.
Improving observability and end-to-end developer experience
To make it easier, faster, and simpler to build end-to-end applications, pipelines, and ML models, we want to bring capabilities and experiences that are familiar to developers, so they can work more efficiently. To do this, we launched a set of DevOps and observability capabilities at Summit to allow developers to build collaboratively, test easily, troubleshoot faster, operate with stability, and boost overall productivity.
This includes such features as Git integration (private preview), which provides easy integration of application code with git and git workflow. Users can view, run, edit, and collaborate with assets that exist in a Git repo, right inside of Snowflake.
We also announced the private preview of Snowflake CLI. Snowflake CLI is an open-source command-line interface explicitly designed for app-centric workloads on Snowflake. Developers can use simple commands to create, manage, update, and view apps running on Snowflake across workloads such as Streamlit, Native App, Snowpark Containers, or Snowpark.
Logging and tracing with event tables (public preview) explore logs from inside and outside of Snowflake for troubleshooting using Snowflake’s engine, bringing better debuggability into code in Snowflake.
All of these features click together cleanly for full DevOps and Software Development Lifecycle of both apps and data in Snowflake, enabling more productive developer workflows.
Snowflake is bringing generative AI to the data by helping our customers securely run LLMs on enterprise data, delivering AI-driven functionality as built-in functions and UIs, and more. Check out our ML demo at Summit to see how it works.