Product and Technology

Access PyPI Packages in Snowpark via UDFs and Stored Procedures

Photo illustration of a man using a computer with the Snowpark  icon and PyPI in Snowpark label

For developers, data engineers and data scientists using Snowpark, one of the biggest challenges has been accessing the Python packages they need and managing dependencies for packages uploaded on stage. While Snowpark Python already supports Anaconda packages and custom packages, the lack of direct access to the vast ecosystem of the Python Package Index (PyPI) has meant extra steps for managing dependencies and working around limitations.

That changes today. Supercharge your Snowpark Python development with groundbreaking news! We're excited to announce direct access to the vast Python Package Index (PyPI), available now in public preview, revolutionizing your workflow and unlocking unparalleled flexibility. Imagine effortlessly integrating any Python package you need, directly into your Snowpark Python UDFs and stored procedures, without waiting. Yes, you heard that right! While the trusted Snowflake Anaconda channel remains available, you now have the freedom to tap into the entire PyPI ecosystem, which hosts more than 600,000 Python packages. 

This new capability significantly simplifies development workflows, making it easier than ever to build and scale Python-powered applications in Snowflake.

How it works

Snowflake has a default Artifact Repository that allows users to connect and install PyPI packages within Snowpark UDFs and stored procedures. Simply grant access to a built-in PyPI repository and install the repository package within a UDF or procedure (get more details about the process in our documentation).

Diagram showing how Artifact Repository allows Snowpark users to directly connect and install PyPI packages within UDFs and stored procedures.
Figure 1. The Artifact Repository allows Snowpark users to directly connect and install PyPI packages within UDFs and stored procedures.

“CARTO, a leading cloud-native location intelligence platform, is collaborating on the EU-funded EMERALDS project to bring advanced mobility analytics for trajectory data to the cloud. Existing methods using libraries like PyMEOS and MovingPandas offer powerful trajectory analysis but are not scalable for large data sets or real-time applications. With Snowflake's Artifact Repository, we can now integrate these methods as Python UDFs, making them easily accessible and scalable within Snowflake without complex data pipelines. This lowers the entry barrier for users to solve mobility challenges such as improving driver-assistance technology by analyzing self-driving car trajectories, optimizing insurance models through driver behavior analysis and enhancing network planning by mapping user movement patterns.”

Giulia Carella
Principal Data Scientist, CARTO

Benefits

Snowflake’s enhanced support for PyPI packages in Snowpark UDFs and procedures provides significant benefits:

  • Instant access, zero hassle: Now, any PyPI (whl) package is at your fingertips. 

  • Unleash limitless possibilities: Leverage any `.whl` package from PyPI and even utilize Snowpark Optimized Warehouse for x86 CPU architecture-specific packages. For custom packages, users can upload those packages on a stage and then import them within a UDF or stored procedure. 

  • Simplified development: Dramatically streamline your development process, making building and scaling Python-powered applications in Snowflake faster and easier than ever.

This provides a secure and seamless experience for the organizations. Users don’t need to pip install a package every time for a data analysis or data engineering job. Rather, they just need to specify the package name as part of ARTIFACT_REPOSITORY_PACKAGES. Under the hood, the package gets installed in the underlying sandbox environment on the virtual warehouse. To make it simple, Snowflake provides a default Artifact Repository called SNOWFLAKE.SNOWPARK.PYPI_SHARED_REPOSITORY that you use to connect and install PyPI packages. An ACCOUTADMIN must grant access to a ROLE to enable this use. Behind the scenes, we also cache the packages to avoid hitting PyPI every time, as it helps improve performance. 

“At Tacoma Public Utilities, we host our data pipelines in Snowflake Python stored procedures, some of which require Python libraries not included in the Anaconda channel. For example, we use PGPy to encrypt data shared with partners, who analyze it to identify energy efficiency opportunities for our customers. Previously, incorporating non-Anaconda libraries was cumbersome, requiring manual downloads, uploads to a Snowflake stage and ongoing maintenance of static library versions. With the new Artifact Repository feature, we can now seamlessly integrate with PyPI, simplifying setup and eliminating the need for manual version management. This ensures our pipelines always have access to the latest maintained packages, enhancing both efficiency and reliability.”

Nicole Edwards
Data Architecture Manager, Tacoma Public Utilities

Why this matters

  • Developers can build more powerful data applications with a broader array of Python libraries.

  • Data engineers can develop their pipeline and leverage packages (such as asyncmy, bitstruct, stumpy, sparse etc.)  from PyPI to be used for various use-cases ranging from data enrichment, ingestion, transformation, validation and many more. 

  • Data scientists can seamlessly execute machine learning workflows using popular packages such as TensorFlow or scikit-learn with the latest versions from PyPI.

Getting started

Getting access to PyPI means more power, more flexibility, and less friction. Dive into a world of endless Python possibilities and elevate your Snowpark projects to new heights.

Ready to experience the future of Snowpark Python development? Start using direct PyPI access today and see the difference! Please see feature documentation here. In addition, you’ll find an end-to-end example of using packages from PyPI for unstructured data processing with Snowpark in this post

Share Article

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime