Data Science

How We Built an Internal Library That Powers Half of Snowflake's Streamlit Apps

When Streamlit was acquired by Snowflake in 2022, one of the big goals was to integrate Streamlit into the heart of Snowflake’s product offerings. We knew the potential of Streamlit — how quickly it could be used to turn data scripts into powerful apps — and we were excited to see how that could scale at Snowflake.

On the Streamlit data team, we’re big believers in dogfooding, using our own tools to solve real problems. So, naturally, we wanted to start building with Streamlit inside of Snowflake as soon as possible.

But, we weren’t casual Streamlit developers; we are power users. We have strong opinions about what a great development experience should feel like. Rather than each of us hacking away in isolation, we decided to build something more scalable — a shared internal platform that could support Streamlit development across the company.

So we created a monorepo (a single Git repository with common tooling set up to hold lots of different apps). It started small — just five of us building a handful of apps for our own work. But, it quickly took off, and more teams wanted to start using the platform.

Today, that platform supports more than 70 teams across Snowflake building more than 500 different Streamlit apps. These apps do everything from surfacing product metrics and analyzing pipelines to supporting executive dashboards and customer success tools. In fact, the apps built on this platform now account for more than 50% of all Streamlit app views inside of Snowflake.

What started as five people dogfooding a product they loved has become a core part of how Snowflake operates. And we’re just getting started.

In this post, I want to tell you about the platform we built, and some of the main features that help make it such a popular platform for other teams building Streamlit apps at Snowflake. We’ll dive into the specific technical choices and hard-won lessons that you can apply to build your own scalable internal Streamlit developer platform.

We built a platform¹ that:

  • Supports local-first development (if desired), so developers can use their editors of choice to actually build the app (VSCode and Pycharm being the most popular choices by far)

  • Quickly spins up new apps, batteries included

  • Allows shared code to be used across multiple pages and multiple apps

  • Follows good development practices by default

  • Allows developers to publish new apps manually or via CI/CD

  • Easily publishes development versions of apps for others to preview

  • Builds as much as possible on Snowflake-specific features

What it looks like to use this platform

An example workflow
An example workflow

Before I go into more details about the different parts of the platform, here’s an overview of what it looks like to actually use the platform to make a new Streamlit app.

  • Make a new branch

  • Run task new, which prompts me to enter in the basic info for the app

Copier prompting for app info
Copier prompting for app info
  • Copier then creates the basic structure of the app

Copying from template
 identical  my_team
    create  my_team/my_new_app
    create  my_team/my_new_app/metrics
    create  my_team/my_new_app/metrics/my_new_app_metrics.py
    create  my_team/my_new_app/app_pages
    create  my_team/my_new_app/app_pages/home_page.py
    create  my_team/my_new_app/app_pages/another_page.py
    create  my_team/my_new_app/.streamlit
    create  my_team/my_new_app/.streamlit/config.toml
    create  my_team/my_new_app/tests
    create  my_team/my_new_app/streamlit_app.py
    create  my_team/my_new_app/snowflake.yml
    create  my_team/my_new_app/environment.yml
  • I then run and edit the app locally using my editor

  • I use task deploy-preview to launch a preview version of the app

  • I update .github/CODEOWNERS with the team members who should review my app in the future

  • I create a pull request with a link to the preview app so my colleagues can review

  • Once it’s approved and merged, a GitHub Action will deploy the new production app

Auto-generated Github Actions success message

A local-first experience

One of our first goals was to provide the ability to work on an app locally, as well as through the web-based UI that Streamlit in Snowflake (SiS) now provides.

Initially, this was a bit more complex as SiS had quite a few limitations compared to open source Streamlit. Thankfully, the list of limitations has gotten quite short, and you can use nearly all the features of open source Streamlit effectively. So the biggest difference between local development and SiS is making sure that the Snowflake connection works smoothly.

The way we make sure our developers have a solid connection is to use either the st.connection(), which should work locally or when deployed, or fall back to get_active_session(), which is only available inside of Snowflake.

import streamlit as st
from snowflake.snowpark.context import get_active_session
from streamlit.errors import StreamlitAPIException
from snowflake.snowpark import Session

def get_session() -> Session:
    try:
        return st.connection("snowflake").session()
    except StreamlitAPIException:
        return get_active_session()

However, this won’t work by itself — for st.connection("snowflake") to actually work, we need to tell Streamlit what connection values to use. To make this as easy as possible, we have all our developers set up a minimal file in .streamlit/secrets.toml

[connections.snowflake]
account = "<my_snowflake_account>"
user = "<my_snowflake_username>"
authenticator = "externalbrowser"

And, if they install snowflake-connector-python[secure-local-storage], it will minimize the number of times they need to redo the browser authentication.

In addition to the basic get_session, we also have premade functions for running and caching Snowflake queries using either SQL or Snowpark. We have open sourced some of these utilities as a part of streamlit-extras.

A simple CLI

Under the hood, creating, updating and running a SiS app is really just a series of SQL commands. But requiring users to run those commands themselves is certainly not a great user experience. So we chose to build our workflows on top of the official Snowflake CLI, snowcli.

This allows us to deploy a Streamlit app with a simple command: snow streamlit deploy. To make it even easier, we have a Taskfile.yml set up so users can just do task deploy to deploy their app, or task deploy-preview to deploy a preview version of the app.

Using snowcli requires a config file at $HOME/.snowflake/config.toml that looks an awful lot like Streamlit’s secrets.toml:

[connections.snowflake]
account = "<my_snowflake_account>"
user = "<my_snowflake_username>"
authenticator = "externalbrowser"

It also requires a snowflake.yml, which specifies which files should be deployed with the SiS app and what database, schema, warehouse and role should be used.

definition_version: "2"

env:
  name: my_app
  schema: public
  database: streamlit
  role: ANALYTICS
  title: My cool app

entities:
  streamlit_app:
    type: streamlit
    identifier:
      name: <% ctx.env.name %>
      schema: <% ctx.env.schema %>
      database: <% ctx.env.database %>
    title: <% ctx.env.title %>
    query_warehouse: compute_wh

    main_file: streamlit_app.py
    pages_dir: pages/
    artifacts:
      - "**/*.py"
      - "*.yml"

So that we can override things like the app name, we use placeholders that are filled in with values from env by default. But these can also be overridden by passing --env key=val.

To make all this even easier on our users, we have a Taskfile.yml (Task docs) that allows them to have simple commands, such as task deploy and task deploy-preview to deploy their app from the command line. The preview version of the app has a different name and title (and can also have a different database, schema, role and even Snowflake account if desired) from the production version of the app. Users can also use task new to create a fully functional app with a multipage, dashboard-style, ready-to-deploy app.

Here is a sample of parts of our Taskfile.yml:

version: "3"

vars:
  USERNAME:
    sh: whoami

tasks:
  snow:
    dir: "{{.USER_WORKING_DIR}}"
    cmds:
      - uvx --from=snowflake-cli==3.10 snow {{.CLI_ARGS}}

  deploy:
    dir: "{{.USER_WORKING_DIR}}"
    cmds:
      - task: snow
        vars:
          CLI_ARGS: streamlit deploy

  deploy-preview:
    dir: "{{.USER_WORKING_DIR}}"
    cmds:
      - task: snow
        vars:
          CLI_ARGS: streamlit deploy --env name={{.PREVIEW_APP_NAME}} --env title={{.PREVIEW_APP_TITLE}}
    vars:
      APP_NAME:
        sh: uvx yq -r .env.name snowflake.yml
      APP_TITLE:
        sh: uvx yq -r .env.title snowflake.yml
      PREVIEW_APP_NAME: "{{.APP_NAME}}_preview_{{.USERNAME}}"
      PREVIEW_TITLE: "{{.APP_TITLE}} Preview (from {{.USERNAME}})"

  new:
    dir: "{{.USER_WORKING_DIR}}"
    cmds:
      # Using copier, asks the user for the app name, schema, db, warehouse and
      # role, and populates a new template app with these values
      - uv run --with copier new_app.py

Using uvx allows us to easily include new Python tools like snowcli and yq (a command-line yaml processor) without users having to do a lot of installations up front or having them interfere with app-specific packages.
 

Shared modules

Along with enabling standardized methods for connecting to Snowflake and running queries, we also have developed a large suite of modules that any of the users of the platform can use when building their apps. Many of these are inspired by our work on streamlit-extras. Some of the most popular packages provide:

charts
  • Standardized charting methods for popular chart types

  • Standardize filter form builders

widgets
  • Laying out apps in rows and grids

CI/CD

While developers can deploy simpler apps manually, our most visible and sensitive applications are updated exclusively through automated CI/CD triggers in GitHub Actions.

We have an internal Github Action (similar to this) that can be used by passing a list of apps in the monorepo that should be deployed if they are changed.

name: Deploy product apps via snowcli

on:
  push:
    branches:
      - main

jobs:
  deploy:
    uses: ./.github/workflows/deploy_apps_via_snowcli.yml
    with:
      apps: |
        product/northstar
        product/monitor
        product/admin
      user: ${{ secrets.PRODUCT_DEPLOY_USER }}
    secrets:
      password: ${{ secrets.PRODUCT_DEPLOY_PW }}

Best practices

Running what has become a company-wide monorepo has allowed us to adopt and update modern Python best practices that scale across users with a wide range of Python (and general software) experience.

One of the biggest ways these practices are encouraged and enforced is through a standard set of linting and formatting that gets run locally via precommit, then in GitHub Actions (by running the precommit action).

  • ruff as a fast and consistent linter

  • ruff format as a fast and consistent formatter

  • mypy to check types (hopefully to be swapped out with ty soon)

Other best practices enforced within the repo include:

  • Use of a virtual environment (either created and managed by uv or by conda as currently supported on the Snowflake platform)

  • Separation of concerns — newly spun apps show developers how to separate out shared utility code from the actual app code, rather than bundling everything together in freewheeling .py files

  • Use of GitHub usernames and teams in a CODEOWNERS file to simplify who should be reviewing code changes that occur

  • A pull request template, which prompts the user to answer key questions about scope and testing, and provides a link to a preview app showing their changes.

Lessons learned

We have learned some most important lessons as this platform has grown in scope and in number of users:
 

Try to assume as little as possible about users’ setup

One of the first and hardest lessons has been the classic problem of: “If it works on my machine…” This was a problem even when building for our original small team. Particularly as usage grew, we ran into innumerable issues, including:

  • I don’t have this explicit dependency installed

  • I don’t have this implicit dependency installed

  • I don’t have the same version installed

  • I don’t have my credentials setup

  • I have my credentials set up, but I’ve named them differently

  • I have everything installed, and the right versions, but the version of <tool> that I’m using is some old version I installed using a different method long ago

These issues used to pop up a lot, but they have generally dwindled over time, mainly from a combination of:

  • Documenting in exhaustive detail the required setup steps in the README

  • Making sure the commands in the Taskfile install exactly what’s needed, and giving informative messages if something is wrong with the user’s setup

  • Getting better at onboarding others (more on that below)
     

Everything breaks someone’s workflow²

As the number of users and types of apps have grown, we have learned that nearly every repo-impacting change will inevitably break someone’s workflow. Though that still ends up happening periodically (a search for “broken” in recent months in our Slack channel yields 90 results), a few things have helped avoid these breakdowns:

  • Carefully watching any changes affecting the Taskfile and any shared modules

  • Watching out for imports — if something is an optional dependency, or something that not every team has added into their app, then don’t import it at the top of a popular shared module — import it inside the function where it’s used instead

  • Using from __future__ import to add backwards compatibility for people using older Python versions

  • Defaulting to add new features/arguments/modules over just replacing existing ones, and if others need to be deprecated, providing users sufficient warning of the removal 

  • If all else fails, making sure that the team maintaining the platform has sufficient privileges to implement the changes everywhere they are needed
     

Many things are extensible; don’t be afraid to extend them

When we first started building this platform, the officially supported features of SiS were pretty limited, so we built our own using task+snowcli. When multipage apps weren’t yet supported natively in SiS, we made a workaround (building on the work of st-pages). When snowcli didn’t yet support some of the newest SiS features, we expanded our Taskfile so that deployment ran custom SQL commands, which turned on the newer features. When Streamlit didn’t yet have a feature we wanted, we found a workaround, some of which got published in streamlit-extras.

Because we were working with objects defined by code (SQL for native Snowflake objects + Python for Streamlit), it was often possible to use code to do things that weren’t supported out of the box.
 

Patient onboarding pays dividends

Onboarding our first external team onto the platform was a high-effort process of meetings and training sessions. While rewarding, we knew that it wouldn’t scale if usage continued to grow. We invested heavily in documentation, a README and a wiki, created a Slack channel for questions, and set up biweekly office hours. A few months later, we realized that more and more new teams were joining, often without any direct support from us. While the resources we had created helped, we also realized the bigger impact was the network effect: We had created a community of expert users who were helping to onboard new teams. We found out that others were even hosting their own training sessions on new features in the platform. Patiently onboarding those early users ended up paying huge dividends and made continued growth much more sustainable.
 

Supportive managers are crucial

None of the maintainers of this platform got hired with “Build and support a platform for building Streamlit apps” in our job offer. A mix of general software people and data people, we’re passionate about Streamlit and about building better tools to let others be successful. We’re thankful that it’s gotten to a point where most of the features are built and “just work” for most people, and that many others help onboard new users and teams. But, even with all that, creating, supporting and extending this platform has taken away lots of time from our other, more official responsibilities. Without supportive managers at Snowflake who saw our passion for this project, and shared our vision for the impact it could have, this project would have been dead in the water.


As our platform has grown, so has the Streamlit in Snowflake platform itself, and many of these have become easier out of the box for all Snowflake customers.

2 https://xkcd.com/1172/

Share Article

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime