Training Data Transparency

Last Updated: January 1, 2026

Snowflake Cortex is a suite of AI features made available to our customers that utilizes artificial intelligence models to derive insights from unstructured data, answer freeform questions, and provide intelligent assistance. Designed for enterprise use, Cortex supports both general purpose and task-specific applications.

Models powering Cortex include our own proprietary models, open-source models, and proprietary models licensed from third parties. Snowflake’s training data for the models it develops generally includes publicly accessible, licensed private or proprietary data, and synthetic data. The collection of datasets used for training varies across Snowflake’s models.

Data is cleaned, filtered, and preprocessed prior to training. Snowflake considers the details of its data processing pipeline and the precise dataset usage to be confidential and competitively sensitive information.

For models sourced from third-party model providers, users should refer to those providers’ respective websites for specific information regarding their training data.