SNOWFLAKE CERTIFIED SOLUTION

Build an End-to-End ML Workflow in Snowflake

By: Elliott Botwick

Published: Jul. 9, 2025

xgb_base = XGBClassifier(
    max_depth=50,
    n_estimators=3,
    learning_rate = 0.75,
    booster = 'gbtree')

In [ ]:
X_train_pd = train_pd.drop(["TIMESTAMP", "LOAN_ID", "MORTGAGERESPONSE"],axis=1) #remove
y_train_pd = train_pd.MORTGAGERESPONSE

xgb_base.fit(X_train_pd,y_train_pd)


In [ ]:
from sklearn.metrics import f1_score, precision_score, recall_score
train_preds_base = xgb_base.predict(X_train_pd) #update this line with correct ata

f1_base_train = round(f1_score(y_train_pd, train_preds_base),4)
precision_base_train = round(precision_score(y_train_pd, train_preds_base),4)
recall_base_train = round(recall_score(y_train_pd, train_preds_base),4)

print(f'F1: {f1_base_train} \nPrecision {precision_base_train} \nRecall: {recall_base_train}')

watch the demo

All Solutions

Learn more about Snowflake ML

Learn more

Overview

This solution allows you to build and deploy a complete machine learning workflow entirely within Snowflake ML. You'll work through a mortgage lending prediction use case, implementing each stage of the ML lifecycle from feature engineering to model deployment and monitoring.

This solution showcases how to build an end-to-end ML workflow, including:

Defining and managing features with Snowflake Feature Store
Model training and hyperparameter optimization with Snowflake ML APIs
Versioning and lifecycle management with Snowflake Model Registry
Tracking performance and drift with integrated ML Observability

The video shows a demo of how you can build, deploy, serve, and monitor models in production with a set of integrated MLOps features that seamlessly work together.

Solution Architecture: End-to-end ML workflow in Snowflake

expand

The quickstart walks through how to:

1. Use Snowflake Feature Store to track engineered features

Store feature definitions in feature store for reproducible computation of ML features

2. Train two Models using the Snowflake ML APIs

Baseline XGboost
XGboost with optimal hyper-parameters identified via Snowflake ML distributed HPO methods

3. Register both models in Snowflake Model Registry

Explore model registry capabilities such as metadata tracking, inference, and explainability
Compare model metrics on train/test set to identify any issues of model performance or overfitting
Tag the best performing model version as 'default' version

4. Set up Model Monitor to track 1 year of predicted and actual loan repayments

Compute performance metrics such a F1, Precision, Recall
Inspect model drift (i.e. how much has the average predicted repayment rate changed day-to-day)
Compare models side-by-side to understand which model should be used in production
Identify and understand data issues

5. Track data and model lineage throughout

View and understand:
- The origin of the data used for computed features
- The data used for model training
- The available model versions being monitored