SNOWFLAKE CERTIFIED SOLUTION
Build an End-to-End ML Workflow in Snowflake
xgb_base = XGBClassifier(
max_depth=50,
n_estimators=3,
learning_rate = 0.75,
booster = 'gbtree')
In [ ]:
X_train_pd = train_pd.drop(["TIMESTAMP", "LOAN_ID", "MORTGAGERESPONSE"],axis=1) #remove
y_train_pd = train_pd.MORTGAGERESPONSE
xgb_base.fit(X_train_pd,y_train_pd)
In [ ]:
from sklearn.metrics import f1_score, precision_score, recall_score
train_preds_base = xgb_base.predict(X_train_pd) #update this line with correct ata
f1_base_train = round(f1_score(y_train_pd, train_preds_base),4)
precision_base_train = round(precision_score(y_train_pd, train_preds_base),4)
recall_base_train = round(recall_score(y_train_pd, train_preds_base),4)
print(f'F1: {f1_base_train} \nPrecision {precision_base_train} \nRecall: {recall_base_train}')
Overview
This solution allows you to build and deploy a complete machine learning workflow entirely within Snowflake ML. You'll work through a mortgage lending prediction use case, implementing each stage of the ML lifecycle from feature engineering to model deployment and monitoring.
This solution showcases how to build an end-to-end ML workflow, including:
Defining and managing features with Snowflake Feature Store
Model training and hyperparameter optimization with Snowflake ML APIs
Versioning and lifecycle management with Snowflake Model Registry
Tracking performance and drift with integrated ML Observability
The video shows a demo of how you can build, deploy, serve, and monitor models in production with a set of integrated MLOps features that seamlessly work together.
Solution Architecture: End-to-end ML workflow in Snowflake
The quickstart walks through how to:
1. Use Snowflake Feature Store to track engineered features
Store feature definitions in feature store for reproducible computation of ML features
2. Train two Models using the Snowflake ML APIs
Baseline XGboost
XGboost with optimal hyper-parameters identified via Snowflake ML distributed HPO methods
3. Register both models in Snowflake Model Registry
Explore model registry capabilities such as metadata tracking, inference, and explainability
Compare model metrics on train/test set to identify any issues of model performance or overfitting
Tag the best performing model version as 'default' version
4. Set up Model Monitor to track 1 year of predicted and actual loan repayments
Compute performance metrics such a F1, Precision, Recall
Inspect model drift (i.e. how much has the average predicted repayment rate changed day-to-day)
Compare models side-by-side to understand which model should be used in production
Identify and understand data issues
5. Track data and model lineage throughout
View and understand:
The origin of the data used for computed features
The data used for model training
The available model versions being monitored
This solution was created by an in-house Snowflake expert and has been verified to work with current Snowflake instances as of the date of publication.
Solution not working as expected? Contact our team for assistance.