Unlocking Hidden Value: The Modern Data Architecture for AI-Powered Insurance Claims Insights

Snowflake for Developers GuidesUnlocking Hidden Value: The Modern Data Architecture for AI-Powered Insurance Claims Insights

Quickstart

Unlocking Hidden Value: The Modern Data Architecture for AI-Powered Insurance Claims Insights

Gayatri Ghanakota

fork repo

Overview

This guide walks through how to build a modern data architecture on Snowflake and AWS that reads open table format Iceberg data from AWS and combines it with Snowflake Native tables, transforms and enriches the data using Snowpark Connect for Apache Spark, and delivers AI-powered insights through Cortex Analyst and Snowflake Intelligence. It provides an end-to-end blueprint for combining open table formats, feature engineering, and natural language analytics to accelerate fraud detection and claims processing.

Prerequisites

Snowflake account with ACCOUNTADMIN role
AWS account with permissions to create and manage S3 buckets, IAM roles/policies, AWS Glue Data Catalog, and Lake Formation
Cortex Code CLI (CoCo) installed and configured with a valid connection to the Snowflake account

What You'll Learn

Snowflake Horizon Catalog — How the unified catalog now enables discovery and governance of both external Apache Iceberg tables and Snowflake Native tables from a single pane of glass, without moving data
Snowpark Connect for Apache Spark — How to use familiar PySpark APIs to join, enrich, and score data across external Iceberg tables and internal Native tables in a single Spark job running on Snowflake compute
Snowflake Intelligence — How to publish a Cortex Agent that lets business users ask natural language questions on the aggregated, fraud-scored claims data without writing SQL
Cortex Code (CoCo) — How to use CoCo and interactively deploy the entire solution end-to-end, from creating Snowflake objects and running Spark jobs to setting up Cortex Analyst and Snowflake Intelligence

Solution Architecture

Architecture Overview

This solution leverages AWS and Snowflake services to create an end-to-end data pipeline for insurance claims analytics.

Components

AWS Account (US-West-2)

Amazon S3 - Raw data storage in Apache Iceberg format
AWS Glue Data Catalog - Metadata management for Iceberg tables

Snowflake (US-West-2)

Catalog Linked Database - Connects to AWS Glue Catalog for accessing external Iceberg data
Snowflake Native Tables - Persistent storage for transformed data
Snowflake Horizon Catalog - Central governance layer
Snowpark Connect for Apache Spark - Data transformation and feature engineering
Snowflake Intelligence - AI-powered agent interactions
Cortex Analyst - Natural language to SQL generation

Phase 1: Data Ingestion & External Cataloging (AWS to Snowflake)

Step	Component	Claims Scenario
Claims Data Landing	Amazon S3	Raw claims data (e.g., FNOL reports, adjuster notes, images, etc.) is stored in the data lake using Apache Iceberg table format
Metadata Registration	AWS Glue Data Catalog	The Glue Catalog is updated with Iceberg table metadata, acting as the "source of truth" pointer to the Iceberg data files in S3
Catalog Synchronization	Catalog Linked Database	Snowflake's Catalog Linked Database connects to AWS Glue Catalog, ingesting the Iceberg metadata and creating a schema and table that points directly to the external Iceberg data in S3

Phase 2: Data Transformation & Feature Engineering

Step	Component	Claims Scenario
Data Processing & Feature Creation	Snowpark Connect for Apache Spark	The Spark job reads claims data from the Catalog Linked Database, performs ML-based feature engineering (e.g., NLP for claims description, embeddings, calculating loss ratios, geo-spatial analysis, etc.) and writes the enriched features to a Snowflake Native Table using the Iceberg REST API
Data Persistence	Snowpark Connect for Apache Spark (Write-back)	The resulting enriched and curated data is written back to a new Snowflake Native Table. This table is optimized for querying and AI/ML workloads

Phase 3: Insight Generation & Agent Interaction

Step	Component	Claims Scenario
Agent Query & Query Orchestration	Snowflake Intelligence, Cortex Agents	A claims agent, without writing SQL, asks a natural language question (e.g., "Show me the top 5 high-risk claims filed this week"). Cortex Analyst receives the request, understands the context, and generates the appropriate SQL query to Cortex Analyst
SQL Generation & Execution	Cortex Analyst	The Cortex Analyst component receives a user question, translates it into an optimized Snowflake SQL query, and executes it against the Snowflake Native Tables
Insight Delivery	Snowflake Intelligence	The end-to-end flow delivers instant answers to complex analytical questions, allowing users to act quickly by converting them to summary alerts or remediation actions

Cross-Cutting Layer: Governance

The Snowflake Horizon Catalog acts as the central governance layer throughout this entire process, ensuring every user and component (from Snowpark job to Snowflake Intelligence) only accesses data according to defined security policies.

Solution Deployment

Step 1: Configure AWS Infrastructure (S3, Glue, IAM, Lake Formation)

S3 Bucket for Iceberg Data Files

Navigate to AWS Console → S3 → Create bucket
Enter the bucket name: <account-id>-us-west-2-insurance-claims-iceberg-data
Select the AWS Region: us-west-2 (must match your Glue catalog region)
Leave Block all public access enabled (default)
Enable Bucket Versioning (recommended for Iceberg table consistency)
Click Create bucket
Once created, create the following folder structure inside the bucket:
- fnol_reports/ — First Notice of Loss reports
- adjuster_notes/ — Adjuster notes and assessments
- initial_estimates/ — Initial claim estimates

The folder structure should look as below once the S3 Bucket and Prefixes are created

AWS Glue Data Catalog

Verify that the Glue Data Catalog is enabled in your region (enabled by default in most regions)
- Navigate to AWS Console → Glue → Data Catalog → Databases
Create a Glue Database
- Navigate to AWS Console → Glue → Databases → Add database
- Database name: insurance_claims_iceberg_glue_db
- Location: s3://<account-id>-us-west-2-insurance-claims-iceberg-data/
- Click Create database

IAM Role with Glue/Lake Formation Permissions

Create an IAM Policy

Navigate to AWS Console → IAM → Policies → Create policy
Select the JSON tab and paste the following policy. Update the account-id placeholder before saving the policy.
Policy name: insurance-claims-iceberg-data-policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "GlueAndS3ScopedAccess",
            "Effect": "Allow",
            "Action": [
                "glue:CreateTable",
                "glue:DeleteDatabase",
                "glue:GetTables",
                "glue:GetPartitions",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:GetDatabases",
                "glue:GetTable",
                "glue:GetDatabase",
                "glue:GetPartition",
                "glue:GetCatalogs",
                "glue:CreateDatabase",
                "glue:GetCatalog",
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:glue:*:*:table/*/*",
                "arn:aws:glue:*:*:catalog",
                "arn:aws:glue:*:*:database/insurance_claims_iceberg_glue_db",
                "arn:aws:s3:::<account-id>-us-west-2-insurance-claims-iceberg-data"
            ]
        },
        {
            "Sid": "GlueAndS3GlobalAccess",
            "Effect": "Allow",
            "Action": [
                "glue:GetTables",
                "glue:GetCatalog",
                "glue:GetTable",
                "s3:ListAllMyBuckets",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts",
                "s3:GetBucketCors",
                "s3:GetBucketVersioning",
                "s3:GetBucketAcl",
                "s3:GetBucketNotification",
                "s3:GetBucketPolicy",
                "lakeformation:GetDataAccess"
            ],
            "Resource": "*"
        },
        {
            "Sid": "S3DataAccess",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:ListAllMyBuckets"
            ],
            "Resource": [
                "arn:aws:s3:::<account-id>-us-west-2-insurance-claims-iceberg-data",
                "arn:aws:s3:::<account-id>-us-west-2-insurance-claims-iceberg-data/*"
            ]
        }
    ]
}

Create an IAM Role
- Navigate to AWS Console → IAM → Roles → Create role
- Trusted entity type: AWS account
- Select Another AWS account (enter a placeholder account ID for now; this will be updated after the Snowflake catalog integration is created)
- Role name: insurance-claims-iceberg-data-role
- Attach the insurance-claims-iceberg-data-policy created above
- Click Create role

Lake Formation Setup

Note: This section uses ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS in your Snowflake catalog integration. Vended credentials allow Snowflake to obtain temporary, scoped credentials from Lake Formation to access S3 data directly.

Enable Lake Formation
- Navigate to AWS Console → Lake Formation → Get started
- Accept the default settings to initialize Lake Formation in your account
Register the S3 Data Lake Location
- Navigate to Lake Formation → Data lake locations → Register location
- S3 path: s3://<account-id>-us-west-2-insurance-claims-iceberg-data/
- IAM role: Select insurance-claims-iceberg-data-role (created in the previous step)
- Permission mode: Select Lake Formation
- Click Register location
Grant Data Permissions to the IAM Role
- Navigate to Lake Formation → Data permissions → Grant
- Principal type: IAM users and roles
- IAM role: insurance-claims-iceberg-data-role
- Database: insurance_claims_iceberg_glue_db
- Table: All tables
- Table permissions: Select, Describe
- Click Grant
Register the S3 Data Location
- Navigate to Lake Formation → Data locations → Grant
- Storage locations: s3://<account-id>-us-west-2-insurance-claims-iceberg-data/
- IAM users and roles: Select insurance-claims-iceberg-data-role (created in the previous step)
- Registered account location should be the AWS Account ID
- Checkmark Grantable
- Click Grant
Verify Lake Formation Permissions
- Navigate to Lake Formation → Data permissions
- Confirm that insurance-claims-iceberg-data-role has all the permissions on all tables in insurance_claims_iceberg_glue_db

Step 2: Connect Snowflake to AWS Glue via Catalog Integration

Create the Catalog Integration

Run the following SQL in Snowflake (via Cortex Code CLI, Snowsight, or SnowSQL). Replace the placeholder values with your AWS account details:

CREATE OR REPLACE CATALOG INTEGRATION glue_db_catalog_integration
  CATALOG_SOURCE = ICEBERG_REST
  TABLE_FORMAT = ICEBERG
  REST_CONFIG = (
    CATALOG_URI = 'https://glue.<region>.amazonaws.com/iceberg'
    CATALOG_API_TYPE = AWS_GLUE
    CATALOG_NAME = '<12_digit_aws_account_id>'
    ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
  )
  REST_AUTHENTICATION = (
    TYPE = SIGV4
    SIGV4_IAM_ROLE = 'arn:aws:iam::<account_id>:role/insurance-claims-iceberg-data-role'
    SIGV4_SIGNING_REGION = '<region>'
  )
  ENABLED = TRUE;

Retrieve Snowflake's IAM User ARN and External ID

Run the following to get the values needed for the AWS trust policy:
```
DESCRIBE CATALOG INTEGRATION glue_db_catalog_integration;
```
Note the values for:
- IAM_USER_ARN — e.g., arn:aws:iam::<aws-account-id>:user/gqx31000-s
- EXTERNAL_ID — e.g., IWB00000_SFCRole=7_Hz7Ebsyt1q+08P9P7XXXXXXX=

Update the AWS IAM Trust Policy

Go back to AWS Console → IAM → Roles → insurance-claims-iceberg-data-role → Trust relationships → Edit trust policy and replace the <SNOWFLAKE_IAM_USER_ARN> and <SNOWFLAKE_EXTERNAL_ID> placeholders with the values from the previous step.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "SnowflakeAccess",
            "Effect": "Allow",
            "Principal": {
                "AWS": "<SNOWFLAKE_IAM_USER_ARN>"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "<SNOWFLAKE_EXTERNAL_ID>"
                }
            }
        },
        {
            "Sid": "GlueAndLakeFormationAccess",
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "lakeformation.amazonaws.com",
                    "glue.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole",
                "sts:SetContext"
            ]
        }
    ]
}

Verify the Integration

SELECT SYSTEM$VERIFY_CATALOG_INTEGRATION('glue_db_catalog_integration');
SELECT SYSTEM$LIST_NAMESPACES_FROM_CATALOG('glue_db_catalog_integration');

The first command should return a success status confirming Snowflake can connect to AWS Glue
The second command should list the insurance_claims_iceberg_glue_db namespace

Step 3: Provision Iceberg Tables & Snowflake Objects (Database, Schema, Stages, Tables)

Clone the git repository into your local machine and navigate to the folder site/sfguides/src/modern-data-architecture-for-ai-powered-insurance-claims-insights/assets/

All SQL commands for this step are in assets/setup.sql.Run the following prompt using Cortex Code CLI (replace the AWS ACCOUNT ID with yours.

Run the setup.sql script by replacing the <account-id> with <01234554634> and access the data to upload into the stage from the data folder in this path

The setup script performs the following sub-steps in order:

Sub-step	What it does
3a	Creates the `INSURANCE_CLAIMS_WH` warehouse, `INSURANCE_CLAIMS_INSIGHTS_DB` database, `CLAIMS_ANALYTICS` schema, and two stages (`DATA_STAGE`, `ML_MODELS`)
3b	Creates the Catalog Linked Database (`glue_database_linked_db`) that connects to AWS Glue via the catalog integration from Step 2
3c	Uploads sample CSV/Parquet data files and the fraud detection model to the Snowflake stages using `PUT` commands
3d	Creates three Iceberg tables (`fnol_reports`, `adjuster_notes`, `initial_estimates`) in the linked database with S3 base locations and auto-refresh enabled
3e	Loads the staged Parquet files into the Iceberg tables using `COPY INTO` with column name matching
3f	Creates two Snowflake native tables (`FRAUD_FLAGS`, `POLICY_DETAILS`) in the `CLAIMS_ANALYTICS` schema
3g	Loads the staged CSV files into the native tables
3h	Runs verification queries to confirm row counts across all five tables

Step 4: Run Feature Engineering with Snowpark Connect for Apache Spark

Create a Notebook in Snowsight
- Navigate to Snowsight → Notebooks → Create Notebook
- Upload the notebook from the cloned repository located in assets folder
- Also, Upload the fraud_model.py script to the notebook environment
Install the Snowpark Connect package
- In the notebook, install the snowpark-connect package from the Packages panel
Run the Feature Engineering notebook
- Run through Insurance-Claims-Feature-Engineering.ipynb notebook from SnowSight UI
- This notebook reads from both Iceberg tables (via Catalog Linked Database) and Native tables, performs feature engineering and fraud scoring, and writes the enriched output to the CLAIMS_PROCESSED_FEATURES table

Step 5: Enable Cortex Analyst, Cortex Agent, and Snowflake Intelligence

Run through the cortex_analyst_snowflake_intelligence_setup.sql script using CoCo CLI with the following prompt:

Run the cortex_analyst_snowflake_intelligence_setup.sql script to create the semantic view by uploading the claims_semantic_view.yaml from assets folder, setup Cortex Analyst, Cortex Agent and enable Snowflake Intelligence

This runs through the following steps:

Upload the semantic model YAML to SEMANTIC_MODEL_STAGE
Create the Semantic View using SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML
Create the Snowflake Intelligence database and schema (SNOWFLAKE_INTELLIGENCE.AGENTS)
Create the Cortex Agent with the cortex_analyst_text_to_sql tool pointing to the semantic view
Grant access permissions to the agent, semantic view, and underlying table

Step 6: Query Claims Data with Snowflake Intelligence

Verify by navigating to Snowsight → AI & ML → Snowflake Intelligence and selecting the "Insurance Claims Analyst" agent
Ask Questions
- "Show me all high risk fraud claims"
- "What is the total claim amount by loss type?"
- "How many claims are under SIU investigation?"

Updated 2026-03-26

This content is provided as is, and is not maintained on an ongoing basis. It may be out of date with current Snowflake instances