gradient lead space image

Pricing Transparency

File Ingestion Natively

on Snowflake

Demonstration of automated ingestion of pricing transparency data from various hospital systems natively into Snowflake (without the need of external tools and processes).

CMS Pricing

Transparency

Federal regulations in the U.S. would require group health plans and health insurance issuers in the individual and group markets to disclose cost-sharing information to participants, beneficiaries, or enrollees (or their authorized representatives) upon request. As proposed, the regulations would also require these entities to provide  an estimate of such individual’s cost-sharing liability for covered items or services furnished by a particular provider.

Solution Overview

Use Case

Proposed requirements from the Centers for Medicare & Medicaid Services would require group health plans and health insurance issuers in the individual and group markets to disclose cost-sharing information to be hosted on an HTTPS website and should be in one of the approved formats (JSON/XML/YAML). Health plan providers must update these files once a month. These pricing files are often large and distributed as compressed JSON files, which can be challenging to process.

Solution

Using Snowpark, read large pricing files dynamically in chunks, process them, and ingest the data into Snowflake without having to upload the entire file into Snowflake’s internal stage or load it into memory for processing.

Solution Architecture

The pricing transparency JSON file is hosted in a cloud storage bucket and is referenced through an external stage on Snowflake. A custom Snowpark Python stored procedure is used to build a directed acyclic graph (DAG) of interconnected tasks that are executed in parallel. Each of these tasks (Snowpark Python code) reads the specific segments of the JSON file using the dynamic file access capability of Snowflake; it reads data from large files in a streaming fashion without loading the entire file into memory. These specific segments are then stored as smaller parquet files on cloud storage, referenced as external tables on Snowflake, and combined to result in structured data entities that hold pricing transparency data for analytics purposes. Each task records audit information indicating success and failures (if any) for traceability.

Technical Detail

Automate ingestion of pricing transparency data from various hospital systems natively into Snowflake (without the need of external tools or processes).

Note: Dynamic File Access is currently in private preview

Benefits

Simplified ingestion of complex pricing transparency data files

Scalability and Performance

Snowflake’s scalable platform with Snowpark capability enables simplified ingestion and parsing of very large JSON files format in which healthcare pricing transparency data is submitted by payers and providers.

Transparency of Healthcare Costs

Leverage healthcare pricing transparency data to gain competitive pricing insights and provide pricing transparency to patients, design better provider networks, and improve patient experience.