Events

Snowflake Data Pipelines

Snowflake Data Pipelines

Businesses work with massive amounts of data today, and in order to analyze all of that data they need a single view into the entire data set. The challenge is that data resides in multiple systems and services, yet it needs to be combined in ways that make sense for deep analysis. Data flow itself can be especially unreliable because there are many points during the transport from one system to another where corruption can happen or bottlenecks (ultimately resulting in latency) can occur. As the breadth and scope of the role data plays increases, the problems only get magnified in scale and impact.

That is why data pipelines are critical. They eliminate many manual steps from the process, enabling a smooth, automated flow of data from one step to another. Data pipelines are important for real-time analytics to help organizations make faster, data-driven decisions. They’re particularly important for organizations that:

  • Rely on real-time data analysis
  • Store data in the cloud
  • House data in multiple sources

To further augment Snowflake’s focus on data pipelines, we released a public preview of the Auto-Ingest, Streams and Tasks, and Snowflake Connector for Kafka features to provide customers continuous, automated, and cost-effective services to load data efficiently and without any manual effort.

New Enhancements to Snowflake Data Pipelines:

  • Auto-Ingest
    AWS and Azure provide notification mechanisms to notify users whenever an object is created. Auto-Ingest is using these mechanisms and layering them over the ingest service so the ingest service can automatically detect and retrieve files created under a stage and ingest them into their appropriate tables. This is important because it reduces latency for queries by ingesting and transforming data as it arrives.
  • Streams and Tasks
    The Streams and Tasks feature is fundamental to building end-to-end data pipelines and orchestration in Snowflake. While customers can use Snowpipe or their ELT provider of choice, that approach is limited to just loading data into Snowflake. Streams and Tasks aims to provide a task scheduling mechanism so customers no longer have to resort to external jobs for their most common scheduling needs for Snowflake SQL jobs. The feature also enables customers to connect their staging tables and downstream target tables with regularly processed logic that picks up new data from the staging table and transforms it into the shape required for the target table.
  • Snowflake Connector for Kafka
    Apache Kafka is a platform for building pipelines to handle continuous streams of records, and this connector makes it fast and easy to reliably publish these records to your Snowflake instance for storage and analysis.

The Snowflake Connector for Kafka is available via the Maven Central Repository. After you install the connector to a Kafka Connect cluster, instances of the connector can be instantiated via a simple JSON configuration or via the Confluent Control Center. After you configure the connector for a set of topics, it creates and manages stages, pipes, and files on the user’s behalf to reliably ingest messages into Snowflake tables.

There is no additional charge for the use of the Snowflake Connector for Kafka, which is freely available under an Apache 2.0 license. The connector makes use of tables, stages, files, and pipes, which are all charged at normal rates.

When we made this set of features available to a select few customers for private preview, those customers saw tremendous benefit. Leaders at Blackboard, Inc. said, Snowpipe and Streams and Tasks enabled us to build an ingestion platform for most of our data pipelines hydrating our data lake. These pipelines serve over a thousand clients/sites with hundreds of tables per site and growing, resulting in a significant reduction of our infrastructure management and costs, and a streamlined architecture with less complexity and handoff points.

How can you get started?

If you have files regularly created in a blob store such as Amazon S3 or Microsoft Azure Blob Store, you can create a Snowpipe with Auto-Ingest option and specify the appropriate prefix for files you want Snowpipe to ingest. Once the pipe is created, you can configure corresponding blob creation notifications for Amazon SQS or for Microsoft Azure Event Grid to go to the pipe. Once you connect the notifications, the pipe will start automatically ingesting newly created files according to the specified prefix.

Table Streams can be used independently of Snowpipe. You can create a stream any time on any table and start consuming it using tasks or any other scheduled activity. A stream can be used just like a view - in DML and query statements.

Tasks can also be used independently of Snowpipe and Table Streams. You can specify a schedule of one minute or longer, resume the task and it will start running on schedule. Streams and tasks can also be used together for continuous data pipelines that run periodically based on changes in a table.

Read more about Snowflake Data Pipelines.

Share Article

Snowflake Dynamic Tables for Continuous Data Pipelines | Blog

Learn about our new Snowflake Dynamic Tables: a new table type that simplifies continuous data pipelines for transforming both batch and streaming data.

Snowflake Dynamic Tables and Declarative Streaming Data Pipelines

Dynamic Tables automate incremental data refresh with low latency using easy-to-use declarative streaming data pipelines to simplify data engineering workloads.

Build Better Data Pipelines with SQL and Python in Snowflake

See the new features to help data engineers build and orchestrate scalable data pipelines with SQL and Python—simplifying workflows and boosting agility.

Operationalizing Data Pipelines With Snowpark Stored Procedures

Snowpark Scala stored procedures are now available to all customers: a feature that lets you simplify your pipelines by hosting them right inside Snowflake.

Patch helps devs unblock pipelines with data packages

Discover Patch.tech in Snowflake's Startup Spotlight, solving data pipeline challenges, and driving innovation with data packages.

Reimagine Batch and Streaming Data Pipelines With Dynamic Tables

Explore Snowflake's Dynamic Tables. Simplify your data pipelines for batch and streaming data with enhanced performance, scalability, and easier orchestration.

Easy Continuous Data Pipelines with GA of Streams and Tasks

How to Configure AWS Glue with Snowflake - Snowflake blog

AWS Glue provides a fully managed environment that integrates easily with Snowflakes to manage data ingestion and transformation pipelines with ease.

Masking Semi-Structured Data with Snowflake

Snowflake recently launched dynamic data masking, an incredibly useful feature for security-minded companies and data-centric organizations.

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime