Snowflake as Your Modern Data Lake, or even Data Ocean

Author: Rob Smoot

Snowflake Technology

Nearly 10 years have passed since the term “data lake” first emerged. The concept was compelling – a single repository for all your raw unstructured, semi-structured, and structured data. Many organizations rushed to build their data lakes since building was the only practical option. Unfortunately, delivering data-driven insights from all that raw data has proved to be difficult. Platform teams tried a number of hopeful remedies that ultimately failed to deliver. Fast-forward to today. The demand for a data lake remains strong, although many projects are simply not providing the promised value. The approach must change to deliver on that original, albeit elusive, benefit of democratizing data analytics and efficiently deriving maximum value from a ton of data.

Your Modern Data Lake in Snowflake

Snowflake’s unique, cloud-built, multi-cluster shared data architecture makes the dream of the modern data lake a reality. We see a large number of our more than 2,000 customers implement Snowflake as the single source of their analytics data.  

Longtime Snowflake customer and web properties owner IAC Publishing Labs loads all of its semi-structured web data into Snowflake, without modeling, and manipulates that data later. Snowflake also enables organizations to easily collect and combine data from multiple sources. Online retailers such as Rue La La attain more complete views of its customers with Snowflake to create relevant, timely and consistent campaigns and offers to address specific customer needs. Other customers, such as Square and Sony, use Snowflake for their data lake. Both companies continue to find ways to derive value from data, far beyond the limitations of their previous platforms.

The data lake use case for Snowflake is possible thanks to the vision of our founders to create a platform that enables native (no transformation required) data loading and analytics on a mixture of data formats. And since Snowflake is cloud-built, it scales instantly and infinitely to handle any amount of data and compute.  

What makes Snowflake ideal for your modern, cloud-built data lake?

  • Your data – no silos – Easily and natively ingest petabytes of structured and semi-structured data (JSON, CSV, tables, Parquet, ORC, etc.) within the same platform.
  • Instant elasticity – You can supply any amount of compute resources, within Snowflake sizes, to any user or workload. Dynamically change the size of a compute engine without impacting running queries, or set up Snowflake compute engines to scale out automatically during periods of heavy concurrency.
  • All your users – You can deploy an unlimited number of users and as many workloads as you want on a single copy of your data without impacting performance.
  • Governance – Prevent your data lake from becoming a data swamp with the structure and control you expect from a modern cloud data lake.
  • Data storage at cost – You only pay the baseline price of Snowflake’s cloud storage providers – AWS S3, Microsoft Azure, and Google Cloud Platform (GCP). And only pay for compute when you are loading or querying data.
  • Transactional Consistency – Move around and combine data with confidence. Snowflake assures consistency for multi-statement transactions and with cross-database joins.
  • Managed service – With built-in provisioning, data protection, security, performance tuning and more, you can focus on high-value endeavors.

Snowflake provides the convenience, unlimited storage capacity, cloud-scaling and low-cost storage pricing you need for a data lake, along with the control, security, and performance you require for a data warehouse. Snowflake isn’t a cloud data warehouse designed with yester-year’s on-premises technology.

But since the concept of the data lake is nearly 10 years old, you may be thinking more globally. Data spans multiple business units, multiple ecosystems, regions, countries and multiple cloud providers. How will you break these barriers and manage a single environment?  

Global Snowflake Turns Your Data Lake into A Data Ocean

Do you want a data lake strategy that includes your data no matter where it resides? Snowflake’s most recent innovations, including Snowflake Database Replication (Q3 2019), enables Snowflake customers to replicate databases and keep them synchronized across multiple accounts in different regions and/or cloud providers. This ensures business continuity in case of a regional or cloud-provider outage. It also allows the portability customers require if they want to move to a different region or cloud. Another benefit is the ability to easily and securely share data across regions and clouds.

Data lake strategies of today fail because of immense complexity and expense, the lack of security and governance, and the creation of data silos. With Snowflake’s long term global strategy and with a single environment, you not only gain better control, but you expand the reach of your data lake around the world, turning your data lake into a data ocean. As we look out into the future, this is the direction and the new reality of many Snowflake customers.

Local or Global, the Dream of a Cloud Data Lake is Now Reality

When the idea of the data lake first emerged, it caused as much hope as it did headache. That hope hasn’t abated. Today’s demand for a modern cloud data lake is more prevalent than ever. But that demand must be addressed. And Snowflake as a data lake is now a reality thanks to our technology and a growing number of our customers who chose us for that very use case. Many of them, and our prospects, are considering a data ocean based on Snowflake’s inherent and most recent product features that enable the data lake, and the data ocean.

Sign up for a 30-day free Snowflake trial today. You can also reach out to us to get connected with other Snowflake customers and discover how Snowflake can help change the way approach your data lake strategy.

Related Links