What is a Data Lake and How to Build One with a Cloud Data Warehouse
What is a Data Lake?
A data lake is a repository for structured, unstructured, and semi-structured data. The main purpose of data lakes is to provide full and direct access to raw (unfiltered) organizational data as an alternative to storing varying and sometimes limited data sets in scattered, disparate data silos.
Rethink your Data Lake with a Cloud Data Warehouse
Over the last 10 years, the notion has been that to quickly and cost-effectively gain insights from a variety of data sources, you need a Hadoop platform. The proposition with Hadoop-based data processing is having a single repository (a data lake) with the flexibility, capacity and performance to store and analyze an array of data types.
In reality, analyzing data with an Hadoop-based platform is not simple. Hadoop platforms start you with an HDFS file system, or equivalent. You then need to string together about a half-dozen software packages just to provide basic enterprise-level functionality. Functionality such as provisioning, security, system management, data protection, database management and the necessary interface to explore and query data.
Compared with implementing and managing Hadoop (a traditional on-premises data warehouse), data warehousing built for the cloud can deliver a multitude of unique benefits.
The Simpler Alternative
Snowflake, which is built for the cloud and delivered as a service, provides you with a different option for handling JSON and semi-structured data. Together with a cloud-built data warehouse, a data lake can offer a wealth of insight with very little overhead.
Snowflake allows users to securely and cost-effectively store any volume of data, process semi-structured and structured data together. Using a standard SQL interface makes it easier to efficiently discover value hidden within the data lake, and quickly deliver data-driven insights to all your business users. It offers the benefits that organizations are seeking in data lake projects, but without sacrificing ease of use and fast analytics.