Svg Vector Icons : http://www.onlinewebfonts.com/icon More Guides

What is a Data Lakehouse?

Data lakehouse architecture is designed to combine the benefits of data lakes and data warehouses by adding table metadata to files in object storage. This added metadata provides additional features to data lakes including time travel, ACID transactions, better pruning, and schema enforcement, features that are typical in a data warehouse, but are generally lacking in a data lake. However, just like any architecture, an open data lakehouse comes with trade offs. Storing data in an open table format can be greatly beneficial for improved interoperability, but can result in greater overhead in terms of tool version compatibility and upgrades, more challenging FinOps with disparate billing, variable performance, limited concurrency support, and disparate governance controls and auditing across many tools.

DATA LAKE FEATURES

  • Separation of storage and compute

  • Virtually unlimited scale data repository

  • Mixed data types: structured, semi-structured and unstructured

  • Choice of languages for processing (but not always SQL)

  • Process data in-place

  • Direct access to rawsource data

DATA WAREHOUSE FEATURES

  • Strong data governance, access to data only through the platform

  • High performance & concurrency support

  • No need to inventory or ingest data

  • ACID transactions

  • Direct access to curated data

  • Version history, time travel

Both data lakes and data warehouses are big data repositories. The difference between data lake vs. data warehouse lies in how they handle compute and storage. Snowflake's Data Cloud can be used to build and adapt to various architecture patterns that align with needs of various use cases. Snowflake offers customers the ability to ingest data to a managed repository, in what’s commonly referred to as a data warehouse architecture, but also gives customers the ability to read and write data in cloud object storage, functioning as a data lake query engine. Regardless of the pattern, Snowflake adheres to core tenets of strong security, governance, performance, and simplicity.

DATA LAKEHOUSE FEATURES

In addition to the features above, Snowflake also provides the following features for a data lakehouse pattern:

  • Fully managed table format

  • Apache Iceberg table format

  • Polyglot, multi-cluster compute engine

  • Cost-effective performance for high concurrency

SNOWFLAKE DATA CLOUD

A data platform is not restricted to a single architectural pattern. Instead, it should have many architecture patterns for many functions and workloads, including:  

A flexible platform like Snowflake allows you to use traditional business intelligence tools and newer, more advanced technologies devoted to artificial intelligence, machine learning, data science, and applications. It’s a single platform that can be used to  power multiple types of workloads. 

See Snowflake’s capabilities for yourself. To give it a test drive, sign up for a free trial