Don’t Ignore ACID-Compliant Data Processing in the Cloud

Author: Michael Nixon

Engineering

When moving workloads to the cloud, you should expect high performance and robust security. You should also have high expectations for scalability, workload concurrency, elasticity, ease-of-use, usage-based pricing and resilience. These capabilities and consumption model are more likely available and tend to execute much better in the cloud compared to on-premises implementations.

However, there’s one area you want to take special care to at least match your current capabilities once you migrate your enterprise data warehouse and analytic workloads to the cloud. And that’s ACID-compliant data processing.

What is ACID?

The term and acronym ACID describes a set of processing capabilities (Atomic, Consistent, Isolated and Durable) that ensure a database management system (DBMS) will make changes to data in a reliable, high-integrity way. A DBMS that is ACID-compliant guarantees that all executed and committed transactions, changes and/or updates to data are:

  • Atomic – As required by a transaction, all statements within the transaction are executed in their entirety or not at all. In database terms, a transaction can be a single statement or operation on data, or a logical group of multiple statements and operations. If the DBMS cannot complete the transaction, successfully execute all statements and operations, but some changes have completed, then the DBMS will automatically and completely roll-back the database to its prior state.
  • Consistent – Every transaction to the database will adhere to defined and declared system integrity constraints. This includes requiring all committed changes to data (i.e., data writes) will be up-to-date and immediately available by all subsequent reads of the data.
  • Isolated – Results of concurrent (simultaneous) transactions, whether to the same data or table or to multiple data sets or tables, are independent of one another. In other words, the results of concurrent processing would be the same as if each of the transactions were to execute in a sequential manner, not simultaneously.
  • Durable – Changes to data, once committed, will remain in place and will survive failures to the data management system and platform, even if the hardware failure occurs just as the DBMS is implementing committed transactions.

Data warehousing workloads benefit from ACID-compliant data processing

Often, in the context of data warehouse data processing, it’s argued that transactional ACID-compliant processing is “less relevant” in the data warehouse space and is more appropriate for classic online transaction processing (OLTP). Thus, the argument goes, requirements can be relaxed to more “eventual consistency”, rather than ACID, for data warehouse and analytic workloads.

Snowflake believes differently. While it can be further argued that OLTP processing, such as banking or financial transactions, may have more at stake should transactions lack high integrity, it does not mean that online analytical processing (OLAP) or operational data warehousing in the cloud are not deserving of the highest levels of integrity.

For example, an OLAP application that analyzes the performance of a website will typically create dashboards that reflect customers’ engagement and interaction with a company. If, for this company, the primary source of transactions with customers is from the web, then the integrity of the data captured and subsequent analytics (including dashboards) are critically important to this company.

Not all cloud data warehouses are the same

The issue is that for some cloud data warehousing platforms, without assurance of ACID compliance, there’s risk of data inconsistencies (See Figure 1.).

 

Cloud Architecture
Figure 1 – Range of ACID Capabilities Based on Cloud Architecture Type

 

A cloud data warehouse infrastructure that based on a collection of separate data warehouse clusters has this highest risk. This is because each cluster is independent. Data may have been copied from one cluster to another, but there is no coordination between the cluster. Therefore, data versioning can occur and there’s no single authority for the data from a system wide perspective.

Providing the highest level of ACID capabilities will be a cloud architecture that behaves like a single, integrate data storage system although there may be separate compute engines. With system wide integration and metadata management, data changes can be tracked, databases can be rolled back in the event of transaction failure, and consistency between compute engines can be assured nearly instantaneously.

In the middle are loosely coupled clusters that may have some level of data syncing between nodes of the cluster and central storage. ACID risks will be function of how swiftly data can be synchronized, before read attempts occur.

Maintain ACID processing when migrating analytic workloads to the cloud

In summary, ACID-compliant processing guarantees the highest level of integrity for all transactions on a database and prevents the dirty reads that can lead to invalid results. If you have ACID-compliant processing now with your current on-premises data warehouse, you certainly want to maintain this level of capability as you deploy your data warehouse workloads on a cloud data warehouse platform.

Or, if you’re already in the cloud and desiring ACID capabilities, then consider a single integrated data warehouse solution approach that has been built for the cloud.

Start Your 30-Day Free Trial. Receive $400 of credits to try all Snowflake features.