5 Must-Have Features for High Concurrency Environments
Author: Michael Nixon
Market News, Snowflake Technology
Whether you’re new to cloud data warehousing or comparing multiple cloud data warehouse technologies, it’s critical to assess whether your data warehouse environment will need to support concurrent processing of any sort. Unless you’re a lone database shop, in all likelihood, the answer is yes you will. Concurrency, or concurrent data processing, is simultaneous access and/or manipulation of the same data. This is not to be confused with parallel processing, which is multiple operations happening at the same time, but not against the same data.
Concurrency can take the form of multiple users interactively exploring and manipulating a particular data set, concurrent applications querying and visualizing the same data set, transactional updates to the data, or even concurrent loading of new data or change data into the data set. And if you thought to yourself, “What? I can concurrently load new data, while supporting queries on the same data set?,” then keep reading.
If you require concurrency, at any capacity — you will want these five cloud data warehouse features:
- High relational performance across a broad range of data types: Of course, you want high performance–that’s a given. However, the more challenging aspect to plan for is fast queries on a broad range of data types, including semi-structured/JSON data. You don’t want your relational data warehouse to bog down and hold up corporate users because now it must handle non-traditional data from groups like the web team or product engineering. JSON is the new normal.
- Automatic and instant warehouse scaling: Let’s say your warehouse can handle a high number of concurrent accesses, but it bogs down during a period of high demand. Now what? This is a very important question to have answered as you compare technologies. Will you have to kick-off users? Will you have to schedule after-hour jobs? Will you have to add nodes? Will this require you to redistribute data? If so, redistributing data takes time and it’s a double-whammy because the existing data in the warehouse has to be unloaded. This is a huge disruption to your users. Instead, you want the ability to load balance across new virtual warehouses (compute engines) and have your cloud data warehouse continue to execute at fast speeds–including loading new data–all against the same data set. The more automatic this load balancing happens, the better.
- ACID compliance: Right up there with poor performance is inaccurate results or reporting due to inconsistent data or dirty reads. With more people, workgroups, or applications accessing the same data simultaneously, the more pressure there is to maintain data consistency. A data warehouse with ACID compliance ensures consistency and data integrity are validated without having to write scripts or manually managing data integrity and consistency yourself.
- Multi-statement transaction support: Tied to ACID compliance is multi-statement transaction support. If you have users that nest multiple transactions within a single query, you want a warehouse solution that you can trust will completely execute the transactions with integrity or will automatically rollback transactions should another transaction in the line fail.
- Data sharing: Data integrity and freshness should not stop just because you have to share data with an external stakeholder. Traditional approaches require engaging in an ETL process or spending time manually deconstructing, securing and transferring data. Data consumers on the receiving end must also spend time to reverse your steps and reconstruct data. This process is slow and manual, and does not ensure your data consumers are always working with live and up to date data. Data sharing allows you to eliminate all of this effort and ensure access to live data.
The business value for proactively planning for concurrency is that you want to ensure your cloud data warehouse can support your environment, regardless of what it throws at you. Especially during times of sudden, unpredictable, heavy query loads against a common data set.
Try Snowflake for free. Sign up and receive $400 US dollars worth of free usage. You can create a sandbox or launch a production implementation from the same Snowflake environment.