We are excited to announce a major update to the Search Optimization Service, which will accelerate both point lookups and analytical queries while at the same time reducing cost for our customers.
All features rolled out in public preview in October 2022 are now generally available, including the ability to select specific columns for search optimization; faster searches in VARIANT, ARRAY, OBJECT, and GEOGRAPHY data types; and faster substring searches in string columns.
Deliveroo cut Search Optimization Service costs by 90%
At a high level, Search Optimization Service works by creating an optimized data structure, called the search access path, and then uses this path to prune, or eliminate, the micro-partitions that do not contain the value you are searching for from the scan set. The fewer micro-partitions Snowflake has to scan, the faster your queries are.
At Deliveroo, an international online food delivery company, users needed to quickly search for customer emails and addresses in large tables of data in order to remove them in response to GDPR data removal requests. This is a perfect use case for the Search Optimization Service. But search-optimizing an entire table can be quite costly. This is why we introduced the ON clause that allows Snowflake users to specify certain columns that should be search-optimized. David Elliot, Staff Analytics Engineer at Deliveroo, shared that “the feature is amazing and super flexible. It cut our Search Optimization Service cost by 90%!”
While customers can apply the Search Optimization Service to the few specific columns frequently used for filtering data, there’s also the option to automatically apply the Search Optimization Service to any column added to the table.
Customers who prefer the convenience of optimizing the whole table, even when new columns are added, will still have access to our default command ALTER TABLE .. ADD SEARCH OPTIMIZATION (without the ON clause). In this case, in order to keep your costs down, we will optimize columns only for the EQUALITY search method and only if they are of the supported data type (fixed-point numeric, varchar, and date and time).
Panther Labs improves point lookup queries by 100x for VARIANT data types
The new features in the Search Optimization Service enable our customers to easily search in logs and semi-structured data stored in VARIANT, ARRAY, and OBJECT columns. Snowflake’s cybersecurity partner, Panther Labs, leverages Search Optimization Service for needle-in-a-haystack-type queries with VARIANT data types. According to Russell Leighton, a Principal Engineer at Panther, “Quickly querying for security indicators like IP addresses and file hashes is very common in a SIEM (security information and event management system). Panther has seen these point lookup queries improve by factors of over 100 times. By allowing per column or nested-attribute indexing, our customers can target common queries, which can reduce compute cost significantly [because Snowflake bills by execution time].”
Speeding up GEOGRAPHY search predicates
The Search Optimization Service also enables customers to accelerate queries that utilize filters on GEOGRAPHY columns such as ST_INTERSECTS, ST_CONTAINS, and ST_DWITHIN. A Snowflake customer in the Location Intelligence field was able to reduce query latency using the ST_DWITHIN predicate by up to 26x. We plotted the latency of their queries on the chart below. The blue dots and their values on the y-axis indicate the query execution time with Search Optimization, while the yellow dots and their x-axis values correspond to the same query’s execution time without Search Optimization Service. The execution time of queries that would otherwise take about 30 or 40 seconds take 1 or 2 seconds with Search Optimization.
Join hundreds of Snowflake customers who are already benefiting from faster queries and lower costs with the Search Optimization Service feature. Specifically, consider using our new, flexible syntax for selecting specific columns for optimization along with semi-structured data types, in the Geography data type, and while doing wildcard/pattern searches in string data types.
For the latest feature announcements and more, subscribe to the Inside the Data Cloud Blog.