At Snowflake, we are relentlessly focused on our customers and on creating innovative technology to better serve their needs. Today marks another milestone where we demonstrate such focus. In this blog, I detail the latest innovations to our cloud data platform. These new features make even more powerful all six data workloads enabled by our platform – data warehouse, data lake, data exchange, data applications, data engineering, and data science.
We’ve grouped these new features into three areas: the core of our platform; data pipelines and extensibility; and making it even easier for you to access the data you need to make data-driven decisions. Here are the latest innovations for 2020 we announced at our Say Hello to the Data Cloud launch event:
CORE PLATFORM FEATURES
Let’s start with the latest enhancements to Snowsight, our analyst experience. Snowsight enables greater productivity through a new schema browser that helps you explore your data, auto-completion for SQL queries, and filters that make it easier to iterate with different values for queries.
Snowsight also makes it possible for your entire organization to collaborate. You can share queries and dashboards with other members of your organization, while maintaining control with granular permissions. We also offer rich visualizations embedded within Snowflake to move you faster from data to insights. Snowsight is now in public preview rolling out to the various cloud regions for our customers.
Transparent usage of materialized views
We continuously invest in improving performance throughout Snowflake. Today, we’re announcing transparent usage of materialized views. With this enhancement, if a user queries an underlying table and there’s a materialized view that could speed up the results, Snowflake will automatically use that materialized view. This means that performance accelerates without having to ask users and analysts to change their data models or queries. This is in private preview right now and will be in public preview in the next couple of months.
New and larger sizes of compute clusters
We’ve also added two new sizes of compute clusters. Snowflake can quickly spin dedicated clusters up and down as you see fit or as your workloads demand, delivering massive concurrency. But occasionally we run into very large jobs that need even more compute horsepower to run faster. We’re announcing today the introduction of our new and largest compute clusters – 5XL and 6XL. As with all of our existing compute cluster sizes, each new size doubles the compute capacity of the previous instance. These new compute clusters will be in private preview in the next several weeks.
Search optimization service
At Snowflake we pride ourselves on giving you an easy-to-use service that doesn’t require tuning to achieve performance, so you can focus on getting value out of your data. Today, when you perform a lookup on a column that is the clustering key for a table, Snowflake is able to reduce the amount of data scanned to deliver great performance. However, there are use cases where you’ll want to search for specific values on columns that are not clustered, and that would require Snowflake to do a full table scan. For these use cases, we’re announcing our new search optimization service you can enable on a table-by-table basis. This service is able to pre-compute information about the table to arbitrarily accelerate lookup queries on any column. This service is now available in public preview.
SQL stored procedures
Many of our customers already store geospatial data in our variant columns. Today, we’re introducing a new data type that uses a round-earth coordinate system to store geospatial data. Unlike other solutions, performance tuning requires no knobs or spatial indexes. You simply create the column with the spatial data type, load the data, and Snowflake takes care of the performance. This feature is now available in public preview.
Dynamic data masking
With this new capability, you can create a policy for a column to limit the visibility of data. Depending on the role executing the query, the data can be returned unredacted, partially redacted, or fully redacted. We’re also extending this capability by offering integration with external tokenization services, starting with Protegrity integration. This capability is currently in private preview and will go into public preview in the next few weeks.
EXTENSIBLE DATA PIPELINES
Snowflake now lets you specify a partitioning expression as part of a COPY INTO <location> operation, which will be used to determine the file structure on cloud storage. This makes the data easier to consume by tools outside of Snowflake, and it also yields performance improvements. This enhancement will enter public preview soon, making Snowflake an even better solution for transformations of data that start and finish in a cloud-based data lake.
Today we are introducing the concept of external functions, where the definition of the function lives inside Snowflake but the implementation of the function sits behind a REST endpoint outside of Snowflake. For example, it could be a cloud service that does scoring or prediction on a machine learning model. From within Snowflake, it looks like a function you can use in your queries or your data transformations. This functionality is available starting today in public preview.
Many of our customers leverage Snowflake for building SQL-based pipelines and data transformations. However, some users have told us they’d prefer to express their pipelines and their transformations in other languages, such as Java or Python. Today, we’re introducing the ability to create user-defined functions in Snowflake with different programming languages. We will start with JAVA functions and we’ll expand to other languages over time. We are also working on enabling some of the most common programming models for data analytics and transformation to operate seamlessly inside Snowflake. For example, we’re looking at enabling the DataFrame programming model, which is popular with Python and Spark users. We’re bringing such programming models to natively run inside Snowflake and leverage the full power of our engine. These extensibility capabilities will be coming to Snowflake in the next few months.
DATA CLOUD CONTENT
We share many joint customers with Salesforce and Tableau, and this partnership will enable us to create a better experience for all of you. We want to make it easier to join your Salesforce data with your data in Snowflake, and use those new insights to power business decisions. Already in open beta, customers of Einstein Analytics can now use a new feature: Einstein Analytics Direct Data for Snowflake. Instead of having to copy data from Snowflake into Einstein analytics, you can query Snowflake directly to get insights from live data. This feature is currently in open beta and immediately available for any Einstein Analytics customer.
We are also announcing today the Einstein Analytics Output Connector for Snowflake. Customers currently use a variety of technologies to import data from Salesforce into Snowflake. Our common vision with Salesforce is to make that process as seamless and as friction-free as possible. With simple configuration steps, you will be able to synchronize Salesforce data to Snowflake, and leverage Salesforce objects as well as curated Einstein Analytics data sets to augment your data workloads in Snowflake. The output connector will be available for customers later this year.
A year ago, we introduced the notion of a data exchange where we help Snowflake customers connect with each other as data providers and data consumers, powered by our underlying Secure Data Sharing capability. What’s unique about our technology is that data sharing is defined in terms of access control, not data movement. Snowflake customers were excited by the opportunity to create their own data exchanges with a select set of partners or customers, and we are thrilled to now offer this capability more broadly in public preview.
Snowflake Data Marketplace
Momentum continues to build with our Snowflake-operated data exchange, which we now call Snowflake Data Marketplace. It’s an incredible opportunity to expand the reach of your data should you choose to make it available to others, while eliminating the cost, headache and risk of traditional data sharing methods. For consumers, data is easily discoverable and usable, enabling richer analytics and data transformations with minimal friction.
One seamless, integrated platform
This spans the major product enhancements we announced at our Say Hello to the Data Cloud launch event. I hope you are as excited as I am about all the new capabilities now available and coming soon to Snowflake. I can’t emphasize enough that the Snowflake Cloud Data Platform is an integrated product, an integrated experience, and a consistent experience across regions and clouds. For our platform in general, and for our new capabilities announced today, we want to make your experience of using and managing Snowflake as simple as possible, so you can focus on getting value out of your data and not managing infrastructure or understanding how different technology components fit together.
Click here to hear more about the rise of the Data Cloud from Snowflake CEO Frank Slootman. Click here to learn more about the global strategy and features of Snowflake from Snowflake’s Co-founder and President of Product, Benoit Dageville.