The Data Cloud was featured at February’s edition of the Digital-Life-Design conference, a series of interdisciplinary gatherings being held virtually in 2021. Snowflake co-founder and President of Products Benoit Dageville provided insights on the new possibilities offered by cloud-based analytics for an audience of business leaders and influencers.
Dageville joined Christian Teichmann of Burda Principal Investments on the virtual stage for a session titled Analytics & Machine Learning—The Magic of the Cloud. Throughout the session, Dageville offered examples and comparisons to illustrate how data and the cloud can help solve big challenges in business and society at large.
The Evolution of Data Analytics
For the uninitiated, analytics terminology can be a bit confusing. Dageville started the session with a quick and clear walkthrough of how large-scale data usage has evolved, starting with data warehouses in the mid-80s. The data warehouse made it possible to consolidate, summarize, and analyze structured data such as transaction records.
Then in the late 2000s, Dageville said, data lakes emerged thanks to pioneers like Hadoop. Data lakes can contain higher volumes of semi-structured data with more complex relationships. As an example, he cited purchases on Amazon. “You want to analyze not only what you bought, but also, for example, all the clicks you performed—analyzing your experience,” he said. Data warehouses weren’t able to deal with the petabyte scale and high velocity of this data, so “data lakes were created to answer that,” he said.
However, data lakes were complex to use and had to compromise on some things that data warehouses did very well, such as efficiently storing transactional data. As a result, companies needed to maintain both kinds of systems.
The Data Cloud represents the third, “and final, I hope,” evolutionary stage, according to Dageville. The Data Cloud enables organizations to efficiently run both data warehouses and data lakes on a single platform.
“You don’t really want to silo your data and have to deal with many different systems,” Dageville said.
Why the Cloud
Dageville explained, in layman’s terms, three transformational aspects of the cloud for analytics: scale, elasticity, and collaboration.
Cloud scale means essentially “unlimited access to resources in both compute and storage,” Dageville noted. This power means organizations can handle unprecedented data volume and velocity.
Elasticity refers to the ease of applying that massive storage and computing power whenever it is needed—without leaving that same capacity idle when it’s not needed.
“You can provision a large amount of resources and analyze your data, do this analysis really quickly because you have a lot of resources, and when you’re done, you can return these compute resources so you don’t pay for them,” said Dageville. This elasticity provides power, speed, and efficiency.
To illustrate, he pointed to the radical changes affected by COVID-19 pandemic and lockdowns.
“Companies have to adapt very quickly. Zoom had to expand dramatically. Others, like travel-based companies, had to contract, and if you need to contract, you’re going to pay for less, immediately. That’s elasticity again of the cloud,” he said.
By collaboration, Dageville referred to not just the ability for multiple users to access the same data, but also the ability to securely share data sets across traditional organizational boundaries. As a simple comparison, he mentioned Google Docs, which lets people from different companies work simultaneously in the same document.
“This collaborative aspect of the cloud is going to really transform data and data analytics,” he said.
AI and Machine Learning
Teichmann asked several questions about the role of the cloud in machine learning and AI applications specifically. Early in the conversation, he raised a previous speaker’s assertion that AI will automate many jobs currently performed by people.
“I don’t believe too much the pessimistic idea that machines are going to replace human beings,” Dageville replied. “The way I see it is that machines are really going to expand what we can understand from this data, by automating many things that can’t be done by a human being. When you have petabyte-scale of data, you cannot analyze it as a human.”
Dageville cited examples from several different industries, including Blackrock, identifying new ways to invest in companies; Anthem analyzing the results of preventive care programs; and the broad healthcare community working collaboratively to analyze vaccination needs and effects.
He mentioned the current public health challenges as an illustration. From vaccine creation to distribution, the COVID-19 pandemic has generated a huge amount of data that many different parties need to work with.
“The data needs to be consumed by many actors. And the machine learning aspect of it is going to be very powerful,” Dageville said.
“You’re looking at connections between these data that are really impossible to find for human beings. And the machine can find these connections.”