Cloud Analytics: Sharing Information, Insights and Innovations
Author: Jon Bock
While it’s been a busy summer here at Snowflake, we are now picking up even more momentum as we head into the fall. In addition to showcasing our cloud data warehouse at industry conferences such as the upcoming Strata + Hadoop World conference in New York, our team will be hitting the road for our fall Cloud Analytics City Tour, kicking off next week in New York and Boston. (You can see the full fall schedule and register for a city near you here.)
To provide some background, this past spring Snowflake invited data professionals to participate in cloud analytic symposiums held in Chicago and Los Angeles. Attendees included a broad array of people interested in innovation in analytics. Our goal was to create a forum where a diverse set of cloud and data analytics professionals could discuss how and why the cloud is playing an increasingly prominent role in analytics and share experiences and recommendations for managing and using data in the cloud.
The value-added knowledge that we saw exchanged at the event and the feedback we received from attendees demonstrated the value of a forum where data professionals could learn and share information and ideas. Speakers shared insights about how the cloud is not simply about efficiencies that save time and budget (one of the first things people think of when it comes to cloud solutions in general), but about how cloud is creating new opportunities for data users to experiment, innovate and to make exponential progress in their work. We saw speakers talk about how cloud is causing a sea change in the status quo, creating new opportunities for a broader array of data users to experiment, innovate and make exponential progress in putting data to work.
The Creative Destruction Cycle in Analytics
Dean Abbott, co-founder and Chief Data Scientist of SmarterHQ and a speaker at our symposium in Los Angeles, described this change as making possible a rapid and continuous cycle of creation, destruction and re-creation that is enabling him and his team of data scientists to test and iterate on the fly. In on-premises data analytics environments, that team would have been required to do careful planning well in advance because of the upfront investment required to make sure that the appropriate resources were purchased and deployed to support projects. In the cloud, environments can be created, populated, used, and destroyed on the fly, making it easy to experiment and iterate rapidly. This different way of working is giving his team’s work added relevancy as a result of being able to test more hypotheses within the same budget and time frame. Moreover, because of the simpler, more agile environment that the cloud enables, the team has access to fresher data, making it possible for new data from the field and other sources to be quickly normalized, integrated and examined against historical data in a way that had not been possible prior to the use of cloud services for data handling and report generation.
This continuous, iterative cycle of data experimentation is vastly different from the traditional lock-step and labor intensive framework in which data users labored and debated incessantly over the hypotheses that they would be testing before undertaking the report generation phase. That latter phase itself had before taken days, even weeks to complete. And even before that, IT had to choose the technology and systems to run the reports before the data had even been examined. Before the advent of the cloud data warehouse and cloud analytics tools, ambiguities, inconsistencies and incongruities were common and difficult to test. The work flow process of the data analysts contained a substantial amount of guesswork, with gaps and delays developing between what the scientists already knew, what data still needed to be tested, and any new data coming in.
You Have All That Data, Now What Do You Do With It?
Tamara Dull, Director of Emerging Technologies at SAS and #13 on the Big Data 2015: Top 100 Influencers in Big Data list, pointed out that cloud has made utilizing the benefits of a data warehouse more accessible to a wider diversity of types and sizes of organizations than in the past. This new accessibility is not only improving data management and enhancing security but also, similar to the experience shared by Dean Abbott, creating new opportunities in data discovery and providing a platform for advanced analytics.
In the past, when data was more homogenous and there were fewer data sources, new and old data could be integrated via complex data integration pipelines, carefully planned data warehouses and sometimes some very large Excel spreadsheets. But with the advent of new data sources and formats such as web application data, mobile user data, and now IoT data streams, traditional systems can no longer keep up. The result is that these gaps have been getting bigger, potentially at great cost to accuracy and effectiveness for those still using these old systems.
It was clear from the presentations and discussions that a variety of organizations from revolutionary start-ups to reinvented Fortune 500’s are building and rebuilding their data-driven operations in the cloud to ensure that their data management infrastructures are as flexible as the incoming data. The outcome of this new paradigm is that while approaches and methodologies for using data can differ vastly between organizations, storing and using data in the cloud opens up exciting new possibilities for data analytics. These changes aren’t just better for business, but have become a requirement for thriving in an increasingly data-driven world.
Coming Up Next
We’re looking forward to more discussions and insights in this fall’s City Tour. We hope you’ll join us to share in the discussion and add your own insights.