Skip to content
  • AT SNOWFLAKE
  • Industry solutions
  • Partner & Customer Value
  • Product & Technology
  • Strategy & Insights
Languages
  • Deutsch
  • English
  • 한국어
  • Español
  • Français
  • Italiano
  • 日本語
  • Português
  • Deutsch
  • English
  • 한국어
  • Español
  • Français
  • Italiano
  • 日本語
  • Português
  • AT SNOWFLAKE
  • Industry solutions
  • Partner & Customer Value
  • Product & Technology
  • Strategy & Insights
  • Deutsch
  • English
  • 한국어
  • Español
  • Français
  • Italiano
  • 日本語
  • Português
  • Aperçu
    • Pourquoi Snowflake
    • Témoignages de clients
    • Partenaires
    • Services
  • Aperçu
    • Le Data Cloud
    • Aperçu de la plateforme
    • Snowflake Data Marketplace
    • Powered by Snowflake
    • Démo en direct
  • Workloads
    • Collaboration
    • Data Science & ML
    • Cybersécurité
    • Applications
    • Data Warehouse
    • Data Lake
    • Data Engineering
    • Unistore
  • Tarification
    • Options de tarification
    • Calculateur de valeur
  • Pour les industries
    • Publicités, médias et divertissement
    • Services financiers
    • Secteur de la santé et sciences de la vie
    • Marketing
    • Secteur public
    • Secteur du retail et des biens de consommation
    • Technologie
  • Apprendre
    • Bibliothèque de ressources
    • Documentation
    • Webinars
    • Formation
  • Connecter
    • Blog
    • Communauté Snowflake
    • Événements
    • Webinar
    • Podcast
  • Aperçu
    • À propos de Snowflake
    • Les investisseurs
    • Direction et Administration
    • Carrière
Author
Kent Graziano Kent Graziano
Share
Subscribe
Jan 25, 2023

Why Column-Aware Metadata Is Key to Automating Data Transformations

  • Valeur pour clients et partenaires
Why Column-Aware Metadata Is Key to Automating Data Transformations

Data, data, data. It does seem we are not only surrounded by talk about data, but by the actual data itself. We are collecting data from every nook and cranny of the universe (literally!). IoT devices in every industry; geolocation information on our phones, watches, cars, and every other mobile device; every website or app we access—all are collecting data. 

In order to derive value from this avalanche of data, we have to get more agile when it comes to preparing the data for consumption. This process is known as data transformation, and while automation in many areas of the data ecosystem has changed the data industry over the last decade, data transformations have lagged behind. 

That started to change in 2022, and in 2023 I predict we will see an accelerated adoption of platforms that enable data transformation automation. 

Why we need to automate data transformations

If we are going to be truly data-driven, we need to automate every possible task in our data ecosystem. Over the multiple decades I’ve spent in the data industry, one observation has remained nearly constant: the majority of the work in building a data analytics platform revolves around data transformations (what we used to call “the T in ETL or ELT”). This must change. Gone are the days when a few expert data engineers could manage the influx of new data and data types, and quickly apply complex business rules to deliver it to their business consumers. 

We cannot scale our expertise as fast as we can scale the Data Cloud. There are just not enough hours in a day to do all the data profiling, design, and coding required to build, deploy, manage, and troubleshoot an ever-growing set of data pipelines with transformations. Add to that, there is a dearth of expert engineers to do all that coding and to have a great rapport with the business users so they understand the rules that need to be applied. Engineers like this don’t grow on trees. They require very specific technical skills and years of experience to become efficient and effective at their craft.

The solution? Code automation. There are plenty of SQL-savvy data analysts and architects out there who can be trained on modern data tools with user-friendly UIs. The more we can generate code and automate data pipelines, the more data we can deliver to the folks who need it most, in a timely manner. Add to that, generated code, based on templates, is easier to test and tends to have way fewer (if any) coding errors.

The fact is, with all this growth, not all that data is in one table or even one database; rather, it is spread across hundreds or even thousands of objects. A single organization may have access to millions of attributes. Translate that to database terms, and that means tens or hundreds of millions of columns that the organization needs to understand and manage.

Legacy solutions, even ones with some automation, are never going to manage and transform the data in all of those columns easily and quickly. How will we know where that data came from, where it went, and how it was changed along the way? With all the privacy laws and regulations, which vary from country to country and from state to state, how will we ever be able to trace the data and audit these transformations—at massive scale—without a better approach?

Using column-level metadata to automate data pipelines

I believe the best answer to these questions is that automation tools we use need to be column-aware. It is no longer sufficient to keep track of just tables and databases. That is not fine-grained enough for today’s business needs.

For the future, our automation tools must collect and manage metadata at the column level. And the metadata must include more than just the data type and size. We need much more context today if we really want to unlock the power of our data. We need to know the origin of that data, how current the data is, how many hops it made to get to its current state, who has access to which columns, and what rules and transformations were applied along the way (such as masking or encryption). 

Column awareness is the next level of innovation needed to allow us to attain the agility, governance, and scalability that today’s data world demands. Legacy ETL and integration tools won’t cut it anymore. Not only do they lack column awareness, they can’t handle the scale and diversity of data we have today in the cloud. 

So, in 2023 I expect to see a much greater adoption of, and demand for, column-aware automation tools to enable us to derive value from all this data faster. It will be a new era for data transformation and delivery platforms. The legacy ETL and ELT tools that got us this far will fall by the wayside as modern automation tools come to the fore with their simplicity and ease of use.

A word about data sharing

Many have said it, but it bears repeating—data sharing and data collaboration are becoming critical to the success of all organizations as they strive for better customer service and better outcomes. Since my involvement in the early days of data warehousing, I have talked about the dream of enriching our internal data with external, third-party data. Thanks to the Snowflake Data Cloud, that dream is now a reality. We just have to take advantage of it.

I believe that 2023 will be the Year of Data Collaboration and Data Sharing. The technology is ready, the industry is ready. Taking advantage of the collaboration and data sharing capabilities of Snowflake will provide the competitive edge that will allow many organizations to become or remain leaders in their industries. In this new age of advanced analytics, data science, ML, and AI, taking advantage of third-party data through data sharing and collaboration is essential if you want to be truly data-driven and stay ahead of the competition. 

Successful organizations must, and will, not only consume data from their partners, constituents, and other data providers, but also make their data available for others to consume. For many this will lead to a related benefit: the ability to monetize data. Again, thanks to Snowflake, it is easier than ever to create shareable data products and make them available on Snowflake Marketplace, at an appropriate price.

With these new capabilities, properly managing and governing the data that is being shared will be paramount, and it must happen at the column level. Just as automating data transformations at scale has been enabled by using column-level metadata, data sharing and governance most certainly need to be at the column level—especially when it comes to sensitive data like PII and PHI. Automating the build of your data transformations using a column-aware transformation tool will be a critical success factor for organizations seeking to accelerate their development of shared data products, now and into the foreseeable future.

If you want to get a jump on this, take a look at a modern data automation tool from Snowflake partner Coalesce.io and see how much faster you can get value from your data and bring some of that data to market.

Share

Related Content

  • Produit et technologie
Juin 22, 2022

Best Practices for Data Ingestion with Snowflake: Part 1

Enterprises are experiencing an explosive growth in their data estates and are leveraging Snowflake to gather data insights to grow their business. This data includes structured, semi-structured, and unstructured data…

More
Lire la suite
  • Produit et technologie
    • Applications
    • AI et ML
Nov 08, 2022

Dynamic Tables: Delivering Declarative Streaming Data Pipelines with Snowflake

Companies that recognize data as the key differentiator and driver of success also know that…

Find Out How
Lire la suite
  • Produit et technologie
Août 19, 2021

5 Guidelines for Reliable Data Pipeline Processing

Data is central to how we run our businesses, establish our institutions, and manage our…

Find Out More
Lire la suite

7 Best Practices for Building Data Applications 

Learn how to select virtual warehouse sizes strategically, adjust minimum and maximum cluster numbers to match expected workloads target workloads to the right services, and more!

Data Warehousing Tutorials

With these data warehousing tutorials, learn more about cloud data warehousing to how to get started fast.

Discover
Lire la suite

Data Transformation

Learn about data transformation, an increasingly needed process to convert data into a compatible format for the destination data platform.

Full Details
Lire la suite

What Is an OTT Platform?

OTT (over-the-top) media services and platforms – aka streaming platforms – have proliferated in recent years, resulting in a flood of new customer data sets.

More
Lire la suite

Data Cloud Glossary

Metadata simplifies more detailed data use by providing basic descriptions. It's categorised into three types: operational,...

Explore
Lire la suite
Snowflake Inc.
  • PLATEFORME
    • Cloud Data Platform
    • Architecture
    • Tarification
    • Data Marketplace
  • SOLUTIONS
    • Snowflake pour le secteur de la santé et les sciences de la vie
    • Snowflake pour les services financiers
    • Snowflake pour les analyses marketing
    • Snowflake pour le secteur du retail
    • Snowflake pour l’éducation
  • RESSOURCES
    • Bibliothèque de ressources
    • Webinaires
    • Communauté
    • Légal
  • EXPLORER
    • Actualités
    • Blog
    • Tendances
  • À PROPOS
    • À propos de Snowflake
    • Direction et administration
    • Partenaires
    • Carrières
    • Contact

Sign up for Snowflake Communications

Thanks for signing up!

  • Privacy Notice
  • Site Terms
  • Cookie Settings

© 2023 Snowflake Inc. All Rights Reserved