Is Your Data AI Ready? Are You?

AI-ready data is much more than the old adage “garbage in, garbage out.” Of course, no one wants garbage, but as another saying goes, “One man’s trash is another man’s treasure.” The key is knowing what you need for a specific initiative but also knowing what you’ve got. Data must be evaluated, managed and governed, including detailed labeling and publishing. Those last two are the key to reuse, the holy grail of effective and efficient AI.
To use a cooking analogy, making data AI ready is more than just putting together a tossed salad. In the kitchen, raw ingredients need to be prepared for specific recipes. Potatoes might need to be sliced, diced or grated depending on what you’re making. But before you even get to that step, you need to find the potatoes. You’ll also likely need to clean them. And, you’ll need to find the other ingredients that go with them. Ingredients must also be labeled: You’d not like to mistake the sugar for the salt or the smoky paprika for the spicy cayenne pepper.
AI-ready data is like those prepared ingredients, ready to be baked into the AI model. At the recent Snowflake Summit, we announced features of the AI Data Cloud that will address the key characteristics of AI-ready data. Here are a few:
- Quality: It goes without saying that the best chefs use quality ingredients. Snowflake allows customers to define quality standards (for example, freshness, duplication and custom measures) and monitor them with Data Metric Functions. Real-time monitoring enables real-time feedback on query performance and modified data, facilitating continuous data quality monitoring. Now Snowflake Cortex AI can be used to automate data cleansing, detect anomalies, standardize data sets and even suggest missing values, reducing manual effort and improving consistency.
- Diversity: The pantry should be stocked with a good variety. Similarly, data diversity helps ensure quality AI outcomes. Snowflake enables customers to store, analyze and apply AI to diverse types of data. Support for open source formats allows customers to access an even broader range, including data sitting outside of Snowflake environments. Additionally, expanding training data to include partner data or data acquired from external providers further ensures diversity. And generation of synthetic data can make sensitive data more accessible or balance representation where parameters of the missing data are known.
- Freshness: Of course, you want your ingredients fresh. Nothing worse than discovering that a key component of your dish has lost its flavor. Having access to data where it resides has always been a strong value proposition of the Snowflake platform, providing AI models with access to the most pertinent and timely information. And Snowpipe’s continuous data ingestion service automates data loading processes, ensuring data is available for analysis as soon as it arrives, contributing to timeliness.
- Governance: Even chefs might want certain ingredients under lock and key, like during truffle season. All the new Snowflake governance features available through Snowflake Horizon enable that access, and use rights can be defined granularly and strictly enforced with features such as role-based access controls, data masking, object tagging and auditing. Snowflake's strategy is to bring the AI models to the data within its secure environment, rather than moving sensitive enterprise data out. This significantly reduces security and governance risks associated with external AI tools. It’s like making sure the cooking is done in your kitchen rather than taking your ingredients down to the neighbor’s house.
- Discovery: Obviously chefs need to be able to find their ingredients, ideally in labeled containers. And they want to know composition and origin, with as much detail as possible. Think of the nutritional facts as metadata. A data catalog, such as Snowflake Horizon Catalog, provides an inventory of data assets with metadata, context and accessibility details, making data easier to find and understand. Snowflake's Snowsight interface enables autocomplete, automatic data profiling, visualizations and dashboards for rapid data exploration. And Snowflake Marketplace facilitates easy discovery and access to diverse data sets and prebuilt applications, both using internal data and from external sources. That makes for a pretty impressive kitchen for any chef.
The bottom line: AI-ready data isn't just a nice-to-have. If you want effective and efficient AI, it has to be well trained. Relevant and clean data means your AI models perform better. When your data is easy to find and easy to understand, you spend less time prepping it. Think of the well-organized kitchen, with tins of flour, sugar and salt; a spice rack with jars labeled and dated; and a refrigerator stocked with fresh ingredients. With data like that, you can build, launch and scale AI initiatives much faster, and drive data reuse across multiple projects.
AI-ready data rarely comes “off the shelf”
AI-ready data doesn’t just happen. There is rarely a prepackaged, off-the-shelf version. You might get lucky and find some. But if you develop good data practices, you can create your own internal market where teams can find the ingredients they need.
The responsibility for AI-ready data doesn’t lie within a single individual or department; it's a shared, cross-functional effort with multiple stakeholders across an organization, from leadership to technical teams, data owners and those who will use it. In kitchen parlance, it takes a brigade, from the chef down.
Rather than focusing on new roles, focus on the responsibilities, that is, what you need to accomplish versus whom you need to hire. Here's an overview of what you might need:
- Supportive executives are critical to the success of AI initiatives, as they scale across an organization. Executives define the overarching business goals to which an AI and data strategy must align. They allocate the necessary budget, personnel and tech infrastructure to support the pursuit of these goals and champion a culture that promotes the effective and responsible use of data and AI. Executives are ultimately accountable for what happens in their organizations, whether it’s profits reported at the end of the quarter or the data breach that happened over the weekend. An executive steering committee for AI should ensure leadership is informed and involved.
- Data leadership (the CDO or the most senior data leader), while part of the executive steering committee, is responsible for defining and implementing data strategy, policy and procedures for ensuring data quality, security and accessibility. The CDO or equivalent works with other business units to establish clear roles and responsibilities for data ownership and stewardship and to develop the guidelines for managing the data lifecycle — from collection through storage, processing and use. The CDO role varies across companies but should serve as executive chef (returning to the kitchen analogy), even if tasks are distributed to business units. The CDO will head a data council to coordinate policy, requirements and use.
- Data ownership and stewardship exists within specific business units most familiar with a particular data set. Ownership implies accountability. Stewardship is responsibility for the accuracy, completeness and consistency of their data. Those with these tasks ensure the data is properly curated — collected, documented and maintained according to established governance policies — and that their domain’s data complies with relevant regulations and internal policies. These are the line cooks. Smaller or more centralized organizations maintain data ownership and stewardship within a single data team, but at scale central teams become a bottleneck. The tasks, however, do not need to be distributed equally for all business units. Hybrid ownership and stewardship remains common.
- Platform and data engineering tasks — building and maintaining the data infrastructure, pipelines and platforms that collect, store, process and make data accessible to AI models — often remain within IT. However, those with these roles collaborate to integrate data from disparate sources, ensure consistency and interoperability and implement controls for data security, access management and privacy. Data engineering work can also be distributed.
- Compliance, legal and ethics reviews are typically performed by specific expert teams. They serve a consultative role to ensure that all data practices, especially concerning sensitive or personal information, comply with relevant data privacy regulations (such as GDPR or CCPA) and emerging AI regulations (such as the EU AI Act). Some companies, such as Salesforce, have an ethics office that oversees AI use across product teams and customers. They develop frameworks for identifying and mitigating bias in data and AI models and monitor use for fairness, transparency and accountability.
- Data scientists and AI/ML engineers are roles, not tasks, and deserve to be called out as such. As the primary consumers of AI-ready data, they are responsible for articulating specific data needs for their AI models (for example, volume, variety, relevance and labeling requirements). They analyze data for quality issues, biases and suitability for AI training and provide feedback to data owners and governance teams on data quality, accessibility and gaps that need to be addressed to improve model performance.
Rather than focusing on new roles, focus on the responsibilities, that is, what you need to accomplish versus whom you need to hire.
While these roles and responsibilities are important, an effective AI program includes a collaborative, cross-functional working group that coordinates requirements and shares plans and practices. Each participant understands their own part in the data lifecycle but is also responsible for facilitating reuse as a means of leverage, scale and greater efficiency. Distribution of roles and responsibilities doesn’t mean a free-for-all; it requires coordination to ensure effective AI. Likewise, in a professional kitchen, the executive chef oversees the entire operation, but each role collaborates with clear communication, timing and teamwork to ensure the line cooks can serve their dishes with precision.
Remember, though, that one size does not fit all. Some business units might have more autonomy than others. Not all responsibilities will require new roles to be defined or headcount to be allocated. At a recent Snowflake roundtable, one customer argued that each data product required three new roles. Not everyone agreed. It’s more important to inventory the responsibilities. Some might be allocated across existing roles. The challenge then is to provide incentives to encourage those in existing roles to take on new tasks, or to replace existing tasks with new, more efficient ways of working. Start small with low-hanging fruit to demonstrate new ways of working and drive change.