Generative AI Architecture: Layers and Models
The data architecture that spans foundational to generative AI (gen AI) as well as large language models (LLMs) is highly sophisticated. Generative AI architecture refers to the underlying model design and components that enable complex models. In this article, we’ll lay out a high-level overview of gen AI architecture. We’ll also discuss four popular categories that may be included in a generative AI architecture, and explain their roles and importance. Then we’ll explore the capabilities of Snowflake Cortex AI, a fully managed service that enables users to easily build and deploy AI-powered applications using industry-leading LLMs while maintaining data security and governance.
Essential Elements of Generative AI Architecture
Behind each generative model is a complex, multilayered AI infrastructure. Although each model is purpose-built for the use case, the underlying generative AI architectural models tend to follow a similar construct, consisting of five layers.
Data processing layer
The data processing layer is responsible for collecting, preparing and processing the information that the generative AI draws from when creating its outputs. In this layer, data may be gathered from various sources within the organization, as well as from external sources such as data partners and third-party data providers. The aggregated data is cleaned and normalized. Then, in a process called feature extraction, unnecessary or redundant data is removed, allowing the model to focus on the most relevant data.
Generative model layer
At the generative model layer, the AI model is trained. Numerous models can be used for gen AI applications. The type or types of models needed will depend on the use case. In this layer, the model is trained, validated and fine-tuned to ensure it can generalize or apply the knowledge gained from the training data to new data.
Feedback and improvement layer
Feedback is essential for optimizing the efficiency and accuracy of a model’s output. Information from user surveys and interaction analysis helps developers gauge how well the model is meeting user expectations. Feedback loops — specially designed algorithms that identify output errors and provide corrective inputs back into the model — help generative AI models learn from their mistakes.
Deployment and integration layer
Before a gen AI model is deployed, the infrastructure for supporting it in a production environment must be set up. This includes provisioning specialized computing resources, model serving and data infrastructure, security and access controls, and other components. The generative model must also be properly fitted into the application’s frontend and backend systems to ensure the finished product is performing as intended.
Monitoring and maintenance layer
The gen AI architecture supports ongoing monitoring and improvement. After deployment, performance metrics such as accuracy, precision and recall must be tracked to ensure the system is producing accurate, reliable outputs. With the introduction of new data or changes in performance requirements, generative models may require retraining or updating to ensure they remain fit-to-purpose. As usage increases, additional resources may also need to be provisioned.
Generative AI Models
A generative AI architecture can incorporate several types of models. Although not an exhaustive list, the models below are used in many generative AI systems.
Large language models
Trained on vast amounts of text data to learn patterns and relationships in human language, large language models are used to generate text, answer questions and converse with human users. They’re often used for language translation, creating summaries from large documents and content creation. LLMs are built using a type of neural network called a transformer model, and they are an integral part of many AI applications, including generative systems.
Variational autoencoders
A variational autoencoder (VAE) is a gen AI algorithm that leverages an encoder-decoder architecture to learn the underlying probability distribution of a data set. This type of generative model is useful for generating images and synthetic data. It’s also used in anomaly detection.
Generative adversarial networks
A generative adversarial network (GAN) is a type of deep-learning AI architecture composed of two opposing neural networks: the generator and the discriminator. The generator creates fake data samples in an attempt to fool the discriminator. Through the course of many training iterations, the generator learns to create increasingly realistic samples while the discriminator becomes more skilled at distinguishing real data from the fake data. By setting one neural network against the other, GANs excel at producing realistic images and video, and are frequently used in developing video games and other multimedia applications.
Diffusion models
Diffusion models work by destroying training data through the gradual addition of Gaussian noise, a type of continuous probability distribution popular in machine learning. After the training data has been eliminated, the model learns to recover the lost training data by undoing the noising process. Diffusion models are used to create high-quality images, audio and 3D data without the use of adversarial training.
Realizing the Potential of Gen AI with Snowflake Cortex AI
Organizations across industries are exploring and adopting gen AI to drive innovation, enhance efficiency, and create new products and services while maintaining security and governance. To accelerate their gen AI initiatives, teams can use Snowflake Cortex AI, an intelligent, fully managed service that delivers gen AI solutions to data directly in Snowflake.
Better-performing LLMs
Snowflake Cortex AI supports Snowflake Arctic, an efficient and truly open model that excels at enterprise tasks like SQL generation, coding and instruction following. Available under the Apache 2.0 license, Arctic provides ungated access to weights and code. Cortex AI users can also access other high-performing LLMs such as Llama 3 (both 8B and 70B), Reka-Core LLMs and Gemma. In addition, models from Mistral AI, AI21 Labs and NVIDIA can all be accessed from Cortex. Each model is accessible via the easy-to-use COMPLETE function, enabling enterprises to quickly extend access and governance policies to these models.
Efficient RAG and semantic search
To accurately answer business questions using LLMs, companies must augment pretrained models with their data. Retrieval-augmented generation (RAG) is a popular solution to this problem, as it incorporates factual, real-time data from semantic search into LLM response generation.
Snowflake customers can now effortlessly test and evaluate RAG-oriented use cases, such as document chat experiences, with Cortex AI’s fully integrated solution. With all of this natively built into the Snowflake platform, there is no need to set up, maintain and govern a separate vector store. This cohesive experience accelerates the path from idea to implementation and broadens the range of use cases that organizations can support.
Improved AI safety
As part of a collaboration with Meta, Llama Guard, an LLM-based input-output safeguard model, comes natively integrated with Snowflake Arctic, and will soon be available for use with other models in Cortex LLM functions. At Snowflake, we prioritize maintaining high safety standards for gen AI applications. As part of our ongoing focus and partnership with Meta, the Llama Guard model is natively integrated into Snowflake Arctic (with availability expanding soon to other models) to proactively filter out any potentially harmful content from LLM prompts and responses. Snowflake Arctic, combined with Llama Guard, minimizes objectionable content in your gen AI applications, ensuring a safer user experience for all. We also plan to offer our customers the ability to use Llama Guard with other models in Cortex AI soon.
Reka Core
This LLM is a state-of-the-art multimodal model demonstrating a comprehensive understanding of images, videos and audio, along with text. Currently Snowflake Cortex AI provides text modality, with multimodal support expected in the near future.
Enhanced data privacy and security
Data security is key to productionizing enterprise-grade gen AI applications. Snowflake is committed to industry-leading standards of data security and privacy to enable enterprise customers to protect their most valuable asset — the data — throughout its journey in the AI lifecycle, from ingestion to inference. Here are several important safeguards that in place to ensure customer data remains private and secure:
Snowflake does not use customer data to train any LLM to be used across customers. LLMs run inside Snowflake.
Data never leaves the Snowflake service boundary or gets shared with any third-party provider.
Role based access controls (RBAC) can be used to manage access to Cortex LLM functions.
Building Your Next-Generation Data Architecture on Snowflake
Snowflake Cortex AI is easy, efficient and trusted. It brings generative AI securely to governed data. Because it’s offered as a fully managed service, teams can focus their full attention on building AI applications while Snowflake handles model optimization and GPU infrastructure to deliver cost-effective performance.