AI & ML

Meta’s Llama 4 Large Language Models Now Available on Snowflake Cortex AI

At Snowflake, we are committed to providing our customers with industry-leading LLMs. We’re pleased to bring Meta’s latest Llama 4 models to Snowflake Cortex AI! 

Llama 4 models deliver performant inference so customers can build enterprise-grade generative AI applications and deliver personalized experiences. The Llama 4 Maverick and Llama 4 Scout models can be accessed within the secure Snowflake perimeter on Cortex AI. According to Meta, Llama 4 Scout is the best multimodal model in the world in its class and supports an industry-leading context window of up to 10M tokens. According to Meta, these models are trained with large amounts of unlabeled text, image and video data for rich end-user experiences. These models are designed for native multimodality, incorporating early fusion to seamlessly integrate text and vision tokens into a unified model backbone. This design accommodates a range of use cases and developer needs. This allows developers to build enterprise-grade AI applications. 

Faster and high-quality inference with a Mixture of Experts Architecture (MoE)

Llama 4 are the first models from Meta to use a MoE architecture — a single token activates only a fraction of the total parameters. As a result, MoE architectures are more compute efficient for both model training and inference and deliver higher quality inference compared to other architectures. Within Snowflake, Llama 4 Maverick and Llama 4 Scout can be integrated with gen AI applications.

  • Llama 4 Maverick offers industry-leading performance in image and text understanding with support for 12 languages to bridge language barriers. As a general-purpose LLM, Llama 4 Maverick contains 17 billion active parameters (400 billion total parameters), offering high-quality inference compared to Llama 3.3 70B. The model is well suited for precise image understanding and creative writing. It provides state-of-the-art intelligence with high speed, optimized for best response quality on tone, and refusals.

  • Llama 4 Scout is a smaller general-purpose model with 17 billion active parameters (109 billion total parameters) and supports an industry-leading context window size of 10 million tokens. This opens up a world of possibilities, including multi-document summarization, parsing extensive user activity for personalized tasks, and reasoning over vast codebases. 

Snowflake’s commitment to open source

Meta’s open-source Llama models have empowered enterprises to create unique AI experiences. At Snowflake, we’re leveraging these models within Cortex AI to build tailored solutions that meet evolving business needs. Customers can use Llama models to power AI agents that handle complex tasks and integrate with tools like Cortex Analyst and Cortex Search - unlocking the full value of their data on a single platform.

"As the world's largest travel guidance platform, Tripadvisor helps hundreds of millions of travelers make the best of their trips each month. Harnessing Llama models in Snowflake, has helped us provide those travelers with highly-relevant, personalized recommendations for their trips, while simultaneously driving more engagement and revenue for our business. Our team is excited to start using Llama 4 models in Cortex AI to push the boundaries of what we can achieve in travel personalization and user experience."

— Rahul Todkar
Head of Data and AI, TripAdvisor.

Our AI Research team has been actively developing cutting-edge technologies on top of these Llama models. For example, Arctic Ulysses is a novel technology we developed that’s optimized for low-latency and high-throughput inference, and is beneficial for long sequence tasks. Furthermore, SwiftKV, another recent innovation built upon Meta’s Llama models and available in Snowflake-Llama-3.3-70B and Snowflake-Llama-3.1-405B, achieves a reduction in the inference costs of Llama LLMs by up to 75% on Cortex AI compared to the baseline Meta Llama models in Cortex AI that are not SwiftKV optimized. This directly translates to tangible cost savings and improved performance for our customers, driving scalable deployment of generative AI initiatives. By optimizing the prefill stage of inference, SwiftKV ensures the efficient processing of lengthy input prompts, a critical requirement for many enterprise applications.

Integrated access via SQL and Python

The Llama 4 series now available in preview on Cortex AI offer easy access through established SQL functions and standard REST API endpoints. Customers can use Llama 4’s advanced inference capabilities into existing applications and data pipelines without complex integration procedures. The new Llama 4 models can be called using a simple COMPLETE function within Cortex AI. 

SELECT SNOWFLAKE.CORTEX.COMPLETE('llama4-maverick',
       [{'role':'user','content':CONCAT('Summarize this customer feedback in bullet points:<feedback>',content,'</feedback>')}]
       ,{'guardrails':true})
FROM my_table;

Integrated access via REST API

To enable services or applications running outside of Snowflake to make low-latency inference calls to Cortex AI, the REST API interface is the way to go. Here is an example of what that looks like:

curl -X POST \
    -H "Authorization: Bearer <jwt>" \
    -H 'Content-Type: application/json' \
    -H 'Accept: application/json, text/event-stream' \
    -d '{
    "model": "llama4-maverick",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather like in San Francisco?"
      }
    ],
    "max_tokens": 4096,
    "top_p": 1,
    "stream": true
    }' \
https://<account_identifier>.snowflakecomputing.com/api/v2/cortex/inference:complete

The trusted path to advanced inference capabilities

Snowflake is the only cloud data platform with native integration to premier models from both OpenAI and Anthropic, as well as others. By integrating Llama 4 into Snowflake Cortex AI, we are providing our customers with access to leading-edge AI models so they can build intelligent applications and data agents, all within the security, governance and unified environment of Snowflake. This powerful combination will enable enterprises to automate repetitive tasks, gain deeper insights from their data, and deliver more value to their customers.

Stay tuned for more updates on how you can start building the next generation of AI applications with Llama 4 on Snowflake Cortex AI.

Learn more

  • Join us at Summit 2025 to learn more about our latest AI innovations.

  • Get the guide to industry-leading AI and data use cases — download now.

  • Read more about Meta’s latest announcements here.

Meta’s Llama 3.1 405B for Enterprise Apps in Snowflake Cortex AI

Meta’s Llama 3.1 405B is now available in Snowflake Cortex AI for secure, serverless app development, supporting long-document processing and multilingual apps.

Build Enterprise-Grade AI Faster with New Multimodal Support, Enhanced Observability and More

Snowflake gives developers the power to bring custom AI into production, making it easier than ever to create high-quality, trustworthy AI and ML applications.

Building a Data-Centric Platform for Generative AI and LLMs at Snowflake

Snowflake enables customers to bring the power of generative AI and large language models (LLMs) to data through Applica, Streamlit and other innovations.

Anthropic’s Claude 3.5 Sonnet now available in Snowflake Cortex AI

Anthropic's Claude 3.5 Sonnet in Snowflake Cortex AI enables enterprises to build gen AI apps with advanced language models for enterprise-ready AI solutions.

Powering Llama LLMs in Snowpark, Part 1

Explore Llama LLMs in Snowpark: a step-by-step guide to running, training, and deploying open-source language models securely in Snowflake.

Meta Code Llama on Snowflake Testing | Blog

Snowflake has been actively testing Meta’s just announced LLM-based Code Llama, an updated iteration (Llama2) focusing on code generation, including SQL.

Snowflake Brings Gen AI to Images, Video and More With Multimodal Language Models from Reka in Snowflake Cortex

Snowflake Brings Gen AI to Images, Video and More With Multimodal Language Models from Reka in Snowflake Cortex

Announcing OpenAI GPT-5.2 on Snowflake Cortex AI

OpenAI now on Snowflake Cortex AI, enabling secure access to OpenAI’s latest models via LLM functions and REST APIs.

TPC-DS at 100TB & 10TB Scale Now Available in Snowflake Samples

100 TB and 10TB versions of TPC-DS data, along with samples of the benchmark's 99 queries, are available now as sample data sets in Snowflake.

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime