데이터 사이언스

Meta’s Llama 4 Large Language Models Now Available on Snowflake Cortex AI

At Snowflake, we are committed to providing our customers with industry-leading LLMs. We’re pleased to bring Meta’s latest Llama 4 models to Snowflake Cortex AI! 

Llama 4 models deliver performant inference so customers can build enterprise-grade generative AI applications and deliver personalized experiences. The Llama 4 Maverick and Llama 4 Scout models can be accessed within the secure Snowflake perimeter on Cortex AI. According to Meta, Llama 4 Scout is the best multimodal model in the world in its class and supports an industry-leading context window of up to 10M tokens. According to Meta, these models are trained with large amounts of unlabeled text, image and video data for rich end-user experiences. These models are designed for native multimodality, incorporating early fusion to seamlessly integrate text and vision tokens into a unified model backbone. This design accommodates a range of use cases and developer needs. This allows developers to build enterprise-grade AI applications. 

Faster and high-quality inference with a Mixture of Experts Architecture (MoE)

Llama 4 are the first models from Meta to use a MoE architecture — a single token activates only a fraction of the total parameters. As a result, MoE architectures are more compute efficient for both model training and inference and deliver higher quality inference compared to other architectures. Within Snowflake, Llama 4 Maverick and Llama 4 Scout can be integrated with gen AI applications.

  • Llama 4 Maverick offers industry-leading performance in image and text understanding with support for 12 languages to bridge language barriers. As a general-purpose LLM, Llama 4 Maverick contains 17 billion active parameters (400 billion total parameters), offering high-quality inference compared to Llama 3.3 70B. The model is well suited for precise image understanding and creative writing. It provides state-of-the-art intelligence with high speed, optimized for best response quality on tone, and refusals.

  • Llama 4 Scout is a smaller general-purpose model with 17 billion active parameters (109 billion total parameters) and supports an industry-leading context window size of 10 million tokens. This opens up a world of possibilities, including multi-document summarization, parsing extensive user activity for personalized tasks, and reasoning over vast codebases. 

Snowflake’s commitment to open source

Meta’s open-source Llama models have empowered enterprises to create unique AI experiences. At Snowflake, we’re leveraging these models within Cortex AI to build tailored solutions that meet evolving business needs. Customers can use Llama models to power AI agents that handle complex tasks and integrate with tools like Cortex Analyst and Cortex Search - unlocking the full value of their data on a single platform.

"As the largest travel guidance platform in the world, TripAdvisor helps over 450 million travelers make the best of their trips each month. Through harnessing Llama models in Snowflake, we’ve been able to provide those travelers with highly relevant, personalized recommendations for their trips, while simultaneously driving more engagement and revenue for our business. Our team is excited to start using Llama 4 models in Cortex AI to push the boundaries of what we can achieve in travel personalization and user experience."

— Rahul Todkar
Head of Data and AI, TripAdvisor.

Our AI Research team has been actively developing cutting-edge technologies on top of these Llama models. For example, Arctic Ulysses is a novel technology we developed that’s optimized for low-latency and high-throughput inference, and is beneficial for long sequence tasks. Furthermore, SwiftKV, another recent innovation built upon Meta’s Llama models and available in Snowflake-Llama-3.3-70B and Snowflake-Llama-3.1-405B, achieves a reduction in the inference costs of Llama LLMs by up to 75% on Cortex AI compared to the baseline Meta Llama models in Cortex AI that are not SwiftKV optimized. This directly translates to tangible cost savings and improved performance for our customers, driving scalable deployment of generative AI initiatives. By optimizing the prefill stage of inference, SwiftKV ensures the efficient processing of lengthy input prompts, a critical requirement for many enterprise applications.

Integrated access via SQL and Python

The Llama 4 series now available in preview on Cortex AI offer easy access through established SQL functions and standard REST API endpoints. Customers can use Llama 4’s advanced inference capabilities into existing applications and data pipelines without complex integration procedures. The new Llama 4 models can be called using a simple COMPLETE function within Cortex AI. 

SELECT SNOWFLAKE.CORTEX.COMPLETE('llama4-maverick',
       [{'role':'user','content':CONCAT('Summarize this customer feedback in bullet points:<feedback>',content,'</feedback>')}]
       ,{'guardrails':true})
FROM my_table;

Integrated access via REST API

To enable services or applications running outside of Snowflake to make low-latency inference calls to Cortex AI, the REST API interface is the way to go. Here is an example of what that looks like:

curl -X POST \
    -H "Authorization: Bearer <jwt>" \
    -H 'Content-Type: application/json' \
    -H 'Accept: application/json, text/event-stream' \
    -d '{
    "model": "llama4-maverick",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather like in San Francisco?"
      }
    ],
    "max_tokens": 4096,
    "top_p": 1,
    "stream": true
    }' \
https://<account_identifier>.snowflakecomputing.com/api/v2/cortex/inference:complete

The trusted path to advanced inference capabilities

Snowflake is the only cloud data platform with native integration to premier models from both OpenAI and Anthropic, as well as others. By integrating Llama 4 into Snowflake Cortex AI, we are providing our customers with access to leading-edge AI models so they can build intelligent applications and data agents, all within the security, governance and unified environment of Snowflake. This powerful combination will enable enterprises to automate repetitive tasks, gain deeper insights from their data, and deliver more value to their customers.

Stay tuned for more updates on how you can start building the next generation of AI applications with Llama 4 on Snowflake Cortex AI.

Learn more

  • Join us at Summit 2025 to learn more about our latest AI innovations.

  • Get the guide to industry-leading AI and data use cases — download now.

  • Read more about Meta’s latest announcements here.

기사 공유하기

Snowflake Cortex AI: Accelerating Financial Services with Agentic AI

Discover Snowflake Cortex AI for Financial Services, a suite of capabilities powering secure, agentic workflows for market analysis, claims management, and more.

Snowflake Ventures Invests in Twelve Labs to Bring Advanced Video Understanding to the Snowflake AI Data Cloud for Media

Snowflake and Twelve Labs will work together to identify opportunities to make it easier to bring powerful video AI capabilities into Snowflake's unified data platform.

Snowflake Expands Supported MFA Methods and Makes Them Available by Default Everywhere

Snowflake expands MFA options with authenticator apps and passkeys, making MFA available by default on Snowsight password sign-ins to strengthen security.

Snowflake, Cortex AI에서 DeepSeek-R1 프리뷰 지원

Snowflake Cortex AI에서 지원하는 DeepSeek-R1은 수학, 코드 및 추론 작업에 최적화된 최고 성능의 오픈 소스 모델입니다. SQL, Python 또는 REST API를 통해 액세스할 수 있습니다.

Break Data Silos: Build, Deploy and Serve Models at Scale with Snowflake ML

Discover how Snowflake ML enables scalable model development and production with integrated tools for training, inference, observability, and governance.

AI 데이터 클라우드에서 안전한 AI 모델 공유 및 수익화 실현

Snowflake는 엔터프라이즈가 AI 데이터 클라우드 내에서 AI/ML 및 LLM을 안전하게 공유, 미세 조정 및 수익화하는 동시에 규정 준수 및 보안을 보장할 수 있도록 지원합니다.

Snowflake Invests in Voyage AI to Optimize Multilingual RAG Applications in the AI Data Cloud

Announcing Snowflake's investment in Voyage AI to optimize multilingual retrieval-augmented generation (RAG) applications in the AI Data Cloud.

Gen AI in Action: Customers Use Cortex AI to Garner New Insights

Cortex AI empowers businesses like Johnnie-O and IntelyCare to leverage generative AI for insights, automation, and innovation with Snowflake's secure platform.

데이터 앱 민주화 - Snowflake, Streamlit 인수

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30일 무료 평가판
  • 신용카드 불필요
  • 언제든지 취소 가능