Snowflake AI Research

Snowflake AI Research

We are a team with extensive experience building systems and technology that have significantly reduced the cost of LLM training and inference. A lot of our work has been open-sourced to provide the AI community with more accessible and cost-effective LLMs. The team includes many specialists in natural language processing and search. With the help of thousands of engineers worldwide at Snowflake, our cutting-edge technology powers enterprise AI products in Cortex AI and more. Check out what we're working on: https://www.snowflake.com/en/product/ai/ai-research/

MORE POSTSFROM Snowflake AI Research

Gen AI

Inside Snowflake Intelligence: Five Pillars of Enterprise-Grade Agentic AI

Explore the underlying architecture, orchestration, and system-level optimizations behind Snowflake Intelligence, a production-grade agentic AI system built for enterprise reasoning.
|||||||
JUN 03, 2025|13 min read
Gen AI

Smaller Models, Smarter SQL: Arctic-Text2SQL-R1 Tops BIRD and Wins Broadly

A deep dive into how Snowflake AI built Arctic-Text2SQL-R1 using simple rewards, strong reasoning, and a scalable approach to real-world SQL generation.
||
MAY 29, 2025|14 min read
Gen AI

Arctic Inference with Shift Parallelism: The Fastest Open Source Inference System for Enterprise AI

Built by Snowflake AI Research, Arctic Inference uses Shift Parallelism, SwiftKV, and speculative decoding to power the fastest open-source enterprise AI.
||||||||
MAY 29, 2025|15 min read
Gen AI

Scaling vLLM for Embeddings: 16x Throughput and Cost Reduction

Learn how we increased embedding throughput 3x in Snowflake Cortex—and 16x vs. vLLM—through smarter serialization, tokenization, and GPU optimization.
||||||
MAY 29, 2025|8 min read
Gen AI

Fastest Speculative Decoding in vLLM with Arctic Inference and Arctic Training

How we enhanced speculative decoding to get 4x faster end-to-end task completion for LLM agents and up to 2.8x faster decoding for conversational, interactive and coding workloads.
MAY 01, 2025|18 min read
Gen AI

Evaluating Multimodal vs. Text-Based Retrieval for RAG with Snowflake Cortex

Discover how multimodal retrieval on Snowflake Cortex transforms enterprise PDF search, enhancing accuracy and speed across complex document formats.
|||
APR 21, 2025|8 min read
Gen AI

Low-Latency and High-Throughput Inference for Long Context with Sequence Parallelism (aka Arctic Ulysses)

Ulysses, a novel sequence parallelism technique, boosts long-context LLM inference performance with 3.4x lower latency and better GPU efficiency.
|||||
APR 03, 2025|14 min read
Gen AI

Think. Execute. Excel: Arctic Text2SQL with Execution-Guided CoT

Learn how Snowflake’s ExCoT optimizes Text2SQL with execution-guided CoT and DPO, setting a new benchmark in natural language to SQL accuracy.
||||
APR 02, 2025|10 min read
Gen AI

Snowflake Arctic Embed Joins ArcticTraining: Simple And Scalable Embedding Model Training

Arctic Embed now merges with ArcticTraining, giving developers open access to core training code for building efficient frontier embedding models.
|
MAR 25, 2025|10 min read

Previous

1

2

3

4

Next

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime