Aurick Qiao

Senior Software Engineer

Aurick Qiao is an AI researcher at Snowflake and former CEO of Petuum Inc., with a Ph.D. from Carnegie Mellon University. He specializes in machine learning systems, distributed computing, and large language models, making significant contributions to AI research and technology.

Gen AI

SuffixDecoding at Production Scale with Arctic Inference and vLLM

SuffixDecoding now delivers 1.96x–3.12x end-to-end speedups in vLLM and Arctic Inference with major CPU optimizations for fast, production-ready LLM serving.

Aurick Qiao|Gabriele Oliaro|Samyam Rajbhandari|Snowflake AI Research

DEC 02, 2025|9 min read

MORE POSTSFROM Aurick Qiao

Accelerating PyTorch Innovation at Scale: Snowflake at PyTorch Conference 2025

How Snowflake tackles four core AI challenges: scaling deep learning, training thousands of models, accelerating inference, and balancing multilingual performance.

Olatunji (Tunji) Ruwase

Snowflake AI Research

NOV 19, 2025|7 min read

Smarter, Faster and Snowflake-Native: Real-Time Text2SQL Behind Snowflake Intelligence

Discover Arctic-Text2SQL-R1.5, Snowflake's new, native Text-to-SQL model built to overcome LLM latency and deliver higher accuracy for real-time conversational analytics.

Lukasz Borchmann

Snowflake AI Research

NOV 04, 2025|7 min read

Fast Reasoning on GPT-OSS with Speculative Decoding and Arctic Inference

Snowflake AI Research boosted GPT-OSS reasoning speed by 1.7–1.8x using Arctic Inference with speculative decoding, enabling faster agentic AI.

Gabriele Oliaro

Samyam Rajbhandari

Snowflake AI Research

AUG 25, 2025|4 min read

Arctic Long Sequence Training (ALST): Scalable And Efficient Training For Multi-Million Token Sequences

Snowflake's ALST enables scalable training of long-context models with up to 15 million tokens using Hugging Face and DeepSpeed, all without custom modeling code.

Samyam Rajbhandari

Olatunji (Tunji) Ruwase

Snowflake AI Research

JUN 24, 2025|10 min read

Arctic Inference with Shift Parallelism: The Fastest Open Source Inference System for Enterprise AI

Built by Snowflake AI Research, Arctic Inference uses Shift Parallelism, SwiftKV, and speculative decoding to power the fastest open-source enterprise AI.

Samyam Rajbhandari

Mert Hidayetoglu

Snowflake AI Research

MAY 29, 2025|15 min read

Low-Latency and High-Throughput Inference for Long Context with Sequence Parallelism (aka Arctic Ulysses)

Ulysses, a novel sequence parallelism technique, boosts long-context LLM inference performance with 3.4x lower latency and better GPU efficiency.

Mert Hidayetoglu

Samyam Rajbhandari

Snowflake AI Research

APR 03, 2025|14 min read

Digital illustration of connected lines and dots in a column lined with grids

Product and Technology

SwiftKV from Snowflake AI Research Reduces Inference Costs of Meta Llama LLMs up to 75% on Cortex AI

SwiftKV optimizes Meta Llama LLMs on Snowflake Cortex AI, reducing inference costs by up to 75% while maintaining accuracy for enterprise AI solutions.

Harshal Pimpalkhute

Samyam Rajbhandari

JAN 16, 2025|5 min read