Samyam Rajbhandari

Samyam Rajbhandari

Principal AI Architect, Snowflake
Samyam Rajbhandari is an expert on AI Systems. He currently leads the inference optimization efforts at Snowflake responsible for development of technologies like SwiftKV for reducing cost and latency of inference. Prior to his currenr role, he co-led the design and development of Snowflake Arctic, an innovative foundation model with a unique MoE architecture, capabile of achieving state-of-the-art enterprise intelligence with best-in-class cost efficiency at the time of release. Prior to Snowflake, Samyam was a co-founder and the system architect for DeepSpeed at Microsoft, where he worked on developing high-performance infrastructures for accelerating large-scale deep learning training and inference on parallel and distributed systems. He designed systems such as ZeRO and 3D parallelism that have been adoptedby many DL frameworks, has become the staple engine for training large language models including models like meta LLama and many other LLMs. On the inference front, he led the effort to optimize the inference system for Dalle.2 and Dall.E 3 models from OpenAI to reduce latency, cost and improve capacity. He has also designed fast systems and led optimization efforts that have been released as part of DeepSpeed-Inference/MII and used in products such as Bing, Ads, AzureML within Microsoft. Samyam received his PhD in High Performance Computing from The Ohio State University.

MORE POSTSFROM Samyam Rajbhandari

Gen AI

Inside Snowflake Intelligence: Five Pillars of Enterprise-Grade Agentic AI

Explore the underlying architecture, orchestration, and system-level optimizations behind Snowflake Intelligence, a production-grade agentic AI system built for enterprise reasoning.
|||||||
JUN 03, 2025|13 min read
Gen AI

Arctic Inference with Shift Parallelism: The Fastest Open Source Inference System for Enterprise AI

Built by Snowflake AI Research, Arctic Inference uses Shift Parallelism, SwiftKV, and speculative decoding to power the fastest open-source enterprise AI.
||||||||
MAY 29, 2025|15 min read
Gen AI

Scaling vLLM for Embeddings: 16x Throughput and Cost Reduction

Learn how we increased embedding throughput 3x in Snowflake Cortex—and 16x vs. vLLM—through smarter serialization, tokenization, and GPU optimization.
||||||
MAY 29, 2025|8 min read
Gen AI

Low-Latency and High-Throughput Inference for Long Context with Sequence Parallelism (aka Arctic Ulysses)

Ulysses, a novel sequence parallelism technique, boosts long-context LLM inference performance with 3.4x lower latency and better GPU efficiency.
|||||
APR 03, 2025|14 min read
Digital illustration of connected lines and dots in a column lined with grids
Product and Technology

SwiftKV from Snowflake AI Research Reduces Inference Costs of Meta Llama LLMs up to 75% on Cortex AI

SwiftKV optimizes Meta Llama LLMs on Snowflake Cortex AI, reducing inference costs by up to 75% while maintaining accuracy for enterprise AI solutions.
||||
JAN 16, 2025|5 min read

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime