Samyam Rajbhandari

Principal AI Architect, Snowflake

Samyam Rajbhandari is an expert on AI Systems. He currently leads the inference optimization efforts at Snowflake responsible for development of technologies like SwiftKV for reducing cost and latency of inference. Prior to his currenr role, he co-led the design and development of Snowflake Arctic, an innovative foundation model with a unique MoE architecture, capabile of achieving state-of-the-art enterprise intelligence with best-in-class cost efficiency at the time of release. Prior to Snowflake, Samyam was a co-founder and the system architect for DeepSpeed at Microsoft, where he worked on developing high-performance infrastructures for accelerating large-scale deep learning training and inference on parallel and distributed systems. He designed systems such as ZeRO and 3D parallelism that have been adoptedby many DL frameworks, has become the staple engine for training large language models including models like meta LLama and many other LLMs. On the inference front, he led the effort to optimize the inference system for Dalle.2 and Dall.E 3 models from OpenAI to reduce latency, cost and improve capacity. He has also designed fast systems and led optimization efforts that have been released as part of DeepSpeed-Inference/MII and used in products such as Bing, Ads, AzureML within Microsoft. Samyam received his PhD in High Performance Computing from The Ohio State University.

Arctic Long Sequence Training (ALST): Scalable And Efficient Training For Multi-Million Token Sequences

Snowflake's ALST enables scalable training of long-context models with up to 15 million tokens using Hugging Face and DeepSpeed, all without custom modeling code.

Samyam Rajbhandari

Arctic Long Sequence Training (ALST): Scalable And Efficient Training For Multi-Million Token Sequences

MORE POSTSFROM Samyam Rajbhandari

Inside Snowflake Intelligence: Five Pillars of Enterprise-Grade Agentic AI

Arctic Inference with Shift Parallelism: The Fastest Open Source Inference System for Enterprise AI

Scaling vLLM for Embeddings: 16x Throughput and Cost Reduction

Low-Latency and High-Throughput Inference for Long Context with Sequence Parallelism (aka Arctic Ulysses)

SwiftKV from Snowflake AI Research Reduces Inference Costs of Meta Llama LLMs up to 75% on Cortex AI

Where Data Does More