Snowflake's Arctic-Extract: A Vision-Language Model for High-Fidelity Document Extraction

Introduction: The changing landscape of document intelligence
In today's data-driven landscape, enterprises grapple with a deluge of unstructured information locked within contracts, invoices, reports and beyond. The linchpin of automation and intelligent data platforms lies in extracting structured, actionable insights from these diverse sources. However, the journey to handle document processing at scale, across various domains and linguistic landscapes, is fraught with engineering challenges. Traditional Document AI models often hit roadblocks in maintaining and scaling with high accuracy, accommodating lengthy documents, and optimizing inference costs.
In response to these engineering hurdles, we are excited to unveil Arctic-Extract in public preview, our latest advancement in Document AI. This model is meticulously designed to deliver rapid, precise and cost-effective data extraction from documents across a broad spectrum of use cases.
The challenge: Addressing scalability, structural diversity and multilingual complexities
Prior iterations of our document extraction models, such as Arctic-TILT, provided a solid foundation for extracting data from unstructured files. However, we identified several key areas for engineering improvement:
Quality and performance: The rapid evolution of AI demanded a substantial boost in model accuracy and efficiency. High-quality document understanding often came at the expense of excessive computational resources and associated costs, especially when integrating Optical Character Recognition (OCR) into the pipeline.
Multilingual support: As our global footprint expands, robust performance in non-English documents has become an increasingly critical engineering requirement. Achieving consistent accuracy across diverse languages has remained a dynamic challenge.
Benchmarking robustness: Ensuring reliability across a wide variety of document types and domains is essential, but proving it through consistent, reproducible benchmarks proved to be a tough nut to crack.
Our solution: Arctic-Extract — a new vision-based approach
Arctic-Extract represents a paradigm shift in Document AI design, a sophisticated document-understanding model meticulously engineered for Snowflake's Document AI pipeline. Our core engineering priorities included:
Architectural superiority: We leveraged state-of-the-art techniques to significantly enhance model quality and performance. Arctic-Extract embodies the pinnacle of our current engineering capabilities.
Native vision-language fusion: Rather than relying on the traditional OCR-plus-text pipeline, we adopted an end-to-end approach that natively processes images and text. This innovation facilitates deeper contextual understanding and streamlines system complexity.
Comprehensive multilingual engine: Our model now boasts support for 29 languages, matching or surpassing previous models in multilingual extraction while delivering marked improvements in accuracy for all major European languages, Chinese, Japanese, Korean and more.
Rigorous real-world validation: Arctic-Extract has been put through its paces on diverse, multilingual datasets to thoroughly validate its accuracy, scalability and resource efficiency.
Benchmark results: Significant gains in accuracy and efficiency
We conducted a thorough evaluation to compare Arctic-Extract with Arctic-TILT across several crucial engineering benchmarks:
Performance surge: The latest architecture achieved an average seven-point improvement over Arctic-TILT on our Business Document benchmark, including an increase of around four points on DocVQA. This places it within the vanguard of current models, achieving scores comparable to leading 72B models — despite being more than 10 times smaller.
Superior business table extraction: Arctic-Extract showcases exceptional performance in Business Table Extraction, achieving scores 35% higher than leading LLMs.
Inference speed: Our OCR-free design significantly reduces inference time without sacrificing quality. By integrating OCR capabilities directly into the model, Arctic-Extract gains comprehensive information.
Cost efficiency analysis: Thanks to its integrated approach, Arctic-Extract achieves near cost parity with Arctic-TILT in most cases, all while providing superior understanding capabilities.
Market review: Evaluating document understanding functions across cloud providers
To gain a comprehensive understanding of the current landscape of Document Understanding capabilities, we conducted a targeted review of Intelligent Document Processing functions offered by various leading cloud providers, and the results of an evaluation on DocVQA (Document Visual Question Answering) data set.

The solution provided by Snowflake with AI_EXTRACT not only scores the best, but is also easy to use.
We also analyzed the cost-effectiveness of various document AI solutions for custom extraction and the data showed that Snowflake's AI Extract stands out as the most economical option. With significantly lower cost than other major vendors, Snowflake is the go-to option for organizations looking to optimize their expenditure without compromising on quality of intelligent document processing.
Provider | Product Offering | Cost per 1000 pages |
---|---|---|
Snowflake | AI Extract | $5 - $7 |
Document AI - Custom Extraction | $20 - $30 | |
Microsoft | Azure AI - Custom Generative Extraction | $21 - $30 |
Amazon | Textract - Custom Queries | $15 - $25 |
Custom Orchestration | Parsing + LLM calls and VLM models | $12 - $18 |
Managed point solutions | End-to-end solutions | $30 - $45 |
Recent enhancements and engineering deep dives
The deployment of Snowflake Arctic-Extract led to numerous architectural refinements:
Core system integration: Arctic-Extract spearheads the adoption of new inference pathways within Snowflake's Cortex, supporting future extensibility and reliability.
Output context: Our model now supports outputs of up to 400 tokens per question, enabling more detailed extractions for long documents or scenarios with multiple answers.
Scalability optimization: Internal benchmarking and cost evaluations revealed advantages of our tightly integrated image-text architecture.
Deployment readiness and upcoming availability: Arctic-Extract is deployed and can be accessed using SQL inference as part of AI_EXTRACT Public Preview. In the near future, we will support REST API and the standard Document AI UI/UX.
We are eager to collaborate with customers, researchers and the open-source community as Arctic-Extract continues to evolve. This is just the start for those dedicated to AI for documents, large language models and scalable enterprise solutions.