Skip to content
  • AT SNOWFLAKE
  • Industry solutions
  • Partner & Customer Value
  • Product & Technology
  • Strategy & Insights
Languages
  • Deutsch
  • Français
  • Português
  • Español
  • English
  • Italiano
  • 日本語
  • 한국어
  • Deutsch
  • Français
  • Português
  • Español
  • English
  • Italiano
  • 日本語
  • 한국어
  • AT SNOWFLAKE
  • Industry solutions
  • Partner & Customer Value
  • Product & Technology
  • Strategy & Insights
  • Deutsch
  • Français
  • Português
  • Español
  • English
  • Italiano
  • 日本語
  • 한국어
  • 개요
    • Why Snowflake
    • 고객 사례
    • 파트너 네트워크
    • 서비스
  • 데이터 클라우드
    • 데이터 클라우드
    • 플랫폼 개요
    • SNOWFLAKE 데이터 마켓플레이스
    • Powered by Snowflake
    • 라이브 데모
  • WORKLOADS
    • 협업
    • 데이터 사이언스&머신러닝
    • 사이버 보안
    • 애플리케이션
    • 데이터 웨어하우스
    • 데이터 레이크
    • 데이터 엔지니어링
    • 유니스토어
  • PRICING
    • Pricing Options
  • 산업별 솔루션
    • 광고, 미디어 및 엔터테인먼트
    • 금융 서비스
    • 의료 및 생명 과학
    • 제조
    • 공공 부문
    • 소매 / CPG
    • 테크놀로지
  • 리소스
    • 리소스
    • Documentation
    • 핸즈온 랩
    • 트레이닝
  • CONNECT
    • Snowflake 블로그
    • 커뮤니티
    • 이벤트
    • 웨비나
    • 팟캐스트
  • 개요
    • 회사 소개
    • 투자정보
    • 리더십 및 이사회
    • 채용
Author
Snowflake
Share
Subscribe
2020년 09월 02일 4 min read

3 Snowflake Features That Make Data Science Easier

  • 제품 및 기술
    • 데이터 사이언스
3 Snowflake Features That Make Data Science Easier

Data science is proving to be a major competitive advantage for companies. While business intelligence (BI) helps companies with reporting and historical analysis, data science goes a step further and predicts the future. It can leverage much more data from many more sources, and using machine learning (ML) principles, it automatically identifies patterns and trends to model, predict, or forecast future outcomes. 

Data science is being used for a wide variety of purposes, from providing personalized movie and TV show suggestions to forecasting where a virus is likely to spread next and helping save lives. This giant leap to advanced analytics has largely been enabled by the cloud. Companies can inexpensively collect, store, and analyze more data than ever before, and with graphics processing unit (GPU)–accelerated computing, they can train multiple ML models simultaneously in just minutes and then choose the most accurate ones to deploy.

But most data science projects fail; in fact, according to a VentureBeat article, 87% never even make it into production, largely because of the complexity involved in building them. Snowflake’s cloud data platform helps companies streamline their data science initiatives. In a newly released Deloitte report that surveyed more than 2,700 global companies about how they are preparing for AI, they ranked modernization of their data infrastructure as their top initiative for gaining a competitive advantage because it is “foundational to every AI-related initiative”—evidence that a modern cloud data platform such as Snowflake can be the linchpin for delivering successful data science projects. 

Figure 1. An illustration of a typical data science workflow 

Snowflake Features That Power Successful Data Science Projects 

Here are three Snowflake features that make it simpler for companies to run successful data science projects so they can leverage AI and ML to enable advanced analytics and gain a competitive edge.

A single, consolidated source for all data

For the highest accuracy, data scientists need to incorporate a wide variety of information when training their ML models. But data can reside in many places and comes in various formats. According to Infoworld, data scientists typically spend up to 80% of their time finding, retrieving, consolidating, cleaning, and preparing data, and only the remaining 20% on building, training, and deploying their models. Much of this is because getting the right data isn’t just “one-and-done.” Data scientists often need to go back to collect additional data multiple times during the course of one project. This entire process can take weeks or months, contributing to latency in the data science workflow. In addition, the data used for analysis needs to have a high level of integrity, or the results won’t be valid or trusted.

By bringing data in from multiple environments, Snowflake provides all data in a single high-performance platform, removing the complexity and latency caused by traditional ETL jobs. Data can be profiled and cleansed directly in Snowflake, ensuring a high level of data integrity. And Snowflake also provides data discovery capabilities so users can more easily and quickly find and access their data. Snowflake also provides instant access to diverse third-party data sets through Snowflake Data Marketplace. There, unique third-party data is available from hundreds of providers and available immediately on demand.

Powerful compute resources for data preparation

Data scientists need powerful compute resources to process and prepare data before they can feed it into modern ML models and deep learning tools. As mentioned above, data scientists spend most of their time understanding, processing, and transforming data they find in multiple formats. One such compute-intensive process is feature engineering, which involves transforming raw data into new, clearer signals that are more meaningful and lead to more-accurate predictive models. Creating new features that are predictive can be complex and time-consuming, involving domain expertise, familiarity with each model’s unique requirements, and multiple iterations. Most legacy tools, including Apache Spark, are overly complex and highly inefficient at data preparation, resulting in brittle and expensive data pipelines.

Snowflake’s unique architecture provides dedicated compute clusters for each workload and team so there is no resource contention between data engineering, BI, and data science workloads. Snowflake’s ML partners push down much of their automated feature engineering into Snowflake’s cloud data platform, providing a significant speed boost to automated machine learning (AutoML). Manual feature engineering can be done in Snowflake using many languages by using Snowflake’s Python, Apache Spark, and ODBC/JDBC connectors. Transforming data with SQL makes feature engineering accessible to a broader audience of data workers and can result in speed and efficiency boosts of 10 times compared to Apache Spark.

An extensive partner ecosystem

Data scientists use many tool sets, and the ML space is rapidly evolving, with new tools being added each year. However, legacy data infrastructure can’t always support the demands of multiple different tool sets, and new technologies such as AutoML require a modern infrastructure to function properly.

Through Snowflake’s extensive partner ecosystem, customers can take advantage of direct connections to all existing and emerging data science tools, platforms, and languages such as Python, R, Java, and Scala; open source libraries such as PyTorch, XGBoost, TensorFlow, and scikit-learn; notebooks such as Jupyter and Zeppelin; and platforms such as DataRobot, Dataiku, H2O.ai, Zepl, Amazon Sagemaker, and many others. Snowflake also offers integrations with the latest ML tools and libraries, such as Dask and Saturn Cloud. By offering a single consistent repository for data, Snowflake removes the need to retool the underlying data every time tools, languages, or libraries are changed. Furthermore, the output from these tools can seamlessly be integrated back into Snowflake. 

Snowflake Is an Engine for Business Value

Once predictive models are deployed, their scored data can be fed back into traditional BI decision-making processes and embedded into applications such as Salesforce. Feeding powerful data science results back to business users can unlock insights that provide unprecedented business growth. In addition, when Snowflake is used with leading ML tools, it can drastically reduce latency in the data science workflow by cutting the time required for developing models from weeks or months to hours.

The Deloitte and Snowflake Alliance

Are you looking to gain a competitive edge? If so, there’s no better time than now to start powering your data science and advanced analytics workloads with Snowflake’s cloud data platform. For more information on how Deloitte and Snowflake help organizations accelerate their data science and data modernization strategies by providing a unique combination of tools, capabilities, and resources, visit Deloitte’s Snowflake Alliance resource.

Share

Spark Machine Learning

Spark machine learning provides a powerful ecosystem for ML and predictive analytics using popular tools and languages.

Have a look
Read More

What is Machine Learning?

Machine learning is an application of artificial intelligence (AI) that enables systems to learn automatically and improve...

Find Out How
Read More

Intro to Automating Security

Read on to learn how automating security can help businesses harden their cybersecurity defenses and the unique role that cloud solutions play in automating security.

Discover
Read More

Snowflake for AI and ML

Accelerate your AI/ML workflows with fast data access and elastically scalable data processing for Python and SQL.

Discover
Read More
Snowflake Inc.
  • 플랫폼 개요
    • 아키텍처
    • 데이터 애플리케이션
  • 데이터 마켓플레이스
  • Snowflake 파트너 네트워크
  • 지원 및 서비스
  • 회사
    • 문의하기

Sign up for Snowflake Communications

Thanks for signing up!

  • Privacy Notice
  • Site Terms
  • Cookie Settings

© 2023 Snowflake Inc. All Rights Reserved