What is Big Data Engineering?
Big Data Engineering is the development, building, and maintenance of database and processing systems architecture for organizations collecting large and growing data sets. Big data engineering takes raw data sets and pipelines that are often riddled with machine- and human-generated errors and employs methods to improve data consistency and reliability.
Big Data Engineering vs. Data Science
Data Science comes into play a bit later in the data life cycle. After data engineers help facilitate reliable data streams, they then provide the cleansed data to data scientists, who use data analytics technology, statistical methodology, and machine learning to create and share accurate, repeatable insights that can be be consumed by business analysts and stakeholders. In order to hand off data to data science teams, data engineers need to establish processes around data modeling, data mining, and ultimately production.
Feature engineering, a subset of data engineering, is the process of taking input data and creating features that can be deployed by machine learning algorithms. Feature engineering provides an essential human dimension to machine learning that overcomes current machine limitations by injecting human domain knowledge into the ML process. However, machine learning has evolved to the point where manual feature engineering can be replaced over time by feature learning, which enables a machine to "learn" human-derived data features and then act on this knowlede to perform tasks.
Snowflake allows data engineers to perform feature engineering on large, Big Data datasets without the need for sampling. For a first-hand look at feature engineering on Snowflake, read this blog post.