From Data Wrangling to Feature Engineering

Every year, insights from business analytics and machine learning (ML) have a bigger and bigger effect on how organizations solve business problems with data. However, the insights in a business dashboard or predictions from an ML model are only as valuable as the quality of the data behind them.

Building high-quality data sets is a multi-step process known as data wrangling which includes cleaning, mapping, and transforming data into a workable format.

These activities commonly involve the following:

  • Merging multiple data sources into a single data set
  • Identifying gaps in the data (for example, empty cells in a table) and either filling or deleting them
  • Deleting data that’s either unnecessary or irrelevant to the project at hand, such as removing duplicates
  • Identifying extreme outliers in the data

This ebook describes how analytics and data science teams can maximize efficiency by leveraging a cloud data platform to unify and govern both data wrangling and feature engineering activities.