Extrapolating from past trends to predict the future is critical to all aspects of business planning. Data warehouses are often full of historical data, but they remain separate from planning systems that factor in future scenarios. Closely coupling data warehouses with time-series forecasting methods can provide a view of historical trends along with accurate predictions about the future.

Forecasting engines continue advancing, and some of the most interesting changes come from our partner AWS’ AI group in the form of the new Amazon Forecast service announced in November 2018 at AWS re:Invent. Amazon Forecast is expected to become GA this quarter, and Snowflake is proud to partner with Amazon AI to bring the power of this new service to our customers to help power the Data Economy.

Why in-Database Forecasting is not Enough

Until now, most databases provided limited capabilities to predict future data, and those that can generally use simple statistical techniques that usually don’t take into account the realities of seasonality, geographic differences, or special events. They are also unable to make sense of various underlying and potentially overlapping micro-trends that influence the top level forecasted numbers.

An interesting aspect of the Amazon Forecast, and the related Amazon Personalize service, is that they were trained on a massive data set assembled by Amazon AI, and it leverages machine learning (ML) techniques that go beyond simple extrapolations. It is an engine that very few companies could reproduce on their own, because its accuracy comes not just from its algorithms but also from its training data. This allows it to perform very fine-grained forecasts that take into account trends at local geographical and temporal levels that no human might be aware of.  

It would be impossible to replicate this level of detail and accuracy inside a standalone database engine. Therefore, we created an integration approach and examples that allow Snowflake customers to run data through this unique engine to enrich their own data warehouses with the results of the Amazon Forecast service.

AI Integration in Data Sharing Scenarios

Snowflake Data Sharing enables data providers to share both data and analytical user-defined functions (UDFs), as described in a previous blog post. UDFs enable powerful SQL and JavaScript analytics, but we can go further by sharing AI- and ML-based functions. Many Snowflake customers that use Snowflake Data Sharing are interested in being able to enrich their customers’ data with their own data and then run the merged sets of data through AI and ML engines.

Also, external factors such as weather, demographic shifts, financial market performance, and even social media sentiment affect many time series. Most companies’ data warehouses do not contain such time series trends and are often not present inside most company’s data warehouses. However, users can obtain this kind of time series data from other companies and data providers using Snowflake Data Sharing. Once shared, users can feed these additional time series into services like Amazon Forecast as “Related Time Series.” This kind of supplemental data can increase the accuracy of AI and ML modeling and forecasting, including the following example, which uses Amazon Forecast.

Example Use Case with Amazon Forecast

The example we’ll use in this multi-part blog series integrates the Amazon Forecast engine with data from Snowflake. First, we will demonstrate an example of a Snowflake user both training and applying Amazon Forecast on their own local data. Next, we will incorporate additional data time series from another Snowflake user via Snowflake Data Sharing.

In the Snowflake Data Sharing example, a Snowflake user takes a time series data set from a Snowflake Data Provider and merges it with a time series data set of its own. The first user then runs the merged data set consisting of the enriched time series through Amazon Forecast, which generates future forecasted data points along the same time series.

How the Integration Works

Here are the steps:

  1. Extract time series: The user isolates a set of time series training data from its Snowflake database and saves it to Amazon S3.
  2. Forecast: The user runs the data through Amazon Forecast using a Python script, receives a baseline forecast, then loads it back into Snowflake.
  3. Connect to a Share: The user connects to another Snowflake user’s data through Snowflake Data Sharing. The first user then isolates one or more time series datasets from the other user and saves it to Amazon S3.
  4. Re-forecast with Enriched Data: To receive an improved time series forecast, the user runs both the original and the shared time series datasets through Amazon Forecast using a Python script. The user also provides the shared time as additional inputs to Amazon Forecast as Related Time Series. Amazon Forecast then uses the inputs to improve the accuracy of the forecast. The user then loads the resulting forecast into Snowflake.

Other Useful Services: Amazon Personalize and Amazon SageMaker

AWS’ AI group also offers Amazon Personalize, which generates personalized recommendations. This engine is also very applicable to Snowflake customers and the Snowflake Data Sharing ecosystem. Beyond the pre-built AWS AI models, the Amazon SageMaker platform trains bespoke AI models. Our previous blog post on integration with Amazon SageMaker covers this example in more detail.

Next Steps

The next installment of this blog series will include a complete working example that you can try and then adapt to your own use case. We’re very excited about partnering with the AWS AI group to bring the power of best-in-class ML algorithms to both Snowflake customers and our data sharing ecosystem.