The challenge was clear: In order to become a publicly traded company we needed to consistently forecast our core financial metrics to within a tight range. To do that we needed to use data science and machine learning (ML) and leverage our own product to accurately forecast the company’s revenue.
The fact that Snowflake is a public company today is evidence of both the power of data science and our product. For the data science team in Snowflake’s Finance department, we found that the keys to developing forecasts our team and the market can trust are partnership, platform, and, of course, data.
Every Snowflake (customer) is unique
Snowflake is a consumption- or usage-based business, meaning that our revenue is intrinsically tied to the value we provide to consumers. Consumption-based pricing serves both our and our consumers’ best interest; however, with a consumption-based business model, predicting revenue and other top-line metrics can be far from straightforward. These challenges aren’t unique to Snowflake. From ad-based businesses to professional services organizations, all businesses have some level of variability.
When setting out to forecast key business metrics, we faced a few challenges:
First, each customer is unique. From their usage patterns to their ramp times, we see a wide variation in how customers engage with our platform. Two customers may be in the same industry, use similar technology stacks, be close to the same size, share many contract details, have a similar use case, and still consume in very different ways.
If you plot daily usage for many customers versus time, it looks like the daily movements typical of the stock market—lots of noise and variability. Even when aggregating across longer periods or among groups of customers, significant variability remains.
Historic usage doesn’t indicate future usage
Second, many of the factors driving a customer’s future consumption cannot be predicted from their usage alone. And because each customer is unique, we often cannot turn to behaviors of other Snowflake customers with similar characteristics for answers either. Key events such as new strategic initiatives, reorgs, mergers or divestments, delays and accelerations of major customer programs, loss or addition of key talent at the customer, new cost controls, and regulatory changes affecting the customer’s market are all factors that can create discontinuities in consumption that are not predictable from readily available data.
In a business-to-consumer business with millions of customers, the data volume often enables an end-to-end machine learning solution to learn from and cope with many types of high-variance events. In addition, gross mispredictions of a small percentage of customers don’t have the potential to materially impact the top-line forecast. In contrast, in a setting with several thousand customers, each of which consumes at a pace that may not be predictable from their past usage, and where significant mispredictions on only a small number of these customers could cause material deviations from the global forecast, a different approach is needed. Machine learning, while vital, is only part of the solution.
Increased efficiency = increased variability
Third, as our engineers make continual improvements to our product, and as our sales engineering and services teams help to optimize customer workloads, customer credit consumption can drop.
For example, earlier this year, our service professionals helped optimize a large set of workloads at one of our customers. The customer had previously needed to ramp the workloads quickly, sacrificing efficiency for speed. Following the optimizations, consumption on these workloads dropped significantly, which translated to a large drop in the customer’s total Snowflake usage. Improvements like this are great for our customers and for our business in the long term; however, from a forecasting perspective, it creates an additional source of future variability that may not be captured in our internal product usage data. Consequently, we need to make ongoing adjustments to our forecasts.
Our forecasting process
Our solution to these forecasting challenges is a process that combines predictions from a collection of machine learning models with on-the-ground knowledge from other departments, including sales and product, that can’t be inferred from the data. Leveraging the Snowflake platform supercharges this process, enabling us to reforecast our business—from top-level product revenue to each individual customer’s consumption, across many time horizons—on a daily basis.
The following steps are highly effective in generating accurate forecasts for our consumption-based business:
- Generate a preliminary ML forecast. Every day we build a bottoms-up forecast of our business, predicting the consumption for each customer account, using a series of machine learning models. These models draw on product usage data as well as sales and marketing information. Over time, accounts move from models that rely exclusively on historical statistics to more-tailored machine learning models as we obtain more data about each customer and as their usage patterns mature.
- Solicit feedback and gather inputs. To augment the daily raw ML forecasts with on-the-ground information, once per quarter we distribute the most recent daily forecast to the sales team and solicit their insight into expected changes in customer behavior not evident in the usage history. We also gather inputs about upcoming changes to the product as well as the accounting treatment of various factors. Combining this information determines what sort of adjustments the ML forecast will need. These adjustments are dollar amounts that can be applied at the account level for each day in specified months, quarters, and years. We also do topside adjustments that are distributed across all accounts or groups of accounts.
- Generate and adjust an ML forecast of record. Armed with the necessary inputs, we then take the most recent ML forecast and pass it through our adjustment pipeline.
- Analyze, review, and update the adjusted forecast. We subject the adjusted forecast to a rigorous review process that involves a number of sanity checks as well as an analysis of historical forecast bias, consumption trends, and seasonality. This process will often inform additional adjustments to the forecast.
- Deploy the final forecast alongside daily updates. Once the final reviews have been passed, the forecast is designated as our quarterly revenue plan and is deployed. This includes pushing to our daily reporting dashboards, which show the quarterly plan alongside the most recent forecast. The adjustments determined during this quarterly process are made to the subsequent raw daily forecasts as well. Some of the adjustments to the daily reforecasts will decay in size as the quarter progresses as we obtain new information, while others, such as accounting adjustments, will retain their size through quarter end. Additionally, we make mid-quarter updates to the adjustments when needed.
Our forecasting process
Keys to successful forecasts: data, partnership, platform
Data, data, data
As data scientists we usually prefer to focus on aspects of the modeling process—metrics, feature engineering, model selection, hyperparameter tuning, the validation methodology—all of which are important topics, but we can’t lose sight of the importance of having all the right data readily accessible in one place.
Within Snowflake, we centralize data from our ERP, data from business systems such as Salesforce, third-party data from Snowflake Data Marketplace partners such as Dun & Bradstreet, as well as from data sources developed in-house, such as our in-house billing engine and the vast trove of metadata generated by our product. A centralized source of truth allows us to develop a holistic view of the business that is updated daily.
Snowflake is a centralized source of truth for all relevant data
Partnership with a range of business functions
With such variability in our business, the forecasting process is reliant on partnership with teams across Snowflake. As I highlighted earlier, we work closely with the Sales team to collect feedback on how our forecasts line up against what they are seeing. This helps us identify and understand any unexpected changes in customer consumption that cannot be predicted from the available data.
In addition to Sales, we work with functions including Product, Accounting, and Corporate Financial Planning to understand other factors influencing consumption.
A data platform with scale, flexibility, and control
Having the right data cloud platform allows our team to move faster and collaborate across teams and data sources with ease. Several aspects of Snowflake make a big difference for our team:
- Performant at scale: With many folks using Snowflake, it is important that we do not run into resource contention with other departments or experience slow queries during critical moments. With multi-cluster compute and autoscaling, Snowflake has been able to handle any amount of data or users we’ve thrown at it.
- Secure and governed: Our team works with a lot of sensitive financial data. Multi-party governance controls such as row-level access policies and data masking give us fine-tuned control over sharing the right data, with the right people.
- Centralized source of truth: I touched on this above, but being able to instantly access relevant data—whether from our ERP or third-party data providers—in one centralized place allows us to develop a 360-degree view of our business.
- Automation: We reforecast a multitude of indicators on a daily basis. Snowflake’s incredible performance enables all of the highly complex transformations at scale needed throughout each day. In addition, Snowflake’s ease of use enables this large data pipeline to be maintained by a small team, which is largely free to focus on the substance of their work instead of coping with the complexity of the underlying platform.
As businesses increasingly adopt consumption-based business models, three things will enable them to accurately forecast key financial metrics: 1) Embed data scientists within your financial organization; 2) use the Data Cloud so you can seamlessly centralize, utilize, and govern data; and 3) establish and maintain deep relationships with teams across your organization.