So you’ve got data—lots of data—coming into your organization from various sources. How do you make sense out of all of it?
At Goldman Sachs, the Legend data platform and Snowflake Native Apps Framework are not just helping teams understand all that data, but also transform it, govern it, share it, and model it—improving timely, data-driven insights and collaborative decision making.
Reducing data inertia with Legend and Snowflake Native Apps
Goldman Sachs works with Snowflake for data warehousing and data engineering, and Snowflake Secure Data Sharing plays a large role in how Goldman Sachs acquires and onboards third-party data from vendors, sellers, and marketplaces. This partnership improved the ingestion of new data, but then came the challenge of digitally enforcing the digital rights and access restrictions on the data and getting insights out of it. Historically, researchers had to work with data engineers to combine internal data with third-party data while maintaining governance, and data engineers had to understand the researchers’ specific needs to apply the right transformations and processes. Each of these situations was fraught with potential bottlenecks, delays, and governance concerns.
By combining the Goldman Sachs Legend platform with Snowflake Native Apps capabilities, data engineers were able to simplify a complex process into a self-service experience for researchers and business users.
Legend, Goldman Sachs’ open source data platform, helps connect the dots between different data sets and generate data model–driven insights, transforming raw, disconnected data into organized, structured formats for easy decision making. The Snowflake Native App Framework gives providers the ability to bring their application code to their customers’ data and makes it easy to build, sell, deploy, and distribute applications in the Data Cloud. Snowflake Native Apps are also deployed in the customer’s account in a way that gives the customer control over their data, while still protecting the data provider’s intellectual property—a factor that was especially relevant for Goldman Sachs.
“The whole idea is to reduce friction and bring this data as quickly and as smoothly as possible into the hands of our users,” says Abhishek Narang, Managing Director, Data Engineering at Goldman Sachs. “In the past, business teams received massive tables of data replicated into their databases, from which it was hard to derive insights. In this new model, we have built in governance models to make data access more meaningful to the end user.”
The Goldman Sachs data pipeline is multifaceted, providing third-party data that can be pulled in using data delivery facilitators including but not limited to Snowflake Marketplace, Amazon Data Exchange, and Crux, a third-party processing agent that can serve as a proxy between Goldman Sachs and other vendors, delivering data via Snowflake Secure Data Sharing.
All of this third-party data passes through Narang’s team, which models the data in Legend, enforces the correct digital rights and entitlements, ensures quality constraints are met, and makes sure the data is clean and versioned appropriately. In a matter of days, Narang’s team makes the modeled and access-controlled data available to business units for use in anything from research initiatives and alpha generation to finding creative opportunities to help clients.
“Modeling data this way in Legend allows everyone, irrespective of function, to speak a common language when it comes to using data in their day-to-day,” says Narang. “Data models are the most succinct and accurate functional specification.”
To demonstrate the power of this capability, Goldman Sachs “supercharged” a Snowflake Native App with capabilities from Legend by transforming APIs to SQL to ensure both native database performance and enforcement of usage rights.
“Legend Snowflake Native Apps in the Data Cloud allow us to encapsulate all the entitlements, all the model spec—which is our IP—and share that with the end user,” says Narang. “It basically superimposes governance with the classic data sharing approach.”
Making governed data collaboration possible
The combination of Legend’s semantic modeling, Snowflake Native Apps, and Secure Data Sharing simplifies governed data sharing and collaboration for Goldman Sachs. The rules dictating how data is connected and entitled are packaged within the app, and remain within it when the app is reshared. This approach reduces several risks associated with siloed data engineering teams in individual business groups, including disparate approaches to enforcing entitlements and governance, duplication of data and effort, cost increases, and different understandings of the data. For example, one team might write queries differently from others, or someone may misunderstand how one data set joins to another and write different semantics in their notebooks.
To further combat these risks and facilitate self-service insights, Goldman Sachs did two things:
- Make the discovery process easy: A single internal catalog of vendor data lets users identify what’s available via Legend and which business units can consume the data.
- Ensure appropriate access: Once users figure out which data they need, all they have to do is request and consume the Snowflake Native App with the click of a button. To the user, it looks like a regular view or a table because it’s shared with user-defined table functions (UDTFs)—but they’re actually getting all of the model specification packaged in the app with the appropriate governance.
The impact of this new approach is substantial. Data set onboarding and access processes that took weeks or months have been reduced to days. Researchers, quants, and data scientists can now use Legend Snowflake Native Apps in the Data Cloud to find appropriate data for their needs, analyze it, transform it, and share it without having to track down a data engineer, and without worrying about degraded quality or governance. Data engineering teams can spend their time on more strategic projects instead of managing operational requests for new data sets.
What’s next for Goldman Sachs and Snowflake Native Apps
Legend and the Snowflake Native App Framework not only help Narang’s team and other internal users save time and effort, but also open the option for governed data sharing with external clients and partners. For example, an institutional client could bring their Snowflake instance and join their data with data sourced from the Legend Snowflake Native App in the Data Cloud to make better investment decisions.
“I’m excited to see how easy it is for data engineers to produce those apps and distribute them via marketplace, private, or auto-fulfillment functionalities, and how easy they are for our researchers and other business teams to use. A perfect example of a win-win,” says Narang.