CUSTOMER STORIES
Chicago Trading Company Replaces Managed Spark for 54% Cost Savings
By moving to Snowflake, CTC’s research platform now brings development to the data for quick, cost-effective and reliable data processing — so traders have the insights they need, when they need them.
KEY RESULTS:
54%
Cost savings — amounting to millions of dollars annually — by moving from managed Spark to Snowflake
$800K
Saved annually by eliminating data movement out of Snowflake and back
Industry
Financial ServicesLocation
Chicago, IllinoisKeeping pace with the speed of markets
Markets move incredibly fast — and no one knows that better than market-makers like Chicago Trading Company (CTC). Recognized as a leading derivatives trading firm, CTC provides liquidity to markets around the world, helping drive efficient, stable and healthy markets by participating on both buy and sell sides.
Supporting dynamic markets requires massive amounts of data. Of CTC’s more than 700 employees, most are quants or engineers. “We need data to make informed decisions about how we optimize our trading behavior and maximize our capabilities,” says David Trumbell, Head of Data Engineering and Principal Engineer at CTC.
When the closing bell rings and the trading desks fall quiet, deep inside CTC’s internal research platform, the real heavy lifting begins. A central resource for the firm’s traders, quants and researchers, this research platform collects information from thousands of sources, including feeds from every exchange they trade on, historical trading prices and third-party data. This amounts to tens of thousands of data sets that need transforming every night to produce the insights that traders need when markets open the following morning. If it sounds like an expensive and resource-consuming task, that’s because it is — or at least, it was.
After years of racking up huge data bills, CTC decided to move its processing off of managed Spark and onto Snowflake, where it had already built its data foundation. Working with Snowflake Professional Services to facilitate a smooth transition, CTC almost immediately found welcome relief from a number of its biggest challenges — from cost and reliability to speed and complexity. Thanks to the reduction in costs, CTC now maximizes data to further innovate and increase its market-making capabilities.
“Now with fewer ephemeral failures and higher visibility in Snowflake, we have a platform that’s much easier and cost-effective to operate than managed Spark.”
David Trumbell
Story Highlights
- Reduced costs and stronger security by eliminating data movement: By moving toward a fully integrated solution, CTC is saving millions each year by getting better visibility into spend and avoiding costly data transfers, which also pose security risks.
- Increased reliability and speed for more efficient data processing: In addition to resolving the reliability issues it saw on managed Spark, the CTC team was able to hit their SLA deadline every day for the first time in the company's history — a milestone it hadn't been in a position to track until Snowflake put the goal within reach.
- Simplified ecosystem for improved usability and innovation: The new system is less taxing on CTC’s engineers, who can now report on the ROI of their efforts and focus more on bringing new value to the firm through technological innovation.
Saving millions by eliminating data movement and gaining visibility into spend
One of the most obvious arguments for moving from managed Spark to Snowflake was that clear-cut cost inefficiencies were baked into the previous system. CTC was paying $800,000 a year just to move data from Snowflake to managed Spark for processing and back again. Now, not only has this expense been eliminated, but also Python development occurs colocated with the data.
With Snowflake, CTC also gained greater visibility and control over its costs. The firm now immediately sees the precise cost of each job, how much compute a particular data set uses, and what it spends on each pipeline. “We had so many months of frustration around trying to get to what each job on managed Spark cost,” Trumbell says. Without this level of detail, it was difficult to attribute expenses to different lines of business and nearly impossible to accurately assess ROI.
“After moving to Snowflake, we immediately got ahead of these cost problems with the ability to see exactly what each specific grouping of code costs on a per-job run,” Trumbell says. This visibility helps the team make more informed decisions around which jobs to run to maximize resources and ROI.
In all, the switch from managed Spark to Snowflake has saved CTC 54% in costs — which translates into millions of dollars for the firm.
“We had so many months of frustration around trying to get to what each job on managed Spark cost. After moving to Snowflake, we immediately got ahead of these cost problems.”
David Trumbell
From complex failures to a simplified, swift and steadfast system
CTC’s move to Snowflake addressed another cost driver: data-processing jobs that simply failed. Some jobs would fail for known reasons, Trumbell says, while others seemed to fail spontaneously, and rerunning the job would suddenly succeed. “Every time one of the managed Spark jobs would fail, it caused much higher costs in terms of compute while also jeopardizing our ability to deliver that data quickly,” he says.
By bringing the compute to the data in Snowpark, CTC has drastically reduced job failures — an invaluable improvement, given that the company’s data-processing jobs are always running against a clock. The goal, Trumbell says, is to have the last hour of the previous day’s data available at least one hour before markets open.
While moving off of managed Spark, CTC was also able to seize the opportunity to implement some operational changes. In combination with the reliability of Snowpark, that meant the company was able to hit that one-hour minimum pre-market deadline every day for the first time in the company's history — a milestone it hadn't been in a position to track until Snowflake put the goal within reach.
54%
lower costs, saving millions of dollars
“Our DataOps team has started to produce an SLA scorecard,” Trumbell says. “We’re now confident we can hit our SLAs on a regular basis, where we normally would’ve missed seven or more days per month with managed Spark.”
One of CTC’s data engineers echoes these performance sentiments: “Snowpark performed insanely better. The variance, too, in the Spark job durations was huge, whereas Snowpark is very consistent.”
This kind of speed and reliability can be traced back to Snowflake’s ease of use. The entire CTC system now is managed by just a couple of engineers, who no longer have to wrestle with the complex configurations demanded by managed Spark. “The amount of knobs was tremendous, and that led to issues,” Trumbell says. “Jobs needed to rerun because the environment wasn’t configured properly for all of these tens of thousands of ETLs that were running.”
Investing in further innovation — richer insights, advanced AI and beyond
Thanks to significant cost savings and freed-up engineering resources, CTC now has room to further innovate on its platform — and maximize return value. Already, Trumbell notes, the company has reopened some of the data pipelines it had shuttered because they had been too expensive to run, and enabled new capabilities to traders, quants and researchers throughout the firm. “We’re now bringing on new sources of data to provide richer data sets and analytics for our traders.”
Trumbell’s team has also started exploring features like Cortex AI and Streamlit to further realize and scale the best of what the Snowflake AI Data Cloud offers. That will undoubtedly include helping CTC incorporate more machine learning, advanced AI and LLMs into its platform, which he says will be a top area of focus moving forward.
“Snowpark provides a strong foundation, enabling us to deliver on our firm's growing data needs.”
David Trumbell
Start your 30-DayFree Trial
Try Snowflake free for 30 days and experience the AI Data Cloud that helps eliminate the complexity, cost and constraints inherent with other solutions.