Elsevier is a leader in information and analytics for research and global health systems, and one of the world’s leading academic publishers, responsible for 2,650 digitized journals and over 42,000 ebook titles including iconic reference works such as Grey’s Anatomy. In 2020 over 1.3 billion articles were downloaded from their ScienceDirect service, the world’s largest database of peer-reviewed scientific and medical research.
Snowflake met with Joao Cazarotti Cruz to discuss how Elsevier have transformed their marketing data platform to offer improved service to their business teams through better data access, improved agility, and unprecedented scalability – all while making a 50% cost saving.
Centralizing technology stacks onto the Snowflake Data Cloud
Elsevier is all about enabling researchers, health professionals and leaders to stay on top of the latest knowledge in their fields, so that they achieve the best possible outcomes in their work. In order to continue to do that, and improve their service, Elsevier set out on a transformation mission to consolidate their numerous data warehouses and tech stacks into one, coherent, whole; Joao Cazarotti Cruz, Data and Analytics Lead at Elsevier explains. “We have 6 large global business units and historically they have been their own organizations, running their own technology stacks, but in recent years there has been a strategic drive to centralize all those different tech stacks into one consolidated data platform.”
Elsevier had until recently had their technology stacks on AWS, and were running very large and very costly Hadoop EMR clusters, which were disproportionately large for what the business actually needed. “We found that, with our migration to Snowflake, even the smallest warehouse offered by them was comparable to our oversized on-premises solution – but for a fraction of the cost,” said Joao Cazarotti Cruz.
At the same time Elsevier’s over-large on-premises storage was not providing cost effective scaling, Joao Cazarotti Cruz continues. “We found that our clusters would not scale linearly within AWS EMR and whenever we had performance issues and had to increase the cluster size there would not be a corresponding increase in performance. With Snowflake, not only is there separation of compute and storage, every time we double the warehousing capacity we see a doubling of performance which makes predicting cost much easier.”
Transforming Elsevier’s Data Strategy
Making the switch to Snowflake not only improved Elsevier’s data storage capabilities and saved money in upfront cost, it also made the work of managing their data much easier. By moving away from diverse legacy data management systems and consolidating onto the Snowflake Data Cloud, Joao Cazarotti Cruz was able to solve a number of other problems that had built up over years of data management struggles. “Given the nature of the Hadoop file system and how the logging works for AWS EMR, which we were using, it was very difficult for us to do error handling, logging and debugging whenever we had any failures, and often we were forced to restart the entire system from failure.” This extreme inefficiency is no longer a problem, now that Elsevier stores its data on the Snowflake Data Cloud, and with minimal additional code in the database Joao Cazarotti Cruz is able to do simple database logging, error capture and reporting without the need to restart anything.
Another shortcoming of Elsevier’s previous Hadoop storage solution was the language that the database runs on. “We were having to use Pyspark to work with the system and that introduced a skills gap and lack of coverage in my team, but by going over to Snowflake we were able to put much of that logic into SQL which means that anyone in my team can go in and read the code, understand the business logic and what is going on.” Making their data more understandable by more people in the organization has greatly increased the speed of their delivery and discovery process. This has even led to widening access to the SQL code and giving the ability to suggest business logic changes to people outside the immediate data and analytics team.
Savings and Increased Operational Efficiency
Making the move to centralize Elsevier’s tech stack, and eliminate the problems inherent in running individual data solutions for each of their 6 global business units has made a huge difference to Joao Cazarotti Cruz and his team.
Overall they were able to reduce data management costs by 50%, increase performance four fold, enable near unlimited scalability, and save an estimated 6,100 labor hours, all by switching to the Snowflake Data Cloud as their modern data platform.
For Joao Cazarotti Cruz the bottom line is simple “We have saved ourselves so much time and effort thanks to our switch to Snowflake, it just works.”