HomeToGo's Cutting-Edge Clickstream Data Architecture with Snowflake

In this guest blog post, HomeToGo’s director of data, Stephan Claus, explains why the company migrated to Snowflake to meet its data needs. This article is based on Stephan’s presentation during the Snowflake Data World Tour 2022. Join us at this year’s world tour to learn more about the latest innovations to Snowflake’s Data Cloud.

HomeToGo is the marketplace with the world’s largest selection of vacation rentals. From vacation homes, cabins, beach houses, apartments, condos, house boats, castles, farm stays and everything in between, HomeToGo combines price, destination, dates and amenities to find the perfect accommodation for any trip worldwide. Founded in 2014, the company’s vision is to make incredible homes easily accessible to everyone. Since launching, the company is now publicly listed and has become a leader in the vacation rental industry, operating local apps and websites in 25 countries and offering over 15 million rentals around the world.

Over the course of this journey, HomeToGo’s data needs have evolved considerably. In late summer 2021, HomeToGo revisited its data architecture, as we required more advanced functionality, such as:

Elasticity — the separation of computing from storage — with faster upscale and downscale opportunities that would allow us to manage our workloads per domain more efficiently
Native support for semi-structured data handling
The support for granular Data Masking on the data warehouse level

Snowflake was able to provide this functionality straight out of the box. It also came with other advantages such as independence of cloud infrastructure providers, data recovery features such as Time Travel, and zero copy cloning which made setting up several environments — such as dev, stage or production — way more efficient. After we had a successful trial period that checked all the boxes, we started our migration in autumn 2021 — together with moving all our data transformation management into the OSS version of dbt.

This migration was highly beneficial for one of our core data pipelines which was tasked with handling the collection of behavioral data from across our websites and apps. This data is fundamentally important and highly time critical to support many of our production-deployed data products, including optimizing our search result ranking or for supporting our marketing efforts.

We use Snowplow for this, which is a comprehensive framework to manage your event collection from frontend and backend. Some of the key features include:

A vast selection of available trackers
A very flexible event setup, including structured and unstructured events, as well as custom contexts
A schema registry and a schema validation
Custom event enrichment steps
Efficient data loader into Snowflake as a big fat table

Snowplow serves as an initial high barrier to make sure that only validated, high quality data is entering the warehouse in the first place. Once the data is in the warehouse, we are leveraging Snowflake’s data warehousing features to handle it.

Something that is especially handy is Snowflake’s support for semi-structured data. This means Snowplow events and custom contexts do not need to be normalised before they are loaded, but are simply stored as json files in Snowflake’s Variant columns in one big fat table called “atomic_events”.

This is a great advantage for two reasons. First, it makes our pipeline more robust: if new data points are added to an existing schema, it will not break the pipeline and our analytical engineers can decide when to pick up the new available data. Second, it reduces the amounts of joins you will usually encounter when you have normalised event tables, and thus has a large impact on your cost footprint.

In addition to this, we use Snowflake Time Travel to increase the robustness of the pipeline further. It ensures we are always able to recover from any infrastructure incident or faulty loading. On top, we can directly leverage the Snowplow dbt packages to immediately build data models based on users, page views and sessions out of the box in an efficient, incremental way.

Last but not least, Snowflake’s zero copy cloning functionality allows us to expose the data to different environments — we use development, stage and production — without maintaining multiple copies of the data, which would only accrue cost and increase management complexity.

Over the past year, HomeToGo’s tech stack has continued to evolve as we have further leveraged Snowflake, Snowplow and dbt. Combining these services is helping the company to make cost-effective, data-driven business decisions, as well as giving us the confidence that our data architecture is robust and secure.

Subscribe to our blog!

Thank you for your submission.

How HomeToGo Is Building a Robust Clickstream Data Architecture with Snowflake, Snowplow and dbt