Data Contracts in Behavioural Web Data Collection
Introduction
In today’s data-driven world businesses increasingly rely on the behavioural data of their customers to make informed decisions and gain insights.
Different teams within an organization including product, sales, marketing and data teams (such as data analysts and scientists) are dependent on this data for various reporting and activation purposes. This has led to a need for more and more data to be tracked with an eternal struggle between not only ensuring there is “enough” data but also that said data is both reliable and robust and that it is collected in a privacy-compliant manner.
When data starts to be collected at scale a company will often encounter 3 common problems as mentioned by Andrew Jones in his “Driving Data Culture Change With Data Contracts” presentation.
What is a Data Contract?
I was first introduced to the concept of the data contract, a movement that is quickly gaining traction in the data engineering space by Andrew Jones at the London Analytics Meetup (highly recommend attending if you’re ever in London) and how it solves the above-mentioned issues within CDC microservices.
A data contract is in its simplest meaning is a set of predefined rules that define the structure, format and requirements for the data being exchanged or collected. The purpose of a data contract is twofold:
It is worth noting that a data contract goes far beyond rules for defining a schema and semantics. As noted from this Confluent documentation - data contracts can evolve and become more complex over time to cover the following elements:
Recommended by LinkedIn
Data Contracts in Behavioural Web Data Collection
This led to some research and exploration as to whether or not data contracts could be used to solve the same aforementioned issues in behavourial web data collection. Theoretically this is possible as in many instances, web behavioural data collection is the earliest upstream location and rawest form of data as behavioural data is often collected directly from the client-side in the browser.
And turns out data contracts aren’t as new in web data collection as I thought. Many web analytics tools already offer some variation of data contracts functionality to ensure that the collected data is in a fixed format:
Data Contracts in Google Tag Manager (GTM)
Google Tag Manager (GTM) does not currently have a native notion of data contracts functionality most likely because of 3 reasons:
With the need for more tailored measurement and the gradual adoption of Server-side GTM there is now however the new-found ability to implement customized data contracts in GTM by leveraging new features such as transformations and the built-in ability to integrate directly with Google Cloud components such as Firestore and BigQuery.
The adoption of data contracts in GTM for your GA4 data collection offers several benefits:
In an upcoming post I will delve into how data contracts can be built for Google Analytics data collection using a combination of some Server-side Google Tag Manager and Firestore magic to help drive better data quality in your GA4 data collection. Stay tuned!
Check out for Andrew Jones’ new book if you’re interested in learning more about data contracts in the data engineering realm!
Pak Hang Leung
Love this! Many organisations have data quality issues with their behavioural data, so it's great to hear that data contracts could help. You might want to connect with Imran Patel and hear about what he is building at https://www.syftdata.com/. It looks a really interesting solution that aims to solve many of the same problems you've described here. Looking forward to the next post :)
This is the good stuff - thanks for sharing 😊