Thoughts on Data Stewardship
In the end, this is going to be an article about data stewardship, but I want to start in a bit of a roundabout way.
My thought process started with an article about consumer intent from TechCrunch: Conversational Analytics are About to Change Customer Experiences Forever. Sorry, the majority of this article is behind the paywall and as a result, I didn’t read all of it. But it did get me thinking about conversations I’ve had in the past. I was working on a project that was specifically related to customer intent, how it could be measured and leveraged to improve conversion rates, front-of-wallet and total-wallet share. Most of you will know “total-wallet share”, but might not know “front-of-wallet share”. “Front-of-wallet” is, as you might have guessed, being the first or default choice for a consumer.
To understand customer intent and how it drives these key metrics, it means having a detailed mapping of the customer journey. Customer journey mapping can often be overly simplified, but it is possible for it to offer rich contextual analysis. This requires understanding and expressing the fact that the customer journey is non-linear, multiple journeys can be concurrent, and some journeys are time-dependent (cyclical, seasonal, or life-stage purchasing). If you want a primer on Customer Journey here are a couple of good articles (Customer Experience via Customer Journey, Realistic Customer Journey Map).
It’s an interesting exercise to think about the complex puzzle of relating customer journeys to detailed behavioural data. Is there a way to detect the primary journey along with secondary or latent journeys based on the available data? And what does this then tell you about intent? Based on this knowledge how does the company convert consumers at a higher rate, increase the number of products for which they are front-of-wallet and increase total-wallet share?
At this point, intellectual curiosity and love of solving puzzles collide with concerns about the depth and breadth of data controlled by corporations and the use of machine learning and deep learning to “extract value” from those volumes of data. If knowledge is power and data is a fundamental ingredient to knowledge then corporations having ever-increasing control over data and extracting ever more refined knowledge from that data poses a pretty big risk. There is a great episode from Linear Digressions (Formulation of AI to Avoid Runaway Risks) and the Podcast Rabbit Hole that addresses some of these concerns with greater skill than I ever could. In particular, I like the way Prof. Russel frames the risk of making humans more predictable and as a consequence easier to manipulate.
So what does data stewardship mean? Here is what Wikipedia has to say. Essentially the role of a data steward is to ensure that data can be used by the organization (quality and accessibility) and that the use of data does not violate compliance regulations like GDPR, CCPA, or PIPIDA. Certainly, these regulations are steps in the right direction but my personal view is that ultimately ownership of data should rest with the individual who generated it not with the organization that captured it. While this does not currently exist, there are potential paths to this eventually, for example, Statistics Canada is talking about a Data Trust for Canadian citizens (Data Strategy).
In the short-term, I believe the concept of data stewardship needs to be adjusted. As important as I think data quality, access and compliance are, the primary purpose of a data steward should be to act on behalf of the individual(s) who generated the data. The data steward should advocate for the “voice of the customer” or more accurately the “voice of the data generator”. Data stewards should be trained on how to understand the use of data within an organization and how it will directly impact the individual for good or ill.
Data stewards should be empowered to:
- ask the question “how does improved conversion rate help the customer?”
- be given a serious answer to their question
- have the authority to work with the project team to ensure that “improved conversion rate” has a tangible benefit for the customer.
Even better would be the development of metrics that are customer benefit focused to counterbalance and complement the plethora of metrics focused on the organizational benefit. These principles abstract well to any context where data about people is collected and used for analytical purposes. I use customer data as an example simply because it's the most common use case.
This might be the hardest data role out there. Data stewards will need to understand the first-order effects to make sure the intended consequences are in the best interest of the customer. More importantly, they will be the ones who dig into the second and third-order effects to understand what if any unintended consequences flow from models being developed and deployed. They will need to be able to do this in a way that brings value to the organization and is supportive of their colleagues. If not, they risk being cut out of the process because they make it impossible to get things done. This means going beyond looking at the granularity of the data used and the attributes included. Data stewards will need to understand the models being applied and the business question(s) being addressed so that their assessment of projects has the necessary nuance. If we never get to a place where individuals own and control their consumer data, then I hope we can evolve to a place where all data professionals are true stewards of the data with which they are entrusted.
Do you think creating a professional designation that carries fiduciary responsibility is on the critical path? Professional governance of doctors, lawyers, and accountants has done a lot for society. Any thoughts of the core principles/tenets?
well argued!