The Missing Layer: Why NHS data debates confuse perspective with lifecycle
The framing that is getting in the way
Much of the current debate about NHS data treats the difficulty as one of incompatible perspectives. Data architects are said to emphasise models, semantics, and governance. System engineers are said to prioritise flows, services, and integration. Practitioners are then invited to choose between them, or to synthesise them, as if the challenge were essentially intellectual. This framing is convenient. It is also wrong.
Both disciplines are necessary. Neither is a rival to the other. The problem is not that one view is being privileged over another. It is that both are being applied to different parts of a process that no one is governing end-to-end. The appearance of disagreement is a symptom of an absent model, not a conceptual clash.
Treating the issue as a contest of perspectives obscures what is actually happening. It suggests that the resolution lies in better conversation between disciplines, or in a more senior role that synthesises them. Neither of those will help while the underlying process remains undefined. Without an articulated lifecycle, architects and engineers are drawing accurate pictures of different stages and assuming they are arguing about the same thing.
Different stages, not different schools
If one looks closely at where tensions arise, they rarely reflect deep conceptual difference. They reflect different stages of the data lifecycle receiving attention from different actors, working to different objectives and timescales.
National bodies tend to focus on aggregation, standardisation, and secondary use. Their work is shaped by reporting obligations, research access, and population-level insight. Their instincts favour structured models, central repositories, and controlled vocabularies applied at the point of exchange. Local organisations focus on capture, workflow, and operational immediacy. Their instincts favour flexible documentation, rapid iteration, and systems that fit the rhythm of clinical work. These are not competing world-views. They are different points in the same lifecycle, each of which produces legitimate preferences that become contradictory only when treated as universal.
Standards bodies and vendors further complicate the picture. Standards work generally operates at the boundary where systems exchange. Vendors operate within product lifecycles shaped by commercial priorities. Researchers work downstream of all of this. Each of these actors produces internally coherent choices that, in the absence of an overall lifecycle model, appear to conflict with the others. The conflict is real in its consequences. It is not a conflict about ideas.
The model healthcare does not have
Other data-intensive sectors have, over decades, developed explicit models of how data moves from creation through to reuse. The Generic Statistical Business Process Model, developed by the international statistical community, is the clearest example. It defines the phases through which data passes: specifying needs, designing instruments, building systems, collecting, processing, analysing, disseminating, and evaluating. Each phase has defined sub-processes, and metadata is treated as an active participant across them. The GSBPM does not tell any organisation what to do. It tells them which stage they are in, what is owed to the stages on either side, and where accountability sits.
Healthcare has no equivalent. There is no shared articulation of how clinical information moves from generation, through operational use, onward to aggregation, analysis, and reuse, and back into the system as evidence. What exists is fragmented. Information governance frameworks describe access and compliance. Interoperability specifications describe exchange. Clinical standards describe content. Data quality frameworks describe dimensions. None of these connects the stages together into a single governed process with defined handovers.
This absence has specific consequences. Responsibilities across stages are undefined. The expectations that one stage places on another are implicit. The metadata that should carry context between stages is rarely designed in, and frequently lost in transit. Quality expectations are expressed at the point of reporting rather than embedded at the point of creation. Standards are treated as overlays applied at exchange rather than as constraints on how data is produced. The lifecycle exists in practice, because data does move. It is governed only at intermittent points, and not as a coherent process.
This is the layer that is missing. It is not a new perspective. It is the explicit, operationalised structure within which the existing perspectives are meant to cohere.
A healthcare-adapted lifecycle model would not be a copy of the GSBPM. It would have to accommodate features the statistical world does not face in the same form. Clinical data is generated in service of immediate care and then carries forward into operational, contractual, regulatory, and research uses, each with different quality requirements and different views of what the same data element means. Collection happens across independent organisations with their own systems, workflows, and governance. The same datum may be a clinical observation, a billing signal, and a measure of service pressure, simultaneously. A usable model would describe these overlapping uses explicitly, assign accountability for the state of data at each stage, and make the obligations between stages formal rather than assumed. That is the work that has not been done.
What lifecycle blindness produces
When the lifecycle is not made explicit, the system still produces outcomes, but they accumulate in predictable ways.
Upstream variability translates directly into downstream complexity. Data is captured under varying local conventions, with varying completeness, and against varying definitions. The interpretive problems this creates later are unbounded. Downstream actors have no authority over how capture happens, and no contract with those who design it. They absorb the variability as a permanent operating cost. Transformation effort becomes indistinguishable from real analytical work, and delivers no new value.
Ethnicity recording in NHS systems illustrates the pattern clearly. Ethnicity is captured in primary care and acute settings through short, often optional, self-identification prompts, frequently outside a clinical interaction that requires it, and with no feedback loop from how the data will later be used. The national Health Inequalities programme, local population health analytics, commissioning allocations, and a substantial portion of clinical research all depend on this same field. The captured data is widely acknowledged to have inconsistent coverage, coarse categories, and variable currency across organisations. Downstream uses have adapted around these limitations for years through imputation, suppression, and caveat, rather than through any renegotiation of how the capture is designed. The pattern is general. The further downstream a use sits, the more it absorbs the accumulated cost of decisions made elsewhere for other reasons.
Standards get applied at exchange rather than at creation. Interoperability specifications describe what must be true at a boundary, not what must be true at the point of origin. A FHIR profile at the edge of a system cannot, on its own, ensure that the underlying record was ever captured with the concepts, cardinality, and terminology binding that the profile assumes. Data is reshaped to conform to the exchange, often through mapping, rather than being semantically consistent from the beginning. What this produces is conformance without coherence. The messages validate. The underlying models do not agree. Exchange-level conformance is mistaken for lifecycle integrity, and the apparent success of interoperability becomes a reliable indicator of its actual absence at the level of meaning.
Data quality is treated as a reporting concern. It appears late in the process, typically when someone trying to use data encounters problems. Dashboards, exception reports, and remediation cycles then impose effort on those at capture to correct issues that the design never prevented. Because quality expectations are not expressed as functions of specific uses, they are framed generically. The effort they generate is not proportional to value. Fitness for purpose becomes a post-hoc judgement rather than a design input.
Transformation layers accumulate. As the mismatch between what is captured and what is needed downstream grows, organisations respond by adding pipelines, adapters, warehouses, and models that reconcile the differences. Each layer is a rational local response to an incoherent system. Collectively, they form a costly and fragile architecture whose primary function is to compensate for the absence of upstream discipline. This is often mistaken for modernisation. It is the opposite. It is the institutional acceptance that the system will not be coherent, and that the cost of incoherence will be paid by intermediaries in perpetuity.
A further consequence, less often discussed, is the slow drift of meaning. Without lifecycle governance that binds definitions at the point of creation to definitions used downstream, the same term comes to denote slightly different things in different places. Coded diagnoses, medication status, admission types, and service attributions acquire variant meanings that are locally consistent and globally divergent. Over years, this drift compounds, until datasets that nominally refer to the same entities no longer do. No exchange standard can reverse it, because the standards were not in place when the divergence began. The lifecycle gap is not only an operational cost. It is a gradual erosion of the system’s ability to speak about itself.
Why the perspective framing survives
The lifecycle framing is more demanding than the perspective framing, and the difference is not only intellectual. It is institutional.
Treating the problem as a clash of perspectives is cognitively simpler. It allows each professional group to recognise itself in the debate and to claim a legitimate role. It preserves the comfortable assumption that the discipline most influential at any given moment holds the answer. It permits discussions to proceed without anyone having to specify what each stage of the lifecycle owes to the others, which is the point at which accountability begins to bite.
The framing also aligns with professional identity. Enterprise architects, systems engineers, clinical informaticians, and data professionals have distinct training, career structures, and communities of practice. Presenting the problem as a negotiation between these groups is more palatable than recognising that none of them alone can govern the process, and that the process itself is not currently governed. A solution built on lifecycle accountability would require shared authority rather than disciplinary pre-eminence. It would disadvantage whichever group has the most to lose from transparency about who controls what.
Recommended by LinkedIn
That is the critique worth stating directly. The perspective framing survives because it protects incumbents.
Each of these actors has reason to prefer a discourse about ideas over a discourse about obligations, because the discourse about ideas can be conducted indefinitely without anyone’s position being altered. A lifecycle framing does not permit that. It forces explicit questions about who is responsible for each stage, who holds the handovers, who sets quality expectations, and who is permitted to reject data that is not fit for onward use. The current arrangement relies on no one holding those questions. That is why the perspective framing persists, and why it will continue to be defended by those who benefit from it.
What lifecycle thinking actually changes
Treating the lifecycle as a first-class concern, rather than an implicit by-product of other activities, changes what is being governed and therefore what is possible.
The relationship between capture and reuse is the first thing that shifts. Capture ceases to be a local operational matter, and reuse ceases to be a downstream analytical problem. They are designed in reference to each other. What is captured, and how, is shaped by the known and anticipated uses it must support. Where different uses create conflicting requirements, they are made visible and negotiated, rather than papered over with transformation.
Data quality expression changes with it. Quality is defined in relation to specific uses. Quality expectations become inputs into the design of capture, coding, and workflow. Generic quality scores give way to fitness statements that are meaningful to both the producer and the consumer of the data. Quality improvement effort is directed where it will affect outcomes rather than where it is easiest to measure.
Standards move too. Rather than being applied only at the point of exchange, they are applied at creation. Conformance becomes a feature of how a system produces data, not a filter through which data passes on its way elsewhere. The effect is cumulative. Over time, less effort is required to make data useful, because less transformation is required to bridge the gap between how it was produced and how it is used.
Architecture takes on a different role. It ceases to be a static description of components and becomes a structure that expresses how data moves, who governs each stage, and where the handovers are. Decisions about platforms, standards, and services can then be evaluated against their effect on the lifecycle, rather than on their intrinsic appeal. None of this dissolves the role of architects, engineers, clinicians, or analysts. It gives them a shared object to work on.
Current programmes are exposed to the same gap
A range of national initiatives currently shape the NHS data environment. The Federated Data Platform, the ten-year health plan, the Neighbourhood Health Implementation Programme, the evolving electronic patient record estate, and the continuing programme of interoperability standards each touch the lifecycle at different points. None of them, individually or in combination, constitutes an explicit lifecycle model.
Without lifecycle thinking, these programmes are vulnerable to three predictable failure modes.
A lifecycle-oriented reading of these programmes does not require any of them to be redesigned. It requires that each be assessed against its contribution to an articulated lifecycle, with explicit expectations of how stages connect and where accountability sits. That is a different conversation from the one currently dominating the discourse, and it is a conversation that has not yet begun at the level this problem requires.
What will not fix itself
Healthcare does not suffer from a shortage of perspectives. It has more perspectives than it can coherently apply. What it lacks is the structure that would allow those perspectives to operate on the same object with defined relationships. That structure is the data lifecycle, made explicit, governed, and connected end-to-end. Until it exists, the consequences described in this article are not risks to be mitigated. They are the default state of the system.
Without a lifecycle model, upstream variability will continue to propagate unchecked, and the intermediary infrastructure built to absorb it will continue to grow.
Each of these trajectories is already visible in the current system, and none of them self-correct. They are the consequences of an absent model, and they will deepen at the rate the system continues to generate data.
The debate about whose perspective is correct occupies the space where lifecycle design should be. It will go on occupying that space for as long as it is comfortable to conduct. What it will not do is produce the structure whose absence is the cause of the difficulty. Perspectives will remain what they are, each accurate within its stage and each insufficient beyond it. The work of connecting the stages is not performed by discussing them. It is performed by governing them, and that work is not being done.
Author: Dr Tito Castillo FBCS CITP CDMP CHCIO
Tito is the founder of Agile Health Informatics Ltd, a specialist health and care IT consultancy service.
His recent books Data as Foundation: Building Healthcare's Invisible Infrastructure and The Argument Advantage: Reasoning in the Age of AI are both available on Amazon.
SUS is good but connecting doctors data is difficult as the gps think the data is theirs and not the patients
In the honest hope that the perspectives you share Tito Castillo (FBCS CITP CHCIO) are understood as widely as possible. How might you frame full lifecycle data governance in a manner that could be applied to local data strategies, DMA, DTAC and NHS national data governance? And how might these support a Trust level CIO purchase and configure data capturing systems and then share data onwards so that it works?
To help the Human in the Loop make sound decisions, the ground they stand on must be visible. Our medical coherence analysis engine maps the internal structure of complex record sets, surfacing contradictions, omissions, and narrative drift before they reach the point of clinical judgment. It is not an arbiter. It is a prerequisite: coherence as the foundation on which human responsibility becomes real rather than assumed. NEJM Case Studies + : https://app.milanote.com/1W6LZz12afiibr?p=o4EuPTgytzd
This is massive problem. From NHS staff problem, junior/mid level technical have no options (unless you work in data) for progressing to senior technical roles. The only route is to go contracting or take a position in a private firm, or follow a management path which has low technical content. It's not that senior tech roles don't exist in the NHS but as the carer path doesn't exist, they don't require NHS junior/mid level experience or skills. So jobs rely of academic quals rather than vocational quals+skills.