Command, Query, and the Modern Data Organisation

Command, Query, and the Modern Data Organisation


Reframing Data Engineering and Analytics Engineering through CQRS


Executive Summary

Modern data platforms are not built from a single tool or database — they are coordinated systems of layered responsibility. Yet most organisations have no principled model for determining where those responsibilities begin and end. The result is not a technology problem; it is an organisational one.

This paper argues that Command Query Responsibility Segregation (CQRS) — applied not as a software pattern but as an organisational model — provides that principle. It proposes a framework in which the structure of a data platform and the responsibilities of the people who build it are governed by a single, deliberate distinction:

The responsibility for capturing data correctly is structurally and organisationally separate from the responsibility for making data useful.

This separation is not a guideline or a best practice — it is the organising principle from which role definitions, platform layers, and team boundaries are derived. Within this model:

  • Data Engineering aligns with the Command (Write) side, where correctness, ingestion, and system-of-record fidelity are paramount.
  • Analytics Engineering aligns with the Query (Read) side, where data is transformed into structured, business-aligned, and consumable forms.

This separation is not merely theoretical — it has direct implications for:

  • How data integrity is enforced,
  • Where business meaning is constructed,
  • How systems scale, and
  • How teams collaborate.

This paper defines five core roles — Data Engineer, DevOps Engineer, Analytics Engineer, Platform Engineer, and Data Analyst — and derives the specific responsibilities and boundaries of each from their position within the CQRS model.

The result is a cohesive, enterprise-grade operating model in which platform architecture and team structure are governed by the same organising principle.


1. CQRS as a Lens for the Data Ecosystem

1.1 The Core Separation

CQRS introduces a simple but powerful idea:

  • The system that writes data should not be forced to serve the needs of reading data.
  • The system that answers questions should not be constrained by how data is captured.

In a modern data platform, this becomes:

Article content

This distinction is not merely conceptual — it reflects the behaviour of real systems at scale, and it is the foundation on which the role model in this paper is built.


1.2 The Write/Read Tension

A critical principle underpins the write/read separation:

The structure required to capture business activity is fundamentally different from the structure required to analyse business performance.

Attempting to collapse these into a single model leads to:

  • Performance degradation,
  • Complex transformations in the wrong layer, and
  • Unclear ownership of data correctness.

The CQRS distinction resolves this tension by distributing write and read responsibilities across distinct platform layers and, as this paper argues, across distinct professional roles within the data organisation.


1.3 The Data Flow in Practice

A modern data system, viewed through this lens, follows a clear progression:

  • Data is captured and persisted faithfully (write path)
  • Data is transformed into business-aligned structures (read path)
  • Data is presented for decision-making (consumption layer)

Each role in this process has distinct responsibilities and expertise — they are not interchangeable, but interdependent and specialised. The following sections define each role in turn, anchoring its responsibilities to its position within the CQRS model.


2. The Data Engineer (DE)

Custodian of the Write Path


Definition and Purpose

The Data Engineer is responsible for ensuring that data enters the platform correctly, securely, and with full fidelity to its source systems. This role governs the write side of CQRS, where the priority is not interpretation, but accuracy and traceability.


Responsibilities in Context

The Data Engineer operates at the boundary between external systems and the data platform. Their primary concern is to ensure that what is written into the system can be trusted.

This begins with data integration, the process of bringing data from various sources into the platform. Whether sourcing from transactional systems (databases used for day-to-day business operations), APIs (interfaces for connecting to other software), or event streams (continuous data from events like transactions), ingestion must be reliable, repeatable, and observable. Technologies such as Kafka (a platform for managing real-time data feeds) and Flink (a framework for stream data processing) provide the backbone for streaming ingestion, while languages and frameworks such as Python and Spark (a big data processing engine) enable custom ingestion logic for both batch and stream workloads — whether processing large volumes at scheduled intervals or handling data continuously as it arrives.

Once data is acquired, it must be persisted (saved) in a raw layer that preserves source fidelity (the data stays true to its origin). This is not a reporting structure; it is a system-of-record representation (an authoritative source of truth), often modelled using approaches such as log-based storage (recording each change as an event) or Data Vault patterns (a data modelling method). Data modelling and governance tooling — such as erwin DI — supports the design of these structures, while storage and compute platforms such as Databricks or Azure provide scalable persistence.

A critical responsibility at this stage is change data capture (CDC), which involves identifying and recording data changes as they occur. When source systems do not explicitly provide change tracking (ways of monitoring modifications, additions, or deletions in data), the Data Engineer must implement mechanisms — for example, using SQL, Spark, or custom ingestion logic — to ensure that downstream processes can reliably detect and process changes.

From this, incremental loading patterns (loading only new or changed data) are established. These are not optional optimisations—they are foundational for scalability. Whether implemented through pipelines built on tools such as Spark or stream platforms such as Kafka, the objective is to ensure that data movement is efficient and consistent over time.

Equally important is technical data quality. At the write layer, this is about structural correctness (the right format and structure), not business meaning:

  • Are data types valid?
  • Has ingestion completed successfully?

Security is also enforced at this boundary. Sensitive data must be masked, encrypted, or transformed before it propagates downstream. This ensures that data governance begins at ingestion, not after the fact.


Corrected Responsibility Boundaries

Two critical corrections to common industry assumptions are necessary:

Infrastructure Provisioning

Infrastructure provisioning does not fall within the Data Engineer's responsibilities. While the Data Engineer defines requirements for storage, compute, and processing, the Platform Engineer is responsible for provisioning and managing these environments. This separation ensures that infrastructure remains standardised, secure, and consistently managed across the organisation.

Data Transformation and Business Logic

The Data Engineer is not responsible for transforming raw data into business-aligned structures. Once data has been written to the raw layer with full source fidelity, responsibility for its interpretation and transformation passes to the Analytics Engineer. Applying business logic at the write layer conflates the two sides of the CQRS separation and creates brittle, tightly coupled pipelines that are difficult to maintain and scale.


Summary Insight

The Data Engineer is the custodian of data truth. Everything written into the platform by this role must be correct, complete, secure, and reproducible — because every downstream process, from transformation to reporting, depends entirely on the integrity of what was captured at the write layer. When this boundary is maintained, the rest of the system can be trusted; when it is not, no amount of downstream engineering can fully compensate.

3. The DevOps Engineer (DOE)

The System of Delivery and Control


Definition and Purpose

The DevOps Engineer is responsible for ensuring that all changes to the data platform—code, pipelines, and configurations—are deployed in a controlled, automated, and auditable manner. Within the CQRS-aligned model, this role serves both sides of the write/read separation equally: it is the mechanism by which the work of the Data Engineer and Analytics Engineer is reliably delivered into production.


Responsibilities in Context

Fundamentally, DevOps introduces discipline into change management — and within the CQRS-aligned model, that discipline applies equally to the write path and the read path. Using version control platforms such as Git or GitLab, the DevOps Engineer defines branching strategies that govern the progression of features, fixes, and releases through the system.

CI/CD pipelines automate the progression of code across environments, typically defined using configuration formats such as YAML. These pipelines enforce testing gates, ensuring that no change reaches production without validation. Automation scripts within these pipelines may be written in languages such as PowerShell or Python.

Workflow orchestration introduces an important nuance. While orchestration frameworks such as Dagster may be used to coordinate data pipelines, the responsibility must be clearly delineated:

  • The DevOps Engineer owns the orchestration framework itself—its configuration, deployment, and reliability.
  • The Data Engineer and Analytics Engineer define the workflows that run within that framework.

This distinction prevents orchestration fragmentation and ensures consistent operational behaviour across pipelines.


Summary Insight

The DevOps Engineer is the discipline that makes the CQRS model operable in practice. Without controlled, automated delivery, even well-designed write and read paths become unreliable — changes accumulate risk, environments drift, and the integrity of both paths is undermined. The DevOps Engineer ensures that what is built by the Data Engineer and Analytics Engineer is deployed to production consistently and safely.

4. The Analytics Engineer (AE)

Architect of the Read Path


Definition and Purpose

The Analytics Engineer is responsible for transforming raw data into structured, business-aligned, and query-optimised models. This role governs the read side of CQRS, where data becomes usable for analytics and decision-making.


Responsibilities in Context

The Analytics Engineer's responsibilities begin where the Data Engineer's end. Raw data at this stage is accurate but not yet useful — it must be transformed into structures that reflect business concepts and support analytical queries.

This transformation is typically implemented using SQL-based transformation frameworks — such as dbt — executed on platforms such as Databricks. The goal is to move data from raw representations into curated layers, where it is cleaned, conformed, and aligned to business entities.

From there, the Analytics Engineer designs read-optimised models, such as dimensional or star schemas. These models are not arbitrary—they are explicitly designed to support:

  • Aggregation,
  • Filtering,
  • Joins, and
  • Historical analysis.

Incremental modelling is a core discipline. Instead of reprocessing entire datasets, the Analytics Engineer employs Change Data Capture (CDC) patterns that identify and process only data that has changed and originated from the write layer. This approach ensures both efficiency and scalability.

Data quality at this layer extends beyond technical correctness into business validation:

  • Are revenue calculations correct?
  • Do status transitions align with business rules?
  • Are dimensions conformed across domains?

The Analytics Engineer also builds the semantic layer — the point at which business logic is formalised into metrics, KPIs (key performance indicators), and reusable measures. This layer is the authoritative source of business definitions, ensuring that the same metric means the same thing across every report and team. Technologies such as Analysis Services (an analytical data engine for building semantic models) and DAX (a formula language for data analysis) provide the foundation for this layer, enabling consistent metric definitions across all reporting tools.

Governance is enforced here through role-based access control, row-level security, and object-level security, ensuring that data is not only correct but also appropriately accessible.


Corrected Responsibility Boundaries

Two corrections to common assumptions are necessary:

Processing from Raw to Consumption

The Analytics Engineer is responsible for transforming data from raw to curated and semantic layers, but does not manage ingestion. Maintaining this boundary is essential for upholding the CQRS separation.


Streaming and Batch Processing

While the Analytics Engineer may work with streaming data, their responsibility is not to ingest streams, but to transform them into analytical structures. Ingestion remains the responsibility of the Data Engineer.


Summary Insight

The Analytics Engineer is the architect of meaning. Raw data captured by the write path is accurate but not yet useful — it is the Analytics Engineer who transforms it into structures that reflect business concepts, support analytical queries, and power the semantic layer on which all reporting depends. Without this role, correctness at the write layer cannot translate into value at the consumption layer.

5. The Platform Engineer (PE)

The Foundation of Reliability and Scale


Definition and Purpose

The Platform Engineer is responsible for the design, provisioning, operation, and reliability of the entire data platform. This role ensures that all data services—ingestion, transformation, storage, and consumption—operate within a stable, observable, and secure environment. In the CQRS-aligned model, the Platform Engineer does not belong to either the write or the read path, but enables both paths: without reliable infrastructure, neither the Data Engineer’s ingestion nor the Analytics Engineer’s transformation can function at scale.


Responsibilities in Context

The Platform Engineer provides the foundational infrastructure that supports all other roles within the system.

This begins with infrastructure provisioning. Using platforms such as Databricks or Azure, for example, the Platform Engineer defines and manages the compute and storage environments that support the data platform. These environments must be scalable, secure, and aligned with organisational standards.

Environment management is equally critical. Development, quality assurance, and production environments must be clearly separated, consistently configured, and reliably maintained. The Platform Engineer ensures that workloads behave predictably across these environments.

Observability is a defining responsibility. Using tools such as Elastic or Grafana, the Platform Engineer implements logging, monitoring, and alerting across the system. This enables:

  • Visibility into pipeline execution,
  • Detection of failures and anomalies, and
  • Rapid incident response.

Reliability engineering extends this responsibility further. The Platform Engineer designs for failure by implementing retry mechanisms, failover strategies, and recovery processes. These measures ensure that the system remains resilient under load and during unexpected events.

Security is enforced at the platform level. Through governance tools such as erwin DI and access control mechanisms within storage and compute platforms, the PE ensures that data and systems are protected.

The Platform Engineer is also responsible for cost and performance optimisation, including monitoring resource usage, optimising workloads, and ensuring efficient platform operation at scale.


Summary Insight

The Platform Engineer defines the conditions under which the entire CQRS-aligned model can function. The write path and read path are only as reliable as the infrastructure that runs them. By ensuring that compute, storage, and observability are consistently managed, the Platform Engineer enables every other role to operate within defined boundaries — and ensures that failures are detected and resolved before they propagate across the system.

6. The Data Analyst (DA)

The Consumer and Interpreter of Data


Definition and Purpose

The Data Analyst is responsible for transforming curated and semantic data into insights, interpretations, and business-facing visualisations. Within the CQRS-aligned model, this role represents the consumption layer—the final destination of the read path. The Data Analyst does not write to the platform or own transformation logic; they operate on outputs from the read path.


Responsibilities in Context

The Data Analyst operates at the final stage of the data lifecycle, focusing on interpreting and communicating data rather than engineering it.

Using reporting and visualisation tools such as Power BI or Sigma, the Data Analyst builds dashboards and reports that enable decision-makers to understand business performance. These visualisations are grounded in the semantic models developed by the Analytics Engineer.

A critical constraint must be observed:

The presentation layer does not store data.

All transformations at this layer should be executed at runtime — for example, using DAX-calculated measures, columns, or tables. Persistent modelling and data storage belong to upstream layers.

The Data Analyst may perform exploratory analysis using SQL or Python and may define report-level calculations. However, core modelling responsibilities, particularly semantic model design, remain with the Analytics Engineer in enterprise environments.


Summary Insight

The Data Analyst is the point at which the CQRS model delivers its business value. The write path has captured data correctly; the read path has made it meaningful; the Data Analyst translates that meaning into decisions. This role does not engineer the platform — it validates that the platform has achieved its purpose.

7. Role Interdependencies

These roles function as components of a coordinated system rather than operating in isolation. The CQRS principle of separating write and read concerns provides the organising logic that binds them together, and understanding how each role depends on the others is as important as understanding the roles themselves.

The Data Engineer’s write path is the foundation. Without trusted raw data, the Analytics Engineer has nothing reliable to transform. Without the Analytics Engineer’s curated and semantic models, the Data Analyst has no structured basis for insight. This is not a loose dependency — it is a strict sequential contract: correctness must be established before meaning can be constructed, and meaning must be constructed before value can be delivered.

The DevOps Engineer and Platform Engineer operate across this entire chain. The DevOps Engineer ensures that changes to either path are deployed safely and consistently. The Platform Engineer ensures that the infrastructure running both paths is stable, observable, and resilient. Neither role belongs to the write or read side exclusively — both are preconditions for the system functioning at all.

In summary, each role occupies a defined position within the model:

  • The Data Engineer (write path) produces trusted raw data.
  • The Analytics Engineer (read path) transforms it into meaningful structures.
  • The Data Analyst (consumption layer) delivers insight to the business.
  • The DevOps Engineer ensures controlled delivery across both paths.
  • The Platform Engineer ensures the infrastructure underpinning both paths runs reliably.

Breaking any of these boundaries has predictable consequences.

  • Business logic written into the ingestion layer couples the write and read paths, making both harder to change.
  • Transformation logic embedded in reporting tools bypasses the semantic layer, leading to inconsistent metrics.
  • Infrastructure managed ad hoc by data engineers creates environments that are difficult to standardise or audit.

Each of these failures is a boundary violation — and the CQRS model makes those violations visible precisely because it explicitly names the boundaries.

8. Conclusion

This paper has applied CQRS not as a literal software implementation pattern, but as an organisational model — one that governs both how data platforms are structured and how the teams responsible for building them are organised.

At its core, CQRS forces a discipline that many data organisations lack:

The explicit separation between the responsibility of capturing data correctly and the responsibility of making data useful.

This separation aligns directly with two distinct engineering disciplines:

  • Data Engineering → owns the write path. Responsible for data fidelity, ingestion correctness, security, and reproducibility
  • Analytics Engineering → owns the read path. Responsible for data transformation, business alignment, query performance, and semantic clarity

This distinction is not merely semantic — it is structural, operational, and, as the role definitions in this paper demonstrate, organisational.


Why This Separation Matters in Practice

In enterprise data platforms, performance and value delivery depend not solely on tools, but on a CQRS-aligned separation of write and read responsibilities — and on the organisational clarity that separation provides.

When the write and read concerns are conflated:

  • Ingestion pipelines become overloaded with business logic,
  • Transformation layers inherit inconsistencies from poorly governed raw data,
  • Reporting systems attempt to compensate for upstream weaknesses, and
  • Ownership of data correctness becomes ambiguous.

The result is predictable:

  • Degraded performance,
  • Duplicated logic,
  • Inconsistent metrics, and
  • Loss of trust in data.

By contrast, when CQRS principles are applied:

  • The write layer (Data Engineering) is optimised for: throughput, reliability, and traceability
  • The read layer (Analytics Engineering) is optimised for: query performance, usability, and business meaning

Each layer can evolve independently according to its purpose, without compromising the integrity of the other.


Implications for Enterprise Data Architecture

Adopting this model leads to a number of concrete architectural outcomes:

  1. Layered Data Systems Become Intentional, Not Accidental. Raw, curated, and semantic layers are no longer incidental artefacts of tooling; they become explicitly owned stages within a CQRS-aligned system.
  2. Data Integrity Is Established Upstream, Not Reconstructed Downstream. The Data Engineer role ensures that data entering the platform is trustworthy, reducing the need for corrective logic in analytical layers.
  3. Business Logic Is Centralised and Reusable. The Analytics Engineer role formalises business definitions within the semantic layer, thereby eliminating duplication across reports and teams.
  4. Performance Is Engineered at the Correct Layer. Write performance is addressed in ingestion and storage design; read performance is addressed in modelling, indexing, and semantic optimisation.
  5. Value Delivery Becomes Measurable. With clear ownership boundaries, data availability (Data Engineer), data usability (Analytics Engineer), and business impact (Data Analyst) can each be measured and traced back to their respective roles.


Closing Statement

Data platforms fail not because of a lack of technology, but because architectural boundaries are insufficiently defined. When the responsibility for capturing data correctly is separated — deliberately and structurally — from the responsibility for making data useful, the result is a system that is composable, scalable, and aligned with business value.

The proposed CQRS-aligned model provides the conceptual clarity to enforce that separation. It gives organisations a language to define roles, a structure to design systems, and a discipline to maintain integrity across the data lifecycle. When Data Engineering and Analytics Engineering are grounded in these principles, the result is not merely a better-organised platform — it is an organisation that knows, precisely and structurally, who is responsible for data truth and who is responsible for data meaning.

To view or add a comment, sign in

More articles by Babalo M.

Others also viewed

Explore content categories