Methodological Framework for Knowledge Graph Development

Methodological Framework for Knowledge Graph Development

Overcoming Pipeline Approach Limitations through Conceptual-Operational Integration

Executive Summary

Traditional pipeline approaches to knowledge graph development (Controlled Vocabularies → Standard Metadata → Taxonomies → Thesauri → Ontologies → Knowledge Graphs) are effective when guided by deep understanding of the technologies and explicit governance of their underlying principles.

However, when the rules connecting each stage remain implicit rather than formalized, early conceptual choices can become progressively more difficult to examine and adjust as they cascade into downstream artifacts.

This framework proposes to strengthen pipeline approaches by making explicit the conceptual foundations, derivation rules, and discipline-specific principles that—when formalized—enable reliable, scalable, and auditable knowledge graph development:

rigorous conceptual modeling integrated with iterative operational materialization, governed by explicit derivation rules and managed through a DCAT-based artifact repository that remains semantically and structurally traceable.

1. Problem Statement: Tacit Risks in Pipeline Approaches

1.1 Nature of the Risk

Pipeline approaches (Controlled Vocabularies → Standard Metadata → Taxonomies → Thesauri → Ontologies → Knowledge Graphs) are effective methodological frameworks when applied with deep understanding of the technologies involved and explicit governance of their underlying principles. However, when deployed without this awareness—or when their implicit rules and semantic assumptions remain tacit rather than formalized—several risks emerge:

1.2 Potential Issues When Implicit Rules Remain Unexamined

Tacit Assumptions About Conceptual Alignment

  • Each layer in a pipeline assumes its input has been conceptually clarified at an appropriate level for that stage
  • Without explicit documentation of these assumptions, practitioners may apply standardization techniques (e.g., RDF schema) to insufficiently clarified concepts
  • The appearance of structural formality can mask underlying conceptual ambiguity

Semantic Slippage Across Layers

  • When derivation rules between pipeline stages remain implicit, subtle semantic shifts can occur
  • A controlled vocabulary term might be interpreted differently in a taxonomy, which might be represented differently in an ontology
  • These slippages are difficult to detect when not explicitly documented

Discipline-Specific Tacit Knowledge

  • Different domains (library science, biomedics, digital humanities, engineering) have developed sophisticated practices for each pipeline stage
  • This domain expertise is often not transferred across disciplinary boundaries
  • Without explicit formalization of discipline-specific principles, practitioners may apply inappropriate rules from other domains

Reversibility and Iteration Complexity

  • Pipeline approaches can support iteration and refinement, but only when derivation rules are explicit
  • Without visibility into how downstream artifacts were derived from upstream ones, revision becomes difficult and costly
  • The implicit nature of derivation rules makes it unclear whether changes require re-derivation or simple amendment

Risk Amplification at Scale

  • Implicit assumptions and tacit rules scale poorly: what works for a small, expert team becomes unmanageable when applied across larger organizations or different disciplinary contexts
  • Consistency becomes difficult to verify when the principles governing each layer are not formally documented

1.3 The Framework's Role

Rather than rejecting pipeline approaches, this framework makes explicit what effective pipeline practice requires: the conceptual foundations, derivation rules, and governance principles that—when tacit—create risks but—when formalized—make pipelines powerful and reliable.

2. Proposed Framework: Conceptual-Operational Integration

2.1 Core Principles

Principle 1: Conceptual Foundation First Rigorous philosophical and domain-specific conceptualization precedes artifact creation. This is not a preliminary phase but an ongoing practice that remains active throughout the lifecycle.

Principle 2: Iterative Maturation The framework embraces iteration: conceptual models are refined through cycles of formalization, implementation, confrontation with reality, and conceptual re-elaboration. Maturity is achieved progressively, not presumed.

Principle 3: Governed Derivation Every downstream artifact is derived from upstream conceptual choices through explicit, auditable derivation rules. These rules are not implicit conventions but formalized relationships that can be verified, traced, and—when necessary—reversed.

Principle 4: Bidirectional Traceability The system maintains mappings between artifacts and their conceptual foundations in both directions: from conceptualization to materialization (forward derivation) and from artifacts back to their justifications (reverse tracing).

Principle 5: Semantic and Structural Consistency At every level, artifacts are validated for consistency both with their conceptual foundations and with each other. Inconsistencies trigger re-examination rather than being papered over with additional formalization.

2.2 Operational Architecture

┌─────────────────────────────────────────────────────────────┐
│        CONCEPTUAL FOUNDATION (Iterative Practice)           │
│  - Domain ontology (philosophical and domain-specific)      │
│  - Conceptual decisions and their justifications            │
│  - Semantic clarifications and boundary definitions         │
│  - Explicit acknowledgment of limitations and ambiguities   │
└──────────────────────┬──────────────────────────────────────┘
                       │
         ┌─────────────┴─────────────┐
         │  DERIVATION GOVERNANCE    │
         │  - Derivation rules       │
         │  - Transformation rules   │
         │  - Consistency validators │
         └─────────────┬─────────────┘
                       │
         ┌─────────────┴─────────────────────────┐
         │   DCAT-BASED ARTIFACT REPOSITORY      │
         │  (Managed by Methodological Rules)    │
         │                                       │
         │  - Controlled Vocabularies            │
         │  - Standard Metadata Schemas          │
         │  - Taxonomies                         │
         │  - Thesauri                           │
         │  - Ontologies (RDF, OWL)              │
         │  - Knowledge Graphs (RDF, Property    │
         │    Graphs, Embeddings)                │
         │                                       │
         │  With explicit versioning, lineage,   │
         │  and derivation provenance            │
         └─────────────────────────────────────┘
                       │
         ┌─────────────┴──────────────────┐
         │  ACTIVATION LAYER              │
         │  - Query governance            │
         │  - Consistency checking        │
         │  - Change impact analysis      │
         │  - Iterative refinement        │
         └────────────────────────────────┘        

3. Detailed Components

3.1 Conceptual Foundation

Definition: Explicit, documented understanding of what is being modeled and why.

Comprises:

  • Domain Ontology: Philosophical and domain-specific clarifications of key entities, categories, relationships, and distinctions
  • Conceptual Decisions Log: Explicit record of choices made (e.g., "We model 'employment' as a temporal relationship, not a property")
  • Boundary Definitions: Clear delineation of what is in and out of scope, what ambiguities are accepted and why
  • Assumption Statement: Explicit articulation of assumptions underlying the model

Practice:

  • Rigorous before initial implementation
  • Revisited in each iteration cycle
  • Updated when inconsistencies surface or new requirements emerge

3.2 Derivation Governance

Definition: Formalized rules that define how upstream conceptual choices generate downstream artifacts.

Illustrative Examples of Derivation Rules:

Rule 1: Vocabulary Derivation

  • IF a conceptual choice defines entity category X with distinguishing properties P1, P2, ...Pn
  • THEN controlled vocabulary includes term for X with documented scope note referencing the conceptual justification
  • AND scope note lists the distinguishing properties
  • THEN any modification to the conceptual definition of P1...Pn MUST trigger review of the vocabulary term

Rule 2: Ontology-from-Taxonomy Derivation

  • IF a taxonomy establishes hierarchical relationship Parent > Child
  • AND conceptual foundation justifies this hierarchy as subsumption (Child instances are instances of Parent)
  • THEN OWL ontology represents this as rdfs:subClassOf
  • AND if the conceptual foundation later establishes it as part-of rather than subsumption
  • THEN OWL representation MUST change to mereological relationship (using appropriate OWL properties)

Rule 3: Knowledge Graph Population Consistency

  • IF an ontology defines class Person with property birthDate as xsd:date
  • AND conceptual foundation specifies that birthDate represents biological birth (not legal registration)
  • THEN KG instances MUST include provenance indicating whether dates represent biological or legal events
  • AND queries MUST respect this distinction or surface the ambiguity

Implementation:

  • Derivation rules are documented in structured form (SHACL, constraint specifications, or custom notation)
  • Automated validators check compliance at artifact generation time
  • Change management systems use these rules to identify downstream impacts of upstream modifications

3.3 DCAT-Based Artifact Repository

Definition: Centralized, structured repository of all knowledge artifacts, managed by derivation governance rules.

Artifact Types:

  1. Controlled Vocabularies: Terms with scope notes, relationships, and links to conceptual justifications
  2. Standard Metadata Schemas: Metadata standards applied to domain entities
  3. Taxonomies: Hierarchical organizations with explicit relationship types
  4. Thesauri: Rich semantic networks with synonymy, hierarchy, and associative relationships
  5. Ontologies: Formal knowledge representations (RDF Schema, OWL)
  6. Knowledge Graphs: Populated instances of ontologies (RDF triples, property graphs, vector embeddings)

DCAT Extensions:

Standard DCAT properties enhanced with:

  • dcat:derivedFrom: Links artifact to its upstream dependencies and conceptual foundations
  • dcat:derivationRule: References the specific rules governing this derivation
  • dcat:semanticVersion: Versioning that reflects semantic significance of changes
  • dcat:consistencyStatus: Current validation status with respect to conceptual foundation and derivation rules
  • dcat:justification: References to conceptual documentation justifying this artifact
  • dcat:fallacyRisk: Explicit acknowledgment of identified or potential logical fallacies

Repository Capabilities:

  • Full versioning and audit trail
  • Lineage tracking (artifact X was derived from artifact Y using rule Z)
  • Reverse dependency analysis (if artifact X changes, which downstream artifacts are affected?)
  • Consistency validation against derivation rules

3.4 Activation Layer

Definition: Dynamic governance practices that use the repository to maintain semantic and structural consistency through iterations.

Activation Mechanisms:

1. Consistency Checking

ON artifact_modification:
  FOR EACH downstream_artifact IN get_dependents(modified_artifact):
    validation_results = apply_derivation_rules(modified_artifact, downstream_artifact)
    IF inconsistency_detected:
      FLAG for_review(downstream_artifact, validation_results)
      ALERT stakeholders with_justification(what_changed, why_inconsistent)        

2. Change Impact Analysis

  • Propose which artifacts require re-derivation
  • Estimate conceptual versus syntactic changes
  • Identify which domain areas are affected
  • Recommend priority for re-validation

3. Query Governance

  • SPARQL/SHACL queries can be tagged with conceptual justifications
  • Queries that violate derivation rules surface warnings
  • Queries on knowledge graphs include provenance indicating which artifacts (and versions) underpin results

4. Iterative Refinement Protocol

CYCLE:
  1. Identify inconsistency or requirement
  2. IF conceptual foundation requires revision:
       Update conceptual model with justification
       Apply derivation rules to propagate changes
       Validate all downstream artifacts
       Document decision and rationale
  3. IF only operational artifact requires revision:
       Check against derivation rules
       If compliant, update artifact
       If non-compliant, escalate to conceptual review
  4. Test against real-world usage
  5. Feed learnings back into conceptual foundation        

4. Addressing Pipeline Approach Problems

4.1 Problem: Cumulative Error Propagation

Pipeline Approach: Errors introduced early persist through all layers, becoming increasingly difficult to correct.

This Framework:

  • Early errors surface during consistency validation against derivation rules
  • Derivation rules encode the why behind each artifact, making errors traceable to their conceptual source
  • Bidirectional traceability allows rolling back to the problematic conceptual choice
  • Iterative protocol ensures that foundational errors are revisited, not frozen

Mechanism: When a knowledge graph instance violates an ontology axiom, the framework traces back: Is this a data quality issue, an ontology error, or a conceptual confusion? The lineage metadata answers this question.

4.2 Problem: Semantic Opacity

Pipeline Approach: Formal appearance masks unresolved conceptual confusion.

This Framework:

  • Conceptual foundation is explicit and auditable—not implicit in artifact structure
  • Derivation rules make the relationship between concepts and formalisms transparent
  • DCAT repository includes "justification" metadata explaining why each artifact exists and what conceptual decision it implements
  • Fallacy risks are explicitly documented, not hidden in formal structure

Mechanism: An ontology axiom linked to its conceptual justification reveals whether the formalism represents genuine semantic understanding or merely syntactic standardization.

4.3 Problem: Irreversibility and Path Dependency

Pipeline Approach: Downstream dependencies on upstream errors make correction prohibitively expensive.

This Framework:

  • Derivation rules enable controlled reversal: changing a conceptual choice can automatically trigger re-derivation of downstream artifacts
  • Versioning and lineage allow parallel maintenance of "corrected" and "legacy" versions during transition periods
  • Change impact analysis reveals the true cost of revisions before they're committed

Mechanism: Modifying an ontology class definition automatically flags which knowledge graph assertions depend on the old definition, enabling staged migration.

4.4 Problem: Institutional Embedding of Fallacies

Pipeline Approach: Logical errors become formalized as axioms, appearing legitimate through their formal representation.

This Framework:

  • Explicit "fallacy risk" metadata acknowledges known or potential logical problems
  • Conceptual foundation must justify each axiom—fallacies cannot be justified in a rigorous conceptual analysis
  • Derivation rules enforce that axioms have defensible conceptual grounding
  • Iterative cycles include logical review as standard practice

Mechanism: An axiom implementing a logical fallacy would fail to pass conceptual justification review before being derived into downstream artifacts.


5. Implementation Approach

5.1 Minimum Viable Implementation

Phase 1: Foundation

  1. Document conceptual foundations explicitly (structured format: markdown, ontology snippets, etc.)
  2. Enumerate key derivation rules for your domain (10-15 core rules sufficient to start)
  3. Create DCAT catalog with initial artifacts and lineage metadata
  4. Implement one consistency validator (e.g., vocabulary terms must reference conceptual justifications)

Note: This is illustrative; specific implementation will vary based on domain and organizational context.

Phase 2: Activation

  1. Implement change impact analysis (identify downstream artifacts affected by modifications)
  2. Add query governance: tag SPARQL queries with conceptual justifications
  3. Establish review protocol for flagged inconsistencies

Phase 3: Maturation

  1. Expand derivation rules as patterns emerge
  2. Automate more validators using SHACL
  3. Develop organizational practices for iterative refinement cycles

5.2 Technical Enablers

DCAT Profile Extensions

  • Create a DCAT profile that includes derivation governance properties
  • Use RDF/OWL to represent derivation rules as machine-readable constraints
  • Leverage SHACL for consistency validation

Tooling

  • Graph database (e.g., Neo4j, RDF triple store) for artifact repository
  • SPARQL engine for querying with provenance
  • Custom Python/Java scripts for derivation rule application and change impact analysis
  • Documentation system (wiki, git-based) for conceptual foundations

Organizational

  • Establish governance roles: conceptual modelers, ontology engineers, KG maintainers
  • Create review protocols for conceptual decisions and artifact validation
  • Develop communication practices: decisions documented and traced in DCAT metadata

5.3 Integration with Existing Standards

  • OWL: Use for formalized ontologies; link axioms to derivation rules and conceptual justifications
  • SHACL: Implement consistency validators as SHACL shapes
  • RDF/SPARQL: Query layer remains unchanged; enriched with provenance metadata
  • DCAT: Central catalog for artifact management and lineage tracking

6. Key Distinctions from Pipeline Approaches

Article content

7. Conclusion

This framework complements rather than replaces established pipeline methodologies. Its purpose is to formalize the implicit assumptions, tacit rules, and disciplinary expertise that make pipeline approaches effective when applied with deep understanding.

By making explicit:

  • The conceptual foundations underlying each stage
  • The derivation rules connecting artifacts across stages
  • The discipline-specific principles governing each layer
  • The assumptions about appropriate inputs and outputs

...this framework enables organizations to:

  1. Apply pipeline approaches consistently across teams and contexts
  2. Replicate proven methodologies across disciplinary boundaries
  3. Identify and examine tacit assumptions that may not transfer to new domains
  4. Iterate and refine with visibility into impacts
  5. Scale beyond the limits of individual expertise

The DCAT-based repository activated through derivation governance becomes the mechanism for managing these explicit rules—making pipeline practices transparent, auditable, and resilient to change.


8. Invitation for Feedback and Refinement

This draft framework raises questions rather than provides definitive answers. Critical engagement are welcome with:

Conceptual Issues

  • Are the core principles appropriately framed?
  • Are there important principles or considerations missing?
  • How do these principles map to existing methodological frameworks in your discipline?

Practical Feasibility

  • Which components would be most useful to implement first?
  • What organizational or technical barriers would you anticipate?
  • How might derivation governance be simplified or made more practical?

Disciplinary Adaptation

  • How would this framework need to be adapted for library science, biomedics, digital humanities, engineering, or other domains?
  • What tacit practices in your discipline should be made explicit?
  • Are there discipline-specific derivation rules that should be standardized?

Technical Realization

  • What technical choices would make this framework more implementable?
  • How should DCAT extensions be formalized?
  • What tooling gaps exist?

Critique and Alternatives

  • Where does this framework overreach or misunderstand pipeline approaches?
  • What alternative approaches should be considered?
  • Where are the logical or practical limitations?

Feedback, critique, and proposals for refinement through discussion, pilot implementations, and cross-disciplinary dialogue are welcome.


Annex: From Controlled Vocabulary to Ontology — Epistemic Foundations

Understanding the Critical Distinction

The framework presented in the main article assumes a clear distinction between different artifact types in the knowledge graph pipeline. However, practitioners often attempt to evolve a controlled vocabulary directly into an ontology, expecting the progression to be continuous. This annex clarifies why this approach fails and explains the fundamental epistemic differences that underpin the framework's derivation governance principles.

Two Distinct Objects, Not Two Stages

A controlled vocabulary (including thesauri and term lists) is fundamentally a prescriptive, flat resource: a collection of standardized terms with simple relationships—synonymy, generic hierarchy (generalization/specialization), thematic associations. Its objective is pragmatic standardization: ensuring consistency in indexing and information retrieval. We say "automobile" rather than "car" or "auto"; we use "economic depression" rather than "crisis" or "recession." A controlled vocabulary is governed by conventional agreement: "we use these terms in this way."

An ontology, by contrast, is a structured representation of reality itself. It does not catalog terms; it models concepts, their properties, their complex relationships, and the logical rules that govern them. An ontology asks fundamentally different questions: What is an "automobile" in relation to a "vehicle"? What are its constitutive parts? What logical relations bind it to other entities? How do we distinguish an automobile from similar entities? An ontology is not flat but multidimensional, formally structured, and—critically—logically coherent.

This is not a difference of degree or complexity. It reflects an epistemic gulf: controlled vocabularies are tools for managing agreement on terminology; ontologies are models of conceptual structure grounded in understanding of the domain itself.

The Epistemic Gap

Three systemic differences explain why vocabulary-to-ontology progression fails:

Polysemy and Granularity: A controlled vocabulary tolerates semantic ambiguity managed through convention. A term can hover between multiple interpretations as long as practitioners understand how to apply it. An ontology, however, demands radical clarification: it must distinguish the separate concepts hiding behind a single term. It must answer: are these genuinely distinct entities, or merely different applications of one concept? This question cannot be answered by extending the vocabulary—it requires reconceptualizing the domain itself.

Formalization of Logical Structure: Relations in a controlled vocabulary are declarative and flat—"X is narrower than Y," "A is related to B." These are annotations, useful but not computationally meaningful in a strong sense. An ontology requires formal logical structure: relations have precise semantics that enable inference, inheritance, constraint propagation. An axiom in an ontology is not merely a labeled edge; it is a logically valid statement that machines can reason over. This transformation cannot be achieved by adding layers of complexity to a vocabulary; it requires reconstituting the representation from the ground up in a logical framework.

Specification of Properties and Constraints: A controlled vocabulary never specifies what can be a property of a concept, or under what constraints properties apply. An ontology must formalize this explicitly: domain and range constraints, cardinality restrictions, property inheritance hierarchies. Moving from vocabulary to ontology is not an extension but a categorical shift from terminological standardization to conceptual formalization.

Why Direct Progression Fails

Attempting to "upgrade" a controlled vocabulary into an ontology by adding detail and structure creates what might be called a pseudo-ontology: formally elaborate but logically fragile, because it lacks the deep conceptual clarity that should ground an ontology.

The problems are systematic:

Accumulated Ambiguity: A vocabulary that was deliberately tolerant of semantic ambiguity becomes an ontology in which that same ambiguity is now formalized and unexamined. What was managed as pragmatic flexibility becomes embedded as logical inconsistency.

Layer Collapse: The vocabulary may conflate distinct concepts (for pragmatic terminological reasons). When formalized as an ontology, these conflations appear as logical axioms—and now it becomes costly and organizationally disruptive to separate them, since downstream applications depend on their conflation.

Missing Conceptual Grounding: An ontology derived from a vocabulary inherits no understanding of why the concepts are structured as they are. It has form without foundation. When inconsistencies emerge (and they will), there is no conceptual basis for resolving them—only the inertia of prior choices.

False Rigor: The formal appearance of ontological structure can mask the absence of genuine ontological clarity. An axiom represented in OWL is no more meaningful than the same statement in plain language if it reflects unexamined conceptual confusion. Formal notation creates an illusion of rigor that can suppress the critical examination needed to detect the confusion.

The Inverse Approach: Conceptually Grounded

The evidence—both from the framework presented in the main article and from practice—suggests that a reverse approach is far more robust: begin with rigorous conceptual modeling that clarifies what exists in the domain and how it is organized, then derive from this ontology a controlled vocabulary that reflects the conceptual structure clearly.

This inverted approach works because it respects the epistemic order:

  1. Conceptual Clarification First: Rigorously model the concepts in your domain—not as terms, but as entities with properties, distinctions, and relationships. Ask hard questions: Is "employment" a state, a relationship, a process? What distinguishes this from "engagement" or "contract"? Document not just the answers but the reasoning.
  2. Formalization of Logical Structure: Once concepts are clarified, represent them in a formal logical framework (OWL, SHACL, or equivalent). Specify properties, constraints, and inference rules. The formalism now has conceptual grounding, not merely syntactic elaboration.
  3. Derivation of Vocabulary: Only after the ontology is clear, assign standardized terms to concepts. Ensure that vocabulary choices align with conceptual distinctions: synonymy now means the terms designate the same concept (not merely similar ones); hierarchy reflects genuine subsumption or parthood, not pragmatic association.

Integration with Derivation Governance

This epistemic inversion aligns directly with the framework's Derivation Governance principle. A derivation rule from ontology to controlled vocabulary might read:

Rule: For each class C in the formal ontology with scope S and distinguishing properties P1...Pn, the controlled vocabulary includes a term T such that:

  • T's scope note explicitly references the ontology class C and its logical definition
  • T's definition articulates the distinguishing properties P1...Pn
  • Any modification to C's logical definition triggers review of T's scope note
  • If the conceptual justification for C is challenged or revised, T becomes subject to re-derivation

This rule formalizes what should be intuitive: vocabulary terms are derived from ontological clarity, not the other way around. Changes flow downward from concept to term, not upward from term to concept.

Practical Implications for Pipeline Practice

For organizations using the pipeline approach described in the main article:

When You Have an Existing Controlled Vocabulary: Treat it as a data point about current practice, not as canonical. Extract the conceptual insights it embodies (often vocabulary terms reveal important distinctions), but do not assume its structure is optimal. Use it to inform conceptual modeling, not to constrain it.

When Building an Ontology: Invest heavily in conceptual work before formalizing. Document the domain model philosophically—what entities exist, why they are distinguished, what relationships hold between them. Only then move to formal representation. This is expensive upfront but prevents the accumulation of unfounded axioms.

When Standardizing Terminology: Derive your controlled vocabulary from a clear ontology (even if partial or preliminary). This ensures that vocabulary choices reflect genuine conceptual distinctions, making the vocabulary more robust and more useful for knowledge graph population and querying.

For Iteration and Refinement: When conceptual errors surface (and they will), the framework's bidirectional traceability allows you to trace from vocabulary term back to ontological axiom back to conceptual justification. You can then correct at the appropriate level—whether that is correcting a misconception in the conceptual foundation or simply adjusting terminology to better reflect a sound concept.

Conclusion

The progression from controlled vocabulary to ontology is not a pipeline but a conceptual leap. Attempting to make that leap by elaborating and formalizing the vocabulary fails because it conflates terminological standardization with conceptual modeling. The reverse—beginning with rigorous conceptual clarity and deriving vocabulary from it—respects the epistemic order and produces more robust, auditable, and maintainable knowledge structures.

Within the framework of the main article, this distinction explains why derivation governance must flow from conceptual foundation through formal ontology to downstream artifacts (including controlled vocabularies). Reversing that flow—attempting to derive conceptual clarity from vocabularies—creates the accumulation of tacit assumptions and semantic ambiguities that the framework is designed to prevent.

Responses to some raised questions about the article

Question 1: Artifact Necessity, Business Value, and Scoping

A critical question emerges when reviewing this framework: How many of these artifacts are actually needed for a given knowledge graph development? Is there real business value in developing separate controlled vocabularies, taxonomies, thesauri, and ontologies? Having developed them through to ontology, should each be separately maintained when changes are needed? Most importantly: How should the work be scoped?

There is a legitimate concern that attempting to model an entire business or domain risks "modeling for its own sake" and, proverbially, boiling the ocean. An alternative approach advocates starting with specific business use cases delivering clear value, with competency questions expressed in business language—letting those questions define the vocabulary and scope needed in the KG. Value is delivered first, then the system expands incrementally through further use cases.

My response: Context-Driven Application and Strategic Starting Points

Different tactics and strategies exist, always driven by specific needs and contexts. The framework presented here does not prescribe a universal approach but rather formalizes principles that apply across different strategic choices.

Consider a specific application domain: preparing governance and building architecture for continuous operational interoperability between partners and domains working on complex products (such as aircraft development). In such contexts, the starting point is often not a blank slate but rather legacy open and de facto standards agreed upon by communities of international experts. The challenge becomes deriving useful and relevant subsets to cover specific collaboration cases. Think of the open standard as a dictionary, and collaboration cases as sentences—you pick what you need rather than reinventing generic concepts each time.

This strategy offers significant advantages. It prevents costly alignment work that would be required if partners independently developed their own models and then tried to reconcile them. It makes explicit what is generic (drawn from standards) versus context-specific (particular to your collaboration). It provides a shared conceptual foundation from which to derive artifacts as needed.

Producing any given artifact—vocabulary, taxonomy, thesaurus, ontology—is not mandatory. It is entirely value-driven. However, if multiple artifacts are produced that address the same topic, they must be aligned for global consistency. Without this alignment, you risk fully inconsistent representations of the same knowledge across different layers of formalization. This is precisely where explicit derivation governance becomes critical: it ensures that when artifacts are created, they remain semantically and structurally consistent with each other and with their conceptual foundations.

The "conceptual foundation first" principle should be understood as: be rigorous about what you're modeling within your defined scope—not "model everything comprehensively before building anything." The framework supports starting from established standards or use case-driven competency questions, rigorous conceptualization for the bounded scope you've defined, explicit derivation rules only for artifacts that deliver value in your context, and iterative expansion guided by new use cases or collaboration requirements, not abstract completeness.

Two complementary strategies emerge. A use case-driven approach starts with specific business use cases and competency questions, builds minimal artifacts to deliver immediate value, and expands incrementally as new use cases emerge. This is appropriate for greenfield projects, exploratory domains, and rapid value delivery. A standards-driven approach starts from established domain standards (the "dictionary"), derives relevant subsets for specific collaboration contexts (the "sentences"), and builds artifacts only where alignment value justifies the cost. This is appropriate for regulated domains, multi-partner interoperability, and leveraging existing consensus.

Both strategies benefit from explicit derivation governance. Use case-driven approaches need it to maintain consistency as the system grows incrementally. Standards-driven approaches need it to ensure derived subsets remain aligned with source standards and with each other.

The framework's core message should be clarified: This framework is not about mandating artifacts or comprehensive modeling. It is about formalizing the principles that ensure semantic consistency when artifacts are created—whatever the strategic approach, whatever the scope. Whether you start from use cases or standards, create minimal artifacts or richer taxonomies, model narrowly or broadly, the framework provides explicit documentation of why each artifact exists (business value justification), clear rules for how artifacts derive from conceptual foundations or source standards, mechanisms for verifying that multiple artifacts remain consistent, and traceability that enables iteration and refinement without breaking existing work.

The framework enables rigorous execution within whatever scope your context demands—it does not dictate what that scope should be.


Nice rigor, but before getting into the detail I think the method should address the questions: - how many of these artifacts are actually needed for a given knowledge graph development: is there really business value in developing separate (controlled) vocabulary, taxonomy, thesaurus, ontology? - having developed them through to ontology, are each separately maintained in their own right when (ontology) changes are needed? - (the big one for me) how is the work scoped? I, with others, think that trying to model a whole business, or even a domain, is in danger of modeling for its own sake and proverbially boiling the ocean. I advocate a pipeline starting with a specific business use case (or a small number) delivering clear business value, with competency questions expressed in business language - that then provides the vocabulary and the scope of capabilities needed in the KG. Having delivered the business value then expand with further use cases, iterate and repeat.

As usual I tend to agree, without reading the details. Bigger problem however is, does our management even understand the difference between implicit and explicit? Some former bosses imo definitely didn’t (let alone the difference between intrinsic and extrinsic motivation)

It is a very powerful knowledge engineering framework if domain-related or discipline-related rules could be established to enable traceability and relevance handling (or relationships) across the different components in DCAT repository. We're building a consolidated data architecture for enterprise data ecosystems (EDE) with a 360 view to modelling, navigation and observability of different parts or perspectives of EDE, through a similar framework or modelling approach. Curious whether such a structure could help cognitive and learning activities of future AI agents working in such as a knowledge space.

Like
Reply

Well explained, thanks for sharing, Nicolas Figay

Like
Reply

Exactly true. Airbus and your work is getting noticed and making an impact in this uncertain world!

Like
Reply

To view or add a comment, sign in

More articles by Nicolas Figay

Others also viewed

Explore content categories