Methodological Framework for Knowledge Graph Development
Overcoming Pipeline Approach Limitations through Conceptual-Operational Integration
Executive Summary
Traditional pipeline approaches to knowledge graph development (Controlled Vocabularies → Standard Metadata → Taxonomies → Thesauri → Ontologies → Knowledge Graphs) are effective when guided by deep understanding of the technologies and explicit governance of their underlying principles.
However, when the rules connecting each stage remain implicit rather than formalized, early conceptual choices can become progressively more difficult to examine and adjust as they cascade into downstream artifacts.
This framework proposes to strengthen pipeline approaches by making explicit the conceptual foundations, derivation rules, and discipline-specific principles that—when formalized—enable reliable, scalable, and auditable knowledge graph development:
rigorous conceptual modeling integrated with iterative operational materialization, governed by explicit derivation rules and managed through a DCAT-based artifact repository that remains semantically and structurally traceable.
1. Problem Statement: Tacit Risks in Pipeline Approaches
1.1 Nature of the Risk
Pipeline approaches (Controlled Vocabularies → Standard Metadata → Taxonomies → Thesauri → Ontologies → Knowledge Graphs) are effective methodological frameworks when applied with deep understanding of the technologies involved and explicit governance of their underlying principles. However, when deployed without this awareness—or when their implicit rules and semantic assumptions remain tacit rather than formalized—several risks emerge:
1.2 Potential Issues When Implicit Rules Remain Unexamined
Tacit Assumptions About Conceptual Alignment
Semantic Slippage Across Layers
Discipline-Specific Tacit Knowledge
Reversibility and Iteration Complexity
Risk Amplification at Scale
1.3 The Framework's Role
Rather than rejecting pipeline approaches, this framework makes explicit what effective pipeline practice requires: the conceptual foundations, derivation rules, and governance principles that—when tacit—create risks but—when formalized—make pipelines powerful and reliable.
2. Proposed Framework: Conceptual-Operational Integration
2.1 Core Principles
Principle 1: Conceptual Foundation First Rigorous philosophical and domain-specific conceptualization precedes artifact creation. This is not a preliminary phase but an ongoing practice that remains active throughout the lifecycle.
Principle 2: Iterative Maturation The framework embraces iteration: conceptual models are refined through cycles of formalization, implementation, confrontation with reality, and conceptual re-elaboration. Maturity is achieved progressively, not presumed.
Principle 3: Governed Derivation Every downstream artifact is derived from upstream conceptual choices through explicit, auditable derivation rules. These rules are not implicit conventions but formalized relationships that can be verified, traced, and—when necessary—reversed.
Principle 4: Bidirectional Traceability The system maintains mappings between artifacts and their conceptual foundations in both directions: from conceptualization to materialization (forward derivation) and from artifacts back to their justifications (reverse tracing).
Principle 5: Semantic and Structural Consistency At every level, artifacts are validated for consistency both with their conceptual foundations and with each other. Inconsistencies trigger re-examination rather than being papered over with additional formalization.
2.2 Operational Architecture
┌─────────────────────────────────────────────────────────────┐
│ CONCEPTUAL FOUNDATION (Iterative Practice) │
│ - Domain ontology (philosophical and domain-specific) │
│ - Conceptual decisions and their justifications │
│ - Semantic clarifications and boundary definitions │
│ - Explicit acknowledgment of limitations and ambiguities │
└──────────────────────┬──────────────────────────────────────┘
│
┌─────────────┴─────────────┐
│ DERIVATION GOVERNANCE │
│ - Derivation rules │
│ - Transformation rules │
│ - Consistency validators │
└─────────────┬─────────────┘
│
┌─────────────┴─────────────────────────┐
│ DCAT-BASED ARTIFACT REPOSITORY │
│ (Managed by Methodological Rules) │
│ │
│ - Controlled Vocabularies │
│ - Standard Metadata Schemas │
│ - Taxonomies │
│ - Thesauri │
│ - Ontologies (RDF, OWL) │
│ - Knowledge Graphs (RDF, Property │
│ Graphs, Embeddings) │
│ │
│ With explicit versioning, lineage, │
│ and derivation provenance │
└─────────────────────────────────────┘
│
┌─────────────┴──────────────────┐
│ ACTIVATION LAYER │
│ - Query governance │
│ - Consistency checking │
│ - Change impact analysis │
│ - Iterative refinement │
└────────────────────────────────┘
3. Detailed Components
3.1 Conceptual Foundation
Definition: Explicit, documented understanding of what is being modeled and why.
Comprises:
Practice:
3.2 Derivation Governance
Definition: Formalized rules that define how upstream conceptual choices generate downstream artifacts.
Illustrative Examples of Derivation Rules:
Rule 1: Vocabulary Derivation
Rule 2: Ontology-from-Taxonomy Derivation
Rule 3: Knowledge Graph Population Consistency
Implementation:
3.3 DCAT-Based Artifact Repository
Definition: Centralized, structured repository of all knowledge artifacts, managed by derivation governance rules.
Artifact Types:
DCAT Extensions:
Standard DCAT properties enhanced with:
Repository Capabilities:
3.4 Activation Layer
Definition: Dynamic governance practices that use the repository to maintain semantic and structural consistency through iterations.
Activation Mechanisms:
1. Consistency Checking
ON artifact_modification:
FOR EACH downstream_artifact IN get_dependents(modified_artifact):
validation_results = apply_derivation_rules(modified_artifact, downstream_artifact)
IF inconsistency_detected:
FLAG for_review(downstream_artifact, validation_results)
ALERT stakeholders with_justification(what_changed, why_inconsistent)
2. Change Impact Analysis
3. Query Governance
4. Iterative Refinement Protocol
CYCLE:
1. Identify inconsistency or requirement
2. IF conceptual foundation requires revision:
Update conceptual model with justification
Apply derivation rules to propagate changes
Validate all downstream artifacts
Document decision and rationale
3. IF only operational artifact requires revision:
Check against derivation rules
If compliant, update artifact
If non-compliant, escalate to conceptual review
4. Test against real-world usage
5. Feed learnings back into conceptual foundation
4. Addressing Pipeline Approach Problems
4.1 Problem: Cumulative Error Propagation
Pipeline Approach: Errors introduced early persist through all layers, becoming increasingly difficult to correct.
This Framework:
Mechanism: When a knowledge graph instance violates an ontology axiom, the framework traces back: Is this a data quality issue, an ontology error, or a conceptual confusion? The lineage metadata answers this question.
4.2 Problem: Semantic Opacity
Pipeline Approach: Formal appearance masks unresolved conceptual confusion.
This Framework:
Mechanism: An ontology axiom linked to its conceptual justification reveals whether the formalism represents genuine semantic understanding or merely syntactic standardization.
4.3 Problem: Irreversibility and Path Dependency
Pipeline Approach: Downstream dependencies on upstream errors make correction prohibitively expensive.
This Framework:
Mechanism: Modifying an ontology class definition automatically flags which knowledge graph assertions depend on the old definition, enabling staged migration.
4.4 Problem: Institutional Embedding of Fallacies
Pipeline Approach: Logical errors become formalized as axioms, appearing legitimate through their formal representation.
This Framework:
Mechanism: An axiom implementing a logical fallacy would fail to pass conceptual justification review before being derived into downstream artifacts.
5. Implementation Approach
Recommended by LinkedIn
5.1 Minimum Viable Implementation
Phase 1: Foundation
Note: This is illustrative; specific implementation will vary based on domain and organizational context.
Phase 2: Activation
Phase 3: Maturation
5.2 Technical Enablers
DCAT Profile Extensions
Tooling
Organizational
5.3 Integration with Existing Standards
6. Key Distinctions from Pipeline Approaches
7. Conclusion
This framework complements rather than replaces established pipeline methodologies. Its purpose is to formalize the implicit assumptions, tacit rules, and disciplinary expertise that make pipeline approaches effective when applied with deep understanding.
By making explicit:
...this framework enables organizations to:
The DCAT-based repository activated through derivation governance becomes the mechanism for managing these explicit rules—making pipeline practices transparent, auditable, and resilient to change.
8. Invitation for Feedback and Refinement
This draft framework raises questions rather than provides definitive answers. Critical engagement are welcome with:
Conceptual Issues
Practical Feasibility
Disciplinary Adaptation
Technical Realization
Critique and Alternatives
Feedback, critique, and proposals for refinement through discussion, pilot implementations, and cross-disciplinary dialogue are welcome.
Annex: From Controlled Vocabulary to Ontology — Epistemic Foundations
Understanding the Critical Distinction
The framework presented in the main article assumes a clear distinction between different artifact types in the knowledge graph pipeline. However, practitioners often attempt to evolve a controlled vocabulary directly into an ontology, expecting the progression to be continuous. This annex clarifies why this approach fails and explains the fundamental epistemic differences that underpin the framework's derivation governance principles.
Two Distinct Objects, Not Two Stages
A controlled vocabulary (including thesauri and term lists) is fundamentally a prescriptive, flat resource: a collection of standardized terms with simple relationships—synonymy, generic hierarchy (generalization/specialization), thematic associations. Its objective is pragmatic standardization: ensuring consistency in indexing and information retrieval. We say "automobile" rather than "car" or "auto"; we use "economic depression" rather than "crisis" or "recession." A controlled vocabulary is governed by conventional agreement: "we use these terms in this way."
An ontology, by contrast, is a structured representation of reality itself. It does not catalog terms; it models concepts, their properties, their complex relationships, and the logical rules that govern them. An ontology asks fundamentally different questions: What is an "automobile" in relation to a "vehicle"? What are its constitutive parts? What logical relations bind it to other entities? How do we distinguish an automobile from similar entities? An ontology is not flat but multidimensional, formally structured, and—critically—logically coherent.
This is not a difference of degree or complexity. It reflects an epistemic gulf: controlled vocabularies are tools for managing agreement on terminology; ontologies are models of conceptual structure grounded in understanding of the domain itself.
The Epistemic Gap
Three systemic differences explain why vocabulary-to-ontology progression fails:
Polysemy and Granularity: A controlled vocabulary tolerates semantic ambiguity managed through convention. A term can hover between multiple interpretations as long as practitioners understand how to apply it. An ontology, however, demands radical clarification: it must distinguish the separate concepts hiding behind a single term. It must answer: are these genuinely distinct entities, or merely different applications of one concept? This question cannot be answered by extending the vocabulary—it requires reconceptualizing the domain itself.
Formalization of Logical Structure: Relations in a controlled vocabulary are declarative and flat—"X is narrower than Y," "A is related to B." These are annotations, useful but not computationally meaningful in a strong sense. An ontology requires formal logical structure: relations have precise semantics that enable inference, inheritance, constraint propagation. An axiom in an ontology is not merely a labeled edge; it is a logically valid statement that machines can reason over. This transformation cannot be achieved by adding layers of complexity to a vocabulary; it requires reconstituting the representation from the ground up in a logical framework.
Specification of Properties and Constraints: A controlled vocabulary never specifies what can be a property of a concept, or under what constraints properties apply. An ontology must formalize this explicitly: domain and range constraints, cardinality restrictions, property inheritance hierarchies. Moving from vocabulary to ontology is not an extension but a categorical shift from terminological standardization to conceptual formalization.
Why Direct Progression Fails
Attempting to "upgrade" a controlled vocabulary into an ontology by adding detail and structure creates what might be called a pseudo-ontology: formally elaborate but logically fragile, because it lacks the deep conceptual clarity that should ground an ontology.
The problems are systematic:
Accumulated Ambiguity: A vocabulary that was deliberately tolerant of semantic ambiguity becomes an ontology in which that same ambiguity is now formalized and unexamined. What was managed as pragmatic flexibility becomes embedded as logical inconsistency.
Layer Collapse: The vocabulary may conflate distinct concepts (for pragmatic terminological reasons). When formalized as an ontology, these conflations appear as logical axioms—and now it becomes costly and organizationally disruptive to separate them, since downstream applications depend on their conflation.
Missing Conceptual Grounding: An ontology derived from a vocabulary inherits no understanding of why the concepts are structured as they are. It has form without foundation. When inconsistencies emerge (and they will), there is no conceptual basis for resolving them—only the inertia of prior choices.
False Rigor: The formal appearance of ontological structure can mask the absence of genuine ontological clarity. An axiom represented in OWL is no more meaningful than the same statement in plain language if it reflects unexamined conceptual confusion. Formal notation creates an illusion of rigor that can suppress the critical examination needed to detect the confusion.
The Inverse Approach: Conceptually Grounded
The evidence—both from the framework presented in the main article and from practice—suggests that a reverse approach is far more robust: begin with rigorous conceptual modeling that clarifies what exists in the domain and how it is organized, then derive from this ontology a controlled vocabulary that reflects the conceptual structure clearly.
This inverted approach works because it respects the epistemic order:
Integration with Derivation Governance
This epistemic inversion aligns directly with the framework's Derivation Governance principle. A derivation rule from ontology to controlled vocabulary might read:
Rule: For each class C in the formal ontology with scope S and distinguishing properties P1...Pn, the controlled vocabulary includes a term T such that:
This rule formalizes what should be intuitive: vocabulary terms are derived from ontological clarity, not the other way around. Changes flow downward from concept to term, not upward from term to concept.
Practical Implications for Pipeline Practice
For organizations using the pipeline approach described in the main article:
When You Have an Existing Controlled Vocabulary: Treat it as a data point about current practice, not as canonical. Extract the conceptual insights it embodies (often vocabulary terms reveal important distinctions), but do not assume its structure is optimal. Use it to inform conceptual modeling, not to constrain it.
When Building an Ontology: Invest heavily in conceptual work before formalizing. Document the domain model philosophically—what entities exist, why they are distinguished, what relationships hold between them. Only then move to formal representation. This is expensive upfront but prevents the accumulation of unfounded axioms.
When Standardizing Terminology: Derive your controlled vocabulary from a clear ontology (even if partial or preliminary). This ensures that vocabulary choices reflect genuine conceptual distinctions, making the vocabulary more robust and more useful for knowledge graph population and querying.
For Iteration and Refinement: When conceptual errors surface (and they will), the framework's bidirectional traceability allows you to trace from vocabulary term back to ontological axiom back to conceptual justification. You can then correct at the appropriate level—whether that is correcting a misconception in the conceptual foundation or simply adjusting terminology to better reflect a sound concept.
Conclusion
The progression from controlled vocabulary to ontology is not a pipeline but a conceptual leap. Attempting to make that leap by elaborating and formalizing the vocabulary fails because it conflates terminological standardization with conceptual modeling. The reverse—beginning with rigorous conceptual clarity and deriving vocabulary from it—respects the epistemic order and produces more robust, auditable, and maintainable knowledge structures.
Within the framework of the main article, this distinction explains why derivation governance must flow from conceptual foundation through formal ontology to downstream artifacts (including controlled vocabularies). Reversing that flow—attempting to derive conceptual clarity from vocabularies—creates the accumulation of tacit assumptions and semantic ambiguities that the framework is designed to prevent.
Responses to some raised questions about the article
Question 1: Artifact Necessity, Business Value, and Scoping
A critical question emerges when reviewing this framework: How many of these artifacts are actually needed for a given knowledge graph development? Is there real business value in developing separate controlled vocabularies, taxonomies, thesauri, and ontologies? Having developed them through to ontology, should each be separately maintained when changes are needed? Most importantly: How should the work be scoped?
There is a legitimate concern that attempting to model an entire business or domain risks "modeling for its own sake" and, proverbially, boiling the ocean. An alternative approach advocates starting with specific business use cases delivering clear value, with competency questions expressed in business language—letting those questions define the vocabulary and scope needed in the KG. Value is delivered first, then the system expands incrementally through further use cases.
My response: Context-Driven Application and Strategic Starting Points
Different tactics and strategies exist, always driven by specific needs and contexts. The framework presented here does not prescribe a universal approach but rather formalizes principles that apply across different strategic choices.
Consider a specific application domain: preparing governance and building architecture for continuous operational interoperability between partners and domains working on complex products (such as aircraft development). In such contexts, the starting point is often not a blank slate but rather legacy open and de facto standards agreed upon by communities of international experts. The challenge becomes deriving useful and relevant subsets to cover specific collaboration cases. Think of the open standard as a dictionary, and collaboration cases as sentences—you pick what you need rather than reinventing generic concepts each time.
This strategy offers significant advantages. It prevents costly alignment work that would be required if partners independently developed their own models and then tried to reconcile them. It makes explicit what is generic (drawn from standards) versus context-specific (particular to your collaboration). It provides a shared conceptual foundation from which to derive artifacts as needed.
Producing any given artifact—vocabulary, taxonomy, thesaurus, ontology—is not mandatory. It is entirely value-driven. However, if multiple artifacts are produced that address the same topic, they must be aligned for global consistency. Without this alignment, you risk fully inconsistent representations of the same knowledge across different layers of formalization. This is precisely where explicit derivation governance becomes critical: it ensures that when artifacts are created, they remain semantically and structurally consistent with each other and with their conceptual foundations.
The "conceptual foundation first" principle should be understood as: be rigorous about what you're modeling within your defined scope—not "model everything comprehensively before building anything." The framework supports starting from established standards or use case-driven competency questions, rigorous conceptualization for the bounded scope you've defined, explicit derivation rules only for artifacts that deliver value in your context, and iterative expansion guided by new use cases or collaboration requirements, not abstract completeness.
Two complementary strategies emerge. A use case-driven approach starts with specific business use cases and competency questions, builds minimal artifacts to deliver immediate value, and expands incrementally as new use cases emerge. This is appropriate for greenfield projects, exploratory domains, and rapid value delivery. A standards-driven approach starts from established domain standards (the "dictionary"), derives relevant subsets for specific collaboration contexts (the "sentences"), and builds artifacts only where alignment value justifies the cost. This is appropriate for regulated domains, multi-partner interoperability, and leveraging existing consensus.
Both strategies benefit from explicit derivation governance. Use case-driven approaches need it to maintain consistency as the system grows incrementally. Standards-driven approaches need it to ensure derived subsets remain aligned with source standards and with each other.
The framework's core message should be clarified: This framework is not about mandating artifacts or comprehensive modeling. It is about formalizing the principles that ensure semantic consistency when artifacts are created—whatever the strategic approach, whatever the scope. Whether you start from use cases or standards, create minimal artifacts or richer taxonomies, model narrowly or broadly, the framework provides explicit documentation of why each artifact exists (business value justification), clear rules for how artifacts derive from conceptual foundations or source standards, mechanisms for verifying that multiple artifacts remain consistent, and traceability that enables iteration and refinement without breaking existing work.
The framework enables rigorous execution within whatever scope your context demands—it does not dictate what that scope should be.
Nice rigor, but before getting into the detail I think the method should address the questions: - how many of these artifacts are actually needed for a given knowledge graph development: is there really business value in developing separate (controlled) vocabulary, taxonomy, thesaurus, ontology? - having developed them through to ontology, are each separately maintained in their own right when (ontology) changes are needed? - (the big one for me) how is the work scoped? I, with others, think that trying to model a whole business, or even a domain, is in danger of modeling for its own sake and proverbially boiling the ocean. I advocate a pipeline starting with a specific business use case (or a small number) delivering clear business value, with competency questions expressed in business language - that then provides the vocabulary and the scope of capabilities needed in the KG. Having delivered the business value then expand with further use cases, iterate and repeat.
As usual I tend to agree, without reading the details. Bigger problem however is, does our management even understand the difference between implicit and explicit? Some former bosses imo definitely didn’t (let alone the difference between intrinsic and extrinsic motivation)
It is a very powerful knowledge engineering framework if domain-related or discipline-related rules could be established to enable traceability and relevance handling (or relationships) across the different components in DCAT repository. We're building a consolidated data architecture for enterprise data ecosystems (EDE) with a 360 view to modelling, navigation and observability of different parts or perspectives of EDE, through a similar framework or modelling approach. Curious whether such a structure could help cognitive and learning activities of future AI agents working in such as a knowledge space.
Well explained, thanks for sharing, Nicolas Figay
Exactly true. Airbus and your work is getting noticed and making an impact in this uncertain world!