The Enterprise Knowledge Graph Life Cycle
The enterprise knowledge graph life cycle provides an overview of the actors and agents involved during the most important operational steps for the (ongoing) development of the graph. This ranges from data inventory, extraction and curation, modeling (authoring), various transformation steps, to linking and enrichment (e.g., inferred data), and analysis or feedback of the newly acquired data into existing database systems. In reality, there are three cycles that are intertwined: the expert loop (following HITL, the human-in-the-loop design principle), the automation loop, and the user loop.
A solid foundation for the creation of high quality data graphs can only be established if sufficient time is invested in the creation and maintenance of curated taxonomies and ontologies, but even these steps can be partially automated. Within the loops, agile and iterative working methods are predominant, whereby individual process steps can interact with each other.
The knowledge graph life cycle points out the following five aspects:
- The development of knowledge graphs is an endeavour involving several stakeholders.
- Developing knowledge graphs means to proceed iteratively and agilely, not linearly.
- Humans and machines should be equally involved in building an enterprise knowledge graph.
- The knowledge graph is constantly being developed further in three loops that are linked together.
- The aim is always to balance the three most important perspectives on the knowledge graph: representing domain knowledge, linking company data, and enriching it with user contexts.
Expert Loop
The Expert Loop involves predominantly knowledge engineers and subject matter experts working on ontologies and taxonomies to be further used by the other loops. Here are the main tasks:
- Inventory: run scoping sessions with business users and SMEs using card-sorting and taxonomy tools combined with automated analysis of selected content and data sources to determine which areas of interest, in combination with which data sets, are important for getting started.
- Extract: extract relevant types of business objects, entities, and topics from identified data sets and put them into the individual enterprise context and link them to specific application scenarios.
- Author: in several iteration steps, develop a viable ontology and taxonomy architecture, which can, for example, consist of several core ontologies and department-specific taxonomies. At the same time, harmonize the associated governance model with the organizational culture and the overall KG governance model.
- Clean: curate suggestions from ML-based tools like corpus analysis. Clean up and adapt taxonomies and ontologies that are reused in the specific organizational setting.
- Link: using ML algorithms, links between entities and concepts from different graphs, mainly between taxonomies, are curated and created.
Automation Loop
Data Engineers and MLOps are responsible for all matters within the Automation Loop.
- Ingest: retrieve data from defined sources and ingest data generated within the user loop for further processing, track provenance and provide data lineage information including technical metadata involving data transformations.
- Clean: clean data from various sources with help from ontologies and corresponding consistency checks automatically.
- Transform: with knowledge graphs in place, most of the ingested data and metadata can be transformed into RDF-based data graphs. Transformation steps follow the rules expressed by domain-specific taxonomies and ontologies.
- Enrich: automatic entity extraction and lookup in knowledge graphs for context information help to enrich data points automatically. Additionally, powerful inferencing mechanisms by using ontologies and constraint languages like SHACL enrich enterprise data sets.
- Link: linking on entity level, not only schema mapping, will generate a rich enterprise knowledge graph. Machine learning and algorithms such as spreading activation can automatically generate links between several graphs and data sets automatically with high precision (so-called 'graph reconcilitation').
User Loop
As beneficiaries of the knowledge graph, mainly business users and data scientists interact with the data within the User Loop, but not only as passive users but also as data producers:
- Extract: using digital assistants or more basic filtering methods such as faceted browsing, business users can extract even small chunks of information or single data points from large data sets precisely and efficiently. Graphs are the key to unlocking the value of large data sets by helping users to narrow down the search space based on individual information needs.
- Analyze: graphs and query languages as SPARQL provide additional means for powerful data analytics and also help to lower the barrier for user self-servicing complementing traditional data warehouses and their rather rigid reporting systems.
- Visualize: business users benefit from linked data, especially when visualizing relationships between business objects and topics. This can be used to analyze causalities or risks in complex systems, to identify hubs in social or IT networks, or just to better understand how things relate in a knowledge domain, etc. But enterprise data modeled as graphs do not necessarily have to be visualized as graphs, but rather serve as a flexible model to present and interpret data in a more individual way than would be possible with rigid data models.
- Interact: users in such systems are also data producers when they interact with the knowledge graph. While they benefit from comprehensive guidance through extensive data landscapes, users also provide feedback on the overall system and their behavior can be used to further enrich the knowledge graph.
- Train models: data scientists can better filter and reuse data through semantically enriched metadata. Relevant data sets can thus be quickly extracted from data catalogs and used specifically for training ML algorithms. Data enriched and linked with knowledge graphs also have a higher expressiveness and are suitable, for example, for the training of classifiers even if only smaller volumes of training data are available.
Conclusion and next steps
The majority of technology platforms used in the development and implementation of enterprise knowledge graphs are specialized in one of the three loops. As a result, only special applications based on graphs can be implemented. Only the right mix and a balanced interaction of the three loops can support a long-term knowledge graph vision and strategy of a company. With the expert loop in the game, which interfaces with the automation loop, every AI system based on knowledge graphs automatically becomes an explainable AI.
Download The Knowledge Graph Cookbook and learn more about the knowledge graph life cycle and how it can be implemented in your organization.
Was fun to see "#SHACL" & "#MLOps" mentioned together. "State of the art" in #MLOps is completely disconnected from #DataGovernance. 😒
Interesting grid on what to do and who to talk to create a common view on what I would call enterprise information architecture.
But there is nothing about ingredients ... https://caminao.blog/knowledge-architecture/ontologies-ea/