Linked Data for Analytics?

Linked Data for Analytics?

Isolated data silos represent one of the biggest obstacles to digitalization. New insights and discoveries, automation, and data-based products and services arise only from the networking of all available and relevant data, both internal and external. Only when all the data is working together can dependencies and larger patterns be identified and processes be understood and improved from start to finish.

Breaking open data silos and networking data is a challenge for many companies. It requires a professional understanding of both the organizational framework and the technical possibilities. Knowing how data can be used in this way becomes a decisive competitive advantage. And one approach to solve this is …

Linked Data

Linked data refers to a special approach for the semantic networking of structured data. The explicit modeling of relationships among data objects makes them comprehensible and analyzable for humans and machines. Linked data thus enables the contextual search of data objects and their relationships to each other. And that is valuable for Analytics.

Linked Data Use Cases for BI and Advanced Analytics

Linked data is becoming interesting for BI and analytics tasks as well as for supporting data governance. The main driving forces behind these initiatives are data transparency and traceability, data quality, and extended information.

Linked data is of particular interest for artificial intelligence applications because of the machine-readable storage of data, which enables the processing machine to take data and its relationships and context into account. On the other hand, the format supports the processing of unstructured data as an additional source, e.g., for building and training machine-learning models.

The relationship knowledge available in linked data also provides complementary information to support the data scientist, data engineer, or business analyst in finding and understanding data, accessing data, and using relationship knowledge for explorative analysis. Value is added because:

  • Relationships can be used as additional features in feature engineering.
  • Access to data can be supported by providing data sets as subgraphs.
  • Transparency and traceability can be achieved by analyzing the model, e.g., by data lineage analysis.
  • Other analytical functions can be applied, such as the analysis of navigation in the graph as well as special graph analyses.

Benefits and Limits

Linked data has been around for a while, but has been used primarily for special operational tasks. Linked data is known as a special application in the manufacturing and pharmaceutical sectors, where the aim is to network individual parts or ingredients in complex process chains across departments. Among BI and advanced analytics experts, interest in linked data is slowly increasing as its advantages are perceived. However, much training is still needed before BI and analytics experts can correctly assess the potential of linked data.

Benefits of Linked Data for Analytics

  • Analysis of complex relationships and search/navigation in data objects.
  • Use of object dependencies, data context, and additional features in the analytics process.
  • Unique object identification and data quality due to the model, even with different object naming and description possible
  • Easy changeability and extensibility of the model brings agility, flexibility, and speed compared to relational models.
  • Human/machine readability of data; self-documentation.

Limits of Linked Data

  • Front-end support of linked data models is limited; applications have to be built to use data.
  • Some query languages, such as GraphQL, are suitable for analytics only to a limited extent. They are oriented toward the analysis of relationships and less toward mathematical mass data operations such as aggregations.
  • BI and analytics practitioners will need to develop new skills, experience, and best practices to use linked data.

My Opinion on Linked Data for Analytics

There is increasing demand to find, understand, and use data in an uncomplicated way. Whether you are a business analyst, data steward, or IT-conscious data engineer, you need consistent and reliable data sources with quality-assured and uniformly understandable data. Linked data is a potential solution that focuses on relationships. It offers a different approach from most other solutions such as data platforms, data governance, data catalogs, metadata repositories, or data lakes.

Linked data is a useful supplement for data management and data analysis. A primary advantage for analytics is the enrichment of data with relational knowledge, which supports data evaluation and modeling. Thus, the linked data approach is also integrated into many databases, data integration tools, data catalogs, NoSQL databases, and solutions for master data management (MDM). This is done both as a basic paradigm and as an optional “engine” for processing data with relationship information.

Linked data has a great deal of potential, especially in machine-learning and artificial intelligence applications, which require machine-readable access to understandable company knowledge. The potential of linked data for analytics is not yet fully understood, but offers plenty of opportunity to explore. After all, who doesn’t dream of easy access to company data with a tool that allows information to be intuitively found, understood, and used in the right context?

More detailed information on linked data is available in our research note: “Linked Data: A Promising Approach for Data Governance and Advanced Analytics”. The research note is free for download due to a sponsorship of Conweaver: https://mailchi.mp/conweaver/linked-data_barc

To view or add a comment, sign in

More articles by Timm Grosser

Others also viewed

Explore content categories