Normalising Property Graphs
Avoiding redundant values in a property graph by creating new nodes

Normalising Property Graphs

#Database #normalization aims at grouping properties such that common #updates and #queries can be processed efficiently. It is therefore profound to #dataintegrity and #datamanagement. State-of-the-art #normalization for #relationaldatabases goes back to the late 1970s. Here, a lossless, dependency-preserving decomposition into Third Normal Form (3NF) can always be guaranteed. In fact, whenever some lossless, dependency-preserving decomposition into Boyce-Codd Normal Form (BCNF) exists, it can be found. The latter is preferable as BCNF eliminates #redundant data values as long as they are caused by functional dependencies, while 3NF guarantees the fewest sources of data redundancy among all dependency-preserving decompositions, but it may need to tolerate data redundancy at a level that is unknown in advance.

In recent times, #graphdatabases have resurfaced and evolved into mature #datamodels that support many modern application needs. Many proposals for query languages, #constraints and #schema have emerged, and collaborations between academia and practitioners have resulted in recommendations for standards. However, the topic of #graphdatabase #normalization has not attracted much attention yet, despite strong needs for such frameworks by any mature model of data. Indeed, there are many challenges that make #normalization for #graphdatabases challenging. For example, graph data is semi-structured, the existence of schemata is only optional, and properties may be missing, to name a few.

My PhD student Philipp Skavantzos and I have proposed the first #normalization framework for property graphs. The framework does not require a #schema, but normalizes a given set of graph-tailored #uniqueness #constraints and functional dependencies. We have successfully transferred state-of-the-art normalization from relational to graph databases.

Our proposal has been accepted by the 49th International Conference on Very Large Data Bases (VLDB 2023), to be held in Vancouver in September. Please see the presentation below to catch a glimpse how the restructuring of graphs can eliminate redundant property values, proportional to the speed up of updates and some query operations, such as aggregation.


To view or add a comment, sign in

More articles by Sebastian Link, DataProf

Others also viewed

Explore content categories