Normalization Of Knowledge

Donavon Gooldy

Published Aug 6, 2025

Normalization is NOT an engineering exercise. The central principle that underlies its rules is that data redundancy, whether repeating groups of attributes, or repeating attribute values, is evidence that more than one function’s descriptive pattern, exists within a data set.

The result of normalization is a model in which each attribute, each entity, and each relationship, speaks business truth about the things that data is evidence of.

In Logical Entity Relationship (ER) Modeling as Peter Chen defined it, Normalization isn’t the elimination of data redundancy in a database. There is no database yet. The goal is to semantically identify and describe the exact things that data describe, and those thing's interactive context. If that’s done, there will be no data solution redundancy.

Normalization according to the semantics of ER and Knowledge Modeling, begins by recognizing that the abstract realm of model subjects is that of human business action. The subjects are things of doing, whether performers of action, receivers of action, controllers of action, the action themselves, places at which action occurs, or things exchange or utilized in action.

Attributes, The Evidence of Action

The things of action expressed in modeling by entities should be recognized as functions, defined by the actions they perform. In a model, these actions are graphically illustrated by an entity’s relationships. It’s the actions expressed by an entity’s relationships that are the basis of its taxonomy. What it does, defines what it is.

The attributes or properties of an entity are the evidence of the action performed by its function. Attributes reflect when the action happened, what’s the action’s status, how many of something was involved in the action, what are the controls of action, what’s the description of the actor that acted, etc.

When data pattern redundancy exists, whether repeating group attributes, or repeating attribute values, the redundancy reflects the fact that a function described by those attributes is repeated within the dataset.

Normalization doesn’t just impact the evidence of business action, it reveals function (entity) that the evidence describes, as well as actions performed (relationships) that the data is evidence of. This is why normalization isn’t “data” normalization; it is function normalization.

Normalization & Action Semantics Reveal Function

Figure 1- 1st Normal Form - Eliminate Repeating Groups

Consider the simple First Normal Form example of Figure 1 above. Eliminating the repeating groups of skill related columns in Employee on the left, we create Employee Skills entity on the right. Normalization identified a new entity, but it’s the predicate semantics of the newly created relationship that reveals the function (the action) of the new entity.

In our example, the action of Employee Skill is how it supports the parent function of Employee.

The knowledge representation of the semantics in Figure 1, are expressed in Figure 2 below, unadulterated by constructs demanded by RDMBS design.

Figure 2 - Employee Skill Knowledge Model Example

A significant knowledge miss in most knowledge models is reciprocating fact statements of actions performed in response to the action of the other relationship. In our simple example, because an Employee practices for fulfillment of their duties an Employee Skill, that Employee’s Skill gives value to the labor of the Employee.

Expressing relationship predicates according to an action performed by the entity’s function, is key to developing the knowledge necessary to write quality entity definitions. The semantics of a function’s actions, represented by an entity’s relationships, are the true basis of its definition. For instance: An Employee Skill is a specialized proficiency that an Employee is qualified to practice in fulfillment of their duties, which gives value to their labor. An entities action defines its function, which means, what it does defines what it is.

For those who argue against complex predicate phrasing in favor of single verbs, consider the information about the unmodeled functions Duty and Labor represented in Figure 2’s predicate phrases. Think of it as “conceptualization” that may indicate a need for expanded knowledge scope.

Refined Understanding Through Action Semantics

Often a relationship’s action semantics provides necessary insight to correctly name an entity according to its actual function, rather than initial perceptions of it.

Figure 3 - Transitive Dependent Data Example

Transitive data dependency often obscures understanding of business function. It exists when one or more attributes rely on another set of attributes within the same data set.

In Figure 3, attributes with repeating values centering around the Policy Transaction Effective Date attribute, are highlighted by the upper callout.

They do so because premium charges for one or more Coverages, which are highlighted by the lower callout, are transactionally grouped by policy change transactions, such as a new Policy Term, a Policy Endorsement (contract amendment), or a Policy Cancelation.

Figure 4 - 3rd Normal Form - Eliminate Transitive Relationships

From a business knowledge standpoint, the semantics of this denormalization doesn’t just hide an entity. It eliminates our ability to semantically express important business knowledge about how Premium is charged during a Policy Term.

In Figure 4, the newly created entity is named Policy Period, rather than Policy Term Transaction or Policy Transaction, because the semantics of its true function indicate it to be a thing of time duration, rather than a pure temporal event.

While the Policy Period is created by a premium bearing transaction, premiums charged are prorated for the days in the period between the transaction date and the end of the Policy Term.

Further, reversals of unearned premium are based on the proration period between the time of a coverage cancelation, or coverage risk change, and the end of the Policy Term. For this reason the new entity has been named Policy Period, since its real defintion is: A Policy Period prorates Policy Coverage_Premium according to a specific defintion of Coverage risk, based on the ratio of a Policy Period’s duration divided by the duration of its Policy Term.

Figure 5 - Normalized Knowledge Model

Figure 5 illustrates the same semantics in knowledge model form. It should be noted that the direction of knowledge graph relationship notation is quite different than that of an ER model. The direction of the relationship in an ER model is based on functional dependence.

In a Knowledge Model however, the notation direction indicates the direction of the action.

Any knowledge of functional dependence must be conveyed in the relationship predicate. Single verb phrases fail to provide this. For instance, in our example, “Policy Period prorates premium according to its risk duration, relative to the duration of a Policy Term”. The implication in the predicate is that the function of the Policy Period is dependent on the Policy Term, which is a commitment of protection for a set duration of time.

Conclusion

Because most things that are subjects of ER and Knowledge Modeling are things of business action, it is the action they perform that defines them. And the semantics of their actions is the visualization tool used to eliminates obfuscation, generalization, incongruity and gaps in knowledge of them.

Normalization is key to establishing correct taxonomy of business function.

Most consider Normalization to be normalization of data, but only because data isn’t recognized as evidence of action. Giving normalized data the context of business action semantics, extends and perfects business knowledge. So, Normalization is truly the normalization of business knowledge.

Joep van Genuchten 8mo

Well written. For those going this far down the rabbit hole of data and semantics, it'd recommend looking beyond ER and (also) venture into frameworks like the web ontology language and uml class diagrams. Not because those are necessarily better (they all have a place in most organisation data landscape), but also because understanding how these different frameworks try to capture our mental models , and how they differ from one other, can teach us a lot about how to represent those mental models better. As a data architect, I prefer thinking in terms rdfs/owl because the open world assumption helps with designing data integration, when I talk to software developers,, the closed world assumption, object oriented nature of UML brings me closer to their way of thinking, and when working with (relational) data engineers, ER does the job very well. And to the point of this article, in each of those contexts, one of the most important aspects is ensuring that everyone understands what the properties mean and what they are properties of.

See more comments

To view or add a comment, sign in

Normalization Of Knowledge

Donavon Gooldy

Attributes, The Evidence of Action

Normalization & Action Semantics Reveal Function

Recommended by LinkedIn

Refined Understanding Through Action Semantics

Conclusion

More articles by Donavon Gooldy

Others also viewed

Can AI validate data quality?

The Perils of Misunderstanding Analytical Competence: An Exploration of Technology, Intellect, and the “Syok Sendiri” Syndrome

Row-Level Security (RLS) vs Object-Level Security (OLS) in Power BI

Check This Before You Think “My Model Is Perfect”

Introduction To The Data Cleaning Process

Revolutionizing Data Transformation with Excel VBA and AI - A Journey of Efficiency and Innovation

Train/Test Split versus Cross-Fold Validation

Model Selection: Choosing the Right Algorithm for Your Data

Avoiding Lookahead Errors in Time Series: A Guide for Data Scientists

Measuring Data Quality: Metrics and KPIs

Explore content categories

Attributes, The Evidence of Action

Normalization & Action Semantics Reveal Function

Recommended by LinkedIn

Refined Understanding Through Action Semantics

Conclusion

More articles by Donavon Gooldy

Archetypes of Business Knowledge – When Employee is not a Person

What is Data?

Focusing The Language of Modeling

Complete Predicates, Complete Knowledge

Simple or Complex Predicates in Business Knowledge

Data Ontology & Taxonomy Primer – Part 6

Data Ontology & Taxonomy Primer – Part 5

Data Ontology & Taxonomy Primer – Part 4

Data Ontology & Taxonomy Primer – Part 3

Data Ontology & Taxonomy Primer – Part 2

Others also viewed

Can AI validate data quality?

The Perils of Misunderstanding Analytical Competence: An Exploration of Technology, Intellect, and the “Syok Sendiri” Syndrome

Row-Level Security (RLS) vs Object-Level Security (OLS) in Power BI

Check This Before You Think “My Model Is Perfect”

Introduction To The Data Cleaning Process

Revolutionizing Data Transformation with Excel VBA and AI - A Journey of Efficiency and Innovation

Train/Test Split versus Cross-Fold Validation

Model Selection: Choosing the Right Algorithm for Your Data

Avoiding Lookahead Errors in Time Series: A Guide for Data Scientists

Measuring Data Quality: Metrics and KPIs

Explore content categories