Normalization Of Knowledge
Photo by detait on Unsplash

Normalization Of Knowledge

Normalization is NOT an engineering exercise.  The central principle that underlies its rules is that data redundancy, whether repeating groups of attributes, or repeating attribute values, is evidence that more than one function’s descriptive pattern, exists within a data set.

The result of normalization is a model in which each attribute, each entity, and each relationship, speaks business truth about the things that data is evidence of.

In Logical Entity Relationship (ER) Modeling as Peter Chen defined it, Normalization isn’t the elimination of data redundancy in a database. There is no database yet. The goal is to semantically identify and describe the exact things that data describe, and those thing's interactive context. If that’s done, there will be no data solution redundancy.

Normalization according to the semantics of ER and Knowledge Modeling, begins by recognizing that the abstract realm of model subjects is that of human business action. The subjects are things of doing, whether performers of action, receivers of action, controllers of action, the action themselves, places at which action occurs, or things exchange or utilized in action.

Attributes, The Evidence of Action

The things of action expressed in modeling by entities should be recognized as functions, defined by the actions they perform. In a model, these actions are graphically illustrated by an entity’s relationships. It’s the actions expressed by an entity’s relationships that are the basis of its taxonomy. What it does, defines what it is.  

The attributes or properties of an entity are the evidence of the action performed by its function. Attributes reflect when the action happened, what’s the action’s status, how many of something was involved in the action, what are the controls of action, what’s the description of the actor that acted, etc.

When data pattern redundancy exists, whether repeating group attributes, or repeating attribute values, the redundancy reflects the fact that a function described by those attributes is repeated within the dataset.

Normalization doesn’t just impact the evidence of business action, it reveals function (entity) that the evidence describes, as well as actions performed (relationships) that the data is evidence of. This is why normalization isn’t “data” normalization; it is function normalization.

Normalization & Action Semantics Reveal Function

Figure 1- 1st Normal Form - Eliminate Repeating Groups

Article content

Consider the simple First Normal Form example of  Figure 1 above.  Eliminating the repeating groups of skill related columns in Employee on the left, we create Employee Skills entity on the right. Normalization identified a new entity, but it’s the predicate semantics of the newly created relationship that reveals the function (the action) of the new entity.

In our example, the action of Employee Skill is how it supports the parent function of Employee.

The knowledge representation of the semantics in Figure 1, are expressed in Figure 2 below, unadulterated by constructs demanded by RDMBS design.

Figure 2 - Employee Skill Knowledge Model Example

Article content

A significant knowledge miss in most knowledge models is reciprocating fact statements of actions performed in response to the action of the other relationship. In our simple example, because an Employee practices for fulfillment of their duties an Employee Skill, that Employee’s Skill gives value to the labor of the Employee.

Expressing relationship predicates according to an action performed by the entity’s function, is key to developing the knowledge necessary to write quality entity definitions. The semantics of a function’s actions, represented by an entity’s relationships, are the true basis of its definition. For instance: An Employee Skill is a specialized proficiency that an Employee is qualified to practice in fulfillment of their duties, which gives value to their labor. An entities action defines its function, which means, what it does defines what it is.

For those who argue against complex predicate phrasing in favor of single verbs, consider the information about the unmodeled functions Duty and Labor represented in Figure 2’s predicate phrases. Think of it as “conceptualization” that may indicate a need for expanded knowledge scope.

Refined Understanding Through Action Semantics

Often a relationship’s action semantics provides necessary insight to correctly name an entity according to its actual function, rather than initial perceptions of it.

Figure 3 - Transitive Dependent Data Example

Article content

Transitive data dependency often obscures understanding of business function. It exists when one or more attributes rely on another set of attributes within the same data set.

In Figure 3, attributes with repeating values centering around the Policy Transaction Effective Date attribute, are highlighted by the upper callout.

They do so because premium charges for one or more Coverages, which are highlighted by the lower callout, are transactionally grouped by policy change transactions, such as a new Policy Term, a Policy Endorsement (contract amendment), or a Policy Cancelation.

Figure 4 - 3rd Normal Form - Eliminate Transitive Relationships

Article content

From a business knowledge standpoint, the semantics of this denormalization doesn’t just hide an entity. It eliminates our ability to semantically express important business knowledge about how Premium is charged during a Policy Term.

In Figure 4, the newly created entity is named Policy Period, rather than Policy Term Transaction or Policy Transaction, because the semantics of its true function indicate it to be a thing of time duration, rather than a pure temporal event.

While the Policy Period is created by a premium bearing transaction, premiums charged are prorated for the days in the period between the transaction date and the end of the Policy Term.

Further, reversals of unearned premium are based on the proration period between the time of a coverage cancelation, or coverage risk change, and the end of the Policy Term. For this reason the new entity has been named Policy Period, since its real defintion is: A Policy Period prorates Policy Coverage_Premium according to a specific defintion of Coverage risk, based on the ratio of a Policy Period’s duration divided by the duration of its Policy Term.

Figure 5 - Normalized Knowledge Model

Article content

Figure 5 illustrates the same semantics in knowledge model form. It should be noted that the direction of knowledge graph relationship notation is quite different than that of an ER model. The direction of the relationship in an ER model is based on functional dependence.

In a Knowledge Model however, the notation direction indicates the direction of the action.

Any knowledge of functional dependence must be conveyed in the relationship predicate. Single verb phrases fail to provide this. For instance, in our example, “Policy Period prorates premium according to its risk duration, relative to the duration of a Policy Term”. The implication in the predicate is that the function of the Policy Period is dependent on the Policy Term, which is a commitment of protection for a set duration of time.

Conclusion

Because most things that are subjects of ER and Knowledge Modeling are things of business action, it is the action they perform that defines them. And the semantics of their actions is the visualization tool used to eliminates obfuscation, generalization, incongruity and gaps in knowledge of them.

Normalization is key to establishing correct taxonomy of business function.

Most consider Normalization to be normalization of data, but only because data isn’t recognized as evidence of action. Giving normalized data the context of business action semantics, extends and perfects business knowledge. So, Normalization is truly the normalization of business knowledge.

Well written. For those going this far down the rabbit hole of data and semantics, it'd recommend looking beyond ER and (also) venture into frameworks like the web ontology language and uml class diagrams. Not because those are necessarily better (they all have a place in most organisation data landscape), but also because understanding how these different frameworks try to capture our mental models , and how they differ from one other, can teach us a lot about how to represent those mental models better. As a data architect, I prefer thinking in terms rdfs/owl because the open world assumption helps with designing data integration, when I talk to software developers,, the closed world assumption, object oriented nature of UML brings me closer to their way of thinking, and when working with (relational) data engineers, ER does the job very well. And to the point of this article, in each of those contexts, one of the most important aspects is ensuring that everyone understands what the properties mean and what they are properties of.

Like
Reply

To view or add a comment, sign in

More articles by Donavon Gooldy

  • Archetypes of Business Knowledge – When Employee is not a Person

    Introduction Do you believe classifying an Employee as a Person in a Business Ontology, as illustrated in Figure 1, is…

  • What is Data?

    A leader of one of the consulting firms I’ve worked for in my career, answered our title question by stating that…

    1 Comment
  • Focusing The Language of Modeling

    A respondent to one of my recent LinkedIn posts commented that the terminology I use is confusing. Indeed, the way I…

  • Complete Predicates, Complete Knowledge

    My last article presented a case for use of relationship predicate phrase or complete predicates, over the common use…

  • Simple or Complex Predicates in Business Knowledge

    I was recently admonished, because my relationship verb phrases do not conform to a standard of “elementary…

  • Data Ontology & Taxonomy Primer – Part 6

    One of this series’ central principles of business knowledge and data modeling is that the things we model are not mere…

    5 Comments
  • Data Ontology & Taxonomy Primer – Part 5

    In Part Five, the Archetype basis of Ontology we introduced in Part Four, will be examined with greater nuance. As…

  • Data Ontology & Taxonomy Primer – Part 4

    It’s been a while since the last article in this series, and there is a lot more subject matter to cover as we explore…

    6 Comments
  • Data Ontology & Taxonomy Primer – Part 3

    We discovered in our last article’s examination of Data Ontology & Taxonomy, that business data describes human…

    2 Comments
  • Data Ontology & Taxonomy Primer – Part 2

    The first article in this series, presented Ontology & Taxonomy as a unified framework or methodology, rather than two…

Others also viewed

Explore content categories