Normalization - The Lost Database Rules
I don't know what is happening in the database world today, but often times I see application's schemas that are not following normalization rules. And I'm speaking about OLTP databases.
Database Normalization is the process in which database tables are designed or modeled to reduce redundant information, minimize storage utilization and help with transaction processing. The objective of Normalization is to free the databases of data modification anomalies.
There are three basic rules for database normalization, the rules are called normal forms. These rules have been part of the relational databases since their conception back in the 1970's by Edgar Codd.
The First Normal Form (FNF) is about data atomicity. Data in a table should not be duplicated and there should be no repeating groups. This means that the data must be unique, and that there should not be two different values describing the same column. A non-complying example of this is a table named ITEM with columns [ITEM_ID(PK)][COLOR][PRICE]. ITEM_ID might have different COLORs but the same PRICE, this will violate the FNF.
The Second Normal Form (2NF) is about data dependency. In addition to comply with the FNF, the non-key columns in the table must only be related to the whole primary key. A non-complying example of this is a table named ITEM_MANUFACTURE with columns [COUNTRY_ID (PK)][ITEM_ID(PK)][COUNTRY]. COUNTRY is dependent of COUNTRY_ID, but not on ITEM_ID - that is partially dependent on the PK. This condition violates the 2NF.
The Third Normal Form (3NF) is about data transitivity. In addition to comply with 2NF, the non-key columns must not be related to other non-key columns. A non-complying example of this is a table named SHIPMENT with columns [SHIP_ID(PK)][WH_ID][WH_DESC]. In this example, I can get WH_DESC by using WH_ID, which is not part of the PK.
Probably these concepts are not being taught well, students might not be paying attention to this class, probably non-Computer Science consultants are designing databases and are unaware of these concepts, or perhaps laziness or project pressure to release code are some of the reasons why these rules are not being followed anymore.
The thing is, every day we keep getting bad OLTP applications that translate in bad performance and wastage of resources, and this could have been prevented if someone have taken the time to verify the database conforms with these 3 basic rules. Packaged Applications and Custom Applications are suffering from the same disease alike.
Anyway, there are more normalization rules, but I'd be more than happy if we start with the first three.
Yes; whoever is designing new modules or making modifications to packaged applications is treating the database as a mere data repository; developers are thinking more about application design than proper data modeling. In a related note, Functional Specifications nor Technical Specs no longer include Data behavior from a business perspective so maybe this is the origin for the lack of modeling. Nonetheless, it is ugly when they don't follow the rules.
Hi Adrian, I have had a few interviews for a DB architect and Modeler and only one of 5 asked me anything about normalization. While 3NF had been the most widely used schema for Data Warehouses, the advent of Star, Snow Flake or Star Flake schemas to accommodate the business intelligence software medium had become a requirement. What some do not understand is during ETL, building the FACT tables become time consuming and cumbersome due to the amount of data consumed by businesses. Thus near realtime updates are almost non-existent. There are solutions such as the hybrid solution I presented and implemented for one of my clients that worked very well.