Normalization in DBMS
What is Normalization?
The process by which a bigger dataset is further broken down into well-structured smaller multiple tables which reduces inconsistency, that does not cause any undesired result during any operation like insert, update, and delete is executed.
This ensures that there is no unnecessary duplication of data and the tables with proper relation between them.
Example
Let's say we have a group of superhero cards in the same box.
It is difficult to find only an IronMan card, we have to search the whole box and every card gets it.
If we separate them by which superhero belongs to MARVEL or DC and put them in the different boxes named MARVEL superheroes and DC superheroes. Now it makes it one step easier to find an IronMan card since we can search only in the MARVEL box instead of searching for whole cards.
Then we can break down each box further by sorting the cards by their nature like God or Human or alien.
Why Normalization?
Normalization improves efficiency and accuracy in data operation because the smaller datasets have a proper relation between them and no unnecessary duplicates. It reduces memory wastage and loss of data can be reduced to a large extent.
Terminologies
Types of Normalization
First Normal Form
For a table to justify 1NF it needs to satisfy:
Example
The below table has several columns that store information about employees, such as their IDs, names, address, and mobile number. And this table is not normalized since there are columns with multiple values.
After the first normalization, we should ensure that "each column has atomic value". The Emp_Moblie column has multiple values of phone numbers. Now we have to break it into its atomic values as the below table.
Now this table ensures data atomicity.
Second Normal Form
For a table to justify 2NF it needs to satisfy:
Example
The below table stores the list of teacher's detail in a school. And this table is in 1NF form as each column does not have multiple values. But it does not follow 2NF because "A table should not have any partial dependencies". There are teachers in the same age group.
Recommended by LinkedIn
The Teachers_Age alone does not decide either the Subject or the Teacher_id and Teachers_Age along with any other column does not decide the other column. So the Teacher_Age is a "nonprime" column.
We break down the table into two different tables as Teacher_Id with Teacher_Age and Teacher_Id with Subject, to eliminate partial dependency.
Third Normal Form (3 NF)
For a table to be in the third normal form: it should satisfy:
Example
The below table stores the list of employee's detail of a company.
Here, Emp_State, Emp_City & Emp_District dependent on Emp_Zip. and Emp_zip is dependent on Emp_Id which makes non-prime attributes (Emp_State, Emp_City & Emp_District) transitively dependent on the superKey(Emp_Id). This violates the rule of 3NF. So will break down the table to
Here, Emp_State, Emp_City & Emp_District just dependent on Emp_Zip.
Now these tables follow 3NF.
Boyce-Codd Normal Form (BCNF)
For a table to be in the Boyce-Codd Normal Form, it should satisfy:
Example
The below table stores the list of employee's detail of a company.
In this table, the column Dept_Type and Dept_No_Of_Emp depends on Emp_Dept and Emp_Nationality depends on Emp_Id. And there are two candidate keys Emp_Id and Emp_Dept. We need to separate the table based on the candidate keys.
This table contains the "candidate key "of both the previous table linked into one table which serves as a relation.
BCNF eliminates all potential for redundancy and data anomalies resulting from functional dependencies.