Normalization in DBMS

Vignesh Muniyappan

Published May 14, 2023

What is Normalization?

The process by which a bigger dataset is further broken down into well-structured smaller multiple tables which reduces inconsistency, that does not cause any undesired result during any operation like insert, update, and delete is executed.

This ensures that there is no unnecessary duplication of data and the tables with proper relation between them.

Example

Let's say we have a group of superhero cards in the same box.

It is difficult to find only an IronMan card, we have to search the whole box and every card gets it.

If we separate them by which superhero belongs to MARVEL or DC and put them in the different boxes named MARVEL superheroes and DC superheroes. Now it makes it one step easier to find an IronMan card since we can search only in the MARVEL box instead of searching for whole cards.

Then we can break down each box further by sorting the cards by their nature like God or Human or alien.

Why Normalization?

Normalization improves efficiency and accuracy in data operation because the smaller datasets have a proper relation between them and no unnecessary duplicates. It reduces memory wastage and loss of data can be reduced to a large extent.

Terminologies

Tuple: A single row of data in a table or relation.
Atomic value: A value that cannot be further divided or broken down into smaller parts.
Prime attribute: A attribute that is used to uniquely identify a tuple in the relation.
Nonprime attribute: A attribute does not contribute to the uniqueness of a tuple in the relation.
Super key: A set of one or more attributes that can uniquely identify a tuple in a relation.
Candidate Key: A minimal super key, meaning that it is a super key with no nonprime attributes.
Partial dependency: A nonprime attribute depends on only a subset of the candidate key attributes, rather than on the entire set.
Transitively dependency: A nonprime attribute depends on another nonprime attribute, which in turn depends on part of a candidate key.

Types of Normalization

First Normal Form(1NF)
Second Normal Form(2NF)
Third Normal Form(3NF)
Boyce Codd's Normal Form(BCNF)

First Normal Form

For a table to justify 1NF it needs to satisfy:

Each column should contain atomic values.
Each Value stored in a column should be of the same type.
All the columns in a table should have unique names.

Example

The below table has several columns that store information about employees, such as their IDs, names, address, and mobile number. And this table is not normalized since there are columns with multiple values.

No alt text provided for this image — Employee details

After the first normalization, we should ensure that "each column has atomic value". The Emp_Moblie column has multiple values of phone numbers. Now we have to break it into its atomic values as the below table.

Now this table ensures data atomicity.

Second Normal Form

For a table to justify 2NF it needs to satisfy:

It should be in its 1NF.
It should not have any partial dependencies.

Example

The below table stores the list of teacher's detail in a school. And this table is in 1NF form as each column does not have multiple values. But it does not follow 2NF because "A table should not have any partial dependencies". There are teachers in the same age group.

Recommended by LinkedIn

Mongo 3.2 features (Mongo World 2015)

Sreenath Gotur 10 years ago

Record Level Indexing in Apache Hudi Delivers 70%…

Soumil S. 2 years ago

Can GraphQL be an answer to API performance issues?

Priya Khatavkar 1 year ago

The Teachers_Age alone does not decide either the Subject or the Teacher_id and Teachers_Age along with any other column does not decide the other column. So the Teacher_Age is a "nonprime" column.

We break down the table into two different tables as Teacher_Id with Teacher_Age and Teacher_Id with Subject, to eliminate partial dependency.

Third Normal Form (3 NF)

For a table to be in the third normal form: it should satisfy:

It should already be in the 2NF.
It should not have Transitive Dependency.

Example

The below table stores the list of employee's detail of a company.

Here, Emp_State, Emp_City & Emp_District dependent on Emp_Zip. and Emp_zip is dependent on Emp_Id which makes non-prime attributes (Emp_State, Emp_City & Emp_District) transitively dependent on the superKey(Emp_Id). This violates the rule of 3NF. So will break down the table to

Here, Emp_State, Emp_City & Emp_District just dependent on Emp_Zip.

Now these tables follow 3NF.

Boyce-Codd Normal Form (BCNF)

For a table to be in the Boyce-Codd Normal Form, it should satisfy:

It should be in the Third Normal Form.
Every determinant should be a candidate key.

Example

The below table stores the list of employee's detail of a company.

In this table, the column Dept_Type and Dept_No_Of_Emp depends on Emp_Dept and Emp_Nationality depends on Emp_Id. And there are two candidate keys Emp_Id and Emp_Dept. We need to separate the table based on the candidate keys.

This table contains the "candidate key "of both the previous table linked into one table which serves as a relation.

BCNF eliminates all potential for redundancy and data anomalies resulting from functional dependencies.

To view or add a comment, sign in

Normalization in DBMS

Vignesh Muniyappan

What is Normalization?

Why Normalization?

Terminologies

Types of Normalization

First Normal Form

Second Normal Form

Recommended by LinkedIn

Third Normal Form (3 NF)

Boyce-Codd Normal Form (BCNF)

More articles by Vignesh Muniyappan

Others also viewed

Reliability with Apache Iceberg

From Crash to Recovery: The Power of the ARIES Algorithm

The Conundrum of Null Values in RDBMS

Difference between partitioning and bucketing in spark?

Understanding Data Locality in Apache Spark

SIGMOD Paper Interpretation | Including Bloom Filters in Bottom-up Optimization

Handling Data Skewness in Apache Spark with the help of AQE

The case for an irrational database

Big data implementations without additional dedicated storage and computation power

Explore content categories

What is Normalization?

Why Normalization?

Terminologies

Types of Normalization

First Normal Form

Second Normal Form

Recommended by LinkedIn

Third Normal Form (3 NF)

Boyce-Codd Normal Form (BCNF)

More articles by Vignesh Muniyappan

Real-Time Web Communication Simplified with WebSockets and STOMP

Spring Boot - P5

Angular - P1

Spring boot P-4

SpringBoot - P3

SpringBoots - P2

SpringBoots - P1

Manual Testing vs Automation Testing

Introduction to Servlet and JSP Development

OBJECT ORIENTED PROGRAMING IN JAVA

Others also viewed

Reliability with Apache Iceberg

From Crash to Recovery: The Power of the ARIES Algorithm

The Conundrum of Null Values in RDBMS

Difference between partitioning and bucketing in spark?

Understanding Data Locality in Apache Spark

SIGMOD Paper Interpretation | Including Bloom Filters in Bottom-up Optimization

Handling Data Skewness in Apache Spark with the help of AQE

The case for an irrational database

Big data implementations without additional dedicated storage and computation power

Explore content categories