Normalization in SQL

Consider the library database that we are going to construct. We have identified the tables of the database. Even though we have created all the tables, let us see here how we got the approach of creating so many tables using normalization.

The first and foremost table of the library data base is BOOKS. When we say books, let us list all the items that are related to it and that needs to present in the table for the normal functionality of library.

Above list has now most of the details about the data from category, publisher to its number of copies in the library. Let us see few data also for this table BOOKS for above list of attributes.

Table of Contents

First Normal Form

According to 1NF, each record in a table should be identified by a primary key column and there should not be any repeating group of attributes.

When we observe the records in the initial BOOKS table above :

There are no repeating groups of attributes in this table

It is little difficult to identify each record using the primary key – ISBN. Still each record can be identified by considering both ISBN and BOOK_COPY_NUM as primary key. Hence above table is in 1NF.

Second Normal Form

According 2NF,

it’s in first normal form

Every non-key attributes are identified by the use of primary key

All subset of data, which applies to have multiple rows in a table must be removed and placed in a new table. And this new table and the parent table should be related by the use of foreign key.

Here BOOKS is in 1NF. Thus first condition is satisfied. As per second condition, primary key does not uniquely identify other non-key attributes. Hence it does not satisfy second condition. It does have multiple rows of same record too. Hence we need to modify the table so that it is in 2NF. What should be done to bring this table to 2NF? First task is to get rid of duplicate records.

Which attribute is making it as repeating records in the table? It is the BOOK_COPY_NUM attribute in the table, which makes records as redundant. Hence let us maintain a separate table – BOOK_COPY to have book’s copy details and ISBN as foreign key in this table. But it should have primary key too. Primary key of new table BOOK_COPY should be both ISBN and BOOK_COPY_NUM, since ISBN alone cannot be uniquely used to identify the copies. Thus above table breaks into two tables as below:

Now the records in the table will also split into tables.

There are no repeating records in the BOOK_COPY tables. It has primary key as well as it is mapped using foreign key too. Hence new table is in 2NF.

In the BOOKS table ISBN alone is enough to identify the books. There is no redundant data too. Hence BOOKS is also in 2NF.

Third Normal Form

According to 3NF,

it should meet all the requirements of both 1NF and 2NF

If there are any columns which are not related to primary key, then remove them and put it in a separate table, relate both the table by means of foreign key i.e.; there should not be any transitive dependency.

Now our both the tables are in 1NF and 2NF. Hence first condition is satisfied. But second condition is not satisfied as there are attributes category name and publisher name, which are no way related to the primary key. They are related to book, but there are their respective IDs to identify them. Hence no need to specify extra name columns in the same table. As the number of books increases from the same category or publisher, we will be entering their names each time. This will create a redundant data in BOOKS table. Hence let us remove these names from BOOKS and put it in separate table, CATEGORY and PUBLISHER respectively. Both these new tables are mapped to BOOKS table by using their primary keys (CATEGORY_ID AND PUBLISHER_ID) respectively. If we want we can add additional columns to the new table or we can leave them as it is.

Now the records in the table will also divide as below :

Now these tables are in 1NF, 2NF and 3NF.

Boyce-Codd Normal Form (3.5NF or BCNF)

This rule states that it should:

Meet all the requirement of 3NF

Any table is said to be in BCNF, if its candidate keys do not have any partial dependency on the other attributes. i.e.; in any table with (x, y, z) columns, if (x, y)->z and z->x then it’s a violation of BCNF. If (x, y) are composite keys and (x, y)->z, then there should not be any reverse dependency, directly or partially.

It satisfies the first condition of BCNF. Second condition talks about partial dependency of attributes. But we do not have partial dependency of attributes in any of the tables too. Hence these tables are in BCNF too.

Imagine we have author in the BOOKS table above. Table structure and data would be as shown below.

Here, consider (BOOK_NAME, CATEGORY_ID, AUTHOR). Then
(BOOK_NAME, CATEGORY_ID) → AUTHOR == > (x, y) → z
AUTHOR → BOOK_NAME = => z → X
This is a violation of BCNF. Hence we need a separate table to maintain AUTHOR details. But Author in turn will have his id, address, phone etc. Hence a table – AUTHOR should be created to maintain author details, and another table – BOOK_AUTHOR should be created to maintain the mapping between books and author.

We can see the same relationship between the Books and category ID. Hence a separate table is created to store the mapping between books and category. Hence tables are now mapped as below :

Thus all these tables are in BCNF now.

Fourth Normal Form (4NF)

According to fourth normal form,

It should meet all the requirement of 3NF

Attribute of one or more rows in the table should not result in more than one rows of the same table leading to multi-valued dependencies

Here all the tables meet first condition that it is in 3NF. Second condition says any attribute of one row should not give result of another row. In our tables above, we have all distinct values. Hence it is in 4NF.

Fifth Normal Form (5NF)

A database is said to be in 5NF, if and only if,

It’s in 4NF

If we can decompose table further to eliminate redundancy and anomaly, and when we re-join the decomposed tables by means of candidate keys, we should not be losing the original data or any new record set should not arise. In simple words, joining two or more decomposed table should not lose records nor create new records.

Our tables here are now in 4NF as well as there is no more redundant data in the tables. Hence it is 5NF too.

All these normalization is done with BOOKS table and its related attributes. But there is another entity in library – BORROWER and their book loans. These tables are also linked with BOOKS and its mapped tables to get the complete normalized tables. Final tables with all these mappings and normalization will look like below.

In a DB, it is not necessary to meet all the normalization. If the DB meets 3NF, then it is more than sufficient. As we normalize the tables further, it leads more complexity of the query and degrades the efficiency of the system.

In this library system, we might have created a separate mapping table for books and publisher, like we did for category. But it is not necessary since the probability of getting redundant data or partial dependency on books and publisher is comparatively NILL.