Table of Contents
Data Integrity
When we are designing a database, there is lot of factors to be concentrated on. We need to make sure that all the required datas are distributed among right tables and there is no duplication/missing data. The space utilised for the appropriately for the database. Time taken for each query is minimal and so on.
Imagine we have a STUDENT table with Student details and the subjects that he has opted for. If we observe the table below, Joseph has opted for two subjects – Mathematics and Physics. That is fine. But what is wrong in below table? His address is repeated each time, which is not necessary and waste of space. This is called redundancy and is not allowed in a database.
Similarly, if we have to insert one more record for Allen, then we have to enter all his details into the above table. But what is the guarantee that all his details are entered correctly? There could be a mistake and hence leading to mismatch in his details. But who will later say which entry is correct? No one! Hence the data in DB is wrong.
Same issue can happen when we update the data. If we update address one of the record, and leave other record for Joseph above, again a data mismatch.
And when we delete a data, say for Chris, who is having only one entry, whole of his information is lost!
Imagine there are two entities – Employee and Department, and they are not properly related by means of foreign key. What would be the result? We can enter as many department as we want to an employee for whom department may not exists at all in Department table! So mapping the tables appropriately is also a very important factor.
Data integrity ensures, all the above mentioned issues are not injected into the database while it is designed. It guarantees that database is perfect and complete.
Types of Integrity Constraints
There are five types of data integrity constraints
Domain Constraint
Here each columns of a table are verified so that correct data is entered into column. For example, numeric data is entered into a NUMBER column and not any character. In a DATE column, correct dates are entered and not any invalid values.
Imagine we have a table where date fields are stored as character and we have to copy this date field into a new table where this column is defined as DATE. What happens here is most of the data from the original table will not be loaded into new table, as there is mismatch in the data stored. i.e.; the original table will have dates in 22 March 2015 format which new table will not accept as date. In the foremost case, if there was domain integrity, original table would not have such dates and would have preserved the integrity of data in the original table itself.
Entity Integrity
This integrity ensures that each record in the table is unique and has primary key which is not NULL. That means, there is no duplicate record or information of data in a table and each records are uniquely identified by non null attribute of the table.
In a STUDENT table, each student should be a different from other and there will not be duplicate records. Also, STUDENT_ID which is a primary key in the table has non-null values for each of the record.
Column Constraint
This constraint ensures that the values entered into a column are correct by means of business rules. Say, there is an age column and its value is negative which is not correct. This constraint refines from entering wrong age. Similar example of such constraint is salary cannot be negative; employee number will be in a given range etc. These are business rules/requirements that specify what kind of values could be entered into each column.
This constraint is different from domain constraint as here it checks for the validity of the data being entered- like correct age is being entered; Correct Employee Id is entered etc. In the domain constraint, it checks, whether correct set of data being is entered – like Date is entered into date column, Number is entered into number column etc.
User-Defined Integrity Constraint
Imagine, while entering a salary of an employee, we need to check if his salary is less than his manager. Though this is similar to column constraint, we cannot direct insist this constraint on the column as the system does not know who his manager is. We need to check for his manager’s salary first, if it is more than his employer, then we will insert the data. For this we manually need to write code. This kind of constraints is called User-Defined Integrity Constraint.
Referential Integrity
As we discussed for Employee and Department tables, if they are not mapped correctly, there would be a data mismatch. It will allow us to enter a department for an employee which does not exist. It will allow us to delete a department for which employees are working. What would be the result? Employees without any department are not correct. Or updating any department number in the employee table will result in a department which does not exist at all. All these cases will lead to mismatch and invalid data in the database.
Hence to ensure above all cases are met, proper relationship has to be defined between the related tables by means of primary and foreign keys. i.e.; every foreign key in the table should be a primary key in the related table.
In our example, DEPARTMENT_ID should be a primary key in Department table and it should be a foreign key in the Employee table. This will stop entering/updating a department which does not exists. It will not allow us to delete any department from the Department table for which employees still exists in Employee table.