Duplicate Elimination in DBMS

SQL does not eliminate duplicates implicitly. It allows to enter duplicate values on columns other than candidate key or if did not specified any keys. If the user wants to eliminate duplicate records, he has to use DISTINCT keyword in the query.

SELECT DISTINCT * FROM DEPT;retrieves unique records from the DEPT table and shows only one record for multiple occurrence of the records.

When duplicate records present in the table, it takes more effort to find such records. If hashing or sorting techniques are used, duplicate records are placed at one block. We can now delete all the duplicates except one. Or fetch all the duplicate records into output buffer, if required. In some cases, user request itself is to eliminate the duplicate values. Then the underlying algorithm applies hashing or sorting to get all duplicate records one after the other and deletes all the duplicate records except one.

Suppose there is duplicate values entered in department table. And a query with DISTINCT is fired to retrieve all unique records from this table. The query uses the sorting method to delete the duplicates and display the result in the window as below. Please note that DISTINCT keyword do not really delete the duplicate records from the table, only the user results are displayed without duplicates. It can even use hashing technique if it gives better cost.

 

However the cost for deleting the duplicate will depend on the technique used to group the duplicate values and the cost of identifying and deleting those records. This is a very costly cost. Hence SQL does not eliminate duplicates implicitly and provided with keywords to use, whenever necessary.

 

Translate »