Introduction
As we have seen already, database consists of tables, views, index, procedures, functions etc. The tables and views are logical form of viewing the data. But the actual data are stored in the physical memory. Database is a very huge storage mechanism and it will have lots of data and hence it will be in physical storage devices. In the physical memory devices, these datas cannot be stored as it is. They are converted to binary format. Each memory devices will have many data blocks, each of which will be capable of storing certain amount of data. The data and these blocks will be mapped to store the data in the memory.
Any user who wants to view these data or modify these data, simply fires SQL query and gets the result on the screen. But any of these queries should give results as fast as possible. But how these data are fetched from the physical memory? Do you think simply storing the data in memory devices give us the better results when we fire queries? Certainly not. How is it stored in the memory, Accessing method, query type etc makes great affect on getting the results. Hence organizing the data in the database and hence in the memory is one of important topic to think about.
Types of File Organization
In a database we have lots of data. Each data is grouped into related groups called tables. Each table will have lots of related records. Any user will see these records in the form of tables in the screen. But these records are stored as files in the memory. Usually one file will contain all the records of a table.
As we saw above, in order to access the contents of the files – records in the physical memory, it is not that easy. They are not stored as tables there and our SQL queries will not work. We need some accessing methods. To access these files, we need to store them in certain order so that it will be easy to fetch the records. It is same as indexes in the books, or catalogues in the library, which helps us to find required topics or books respectively.
Storing the files in certain order is called file organization. The main objective of file organization is
- Optimal selection of records i.e.; records should be accessed as fast as possible.
- Any insert, update or delete transaction on records should be easy, quick and should not harm other records.
- No duplicate records should be induced as a result of insert, update or delete
- Records should be stored efficiently so that cost of storage is minimal.
There are various methods of file organizations. These methods may be efficient for certain types of access/selection meanwhile it will turn inefficient for other selections. Hence it is up to the programmer to decide the best suited file organization method depending on his requirement.
Some of the file organizations are
- Sequential File Organization
- Heap File Organization
- Hash/Direct File Organization
- Indexed Sequential Access Method
- B+ Tree File Organization
- Cluster File Organization
Let us see one by one on clicking the above links
Difference between Sequential, heap/Direct, Hash, ISAM, B+ Tree, Cluster file organization in database management system (DBMS) as shown below:
Sequential | Heap/Direct | Hash | ISAM | B+ tree | Cluster | |
Method of storing | Stored as they come or sorted as they come | Stored at the end of the file. But the address in the memory is random. | Stored at the hash address generated | Address index is appended to the record | Stored in a tree like structure | Frequently joined tables are clubbed into one file based on cluster key |
Types | Pile file and sorted file Method | Static and dynamic hashing | Dense, Sparse, multilevel indexing | Indexed and Hash | ||
Design | Simple Design | Simplest | Medium | Complex | Complex | Simple |
Storage Cost | Cheap (magnetic tapes) | Cheap | Medium | Costlier | Costlier | Medium |
Advantage | Fast and efficient when there is large volumes of data, Report generation, statistical calculations etc | Best suited for bulk insertion, and small files/tables | Faster Access No Need to Sort Handles multiple transactions Suitable for Online transactions | Searching records is faster. Suitable for large database. Any of the columns can be used as key column.Searching range of data & partial data are efficient. | Searching range of data & partial data are efficient. No performance degrades when there is insert / delete / update. Grows and shrinks with data. Works well in secondary storage devices and hence reducing disk I/O. Since all datas are at the leaf node, searching is easy. All data at leaf node are sorted sequential linked list. | Best suited for frequently joined tables. Suitable for 1:M mappings |
Disadvantage | Sorting of data each time for insert/delete/ update takes time and makes system slow. | Records are scattered in the memory and they are inefficiently used. Hence increases the memory size. Proper memory management is needed. Not suitable for large tables. | Accidental Deletion or updation of Data Use of Memory is inefficient Searching range of data, partial data, non-hash key column, searching single hash column when multiple hash keys present or frequently updated column as hash key are inefficient. | Extra cost to maintain index. File reconstruction is needed as insert/update/delete. Does not grow with data. | Not suitable for static tables | Not suitable for large database. Suitable only for the joins on which clustering is done. Less frequently used joins and 1: 1 Mapping are inefficient. |