Table of Contents
Introduction
Consider our regular example of student table with ID, NAME, ADDRESS, AGE, GENDER and Semester. This table will be loaded with fresh data at the beginning of academic year for new students and updated for the older student with next semester. Most of the time this table will be static and there will be comparatively less insert/ delete/update on it. At the same time there will lot of ad-hoc retrieval with this table i.e.; there would be different types of query fired on this table to get the detail of the students like list of all male/female students, students residing at particular area, or male/female students in particular semester etc. But what would be the size of this student table in a college database? Very huge, right? In this case user query should be very quick enough to get any kind of query result. Will our traditional file organization methods be faster to give us the results? Or can we think of any better method of storing and retrieving the data?
Yes there is better method to select the records from memory for static tables which is accessed multiple times only to see the table content.
Bitmap Indices
Bitmap indices are very powerful in retrieving the records for above cases. In this method, indexes for columns with less unique values are used in the form of bits. Let us try to understand one by one about this method using example.
In the above example, if we notice the column GENDER, it can have only two values – Male or Female. Compared to the whole STUDENT table, they are not unique value. Similarly, say we have only four semesters for the course, and then we can have only four values – sem1, sem2, sem3 and sem4. These types of columns are called less unique value columns or columns with less cardinality. Even though these columns have less frequent values, they are queried most often.
Bits – as everyone knows it is the smallest unit of data representation. It can have either 0 or 1 as value. Now what if we use these bits to represent these less unique value columns? But how? This technique of storing less frequently used columns in the form of bits is called bitmap indices.
This method is used for very large tables with less unique value columns and is accessed number of time with various retrieval queries. In this method we will have
- As many bits as the number of rows in the table for each of less unique value columns. For example, if the STUDENT table has 10K records, then we will have 10K bits – one bit for each row.
- Number of bitmap indices created on the column will be equal to number of distinct values in the column. For example, for GENDER column we will have two bitmap indices created – one for male and one for female, and for semester column we will have four bitmap indices created – 1, 2, 3, and 4.
- If we have any matching value in the column for the row, then that row bit will have ‘1’, else ‘0’. That means, for GENDER column we will have 2 bitmap indices – one for male and one for female. The bit value for the ‘male’ bit map index will be 1, if that row has GENDER as ‘M’, else ‘0’.
Imagine in our example of STUDENT table has only four records and has values as below.
- As per rule 1, we will have four rows in the table and hence we will have bits – one bit for each row.
- The GENDER column has only two values – ‘M’ and ‘F’. Hence we will have two bitmap indices – one for ‘M’ and one for ‘F’.
- Now the bitmap index for GENDER column is as below. Here Bitmap index ‘M’ has value ‘1000’ indicating first row has gender as ‘M’ and rest of the rows do not have gender as ‘M’ Similarly, bitmap index ‘F’ indicates first row is not ‘F’ and rest of all rows are having gender as ‘F’.
Similarly bitmap index for Semester can be as follows:
Suppose that we have to find the female students who are in the second semester. Here this query uses two columns to filter the records, where both of them have less unique value columns.
SELECT * FROM STUDENT WHERE GENDER = ‘F’ AND SEMESTER =2;
The query will search the bitmap index for both of this columns and perform logical ‘AND’ operation on those indexes to get the actual address of the result.
Bitmap Index for F: | 0 1 1 1 | |
AND | ||
Bitmap Index for SEM 2: | 0 0 1 0 | |
Result: | 0 0 1 0 | this implies third row has the result for the query. |
Look at the table to check if it is correct. Yes, it has resulted in correct row. So the DBMS goes to the third row in the file and displays the result for the user.
Here fetching the bitmap index and performing the logical ‘AND’ operation to get the result is comparatively faster. Hence this method of storage is useful for this kind of data.
If we have to delete the record from the table, it will generate a temporary delete index will be generated. Then it will perform logical ‘AND’ operation on filter columns and temporary index to remove the data from the table.
Suppose we have to delete the female student from the 4th semester. Then steps involved in deleting this record is as follows.
SELECT * FROM STUDENT WHERE GENDER = ‘F’ AND SEMESTER =4;
Bitmap Index for F: | 0 1 1 1 | |
AND | ||
Bitmap Index for SEM 2: | 0 0 0 1 | |
AND | ||
1 0 0 0 | temporary delete index | |
Result: | 0 0 0 0 | this implies no rows, hence record is deleted |
Advantages of Bitmap Indices
- As we have seen already, this method helps in faster retrieval of the records when there are less cardinality columns and those columns are most frequently used in the query. This method is efficient even if we have very big table.
- This method is more efficient when the columns have least involved in insert/update/delete operations.
- It allows to combine multiple bitmap indices together to fire the query as we have seen in above examples
Disadvantages of Bitmap Indices
- They are not suitable for small tables. In small tables, DBMS will force to use full table scan instead of using bitmap index.
- When there is multiple insert/update/delete on the table from different users, it may cause deadlock on the tables. It will take time to perform the DML transaction and then to update the bitmap index. Hence when there is multiple DML transaction from different users, it will not be able perform transaction quickly, and causing the deadlock.
- When there is large number of records, there is an overhead to maintain this bitmap indexes. Each time a new record is entered, we have to modify the bitmap index throughout, which is a tedious and time consuming.
Index in SQL
In SQL we can create bitmap index by using below query.
CREATE BITMAP INDEX Index_Name
ON Table_Name (Column_Name);
Our example above for creating bitmap index for Gender and Semester can be written as follows:
CREATE BITMAP INDEX IDX_GENDER
ON STUDENT (GENDER);
CREATE BITMAP INDEX IDX_ SEMESTER
ON STUDENT (SEMESTER);