Logs are one of the mechanisms used for recovering DB from failure. It will have details about each step in the transaction so that in case of any failure, database can be recovered to the previous consistent state or taken to the consistent state after transaction.
In ATM withdrawal, we had four steps:
- Check the Account for balance in his account : T1
- If sufficient amount exists, give the money to user : T2
- Calculate balance = balance – 500 : T3
- Update account to new balance: T4
Now what a log file will have? It will store each detail about the each of the transactions above.
Look at above sample of logs for money withdrawal. It has the records of data value at each stage of transaction along with the status of the transaction. When the transaction fails at T3, it even knows at which stage of T3, it failed. It also knows the values of the balance and the money that user has at every stage. Hence it is easy for the system to decide, whether to complete the transaction or to rollback the transaction.
The log files are stored at different storage device than the DB. It is done so because in case of media failure, the logs are not lost and we can recover the database. Sometimes, even the copy of log files is created and is stored at different locations. This will help if the servers storing the log file itself fails! DB will be recovered from the alternate log files. This type of logging is also known as physical logging.
Let us consider different cases where logging is used to recover the database.
Transaction Abort
Suppose the system fails when it is executing T2 in ATM withdrawal example. What should be done to recover the database? The system checks the log files to see which transactions have been executed and what is the status of DB. It sees that the transaction has failed while giving the money to user, but it has not yet given money to user. Hence it starts reading the log file backward, and reverts each transactions by updating the value to value present in the ‘before’ transaction state. In our example, there were no updates/inserts in T1 and T2. Hence, it simply reads the log files backward.
Suppose the failure was at T4 after updating the balance to 1500, but before committing the transaction. Then, abort operation has to revert the balance to 2000, which is the value of balance before T4 began. Then it should go backward for each transaction and revert the values, if applicable.
Log files cannot be differentiated between the transactions of particular operation. It will have all the transactions of multiple operations. i.e.; ATM withdrawal of money by multiple users at different location at same time would have been entered in the log file. In order to differentiate between individual’s transactions, we need to append the transaction id to beginning of the log file. i.e.; suppose T1 be the transaction ID (TID) of user A and T2 be the TID of user B. Then in the log file, before logging the logs for user A, it should be appended with T1, and with T2 before starting transactions of user B. This will help to differentiate the transactions, and help the system to understand, how many transactions/ which all transactions are affected by failure and needs reverting.
Crash Recovery
Suppose there is a system crash. It has to recover all the transactions till the previous consistent state. How is it can be done? Suppose we have two transactions T1 and T2 as shown below.
The system crashes after updating the value of y in T2. Now how the system does recover? How does it understand which transactions to be reverted? As we saw above it starts checking the log files in reverse order – from end to the beginning. It checks for the status of each transaction to see which were active during the crash. If the transaction is in update mode, then it is an active transaction during the crash. If the status of transaction is commit or aborts in the log file, then it is not an active transaction; but it is complete. Hence avoid considering such transactions. In the above example, T1 has committed the value of x. hence it is not active during the failure. T2 does not have any commit or abort status, and it has update status. Hence only T2 is active during crash, and look for its logs alone to revert the changes. Hence it goes on checking the values of y at each stage in reverse order and updates the value of y to before system crash. This is how it recovers from crash.
Here the commit or abort will let the system know that those transactions are not active. But in order to see which transaction is active, the system has to traverse the whole log file. But the log file can be very huge in very big database systems. In such cases, traversing whole log file is not preferred. We need a mechanism to traverse the log file till some particular point. This can be achieved by introducing checkpoints in the log files. Thus when there is a crash, the system will traverse the log file backwards till the checkpoints, and see which transactions are active. Any transactions before the checkpoints are all either committed or aborted; hence they are complete.
Check above diagram. There are three transactions T1, T2 and T3, one checkpoint, CHKPT. The transaction T1 is committed before CHKPT. Hence it is complete. Transaction T3 is committed after CHKPT, hence it is also complete. Only T2 is in update status and hence only T2 has to be reverted to its previous state when there is a crash.