Database is purely used for storing and manipulating the data for the organization or any particular requirement. It contains data and its related architectures. In ideal scenario, any DB will have its server and more than one user. Hence when DB is designed, its server is kept at one place / computer and users are made to access this system.
Initially when DB is created, it will be like a skeleton. As and when user starts accessing the DB, its size grows or shrinks. Usually the size of the DB grows drastically than shrinking. Similarly number of users may also increase. These users may not be from one single location. They will be from around the world. Hence the transaction with the DB also increases. But this will create heavy network load as the users are at different location and server is at some other remote location. All these increasing factors affect the performance of DB – it reduces its performance. But imagine systems like trading, bank accounts, etc which gives such a slow performance will lead to issues like concurrency, redundancy, security etc. Moreover user has to wait for longer time for their transaction to get executed. User cannot sit in front of their monitor to see the balance in their account for longer time. All of them need results as soon as they click the button.
In order to overcome these issues, a new way of allocating users and DB server is introduced. This new method is known as Distributed Database System. In this method, database server is kept at different remote locations. That means different database server is created and are placed at different locations rather than at single location. This in turn kept in sync with each other to maintain the consistency. The users accessing the DB will access any of these DB servers over the network as if they are accessing the DB from single location. They will be able to access the server without knowing its location. This in turn reduced the accessing time for the user. i.e.; when a user issues a query, the system will fetch the server near to that user and access will be provided to nearest server. Hence it reduces the accessing time and network load too.
A distributed Database (DDB) will look like below diagram. DB servers will be distributed at various locations and will communicate with each other using communication networks. Users from different parts of the world will access the different servers via communication network. This network masks them from the real location of the servers.
Table of Contents
These distributed DBs are of two types:
Homogeneous DDB
This type of distributed database system will have identical database systems distributed over the network. When we say identical database systems it includes software, hardware, operating systems etc – in short all the components that are essential for having a DB. For example a database system with Oracle alone distributed over the network, or with DB2 alone distributed over the network etc. this type of DDBMS system does not give the feel that they are located at different locations. Users access them as if they are accessing the same system.
Heterogeneous DDB
This is contrast to above concept. Here we will have different DBs distributed over the network. For example DB at one location can be oracle; at another location can be Sybase, DB2 or SQL server. In other words, in this type of DDB, at least one of the DB is different from other DBs. In addition to this, the operating systems that they are using can also be different – one DB may be in Windows system while other would be in LINUX.
Irrespective of the type of DDBMS used, user will be accessing these DBs as if they are accessing it locally.
Advantages of Distributed Database systems
Though this type of architecture of DBs helps in performance, security and recovery, there are many other advantages of this type of DBs.
- Transparency Levels : In this systems, physical location of the different DBs, the data like files, tables and any other data objects are not known to the users. They will have the illusion that they are accessing the single database at one location. Thus this method gives the distribution transparency about the databases. In addition, the records of the tables can also be distributed over the databases either wholly or partially by fragmenting them. Consider employee database of a organization. The organization will have employees from different locations from the world working at their respective locations. Then the details about employees at particular location will be stored in the DB at that particular location and one central DB which will have records of all employees of organization. Hence when users of particular location will have access to the nearest DB to them. In addition the all these DBs will be in sync with the main DB. In this way, it gives transparency as well as security to the data being stored. This will also help to recover the data in case any failure or crash.
This type of system provides location transparency by allowing the user to query any database or tables from any location, network transparency by allowing to access any DB over the network, naming transparency by accessing any names of objects like tables, views etc, replication transparency by allowing to keep the copies of the records at different DBs (like we saw above), fragmentation transparency by allowing to divide the records in a table horizontally or vertically.
- Availability and Reliability: Distribution of data among different DBs allows the user to access the data without knowing failure of any one of the system. If any system fails or crashes, data will be provided from other system. For example, if DB-IN fails, the user will be given data from DB-ALL or vice versa. User will not be given message that the data is from DB-IN or DB-ALL, i.e.; all the DBs will be in sync with each other to survive the failure. Hence data will be available to the user all the time. This will in turn guarantee the reliability of the system.
- Performance : Since the users access the data that are present in the DBs which are near to them, it reduces the network load and network time. This also reduces the data management time. Hence this type of systems gives high performance.
- Modularity : Suppose any new DB has to be added to this system. This will not require any complex changes to the existing system. Any new DBs can be easily added to the system, without having much change to the existing system (because the entire configuration to have multiple DB already exists in the system). Similarly, if any DB has to be modified or removed, it can also be done without much effort.
Disadvantages of Distributed Database systems
Since it has multiple DBs are and many things to manage as a whole, it is no way that perfect. This system also has disadvantages.
- Complexity : This is the main drawback of this system. Since it has many DBs, it has to maintain all of them to work together. This needs extra design and work to keep them in sync, coordinate and make them work efficiently. These extra changes to the architecture makes DDBMS complex than a DB with single server.
- Cost : Since the complexity is increased, cost of maintaining these complexity also increases. Cost for multiple DBs and manage them are extra compared to single DB.
- Effort for Integrity : Extra effort is needed to maintain the integrity among the DBs in the network. It may need extra network resources to make it possible.
- Security : Since data is distributed over the DBs and network, extra caution is to be taken to have security of data. The access levels as well unauthorized access over the network needs some extra effort.
- DDBMS requires very good experience in DBMS to deal with.
- Fragmentation of data and their distribution gives extra challenges to the developer as well as database design. This in turn increases the complexity of database design to meet the transparency, reliability, integrity and redundancy.