Table of Contents
What is Join and Why it is required?
Database is mainly composed of many tables. If we access any one table at a time, then we will not be using the real use of database. For example, consider a employee database. When we say employee database, it has many tables like EMPLOYEE, DEPARTMENT, and PROJECT etc. If we access EMPLOYEE table alone, then we will get only informations about employees. It will not give us the clear details about his department or projects.
Below is an EMPLOYEE table it has very basic information about employees. If we want to know his department or manager, it gives only their IDs. We will not get any details about his department or manager.
Suppose we want to know department details about an employee then we have to first get the department id of that employee from the EMPLOYEE table and then we have to go to DEPARTMENT table to get the details for that ID. But this kind of fetching is tedious. It will not give us the result in one shot, it will need some intermediary/ temporary memory – variables to store the Department ID of the employee, and moreover it will not be performed in single query.
This is the case of fetching department details for one employee. What if we have to search department details for 10, 100, 1000 etc? What will be the effort involved? How many queries need to be fired? How much intermediary space needs to be allocated to hold the intermediary department id? What is the guarantee that department details fetched by second queries mapped with the correct employee in the first query?
To avoid all these efforts, confusion and memory wastage, SQL uses a technique called joins. In this technique, it combines two or more tables by means of common column to get the requested result. In other words, two or more tables are combined to get all the rows and columns from all the participating tables and only those rows are retained in the final result whose columns in common are matching.
In our example above EMPLOYEE and DEPARTMENT tables are joined together to get the department details of the employee. Here what is done is both the tables are joined to get all the rows and columns from both the tables and then only rows that are matching to columns in common are retained. That means, DEPARTMENT_ID is the common column in both EMPLOYEE and DEPARTMENT table which is used link these two tables. In other words, using the DEPARTMENT_ID in EMPLOYEE table we can connect to DEPARTMENT table to get more details about that department.
Below is the snapshot of data in both the tables.
We can notice here that EMPLOYEE_ID = 100, Steven has DEPARTMENT_ID = 90. If we need to get more details about DEPARTMENT_ID = 90, we will get it in the DEPARTMENT table against DEPARTMENT_ID = 90. Hence the query using join to get the Steven’s department would be like below:
SELECT dept.* FROM DEPARTMENTS dept, EMPLOYEES emp WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID AND emp.FIRST_NAME = 'Steven';
We notice here that we got two department details for Steven. This is because, we have two different employees named Steven in our EMPLOYEE table. Both of them are working for different departments – 50 and 90. Hence we got two department details.
To be more specific on why we got two department details, we can add employee details also in our SELECT query above. It will make the result clearer and accurate. However selecting the columns depends on user’s request.
SELECT emp.EMPLOYEE_ID, emp.first_name, emp.LAST_NAME, dept.* FROM DEPARTMENTS dept, EMPLOYEES emp WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID – Join Condition AND emp.FIRST_NAME = 'Steven'; – Filtering Condition
Now it is very clear why we got two department details.
In above queries we can note that ‘WHERE’ clause is used to combine two tables using common columns in the table. This is how and why we use joins in the query.
Suppose both the Steven in the EMPLOYEE table work for the same Department. What will be the result of query and why?
SELECT emp.EMPLOYEE_ID, emp.first_name, emp.LAST_NAME, dept.* FROM DEPARTMENTS dept, EMPLOYEES emp WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID AND emp.FIRST_NAME = 'Steven';
This is because, we are not filtering the result based on the DEPARTMENT_ID. Here query is written to fetch the DEPARTMENT_ID of Steven and then the details about that Department. Hence the query will first fetch all the Steven in the EMPLOYEE table, and then retrieve all the departments of Steven. Here it will not consider the department of both the Steven as same. Because, here both the Stevens are different and hence their department is also considered as different. To be more clear, above query will first retrieve all the DEPARTMENT_ID of the employees whose name is ‘Steven’. Hence it will give all the department ids without filtering the duplicates. Hence we get two rows from the EMPLOYEE table with DEPARTMENT_ID. Now this DEPARTMET_ID is fetched in the DEPARTMENT table to get its details. Since we have two department ids (even though they are same), DEPARTMENT table is queried twice to get the details. Hence we get two rows.
Suppose we have removed employee details from the SELECT list and retained only DEPARTMENT details. Do you think it will show single row now?
SELECT dept.* FROM DEPARTMENTS dept, EMPLOYEES emp WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID AND emp.FIRST_NAME = 'Steven';
The answer is NO. Still it will show duplicate rows. This is because of the same reason above. That means, it is not the SELECT list data that are determining the duplicity of the rows. This is all because of the filter condition ‘emp.FIRST_NAME = ‘Steven’’.
How can we avoid these duplicate rows in the result? There are different ways to do this.
1.   Suppose we are selecting only department details like below query.
SELECT dept.* FROM DEPARTMENTS dept, EMPLOYEES emp WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID AND emp.FIRST_NAME = 'Steven';
Since here both the Steven in DEPARTMENT table works for the same department, we are getting same department twice. So, we can use DISTINCT clause to remove any duplicate rows here. This DISTINCT clause will remove duplicate only if their departments are same.
2. Another method to get the accurate result is by writing very accurate and specific query. In above case, we had filter condition using employee name. But employee name need not be unique. There can be many employees with same name. If we need to get the result for specific employee, then we need to be very specific while filtering his details from EMPLOYEE table. Like we said above, we cannot filer correct employee by using his name. His details will be filtered correctly by using his ID which is very unique for each employee. Hence in above query rather than specifying the name of the employee, if we specify the ID of the employee, then we will get the more accurate result.
To specify more clearly about the result, we can include employee details also in the select list. However it will not change our result set.
Modifying the results is always depends on the requirement. In above cases, if the request is to query the tables based on employee name, then we can use only DISTINCT to avoid any duplicate rows. But if they work for different department, then above two solutions will not give any affect.
What is Inner Join?
This is the simplest form of joins. Here two or more tables are joined based on the common column/s in the respective tables to get the requested result. This type of join is also known as equijoin. Here two or more tables are joined using ‘=’ in the WHERE clause. It will return the records only if there is matching records in both the tables.
Consider the same example of EMPLOYEE and DEPARTMENT table. Suppose we want to see the department name for which each employee works. As we have learnt above, this is done by using the join condition on DEPARTMENT_ID on both the tables. Here we need to list all the employees from the EMPLOYEE table and for which we need to show the department name depending on their DEPARTMENT_ID. i.e.;
SELECT emp.EMPLOYEE_ID, emp.first_name, emp.LAST_NAME, dept.DEPARTMENT_ID, dept.DEPARTMENT_NAME FROM DEPARTMENTS dept, EMPLOYEES emp WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID;
Here we can see that for each employee in the EMPLOYEE table, we have one department Id and have matching details in the DEPARTMENT table. We have only those departments in the EMPLOYEE table that has details in the DEPARTMENT table. Hence when we perform inner join on the table we get matching rows only.
Suppose we have some departments in DEPARTMENT table for which still no employees are assigned. These departments are not at all obtained by the inner query. That means when we equijoin the tables, it will fetch for below department Id in both the table. If it finds them in both the table, then it will display the results based on the SELECT list items.
What is Outer Join?
This is another powerful joining technique in SQL. Here it works same as inner join with little difference. When outer join is used it lists all the records that are matching the joining condition like in inner joins and it also lists the records from either of the table which do not have matching records. Here displaying non-matching records from which table is depend on the type of outer join.
If we consider the same example as above with EMPLOYEE and DEPARTEMENT tables, to pull all the employees who are working for the departments in DEPARTMENT table, then we can see the result like below.
SELECT dept.*,emp.* FROM DEPARTMENTS dept,EMPLOYEES emp WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID (+);
OR
SELECT dept.*,emp.* FROM DEPARTMENTS dept LEFT OUTER JOIN EMPLOYEES emp ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;
This query will pull all the records from DEPARTMENTS table first. Then it fetches all the employees in EMPLOYEES table who are working for the departments in it. It lists all the matching records from the EMPLOYEES table. There might be few departments in DEPARTMENTS table for which there will not be any employees. For such case, outer join will display the department details and its corresponding employees as NULL.
This is how outer join works. In this method of joining we can see that all the matching records from the tables participating in joining condition will be displayed first. Then all the records for which match is not found in the second table is marked as null. This is one of the techniques in outer joins.
What are different types of Outer Join?
Outer join is the technique to get the matching records from the tables that are participating in join and the records that do not have match. But selecting the records that do not have matching record is either from one of the table or from both the tables. According to the method select to pull the records from the table, there are 3 types of outer joins –left outer join, right outer join and full outer join.
- Left Outer Join : This is the outer join method where all the records from the table on the left of the join condition are listed and the records for which match is not found in the right table NULL is displayed. That means, if we write ‘A LEFT OUTER JOIN B’, then all the records from A is displayed and their corresponding records from table B is displayed against them. For the records in A for which match is not found in table B, NULLs are displayed.
Consider the example where we need to display all the department and their employees. Here, we need to show all the departments. If we have employees working in that department, then we need to show those employees. If there are no employees for the department then we need to list those departments but their employee as NULL.
SELECT dept.*,emp.* FROM DEPARTMENTS dept,EMPLOYEES emp WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID (+);
OR
SELECT dept.*,emp.* FROM DEPARTMENTS dept LEFT OUTER JOIN EMPLOYEES emp ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;
Here DEPARTMENTS is the LEFT table and EMPLOYEES is the right table. When we write the clause ‘LEFT OUTER JOIN’ it pulls all the records from left table with their matching records from the right table. If there are no matching records in the right table, it displays NULLs for their columns.
There is another method of writing left outer join using ‘(+)’ sign. When we perform left outer join, we write ‘(+)’ on the right side of the join condition to indicate that pull all the records from the left side table.
- Right Outer Join : This is the outer join method where all the records from the table on the right of the join condition are listed and the records for which match is not found in the right table NULL is displayed. That means, if we write ‘A RIGHT OUTER JOIN B’, then all the records from B is displayed and their corresponding records from table A is displayed against them. For the records in B for which match is not found in table A, NULLs are displayed.
Consider the same example where we need to display all the department and their employees. Here, we need to show all the departments. If we have employees working in that department, then we need to show those employees. If there are no employees for the department then we need to list that department but their employee as NULL. But here we are going to use RIGHT outer join. Hence we will use ‘RIGHT OUTER JOIN’ or ‘(+)’. While using ‘(+)’ sign, we will change the position of the sign to indicate that it is right outer join like below:
SELECT dept.*,emp.* FROM DEPARTMENTS dept,EMPLOYEES emp WHERE dept.DEPARTMENT_ID(+) = emp.DEPARTMENT_ID;
OR
SELECT dept.*,emp.* FROM DEPARTMENTS dept RIGHT OUTER JOIN EMPLOYEES emp ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;
- Full Outer Join : This is another type of outer join where both left and right outer joins are combined. That means, at first all the matching records from both the tables are listed, then all the records from both the tables are listed with NULLs to columns from other table for which match is not found.
Here we cannot use ‘(+)’ symbol to write full outer join query. We need to write ‘FULL OUTER JOIN’ to get the result. In our example of EMPLOYEES and DEPARTMENT, when we perform full outer join, it will list all the matching records from DEPARTMENTS and EMPLOYEES and then it will list rest of the departments for which no employees are present. Then it will list all the employees who are not yet assigned to any department. The columns of the table for which match is not found is shown as NULL. We can see NULLs in DEPARTMENTS and EMPLOYEES columns below for which match is not found.
SELECT dept.*,emp.* FROM DEPARTMENTS dept FULL OUTER JOIN EMPLOYEES emp ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID