What is hash Join in Oracle?
Hash joins are used when the joining large tables or when the joins requires most of the joined tables rows. This is used for equality joins only
Algorithm for Oracle Hash Join
1) The optimizer uses smaller of the 2 tables to build a hash table in memory.Small table is called build table
For each row in small table loop
Calculate hash value on join key
Insert row in appropriate hash bucket.
2)then scans the large tables and compares the hash value (of rows from large table) with this hash table to find the joined rows.Large table is called probe table
For each row in big table loop
Calculate the hash value on join key
Probe the hash table for hash value
If match found
The above explanation is true when the hash table being develops fits entirely in memory.if the hash table is too big to fit in the available memory ,then Oracle do the processing in little different manner.
Basically if the hash table is too big to fit in the available memory Oracle has a mechanism for saving it to disc in batches (called partitions) and saving the probe table to disc in matching batches, then doing the join piece wise between matching batches
Basically when the hash area fills up, Oracle will require to use TEMP tablespace. Oracle will choose largest partition with in the hash and write that to TEMP.Oracle keeps a bitmap-like index of the entire hash table in memory. This index only knows that for a given hash bucket (location in the hash table) there is or is not a value.But the index doesn’t have the actual value.
Hence if a row from the second or probing table hashes to one of these locations the only thing that is known is that there is a potential match.The process of building hash table continues till the entire hash table is completed.Part of the hash table is in memory and part is on Disk
Now the Oracle start reading the probe or other table and start creating the hash of the join key.If the hash key matches with the hash in memory ,then the join is completed and row is returned.If the row is the potential match then oracle will store this row on the TEMP tablespace in the same partitioning scheme as the first row data was kept.
Once the first pass of the entire secondary table is done and all the rows in hash table in memory are given,Oracle will start looking at the matching partition in the disk and it will process each partition one by one and reading the rows in the memory and process the output
Hash join Oracle Example:
select /* +use_hash( a b) */ a.emp_no,b.dept_no,b.dept_name from emp a,dept b where a.dept_no=b.dept_no; Plan ------------------------------------------------- SELECT STATEMENT HASH JOIN TABLE ACCESS FULL EMP TABLE ACCESS FULL DEPT
How Hash join in Oracle is processed
1) First EMP table is scanned and hashed
2) the dept table is scanned for each row and hash is created for the join keeping dept_no
3) dept_no hash is matched in the hash table, if a match is found ,joined rows are returned
Oracle has provided the hint use_hash to force the use of hash join.
Usage select /* +use_hash(table alias) */ ......
This tells the optimizer that the join method to be used when “table_alias” is the next row source in the join order should be a hash join; however it does not tell the optimizer whether that row source should be used as the build table or the probe table.
To specify how the row source is used you need a second hint: no_swap_join_inputs(“table_alias”) if you want Oracle to use the rowsource as the probe table, or swap_join_inputs(“table_alias”) if you want Oracle to use it as the build table
Some important points about hash join in Oracle
1)The output of hash join result is not instantaneous as hash joining is blocked on building up hash table.Once the hash table build is completed, then the rows are returned faster
2)hash joins are commonly seen with ALL_ROWS optimizer mode, because it works on model of showing results after all the rows of at least one of the tables are hashed in hash table.
3)Hash table is the private memory so accessing data does not incur much latching activity
Cost of the hash Join= Cost of the read of table A +cost of the read table B + some little processing in memory
Difference between Nested Loop and Hash Join
|Hash Join||Nested Join|
|Hash joins are used when the joining large tables or when the joins requires most of the joined tables rows. This is used for equality joins only||The NESTED LOOPS Join is a join operation that selects a row from the selected beginning row source and uses the values of this row source to drive into or select from the joined row source searching for the matching row.
-Best for OLTP type transactions
|Hash Operation is normally efficient for the two dataset when lot of records are returned.||When one dataset has no or small number o records returned and other dataset can be matched using index operation,then Nested loop join is more efficient|
|You may see more hash joins used with ALL_ROWS optimizer mode, because it works on model of showing results after all the rows of at least one of the tables are hashed in hash table.||You will see more use of nested loop when using FIRST_ROWS optimizer mode as it works on model of showing instantaneous results to user as they are fetched. There is no need for selecting caching any data before it is returned to user. In case of hash join it is needed and is explained below.|