A hash collision occurs when two distinct objects (or keys) produce the same hash code value when passed through a hash function. In other words, different elements end up in the same bucket of a hash table. Since hash codes are finite, and there can be far more possible elements than hash codes, collisions are inevitable.
Causes of Hash Collisions
- Limited hash code range: The hash function has a limited output range, typically 32-bit integers (as returned by Java’s
hashCode()
method), while there can be many more possible input values. - Different objects, same hash code: Different objects may naturally produce the same hash code due to the nature of the hashing algorithm.
Example
Suppose you have two distinct objects A
and B
, and both produce the same hash code:
int hashA = A.hashCode(); // e.g., 123456
int hashB = B.hashCode(); // e.g., 123456
Both objects A
and B
would be placed in the same bucket in a hash table.
Collision Handling in Hash Tables
When a collision occurs, a hash table needs a mechanism to handle it efficiently. There are a few common strategies:
1. Separate Chaining (Used in Java HashMap)
In this approach, each bucket of the hash table stores a list (or chain) of entries. If multiple elements have the same hash code, they are added to the list within that bucket.
- Before Java 8: Java’s
HashMap
used linked lists for separate chaining. Each bucket points to a linked list where all colliding elements are stored. - After Java 8: To improve performance, if the size of the linked list in any bucket exceeds a threshold (8 entries by default), Java’s
HashMap
converts the list into a balanced binary search tree (Red-Black Tree). This optimizes search time for colliding elements from O(n) to O(log n).
Steps in Separate Chaining:
- Compute the hash code for the element.
- Use the hash code to determine the bucket (index) in the hash table.
- If the bucket is empty, store the element.
- If the bucket already contains elements (collision), add the new element to the linked list (or tree) in that bucket.
- Use the
equals()
method to ensure that duplicates are not added to the bucket.
Example of Separate Chaining:
Bucket 0 -> [Element1] -> [Element3] -> [Element5]
Bucket 1 -> [Element2]
Bucket 2 -> [Element4]
Here, Element1
, Element3
, and Element5
all have the same hash code and are stored in the same bucket, forming a linked list.
2. Open Addressing (Not used in Java HashSet/HashMap)
In open addressing, when a collision occurs, the hash table looks for another available slot by following a sequence of probes. This could be done using:
- Linear probing: Check the next slot in the array.
- Quadratic probing: Use a quadratic function to calculate the next slot.
- Double hashing: Use another hash function to calculate the next probe location.
Java’s HashSet
and HashMap
do not use open addressing, but it’s a common collision-handling technique in other hash table implementations.
Example of Hash Collision in Java
Here is a simple example to illustrate a collision:
public class HashCollisionExample {
public static void main(String[] args) {
HashSet set = new HashSet<>();
String str1 = "FB"; // Both "FB" and "Ea" have the same hash code in Java.
String str2 = "Ea";
set.add(str1);
set.add(str2);
System.out.println("HashCode of str1: " + str1.hashCode());
System.out.println("HashCode of str2: " + str2.hashCode());
System.out.println("Set contains 'FB': " + set.contains("FB"));
System.out.println("Set contains 'Ea': " + set.contains("Ea"));
System.out.println("Set size: " + set.size());
}
}
Output:
HashCode of str1: 2236
HashCode of str2: 2236
Set contains 'FB': true
Set contains 'Ea': true
Set size: 2
Both "FB"
and "Ea"
have the same hash code (2236
), but they are stored separately in the set due to collision handling.
Performance Considerations
Collisions can negatively impact the performance of a hash table. In a scenario with frequent collisions:
- Without proper handling, lookups, insertions, and deletions degrade from O(1) (constant time) to O(n) (linear time), where
n
is the number of elements in the bucket. - With tree-based separate chaining, the performance is O(log n) in worst-case scenarios when collisions occur frequently (many elements per bucket).
Summary of Hash Collision Handling:
- Hash collisions occur when two distinct elements produce the same hash code.
- Separate chaining is the collision resolution technique used in Java’s
HashMap
, where each bucket stores a list (or tree) of entries. - Rehashing may occur when the load factor exceeds a threshold, and the hash table is resized to reduce collisions.
- Efficient handling of collisions ensures that hash-based collections like
HashSet
andHashMap
maintain good performance in most scenarios.