Two basic methods; separate chaining and open address.
Hangs an additional data structure off of the buckets. For
example the bucket array becomes an array of link list. So to
find an item we first go to the bucket then compare keys.. This
is a popular method, and if link list is used the hash never fills
up.
Illustrate
load factor, f =
n/N where n is number of items stored in the hash
table. Like for the load factor to be less then 1.
The cost
for get(k) is on average O(n/N)
The problem with separate chaining is that the data structure can
grow with out bounds. Sometimes this is not appropriate because
of finite storage, for example in embedded processors.
Open
addressing does not introduce a new structure. If a collision
occurs then we look for availability in the next spot generated by an
algorithm. Open Addressing is generally used where storage space is a
premium, i.e. embedded processors. Open addressing not necessarily
faster then separate chaining.
Methods for Open Addressing:
Linear Probing:
We try to insert Item = (k, e) into bucket A[i] and find it full so the next bucket we try is:
A[(i + 1) mod N]
then try A[(i + 1) mod N], etc.
Illustrate with 11 buckets: Note the probing is linear.
Note the hash table can be filled up.
Also what to do if we remove an Item. Should repair the array A but this is too costly. Instead we mark the bucket as available/deactivated. Then the next use of findElement(k) would skip over the available/deactivated bucket. insertItem(k, e) would insert into a available/deactivated.
Clustering slows down searches.
Quadratic Probing:
A[ (i + f(j) )mod N] where j = 0, 1, 2, ... and f(j) = j2
Helps avoids clustering. Secondary clustering can
occur. We can imagine a more complicated function for f.
Double Hashing:
Use a second hash function h'.
A[ (i + f(j) )mod N] where f(j) = j*h'(k) should not evaluate to zero. Example: h'(k) = q - (k mod q). Note that still i = h(k).