Hash Tables

Are a form of maps common in compilers for symbol tables (tables storing variable's names and address) and operating systems for registry of environment variables (variables defining the shell options).

Components of a Hash Table

Bucket Arrays // where the items are stored.
Hash Functions (algorithm for storing items in the hash table) // How we locate the items.

Bucket Arrays

An array A with size N
If keys are integers then insert will put entry e in array rank k, A[k] = e, where entry = (key, value). We could then recover the value given k, with get(k) by accessing A[k].
Any bucket with array cell not assigned an element should contain a special NO_SUCH_KEY object. Naturally when the bucket array is first created with no keys assigned all cells contain NO_SUCH_KEY

The map methods get(k) and insert(k,e) should take O(1).

There are problems:

What if the keys are not unique?
What if the keys are not integers?

If there are many keys or if N is too small then a collision can occur. We have to take care of this. If the keys are not integers then we need a function form keys to integers: a hash function.

Hash Functions

A hash function maps general keys to a finite range of integers.

Hash function: general keys --> finite set of integers

The hash function has two parts:

Hash code, maps general keys to integers.

Hash code: general keys --> Whole range of integers (Not really.)
Compression map, maps the infinite set of integers to a finite range of integers.

Compression map: Whole range of integers --> finite range of integers

Hash Codes

We like for the hash code to avoid collisions, but this is not always possible. Collisions are when distinctly different keys map to the same hash values. Why is this possible? Collisions are unavoidable because:

finite set of integers even if we use int
algorithm may get too complicated otherwise

We require that if keys are equal then the hash code should return the same integer value. If hash code function is h, then:

if k₁ = k₂then h(k₁) = h(k₂)

For efficiency of the hash table we would like for the hash code to spread out the mapping uniformly across the integers.

Methods for Hashing

Casting to Integer:

This works great for int and char, but not very well for doubles or Strings. Why? Chopping would give the same value for the hashcode.
Summing components:

static int hashCode(long i) { return (int)((i >> 32) + (int) i):} // good for long
Polynomial hash codes:

Assume the key is of the form of a Object with multiple components, (x₀, x₁, x₂, ..., x_k_-1), for example a String, the individual objects characters. The polynomial:

x₀a^k-1 + x₁a^k-2 + ... + x^k-2a + x_k-1
x_k-1 + a(x_k-2 + a(x_k-3 + ... + a(x₂ + a(x₁ + ax₀)) ... ))
Cyclic shift hash codes:

A variant of the polynomial hash code but replaces multiplication of a with cyclic shifts. Again mixes the components.

Compression Maps

We need to compress the hash value so we can find the proper bucket in a finite array.

Methods for Compressing

The Division Method:

h(k) = |k| mod N
The Mad Method:

Multiply-Add-Divide