Greedy Algorithm and Huffman Coding

Greedy Algorithm

Some optimization problems can be solved using a greedy algorithm. A greedy algorithm builds a solution iteratively. At each iteration the algorithm uses a greedy rule to make its choice. Once a choice is made the algorithm never changes its mind or looks back to consider a different perhaps better solution; the reason the algorithm is called greedy. Generally the greedy rule can be expressed using priority queue of possible candidates (partial solution) to build the solution. So each iteration of the greedy algorithm removes the next best candidate from the priority queue and checks if the solution can add the candidate to build a larger solution. The algorithm terminates when the priority queue is empty

Iterative Algorithm Greedy(Set of candidate component_solution each with value and other attribute)

// returns sequence, S, of component_solution representing the complete solution

make a Priority Queue, Q, of candidate component_solution keyed by value

make a empty Sequence, S

while Q is not empty do

remove maximum value candidate component_solution, c, from Q

if S + c a good partial solution then add c to S

return S

The most common example of a greedy algorithm is making change with American coinage. For instance, if you have to give someone 73 cents in change, what do you do? First, give them as many quarters as you can (2), then from the leftover, give them as many dimes (2), as many nickels (0) and as many pennies (3). So, the priority queue is the amount of change given.

So, lets look at the algorithm for making change:

Iterative Algorithm Change(double value)

// returns a Sequence, of # of dollars, quarters, dimes, nickels, then pennies. (or whatever change set you use)

make a Priority Queue, Q, of coins by value

make a empty Sequence

while Q is not empty do

remove maximum coin from Q

find x such that coin > (value – x * coin) > 0

Add x to S

return S

A greedy algorithm is not guaranteed to always generate the optimal solution. But the algorithm is generally easy to design and implement, and when it does generate the optimal solution it does so efficiently. Also when the greedy algorithm does not generate the optimal solution, the solution is close to optimal.

What happens when you use a different set of coins, say 1, 4, and 6 cent pieces?

Lets make change for 8 cents using the greedy algorithm.

The greedy approach gives you one 6 cent coin, and two 1 cent coins, which means you have 3 coins, but the optimal solution is two coins (two 4 cent coins). So, for the change problem, the optimal solution depends on the data set. Luckily for us, American coinage works great for this approach.

Text Compression

Typically characters are encoded by fixed length binary words. ASCII code uses 8 bits and Unicode uses 16 bits. Huffman encoding uses variable bit length words to encode the characters. Characters used more frequently have smaller length. In this way the encoding of string can be smaller then with fixed length words.

The Huffman code uses a binary tree to describe the code. Each letter of the alphabet is located at an external. The bit encoding is the path from the root to the letter with moving to the left child generating a 0 and moving to right child generating a 1. If we are actually using the tree to encode the text then we would need an additional locater structure. Normal one would make a lookup table and use the tree only to construct/determine the code. The tree is a satisfactory structure to decode.

Illustrate Huffman tree

Huffman Algorithm

We determine the frequency of character and use the frequency to prioritize the characters (that are single node trees). We remove two elements from the queue and construct a binary tree with key the sum of the two removed keys. The constructed tree is but back into the queue.

Algorithm Huffman(X)

input: String X of length n with d distinct characters
output: coding tree for X

Compute the frequency function f.
Initialize empty priority queue Q of trees
for each character c in X do
Create a single-node binary tree T storing c.
Insert T into Q with key f(c)

while Q.size() > 1 do
f₁ = Q.minKey()
T₁ = Q.removeMin()
f₂ = Q.minKey()
T₂ = Q.removeMin()
Create new binary tree T with left subtree T₁ and right subtree T₂
Insert T into Q with key f₁ + f₂
return tree Q. removeMin()

The algorithm is a good example of an Greedy Algorithm. Can you see how? Notice that the work done after selecting from the priority queue is different for almost every Greedy Algorithm. This is usually the tough part of writing the algorithm.

This is also a good example of using a compound structure, with varying sized trees as elements of the priority queue. The complete Huffman tree is an optimal variable length word encoding.

Cost

Running time is O(n + d lg d) where O(d lg d) for constructing the tree from d is the alphabet size and O(n) for making the frequency function and encoding the message using a lookup table.

Coding

Although the Huffman Tree is a suitable data structure for decoding a message, it is a not a suitable data structure for encoding a message.

How would you encode a message using a Huffman tree? For each character in the message we would have perform a tree traversal, recording and updating the encoding string as we go, until we find the character we are searching. What would be the cost?