Data Structures and Algorithms - Old Questions
2. Describe the significance of Huffman tree. Describe procedure for construction of a Huffman tree. Illustrate it with example. Describe different types of applications of Binary trees.
Huffman
Coding is a technique of compressing data to reduce its size without losing any
of the details. It is generally useful to compress the data in which there
are frequently occurring characters.
Using
the Huffman tree, we can compress the string to a smaller size.
Procedure
for construction of Huffman tree
1.
Calculate the frequency of each character in the string.
2.
Sort the characters in increasing order of the frequency. These are stored in a
priority queue Q.
3.
Make each unique character as a leaf node.
4.
Create an empty node z. Assign the minimum frequency to the left
child of z and assign the second minimum frequency to the
right child of z. Set the value of the z as the
sum of the above two minimum frequencies.
5.
Remove these two minimum frequencies from Q and add the sum
into the list of frequencies.
6.
Insert node z into the tree.
7. Repeat
steps 3 to 5 for all the characters.
8. For each non-leaf node, assign 0 to the left edge and 1 to the right edge.
Example:
Let us take any four characters and their frequencies:
Now sort these characters according to their frequencies in non-decreasing order.
Here before using Huffman algorithm the total number of bits
required is
= 2*(6+3+2+1) = 24 bits.
The tree constructed for the above example is shown below:
Now from variable length code we get following code sequence.
Thus after using Huffman algorithm the total number of bits
required is
=1*3+2*3+3*2+6*1=21 bits
Applications of Binary trees
1. Binary Search Tree - Used in many search applications where data is constantly entering/leaving, such as the map and set objects in many languages' libraries.
2. Binary Space Partition - Used in almost every 3D video game to determine what objects need to be rendered.
3. Binary Tries - Used in almost every high-bandwidth router for storing router-tables.
4. Hash Trees - used in p2p programs and specialized image-signatures in which a hash needs to be verified, but the whole file is not available.
5. Heaps - Used in implementing efficient priority-queues, which in turn are used for scheduling processes in many operating systems, Quality-of-Service in routers, and A* (path-finding algorithm used in AI applications, including robotics and video games). Also used in heap-sort.
6. Huffman Coding Tree - Used in compression algorithms, such as those used by the .jpeg and .mp3 file-formats. GGM Trees - Used in cryptographic applications to generate a tree of pseudo-random numbers.
7. Syntax Tree - Constructed by compilers and (implicitly) calculators to parse expressions.
8. Treap - Randomized data structure used in wireless networking and memory allocation.
9. T-tree - Though most databases use some form of B-tree to store data on the drive, databases which keep all (most) their data in memory often use T-trees to do so.