Huffman Coding Algorithm Tutorial
Dr Arun Kumar
PhD (Computer Science)Table of Contents

Introduction
 Definition

Steps of Huffman Coding Algorithm
 Frequency Calculation
 Priority Queue
 Building the Huffman Tree
 Code Assignment

Time and Space Complexity
 Time Complexity
 Space Complexity

Limitations
 Not Adaptive
 Not Optimal for Small Data Sets
 FixedLength Encoding

Huffman Coding Example using Python
 Problem Statement

Python Implementation
 Importing Libraries and Defining the Huffman Node Class
heapq
Counter
HuffmanNode
Class
 Building the Huffman Tree
 Frequency Calculation
 Priority Queue Initialization
 Building the Tree
 Assigning Codes to Characters
 Base Case
 Leaf Node
 Recursive Case
 Encoding the Text
 Build the Huffman Tree
 Assign Codes
 Encode the Text
 Return
 Importing Libraries and Defining the Huffman Node Class

Example Usage
 Input Text
 Output
 Encoded Text
 Huffman Codes

Detailed Explanation with Example
 Example Text: "abc"
 Building the Huffman Tree
 Initial Heap
 Merging Nodes
 Final Tree
 Assigning Codes
 Encoding the Text
 Building the Huffman Tree
 Example Text: "abc"
Introduction to Huffman Coding  A Greedy Algorithm
Huffman Coding is a widely used algorithm for lossless data compression. It assigns variablelength codes to input characters, with shorter codes assigned to more frequent characters. The goal is to minimize the total number of bits used to represent the input data.
Steps
Frequency Calculation:
 Calculate the frequency of each character in the input data.
Priority Queue:
 Create a priority queue (minheap) where each node is a character and its frequency.
Building the Huffman Tree:
 While there is more than one node in the queue:
 Extract the two nodes with the lowest frequency.
 Create a new internal node with these two nodes as children and a frequency equal to the sum of their frequencies.
 Insert the new node back into the priority queue.
 The remaining node in the queue is the root of the Huffman Tree.
Code Assignment:
 Traverse the Huffman Tree from the root to the leaves, assigning a binary code to each character. Assign '0' for a left edge and '1' for a right edge.
Time and Space Complexity
Time Complexity:
 Building the frequency table: O(n)O(n)O(n)
 Building the priority queue: O(nlogn)O(n \log n)O(nlogn)
 Building the Huffman Tree: O(nlogn)O(n \log n)O(nlogn)
 Total: O(nlogn)O(n \log n)O(nlogn)
Space Complexity:
 Storing the frequency table: O(n)O(n)O(n)
 Storing the Huffman Tree: O(n)O(n)O(n)
 Storing the codes: O(n)O(n)O(n)
 Total: O(n)O(n)O(n)
Limitations
 Not Adaptive: Huffman Coding is not adaptive and requires a complete pass through the data to build the frequency table.
 Not Optimal for Small Data Sets: For very small data sets, the overhead of storing the Huffman Tree may outweigh the compression benefits.
 FixedLength Encoding: In cases where all characters are equally frequent, Huffman Coding does not provide any compression advantage over fixedlength encoding.
Huffman Coding example using Python
Problem Statement: Given a string, compress it using Huffman Coding and output the encoded binary string. Also, provide the Huffman codes for each character.
1. Importing Libraries and Defining the Huffman Node Class
First, we import the necessary libraries and define the HuffmanNode class.
import heapq
from collections import defaultdict, Counter
class HuffmanNode:
def __init__(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None
def __lt__(self, other):
return self.freq < other.freq
 heapq: This is used to implement a priority queue.
 Counter: This is used to count the frequency of characters in the text.
 HuffmanNode Class: This class represents a node in the Huffman Tree.
 __init__: Initializes a node with a character and its frequency. left and right are set to None initially.
 __lt__: Defines the lessthan operator to compare nodes based on frequency. This is necessary for the priority queue to work correctly.
2. Building the Huffman Tree
Next, we define the function to build the Huffman Tree.
def build_huffman_tree(text):
frequency = Counter(text)
priority_queue = [HuffmanNode(char, freq) for char, freq in frequency.items()]
heapq.heapify(priority_queue)
while len(priority_queue) > 1:
left = heapq.heappop(priority_queue)
right = heapq.heappop(priority_queue)
merged = HuffmanNode(None, left.freq + right.freq)
merged.left = left
merged.right = right
heapq.heappush(priority_queue, merged)
return priority_queue[0]
 Frequency Calculation: Counter(text) creates a dictionary where the keys are characters and the values are their frequencies.
 Priority Queue Initialization: We create a list of HuffmanNode objects for each characterfrequency pair and convert it into a heap using heapq.heapify().
 Building the Tree:
 We repeatedly pop the two nodes with the smallest frequencies from the heap.
 We create a new HuffmanNode with these two nodes as children and a frequency equal to their sum.
 We push the new node back into the heap.
 This process continues until there is only one node left in the heap, which is the root of the Huffman Tree.
3. Assigning Codes to Characters
Now, we define the function to assign Huffman codes to each character by traversing the tree.
def assign_codes(node, current_code, codes):
if node is None:
return
if node.char is not None:
codes[node.char] = current_code
assign_codes(node.left, current_code + '0', codes)
assign_codes(node.right, current_code + '1', codes)
 Base Case: If the current node is None, we return.
 Leaf Node: If the current node has a character (node.char is not None), we assign the current code to this character.
 Recursive Case: We recursively call the function for the left and right children, appending '0' for the left edge and '1' for the right edge.
4. Encoding the Text
Finally, we define the function to encode the input text using the Huffman Tree.
def huffman_encoding(text):
root = build_huffman_tree(text)
codes = {}
assign_codes(root, '', codes)
encoded_text = ''.join(codes[char] for char in text)
return encoded_text, codes
 Build the Huffman Tree: We call build_huffman_tree to get the root of the tree.
 Assign Codes: We call assign_codes to get the Huffman codes for each character.
 Encode the Text: We create the encoded text by replacing each character in the input text with its corresponding Huffman code.
 Return: We return the encoded text and the dictionary of codes.
Example Usage
Here's how you can use the above functions in a complete program:
if __name__ == "__main__":
text = "huffman coding example"
encoded_text, codes = huffman_encoding(text)
print("Encoded Text:", encoded_text)
print("Huffman Codes:", codes)
 Input Text: "huffman coding example"
 Output:
 Encoded Text: The binary representation of the input string.
 Huffman Codes: A dictionary with characters as keys and their corresponding Huffman codes as values.
Detailed Explanation with Example
Let's consider the text "abc" with frequencies:
 a: 1
 b: 1
 c: 1
Building the Huffman Tree:
 Initial heap: [(1, 'a'), (1, 'b'), (1, 'c')]
 Pop a and b, create new node with frequency 2.
 Heap after first merge: [(1, 'c'), (2, None)]
 Pop c and new node, create root node with frequency 3.
 Final tree:
Assigning Codes:
 c: 0
 a: 10
 b: 11
Encoding:
 Text "abc" becomes "10110"
This is a simplified example, but it shows how the algorithm works in practice. The actual frequencies and codes will depend on the input text.
*
/ \
c *
/ \
a b
Related Questions
What is the Huffman coding algorithm?
Huffman coding is a technique used for data compression. It assigns variablelength codes to input characters, with shorter codes assigned to more frequent characters, to reduce the overall size of the data.
What are the two types of Huffman coding?
There are two main types of Huffman coding: Static Huffman Coding: In this type, the code assignments are fixed and do not change over time. Dynamic Huffman Coding (also known as Adaptive Huffman Coding): This type allows the code assignments to adapt based on the input data's frequency distribution, resulting in potentially more efficient encoding for varying data.
Is Huffman coding a Greedy algorithm?
Yes, Huffman coding is a Greedy algorithm. It makes locally optimal choices at each step with the hope of finding a global optimum solution.
What is the disadvantage of Huffman algorithm?
One disadvantage of Huffman coding is that it requires the sender and receiver to have the same static Huffman tree for encoding and decoding. If the trees differ, the decoding will produce incorrect results.
What is better than Huffman coding algorithm?
There isn't a single algorithm that is universally better than Huffman coding in all scenarios. Different data characteristics and requirements may lead to different compression algorithms being more suitable. Some alternatives to Huffman coding include Arithmetic coding, LempelZiv algorithms (like LZ77 and LZ78), and RunLength Encoding (RLE). Each of these has its strengths and weaknesses depending on the specific use case.
Related Post
Research Design and Methodology in depth Tutorial
This guide provides an indepth overview of the essential aspects of research design and methodology.
How to Conduct a Literature Review in Research
This guide serves as a detailed roadmap for conducting a literature review, helping researchers navigate each stage of the process and ensuring a thorough and methodologically sound review.
How to Formulate and Test Hypotheses in Research
Here’s a stepbystep guide, illustrated with an example, to help understand how to formulate and test hypotheses using statistics.
Difference between Qualitative and Quantitative Research with Example
Research methodologies can be broadly categorized into qualitative and quantitative approaches. This article explores these differences using an example, including the use of statistics.