Huffman Coding Algorithm Tutorial

Dr Arun Kumar

PhD (Computer Science)

Share Facebook Linkedin Twitter

Table of Index

Introduction to Huffman Coding - A Greedy Algorithm
Steps Involved
Frequency Calculation:
Priority Queue:
Building the Huffman Tree:
Code Assignment:
Time and Space Complexity
Time Complexity:
Space Complexity:
Limitations of Huffman Coding Algorithm
Not Adaptive:
Not Optimal for Small Data Sets:
Fixed-Length Encoding:
Detailed Explanation with Example

Step by Step Example

1. Importing Libraries and Defining the Huffman Node Class
2. Building the Huffman Tree
3. Assigning Codes to Characters
4. Encoding the Text
Example Usage

Frequently Asked Questions

What is the Huffman coding algorithm?
What are the two types of Huffman coding?
Is Huffman coding a Greedy algorithm?
What is the disadvantage of Huffman algorithm?
What is better than Huffman coding algorithm?

Introduction to Huffman Coding - A Greedy Algorithm

Huffman Coding is a widely used algorithm for lossless data compression. It assigns variable-length codes to input characters, with shorter codes assigned to more frequent characters. The goal is to minimize the total number of bits used to represent the input data.

Steps Involved

Frequency Calculation:

Calculate the frequency of each character in the input data.

Priority Queue:

Create a priority queue (min-heap) where each node is a character and its frequency.

Building the Huffman Tree:

While there is more than one node in the queue:
- Extract the two nodes with the lowest frequency.
- Create a new internal node with these two nodes as children and a frequency equal to the sum of their frequencies.
- Insert the new node back into the priority queue.
The remaining node in the queue is the root of the Huffman Tree.

Code Assignment:

Traverse the Huffman Tree from the root to the leaves, assigning a binary code to each character. Assign '0' for a left edge and '1' for a right edge.

Time and Space Complexity

Time Complexity:

Building the frequency table: O(n)O(n)O(n)
Building the priority queue: O(nlog⁡n)O(n \log n)O(nlogn)
Building the Huffman Tree: O(nlog⁡n)O(n \log n)O(nlogn)
Total: O(nlog⁡n)O(n \log n)O(nlogn)

Space Complexity:

Storing the frequency table: O(n)O(n)O(n)
Storing the Huffman Tree: O(n)O(n)O(n)
Storing the codes: O(n)O(n)O(n)
Total: O(n)O(n)O(n)

Limitations of Huffman Coding Algorithm

Not Adaptive:
Huffman Coding is not adaptive and requires a complete pass through the data to build the frequency table.
Not Optimal for Small Data Sets:
For very small data sets, the overhead of storing the Huffman Tree may outweigh the compression benefits.
Fixed-Length Encoding:
In cases where all characters are equally frequent, Huffman Coding does not provide any compression advantage over fixed-length encoding.

Detailed Explanation with Example

Let's consider the text "abc" with frequencies:

a: 1
b: 1
c: 1

Building the Huffman Tree:

Initial heap: [(1, 'a'), (1, 'b'), (1, 'c')]
Pop a and b, create new node with frequency 2.
Heap after first merge: [(1, 'c'), (2, None)]
Pop c and new node, create root node with frequency 3.
Final tree:

Assigning Codes:

c: 0
a: 10
b: 11

Encoding:

Text "abc" becomes "10110"

This is a simplified example, but it shows how the algorithm works in practice. The actual frequencies and codes will depend on the input text.

    *
   / \
  c   *
     / \
    a   b

Step By Step Example

1. Importing Libraries and Defining the Huffman Node Class

First, we import the necessary libraries and define the HuffmanNode class.

import heapq
from collections import defaultdict, Counter

class HuffmanNode:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None
    
    def __lt__(self, other):
        return self.freq < other.freq

heapq: This is used to implement a priority queue.
Counter: This is used to count the frequency of characters in the text.
HuffmanNode Class: This class represents a node in the Huffman Tree.
- __init__: Initializes a node with a character and its frequency. left and right are set to None initially.
- __lt__: Defines the less-than operator to compare nodes based on frequency. This is necessary for the priority queue to work correctly.

2. Building the Huffman Tree

Next, we define the function to build the Huffman Tree.

def build_huffman_tree(text):
    frequency = Counter(text)
    priority_queue = [HuffmanNode(char, freq) for char, freq in frequency.items()]
    heapq.heapify(priority_queue)
    
    while len(priority_queue) > 1:
        left = heapq.heappop(priority_queue)
        right = heapq.heappop(priority_queue)
        merged = HuffmanNode(None, left.freq + right.freq)
        merged.left = left
        merged.right = right
        heapq.heappush(priority_queue, merged)
    
    return priority_queue[0]

Frequency Calculation: Counter(text) creates a dictionary where the keys are characters and the values are their frequencies.
Priority Queue Initialization: We create a list of HuffmanNode objects for each character-frequency pair and convert it into a heap using heapq.heapify().
Building the Tree:
- We repeatedly pop the two nodes with the smallest frequencies from the heap.
- We create a new HuffmanNode with these two nodes as children and a frequency equal to their sum.
- We push the new node back into the heap.
- This process continues until there is only one node left in the heap, which is the root of the Huffman Tree.

3. Assigning Codes to Characters

Now, we define the function to assign Huffman codes to each character by traversing the tree.

def assign_codes(node, current_code, codes):
    if node is None:
        return
    if node.char is not None:
        codes[node.char] = current_code
    assign_codes(node.left, current_code + '0', codes)
    assign_codes(node.right, current_code + '1', codes)

Base Case: If the current node is None, we return.
Leaf Node: If the current node has a character (node.char is not None), we assign the current code to this character.
Recursive Case: We recursively call the function for the left and right children, appending '0' for the left edge and '1' for the right edge.

4. Encoding the Text

Finally, we define the function to encode the input text using the Huffman Tree.

def huffman_encoding(text):
    root = build_huffman_tree(text)
    codes = {}
    assign_codes(root, '', codes)
    encoded_text = ''.join(codes[char] for char in text)
    return encoded_text, codes

Build the Huffman Tree: We call build_huffman_tree to get the root of the tree.
Assign Codes: We call assign_codes to get the Huffman codes for each character.
Encode the Text: We create the encoded text by replacing each character in the input text with its corresponding Huffman code.
Return: We return the encoded text and the dictionary of codes.

Example Usage

Here's how you can use the above functions in a complete program:

if __name__ == "__main__":
    text = "huffman coding example"
    encoded_text, codes = huffman_encoding(text)
    print("Encoded Text:", encoded_text)
    print("Huffman Codes:", codes)

Input Text: "huffman coding example"
Output:
- Encoded Text: The binary representation of the input string.
- Huffman Codes: A dictionary with characters as keys and their corresponding Huffman codes as values.

What is the Huffman coding algorithm?

Huffman coding is a technique used for data compression. It assigns variable-length codes to input characters, with shorter codes assigned to more frequent characters, to reduce the overall size of the data.

What are the two types of Huffman coding?

There are two main types of Huffman coding: Static Huffman Coding: In this type, the code assignments are fixed and do not change over time. Dynamic Huffman Coding (also known as Adaptive Huffman Coding): This type allows the code assignments to adapt based on the input data's frequency distribution, resulting in potentially more efficient encoding for varying data.

Is Huffman coding a Greedy algorithm?

Yes, Huffman coding is a Greedy algorithm. It makes locally optimal choices at each step with the hope of finding a global optimum solution.

What is the disadvantage of Huffman algorithm?

One disadvantage of Huffman coding is that it requires the sender and receiver to have the same static Huffman tree for encoding and decoding. If the trees differ, the decoding will produce incorrect results.

What is better than Huffman coding algorithm?

There isn't a single algorithm that is universally better than Huffman coding in all scenarios. Different data characteristics and requirements may lead to different compression algorithms being more suitable. Some alternatives to Huffman coding include Arithmetic coding, Lempel-Ziv algorithms (like LZ77 and LZ78), and Run-Length Encoding (RLE). Each of these has its strengths and weaknesses depending on the specific use case.

huffman code , huffman coding algorithm , huffman code example , huffman encoder , huffman tree , adaptive huffman coding , dynamic huffman coding , huffman algorithm in data structure , huffman algorithm java , huffman code in python , huffman code java , huffman coding algorithm example , huffman coding c program , huffman coding c++ , huffman coding calculator , huffman coding compression , huffman coding data structure , huffman coding greedy algorithm , huffman coding in c , huffman coding in digital image processing , huffman coding leetcode , huffman compression , huffman python , shannon fano coding example , adaptive huffman codes , adaptive huffman coding example , adaptive huffman coding online , adaptive huffman coding python , c++ program for huffman coding , decode huffman code python , file compression using huffman algorithm in c++ , huffman algorithm python , huffman coding c++ github , huffman coding github , huffman coding image compression python , huffman coding javascript , huffman coding matlab github , huffman coding online , huffman coding program , huffman coding program in python , huffman coding python code , huffman coding python heapq , huffman coding steps , huffman coding using python , huffman encoding and decoding python , huffman python code , image compression using huffman coding in python , python code for huffman coding , python huffman , shannon fano algorithm python , shannon fano python , use huffman coding to encode these symbols with given frequencies , shannon fano c++ ,

Prim's Algorithm: Understanding Minimum Spanning Trees

Prim's Algorithm is a greedy algorithm used to find the Minimum Spanning Tree (MST) of a weighted, undirected graph.

Author: Dr Arun Kumar 2024-12-09 16:40:23

7 Minutes