Huffman Coding : A Lossless Algorithm for Data Compression

Huffman Coding : A Lossless Algorithm for Data Compression

Hello connections I hope you all heard about Huffman Coding which is a well known lossless data compression algorithm . When it comes to data compression we all have used image compression tools or video ,audio or any file compression tools in our day to day life . But have you ever wonder how those compression works ? Let's understand a Algorithm which helps us to compress our data :

Introduction :

Huffman coding is a lossless algorithm which is used for data compression. So first of all what is a lossless algorithm ? Whenever we compress our data using some algorithm there may be a chance of loss of some data and that is known as loss-full algorithms. So the lossless algorithms do the opposite as lossfull algorithm and does not loss any data during compression.

How Huffman Coding Works to Compress Data ?

It works in several steps to compress our data . Let's discuss it step by step :

Step-1 : Creation of Frequency of the input

In first step we will create a frequency map to store the frequency of all the characters and symbol of the input .

For Example : If our input is "huffman" then map will be like {h:1 , u:1 ,f:2 ,m:2 ,a:2 , n:1} . Frequency of the characters represents the number of time each character appears in the input


Article content

Step-2 : Creation of Node using Each key of the frequency Map


Article content

Create Node class which will contain characters and their frequency and the left node and right node .Using these node Huffman Tree will be created .

Then we need to create a Priority Queue(Min-Heap) or we can implement our custom min heap to store the nodes.

If we consider the previous example the Priority Queue will contains each character and it's frequency and left and right node will be null initially.

Step-3 : Remove each two nodes from the Priority Queue(Min-Heap) and combine them and insert it into again in the Priority Queue.

In this step we will remove two smallest node from the min heap and then combine them and then insert again to the min heap .After combination we have to create a node which will contain no name ,sum of their sequence and left node as the first node that have removed from the priority queue and right node as the second node that have removed from the queue.

Then we will process all the elements of the heap.


Article content


Article content


Article content


Article content


Article content


Step -4 : Assign Codes to the Huffman Tree

Create Encoder and Decoder Map . Encoder map will be of Character and String type and Decoder will be it's reverse which is String and Character

In this step we will traverse the Priority Queue which we can say Huffman Tree and assign '0' to the left node and '1' to the right node of the tree and store in the encoder and decoder map.


Article content

Step-5 : Create Encoder

Then we will create our encoder which will take input and by retrieving the input we can form the encoded string . for the above example "huffman" will look like "00000101100100111" after encoding.

Step-6 : Create Decoder

Then we will create Decoder which which will take the encoded string as input and retrieve the characters from encoded string and form key by combining character which is unique for each character in the original input and then search in the decoder map for the key and combine them to form the result as original input.


Article content
Note : The above images that describes the steps to encode and decode a string using Huffman coding may not have similar output as yours .It is just a dry run to understand how it works .But the process is same .Output might differ because of the removal from the priority queue because many numbers are having similar value so priority queue might remove different element. So Nothing to worry if your encoded string is different from this.

So here we understand The process how We can encode and decode a input using Huffman coding.

Click Here For Code

Encoder and Decoder Method using BitSet :

The above code was about storing the encoded value as a String .If we are processing the input and encoding it using String it will be performance issue as every bit will be stored as a single character so there may be a chance of memory inefficiency . Also Operations on Strings, such as concatenation and indexing, are generally slower than operations on BitSet, which are optimized for bit manipulation.

BitSet is a data structure in Java that represents a sequence of bits (binary digits). It allows for efficient storage and manipulation of bits, making it ideal for scenarios where you need to manage large amounts of boolean data, such as in algorithms for compression, cryptography, and more.

Now optimize our encoder and decoder process using BitSet :

While encoding We will create a BitSet and initialize it to Store the encoded data and while decoding we can retrieve the encoded BitSet to generate the key and find the character from the encoder map and build our original String .

Conclusion :

In this article, we’ve explored Huffman coding, a powerful lossless data compression algorithm widely used in various applications. By compressing data without losing any information, Huffman coding optimizes storage and transmission, making it invaluable in our digital world.

Summary

Huffman Coding Basics: We discussed what Huffman coding is and how it works, breaking down the process into several clear steps—from calculating character frequencies to building a Huffman tree and generating binary codes.

Encoding and Decoding: We detailed the encoding and decoding processes using both traditional string representations and the more efficient BitSet approach. The use of BitSet enhances performance and reduces memory overhead, making it a preferred choice for handling binary data in compression algorithms.

Applications of Huffman Coding:

Huffman coding is used in various applications, including file compression (ZIP, Gzip), image compression (JPEG), video compression (MPEG), text compression, data transmission in network protocols, and multimedia formats (MP3).

By leveraging Huffman coding, developers can create more efficient systems that save storage space and improve performance across various applications. Its versatility and efficiency make it a fundamental technique in the field of data compression.

I would love to hear your thoughts on this article! If you have any feedback or suggestions for improvement, please feel free to share.

Code Reference

For those interested in the code implementation of Huffman coding, you can find it in my GitHub repository: GitHub Repo Link. Your contributions and suggestions are welcome!


Do you think Huffman could be re-built to ensure it returns an optimal balanced Huffman tree?

To view or add a comment, sign in

More articles by Akash Das

Others also viewed

Explore content categories