The JPEG Compression Algorithm

Michael Ellis

Published Oct 12, 2015

A *.jpg image is based on a variation of the 2-Dimensional Fourier Transform. A 2-Dimensional Fourier Transform is required for image files since images are 2-D. The 2-D transform is equivalent to taking the l-D transform of each row, and then taking the l-D transform of each column, for an image matrix.

The problem with the Fourier Transform is that it leaves us with a real component and an imaginary component for each pixel in the image. By symmetrically extending the image before taking the 2-D Fourier transform, the imaginary, or sine, terms can be forced to zero.

The images used in this post are raw 8-bit grayscale images.

The Discrete Cosine Transform (DCT)

The DCT is simply a separate mathematical method for generating the

Cosine Transform without explicitly using the Fourier Transform. It is

based on defining a matrix [C] such that

If an image is divided into blocks of size 8x8, then the matrix C must also be 8x8 and becomes

If a matrix U represents the pixels in an 8x8 block of an image, then the DCT is define by the equation

[V] = [C] [U] [CT]

using matrix multiplication and [CT] as the transpose of matrix [C].

[V] becomes a sparse matrix with few large coefficients. For example,

which represents the first 8x8 sub block of a grayscale test file. The actual transform file, V, is rounded to integer values and becomes

which contains a large number of zeroes, but retains almost all of the information in the original sub image [U]. Entropy analysis of the original matrix [U] reveals that a compression of 1.77 to 1 is possible using Huffman encoding. However, a compression of 25.6 to 1 is possible using Huffman encoding of the matrix [V].

This method is the basis of the Joint Photographic Expert Group (JPEG) algorithm in which images are divided into 8x8 sub blocks; each sub block is processed independently by a DCT algorithm, and then the resulting file is Huffman encoded.

A compression scheme using the DCT can be summarized in the following steps:

Divide the image into 8x8 blocks.
Perform the DCT on each 8x8 block.
Replace values close to zero with zero.
Huffman encode the resulting file.

Decompression involves the following:

Huffman decode the compressed file.
Perform the inverse DCT on each 8x8 block.

The entire test image was compressed at a rate of 25:1.

The JPEG algorithm is basically an 8x8 DCT with some enhancements. JPEG is generally considered feasible for compression of still images in the 10:1 to 25:1 range.

Another standard, Motion Picture Expert Group (MPEG), is used for motion video and exploits the images differences frame-by-frame. MPEG is considered feasible for the compression of real time video images at the rate of 50:1.

The JPEG Compression Algorithm

Michael Ellis

The Discrete Cosine Transform (DCT)

More articles by Michael Ellis

Others also viewed

🚀 The Future of LLMs: Unlocking Massive Context with Visual Compression

Why is Mamba creating waves? Is it a replacement for transformers?

How does Retrieval-Based Speculative Decoding Improve RAG Performance

Translate webtoon (Korean comic) using machine learning - Part 2

Paper Review: NeoBERT: A Next-Generation BERT

Today's best foundation models vs the NY Times Connections Puzzle. Who will win?

Leveraging VGG16 and Nearest Neighbors for Efficient Image Classification and Similarity Retrieval: A case study on Outdoor Place Recognition

Facial Recognition Using VGG16

BitNet: Scaling 1-bit Transformers

Zero to 3D: Building the Neural Code Graph in < 12 Hours

Explore content categories

The Discrete Cosine Transform (DCT)

More articles by Michael Ellis

Switched Capacitor Simulation

Edison versus Tesla: Who will win?

Fourier Transform leads to Bessel Function

The Transmultiplexer: Polyphase Revisited

Test your Bipolar Transistor Knowledge

Simulation Program with Integrated Circuit Emphasis (Spice Refresher)

Spread Spectrum Tutorial, FHSS and DSSS

Fiber Optic Transmission, Maximum Possible Speed

S-Parameter Conversion Tool

Spectrum Analyzer Tutorial

Others also viewed

🚀 The Future of LLMs: Unlocking Massive Context with Visual Compression

Why is Mamba creating waves? Is it a replacement for transformers?

How does Retrieval-Based Speculative Decoding Improve RAG Performance

Translate webtoon (Korean comic) using machine learning - Part 2

Paper Review: NeoBERT: A Next-Generation BERT

Today's best foundation models vs the NY Times Connections Puzzle. Who will win?

Leveraging VGG16 and Nearest Neighbors for Efficient Image Classification and Similarity Retrieval: A case study on Outdoor Place Recognition

Facial Recognition Using VGG16

BitNet: Scaling 1-bit Transformers

Zero to 3D: Building the Neural Code Graph in < 12 Hours

Explore content categories