Wavelet in Audio & Computer Vision

Gabriel L.

Published Oct 7, 2020

In computer vision, there are two forms of Domain. The Spatial Domain and the Frequency Domain.

The Spatial Domain is used by all of us every day. When we increase the contrast, increase or decrease brighteners in our phone, TV or monitor we are doing a spatial transformation at a pixel level. A matrix (Figure 2 ) is a computer representation of an image. A filter or Kernel (Figure 3) passes over the array and execute an array operation. All those filterings used in Instagram or other Social Media Spatial Filtering at the Pixel Level are doing this type of operations.

Spatial Filtering or Spatial Transformation are those done at the pixel level. Is faster and intuitive and they are very much in use. Convolution neural networks or CNN a very popular Deep Learning Network used for image classification does a convolution operation between a filter/kernel and the array. It is used to extract features from the image. This is also done in the Spatial Domain. Here a graphical explanation.

That is why is call Convolution operation in a Convolutional Neural Network or CNN.

But this is article is about the Frequency Domain. So what is the Frequency Domain? The Frequency Domain is when an image is decomposed in its frequency's. We do this, for example, with the Fourier Transform which is used to decompose an image into its sine and cosine components. The output of the transformation represents the image in the Frequency Domain. In the Fourier transformed image, each point represents a particular frequency contained in the spatial domain. The Fourier Transform is used in a wide range of applications, such as image analysis, image filtering, image reconstruction and image compression.

Here below an original image get decomposed in the Frequency Domain with the Fourier Transform

The advantages of the Fourier Transform (FT) is that the images are decomposed in its sinusoidal components it easy to examine or processes certain frequencies that are geometrical structure in the spatial domain.

Then can be processed back to its original form with the Inverse Fourier Transform.

In the title of this article, we mention the word Wavelet which is a “new” form of transforming an image from the Spatial domain into the Frequency Domain.

I want to make an example to picture the difference between FT and Wavelets.

Let’s assume that we want to decompose the picture of a traffic light.

The traffic light has 3 colour red, yellow and green as we all know.

The Fourier Transform will provide the exact frequency for those 3 colours. But with one big catch. No temporal or spatial information. They can be red, green and yellow at the same time the FT will be exactly the same.

Because of the Principle of Indertermination of Heisenberg is not possible to know location and time at the same time or in this case frequency and temporal/spatial information. This posses a very big problem for a certain type of problems.

Let’s see the code for Fourier Transform showing the lack of information about which location in time.

t_n = 1
N = 100000
T = t_n / N
f_s = 1/T

xa = np.linspace(0, t_n, num=N)
xb = np.linspace(0, t_n/4, num=N/4)

frequencies = [4, 30, 60, 90]

y1a, y1b = np.sin(2*np.pi*frequencies[0]*xa), np.sin(2*np.pi*frequencies[0]*xb)

y2a, y2b = np.sin(2*np.pi*frequencies[1]*xa), np.sin(2*np.pi*frequencies[1]*xb)

y3a, y3b = np.sin(2*np.pi*frequencies[2]*xa), np.sin(2*np.pi*frequencies[2]*xb)

y4a, y4b = np.sin(2*np.pi*frequencies[3]*xa), np.sin(2*np.pi*frequencies[3]*xb)

composite_signal1 = y1a + y2a + y3a + y4acomposite_signal2 = np.concatenate([y1b, y2b, y3b, y4b])

f_values1, fft_values1 = get_fft_values(composite_signal1, T, N, f_s)
f_values2, fft_values2 = get_fft_values(composite_signal2, T, N, f_s)

fig, axarr = plt.subplots(nrows=2, ncols=2, figsize=(12,8))

axarr[0,0].plot(xa, composite_signal1)

axarr[1,0].plot(xa, composite_signal2)

axarr[0,1].plot(f_values1, fft_values1)

axarr[1,1].plot(f_values2, fft_values2)



plt.tight_layout()

plt.show()

This is when Wavelet is different.

The Wavelet transform will sacrifice information on the frequency side but will gain information on the temporal side. And this is a great advantage in a certain situation where temporal is important.

But what is a wavelet?

A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases, and then decreases back to zero. It can typically be visualized as a “brief oscillation” like one recorded by heart monitor. Here below a wavelet and original image and then process with Wavelets

Wavelets are the basic functions of a relatively new type of mathematical transform (since the early 1980s), that occupies a space somewhere between the spatial domain of image pixels, and the frequency (Fourier) domain of spatial frequency components.

Instead of using pure cosine and sine waves, as Fourier does, wavelet functions are scaled and shifted versions of a common mother wavelet or Family of wavelet shape. In that sense, they are simpler than Fourier functions. Usually, wavelets are scaled in generations: each parent having 4 children in 2-D, and each child is half the size (in length) of its parent. Each child then has 4 grandchildren etc., for typically 4 to 8 generations.

The simplest Wavelet is the Haar Transform.

In order words, Discrete Wavelet Transform or DWT and Continuous Wavelet Transform are a mathematical means for performing signal analysis when signal frequency varies over time. For certain classes of signals and images, wavelet analysis provides more precise information about signal data than other signal analysis techniques. It is also useful to retrieve weak signals from noise in the image. ( I will make another article where will talk about Noise in an Image).

The Inverse Discrete Wavelet Transform like the Inverse Discrete Fourier Transform build and Image from of sum of a large number of Wavelets function back to the Spatial Domain.

If you interested in this you can take a look at all type of wavelet at

http://wavelets.pybytes.com/

This is the library in python call PyWavelets that has all the best Wavelet Transform for Python ready to use in Machine Learning.

import pywt #PyWavelets library in Python

print(pywt.families(short=False))
['Haar', 'Daubechies', 'Symlets', 'Coiflets', 'Biorthogonal', 'Reverse biorthogonal','Discrete Meyer (FIR Approximation)', 'Gaussian', 'Mexican hat wavelet', 'Morlet wavelet','Complex Gaussian wavelets', 'Shannon wavelets', 'Frequency B-Spline wavelets', 'Complex Morlet wavelets']

This is the image of the Discrete Wavelet Transform when receiving an input of a signal.

a) A tree of real Filters for the DWT and b) Reconstruction filter block for 2 bands at the time using the Inverse Transform

A CNN WITH WAVELET ( For Sound/Signal Classification)

Here below the image of a scalogram. Such a scaleogram gives us detailed information about the state-space of the system, i.e. it gives us information about the dynamic behaviour of the system.

A scaleogram can not only be used to better understand the dynamical behaviour of a system, but it can also be used to distinguish different types of signals produced by a system from each other.

If you record a signal while you are walking up the stairs or down the stairs, the scaleograms will look different. ECG measurements of people with a healthy heart will have different scaleograms than ECG measurements of people with arrhythmia. Or measurements on a bearing, motor, rotor, ventilator, etc when it is faulty vs when it not faulty. The possibilities are limitless!

A CNN WITH WAVELET ( For image Classification)

I will base this part of my article on the Paper Advanced Image Classification using Wavelets and Convolutional Neural Networks by Trevis et al. 2016

They propose to convert the images into the Wavelet Domain, where they can be processed at a lower dimension, with faster processing times. Furthermore, given the varying frequencies represented in each subband, multiple CNNs performed on each subband, or a combination of them can increase the accuracy of the classification.

Let’s see the process Step by Step:

1)Convert the raw images into the wavelet domain.

2) Perform Z-score normalization on subbands

Z = M - mean(M) /std(M)

where M is the input and mean and std represent the 2D mean and standard deviation of the input.

3) Normalize all subband Except for the LL band.

4) Perform CNN on selected subbands.

5) Combine all results using OR operator to get the final classification.

The application of the wavelet subbands is presented in two different ways. The first way (hereafter called CNN-WAV2) fuses the detail coefficients (LH, HL, HH) together prior to processing the images according to this formula:

HF = α * LH +β * HL + γ *HH

where α, β and γ and are the weight parameters of each subband, whose values are determined below by the TA ( Test Accuracy ) accuracy for each individual after CNN Processing as follow:

All this is shown in this Figure CNN-WAV2:

The second way is a CNN-WAVE4 where each subband has it own CNN and is represented in the figure below

Conclusion

Wavelets are a very powerful tool that can be used to extract feature of the like of edges but it can also be used in deep learning for image classification and signal classification.

I hope you enjoy this article. See more articles at https://medium.com/@gabriel_66675/

To view or add a comment, sign in

Wavelet in Audio & Computer Vision

Gabriel L.

More articles by Gabriel L.

Others also viewed

Understanding the Groundbreaking 'Attention Is All You Need' Research Paper

Artificial Intelligence and Applications: Neural Style Transfer

Spiky Panda

Autoencoder for Data Compression, Denoising, and Anomaly Detection

U-Net: A Convolutional Neural Network (CNN) Model, Not a Transformer

How do we get Images from Text? - Understand the GAN behind it!

How Computers See Images: Turning Colors into Numbers

How Transformers Work: A Technical Dive into Self-Attention

simple parallelism in neural networks

Explore content categories