DFT and FFT

Knut Inge Hvidsten

Published Aug 30, 2017

As a dsp-guy, the Fast Fourier Transform (FFT) and its lineage has been both a part of my education and day-to-day use at work. However, old knowledge tends to rust and new insights are (hopefully) added along the way. The text below is a second look, using a somewhat different presentation from what I was taught. The classical dsp-way to teach this material is using concepts like "Butterflies" and "Decimation-in-time", proofs and continous-domain heritage. I prefer a more casual digital circuit/programmer-friendly matrix-oriented exposition.

Primarily using MATLAB syntax for being more concrete than symbolic math while more compact than C. Python/Numpy users should feel right at home.

DFT

The DFT (Discrete Fourier Transform) is a linear, invertible and orthogonal transform, doing a (complex) matrix-vector multiplication of complexity N^2. For a small transform size of 8, we have:

N=8;
Wn=exp(-1j*2*pi/N);
f=0:N-1;
W=Wn.^(f'*f);

Let us define an input vector that oscillates between +1 and -1:

x_tmp = cos(2*pi*(0:0.5:(N-1)/2));

And do the transform:

X_TMP = x_tmp*W;

The input vector (lower left) is multiplied by each (complex) vertical column of W (upper right), and the accumulated sum for each (lower right) tells us something about how the input vector matches that particular "template". In this case we see that there is a strong correlation with the real part of template #5, a vector that (not surprisingly) fluctuates between +1 and -1.

The use of DFT (FFT) for more practical problems and frequency domain analysis is besides the point of this article.

The DFT transformation can be inverted by another matrix multiplication, now by the inverse DFT, which is found by flipping the sign of the exponent of the forward DFT. Let us confirm that the two operations in series are basically an identity matrix (the equivalent of "no op") to within reasonable numerical bounds:

assert(sum(sum((W*conj(W))/N-eye(N))) < 1e-7)

Let us also confirm that our DFT gives basically the same results as MATLABs builtin call to FFTW:

x = rand(N,1)+1j*rand(N,1);
X_ref = fft(x);
X = W*x;
assert(max(abs(X-X_ref)) < 1e-7)

FFT

The DFT has some nice properties on its own, but a large reason for its popularity is the discovery of "fast" algorithms in the 1960s. It turns out that Kronecker tensor products, "twiddle factors" and a permutation matrix where the index is changed according to the binary represantation backwards is both compact and gives an alternate perspective on how FFTs are computed as a set of sparse matrix operations.

The lines below are a simple Decimation in time (DIT) radix 2 forward FFT. While the steps may seem cumbersome, the structure can be reused for forward/inverse FFT, radix4/split-radix or decimation-in-frequency, while being a fair representation of how in-place FFTs tends to be implemented in the real world.

Define a random input vector

x = randn(1,N) + 1j*randn(1,N);

The ubiquitous "butterfly":

seed = [1  1; ...
        1 -1];

Copy the input to the buffer we are working on (we could work destructively on the input buffer if needed)

Y = x;

Do the first "stage". Notice how the 2x2 seed is "smeared" diagonally by the Kronecker product (to the left in the figure below) and how the first half or so twiddle factors are unity:

Gi = {kron(seed, eye(4))};
Gi2 = {Gi{ones(1,1)}};
Gi2 = blkdiag(Gi2{:});
Wi = [zeros(4, 1); (0:3)'/8];
Wi2 = exp(-1j*2*pi*repmat(Wi.', 1, 1));
Y = Y*Gi2*diag(Wi2);

Do the second stage. Notice that the seed is still smeared, but now confined to N/2 size blocks along the diagonal:

Gi = {kron(seed, eye(2))};
Gi2 = {Gi{ones(2,1)}};
Gi2 = blkdiag(Gi2{:});
Wi = [zeros(2, 1); (0:1)'/4];
Wi2 = exp(-1j*2*pi*repmat(Wi.', 1, 2));
Y = Y*Gi2*diag(Wi2);

Do the final stage:

Gi = {kron(seed, eye(1))};
Gi2 = {Gi{ones(4,1)}};
Gi2 = blkdiag(Gi2{:});
Wi = [zeros(1, 1); 0/2];
Wi2 = exp(-1j*2*pi*repmat(Wi.', 1, 4));
Y = Y*Gi2*diag(Wi2);

The output now is in "bit-reversed" order. We need to sort them according to their binary index flipped backwards. I.e. sorting according to:

1 (000)->(000) 1
2 (001)->(100) 5
3 (010)->(010) 3
4 (011)->(110) 7
...

If we want to use a sparse matrix multiplication for the re-ordering, this can be done by:

br = bin2dec(fliplr(dec2bin(0:N-1, log2(N)))) + 1;
B = zeros(N,N);
B(sub2ind([N N], 1:N, br.')) = 1;
Y = Y*B;

But it is simpler do index the vector directly.

Check results:

assert(max(abs(Y - fft(x))) < 1e-7)

If we want to do larger power-of-two FFT sizes, the pattern should be evident from the example above. The size of each sparse matrix multiplication increases according to N, and the number of stages increase according to log2(N), thus the big O complexity of NlogN that can be a considerable saving compared to the N^2 of the DFT.

C F Van Loan has a great book on this subject, "Computational Frameworks for the Fast Fourier Transform"

The power-of-two myth

I have noticed dsp people, practitioners and academics alike claiming that this or that operation must be power-of-two (POT) in order to take advantage of the speed of FFT solutions. This is false. FFT-type factorizations are possible for any DFT size, even prime sizes. POT does tend to be somewhat faster than non-POT and simpler to implement, though. If you are using an optimized library to calculate your FFT (and really, you should), this is taken care of in e.g. FFTW:

https://blogs.mathworks.com/steve/2014/04/07/timing-the-fft/

Walsh-Hadamard transform

The Walsh Hadamard transform is a real, binary relative of the DFT. It is most easily defined by a recursive expansion:

seed = [1 1; 1 -1];
N = 4; 
Ww = seed;
for stage = 2:log2(N)
    Ww = kron(Ww, seed);
end
Ww_ref = N*fwht(eye(N), N, 'hadamard');
assert(max(abs(Ww(:)-Ww_ref(:))) < 1e-7)
Ww =

     1     1     1     1
     1    -1     1    -1
     1     1    -1    -1
     1    -1    -1     1

Let us do the steps similar to the FFT above

N=8;
x = randn(1,N);
 
Y = x;
%% stage 1
Gi = {kron(seed, eye(4))};
Gi2 = {Gi{ones(1,1)}};
Gi2 = blkdiag(Gi2{:});
Y = Y*Gi2
%% stage 2
Gi = {kron(seed, eye(2))};
Gi2 = {Gi{ones(2,1)}};
Gi2 = blkdiag(Gi2{:});
Y = Y*Gi2
%% stage 3
Gi = {kron(seed, eye(1))};
Gi2 = {Gi{ones(4,1)}};
Gi2 = blkdiag(Gi2{:});
Y = Y*Gi2
 
Y_ref = N*fwht(x, N, 'hadamard');
assert((max(abs(Y_ref - Y)) < 1e-7))

We may also calculate the Walsh-Hadamard Transform via a multidimensional FFT:

N=16;
x = randn(1,N);
Y = fftn(reshape(x,2*ones(1, log2(N))));
Y_ref = N*fwht(x, N, 'hadamard')
assert(max(abs(Y(:)'-Y_ref)) < 1e-7)

To view or add a comment, sign in

DFT and FFT

Knut Inge Hvidsten

Walsh-Hadamard transform

More articles by Knut Inge Hvidsten

Others also viewed

Module 2: Core FDTD Implementation (Lessons 9-18)

Three days of symbolic-numeric computation vibe coding (Jan 2026)

GPU Image correlation

A fast quantum Interior Point algorithm for Linear Programming

Nuts and bolts of my first Mixed-Signal Integrated Circuit

HAckerzz Syllabus For CS Speed Run

Why Every Program Becomes a Polynomial

Implementing Continuous Tow Shearing Algorithm for Complex Composite Manufacturing

CAScad: an open source reactive computer algebra notebook in the browser

Explore content categories

Walsh-Hadamard transform

More articles by Knut Inge Hvidsten

Chess

Hello JPEG

Developing iPhone raw images using MATLAB

Visual intuition into the Z-transform

Angles, atan2 and taxicabs

Others also viewed

Module 2: Core FDTD Implementation (Lessons 9-18)

Three days of symbolic-numeric computation vibe coding (Jan 2026)

GPU Image correlation

A fast quantum Interior Point algorithm for Linear Programming

Nuts and bolts of my first Mixed-Signal Integrated Circuit

HAckerzz Syllabus For CS Speed Run

Why Every Program Becomes a Polynomial

Implementing Continuous Tow Shearing Algorithm for Complex Composite Manufacturing

CAScad: an open source reactive computer algebra notebook in the browser

Explore content categories