Efficent Multithreaded Algorithms for the Fast Fourier Transform


Parimala Thulasiraman
Kevin Theobald
Ashfaq A. Khokhar
Guang R. Gao


In this paper we present fine-grained multithreaded algorithms and implementations for the Fast Fourier Transform (FFT) problem. The FFT problem has been formulated using two distinct approaches based on the data
ow concepts. The first approach, referred to as the receiver-initiated algorithm, realizes the FFT iterations as a parent-child relationship while fully exploiting the underlying parallelism. By establishing small sized threads, minimal cost overhead in thread switching and embedding the synchronization operations within a parent-child relationship, the receiver-initiated algorithm is well suited for fine-grained architectures. The second approach, referred to as the sender-initiated algorithm, follows a data ow model based on the producer-consumer style of programming and can be adopted to different architectural parameters for achieving high performance. We show in the paper the advantages and disadvantages of these approaches on fine-grained architectures. The implementations of the proposed algorithms have been carried out on the EARTH (Effcient Architecture for Running THreads) platform which is a non-preemptive, fine-grained data ow model.

For both the algorithms, we analyze the number of remote and local threads in each processor and study its impact on the experimental results. Our implementation results show that for large number of processors, both the algorithms perform well, yielding execution times of only 10 ms for an input of 16 K data points on a 64 processor machine, assuming each processor running at 140 MHz clock speed. For certain block sizes on fixed problem size and machine size, the receiver-initiated approach performs better than the sender-initiated approach. We also present the algorithms implication on the design of fine-grained architectures.


Special Issue