Personal tools
You are here: Home Publications Adaptive Computation of Self Sorting Inplace FFTs on Hierarchical Memory Architectures
Document Actions

Ayaz Ali, Lennart Johnsson, and Jaspal Subhlok (2007)

Adaptive Computation of Self Sorting Inplace FFTs on Hierarchical Memory Architectures

In: High Performance Computation Conference, Houston, TX.

Since the publication of the famous Cooley Tukey algorithm for the Fast Fourier Transform (FFT), many algorithms have been introduced for computing DFTs efficiently. While some of the algorithms had lower operation count for certain transform sizes others were just the rescheduling (implementations) adapted to target architectures through optimization of memory access pattern. A naive implementation of the recursive Mixed Radix algorithm would require an additional sorting step to bring the output back "in order". Various algorithms have been suggested that perform this sorting (digit-reversal) inside the FFT "butterfly" using the output vector. Performing the computation "in place", i.e. without a separate output vector or temporary array, can be useful when the input vector needs to be overwritten with the result. It also makes the computation of larger size transforms feasible when the memory is limited. But computing "in-place and in-order" FFT poses a very difficult problem on hierarchical memory architectures where data movement can seriously degrade the performance. In this paper we present the recursive formulation of self sorting in-place FFT algorithm that adapts to the target architecture. When an efficient in-place, in-order execution is not possible, we show how efficient schedules can be constructed that use minimum work-space to perform the computation. To express and construct the FFT schedules, we present the Context Free Grammar that generates a language called FFT Schedule Specification Language. We conclude by comparing the performance of our in-place in-order FFT implementation with that of other well known FFT libraries. We also present a performance analysis of the performance difference between the cache oblivious execution of out-of-place and in-place FFTs

by Charles Koelbel last modified 2008-05-09 05:50
« September 2010 »
Su Mo Tu We Th Fr Sa

VGrADS Collaborators include:


Powered by Plone