Ayaz Ali, Lennart Johnsson, and Dragan Mirkovic (2007)
Empirical Auto-tuning Code Generator for FFT and Trigonometric Transforms
In: ODES: 5th Workshop on Optimizations for DSP and Embedded Systems, in conjunction with International Symposium on Code Generation and Optimization (CGO), San Jose, CA.
We present an automatic, empirically tuned code genenrator for Real/Complex FFT and Trigonometric Transforms. The code generator is part of an adaptive and portable FFT computation framework - UHFFT. Performance portability over varying architectures is achieved by generating highly optimized set of straight line C codelets (micro-kernel) that adapt to the microprocessor architecture. The tuning is performed by generating several variants of same size codelet with different combinations of optimization parameters. The variants are iteratively compiled and evaluated to find the best implementation on a given platform. Apart from minimizing the operation count, the code generator optimizes for access pattern, register blocking, instruction schedule and structure of arrays. We present details of the optimizations conducted at several stages and the performance gain at each of those levels. We conclude the paper with discussion of the overall performance improvements due to this aggressive approach to generating optimized FFT kernels.