350 rub
Journal Achievements of Modern Radioelectronics №12 for 2010 г.
Article in number:
Estimation of Performance for Parallel Computing Fast Fourier Transform
Authors:
A.S. Sidorenko, P.V. Goryaev
Abstract:
The performance estimation of parallel computing fast Fourier transform on several processors is performed in this paper. Two conditions are stated below, the compliance to which give us the maximum performance for parallel computing:
1. If there are n processor cores and n-1 cores do computations within the interval less than T1, and at least one core requires the interval T1 then the total required time will be equal to T1.
2. It is desirable, where possible, to reduce data exchange between processors.
Based on the fact that FFT is only a linear combination of the same operations, like «butterfly», different variants of «the logical» distribution of computations on several processor cores were proposed.
In analyzing proposed variants it was determined that in addition to the consumption of the processor time directly for the calculations also quite a lot of time is spent on data exchange between processors. Architecture features of signal processors allow calculations and data exchange through the ports to be performed simultaneously. It is known that the operations of data exchange are much slower as compared with the calculations. Therefore conditions were established under which the time required for data exchange will not exceed the computing time. In this case, you can ensure the ab-sence of idle processors.
This problem is not new. However, in the existing work (the reference is provided in the article) the estimation of the time spent on data exchange just is not given and the scheme is architecturally rigid, so its extension by more than 4 processors has considerable difficulties.
Later the formula for the number of levels of the FFT, which must be calculated on a single processor in order to avoid the time consumption on data exchange, is given. The table shows the calculations for different sets of input data. Finally, it was concluded that the parallelization of the FFT in this manner gives a performance gain only under certain conditions. The most critical of them are the number and capacity of link ports and the core frequency multipliers and periphery ratio. Otherwise, there are noticeable time overheads for data exchange. Additional memory to store downloaded data, is also needed as it is necessary to keep simultaneously both the input and output data sets.
Pages: 54-58
References
- Нагулин Н. Е., Нагулин С. Н. Алгоритм параллельного вычисления быстрого преобразования Фурье для транспьютероподобных сигнальных процессоров // Цифровая обработка сигналов. 2007. № 4.
- Гоулд Б., Рабинер Л. Теория и применение цифровой обработки сигналов / под ред. Ю.И. Александрова. М.: Мир. 1978.
- Процессор OMAP3525. URL: http://focus.ti.com/docs/prod/folders/print/omap3525.html (дата обращения 02.05.2010).
- Процессор «Мультикор». URL: http://multicore.ru/index.php-id=601 (дата обращения 02.05.2010).
- Процессор SHARC. URL: http://www.analog.com/en/embedded-processing-dsp/sharc/adsp-21489/processors/product.html (дата обращения 02.05.2010)