메뉴 건너뛰기




Volumn 26, Issue 6, 2009, Pages 90-102

Discrete Fourier transform on multicore: A review of optimizations necessary for good multicore performance

Author keywords

Discrete Fourier transforms; Multicore processing; Optimization; Signal processing algorithms

Indexed keywords

COMPUTER GRAPHICS; GRAPHICS PROCESSING UNIT; MULTICORE PROGRAMMING; OPTIMIZATION; PROGRAM PROCESSORS; SIGNAL PROCESSING;

EID: 85032751664     PISSN: 10535888     EISSN: None     Source Type: Journal    
DOI: 10.1109/MSP.2009.934155     Document Type: Review
Times cited : (66)

References (51)
  • 3
    • 19344368072 scopus 로고    scopus 로고
    • M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, SPIRAL: Code generation for DSP transforms, Proc. IEEE (Special Issue on Program Generation, Optimization, and Adaptation), 93, no. 2, pp. 232-275, 2005.
    • M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, "SPIRAL: Code generation for DSP transforms," Proc. IEEE (Special Issue on Program Generation, Optimization, and Adaptation), vol. 93, no. 2, pp. 232-275, 2005.
  • 6
    • 20744449792 scopus 로고    scopus 로고
    • M. Frigo and S. G. Johnson, The design and implementation of FFTW3, Proc. IEEE (Special Issue on Program Generation, Optimization, and Adaptation), 93, no. 2, pp. 216-231, 2005.
    • M. Frigo and S. G. Johnson, "The design and implementation of FFTW3," Proc. IEEE (Special Issue on Program Generation, Optimization, and Adaptation), vol. 93, no. 2, pp. 216-231, 2005.
  • 8
    • 85032767349 scopus 로고    scopus 로고
    • M. Frigo and S. G. Johnson, FFTW 3.2 [Online]. Available: Www.fftw.org
    • M. Frigo and S. G. Johnson, FFTW 3.2 [Online]. Available: Www.fftw.org
  • 14
    • 0025600627 scopus 로고    scopus 로고
    • A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures
    • J. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri, "A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures," IEEE Trans. Circuits Syst., vol. 9, no. 4, pp. 449-500, 1990.
    • (1990) IEEE Trans. Circuits Syst , vol.9 , Issue.4 , pp. 449-500
    • Johnson, J.1    Johnson, R.W.2    Rodriguez, D.3    Tolimieri, R.4
  • 15
    • 0023347849 scopus 로고
    • Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared-memory architectures
    • A. Norton and A. J. Silberger, "Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared-memory architectures," IEEE Trans. Comput., vol. 36, no. 5, pp. 581-591, 1987.
    • (1987) IEEE Trans. Comput , vol.36 , Issue.5 , pp. 581-591
    • Norton, A.1    Silberger, A.J.2
  • 16
    • 0040546915 scopus 로고
    • Block algorithms for FFTs on vector and parallel computer
    • Amsterdam, The Netherlands: Elsevier
    • M. Hegland, "Block algorithms for FFTs on vector and parallel computer," in Parallel Computing: Trends and Applications. Amsterdam, The Netherlands: Elsevier, 1994, pp. 129-136.
    • (1994) Parallel Computing: Trends and Applications , pp. 129-136
    • Hegland, M.1
  • 17
    • 0025403252 scopus 로고
    • FFTs in external or hierarchical memory
    • Mar
    • D. H. Bailey, "FFTs in external or hierarchical memory," J. Supercomput., vol. 4, no. 1, pp. 23-35, Mar. 1990.
    • (1990) J. Supercomput , vol.4 , Issue.1 , pp. 23-35
    • Bailey, D.H.1
  • 19
    • 84968470212 scopus 로고
    • An algorithm for the machine calculation of complex Fourier series
    • Apr
    • J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comput., vol. 19, pp. 297-301, Apr. 1965.
    • (1965) Math. Comput , vol.19 , pp. 297-301
    • Cooley, J.W.1    Tukey, J.W.2
  • 20
    • 0001316941 scopus 로고
    • An adaptation of the fast Fourier transform for parallel processing
    • Apr
    • M. C. Pease, "An adaptation of the fast Fourier transform for parallel processing," J. ACM, vol. 15, no. 2, pp. 252-264, Apr. 1968.
    • (1968) J. ACM , vol.15 , Issue.2 , pp. 252-264
    • Pease, M.C.1
  • 21
    • 0023380592 scopus 로고
    • Multiprocessor FFTs
    • July
    • P. N. Schwarztrauber, "Multiprocessor FFTs," Parallel Comput., vol. 5, pp. 197-210, July 1987.
    • (1987) Parallel Comput , vol.5 , pp. 197-210
    • Schwarztrauber, P.N.1
  • 24
    • 0029771732 scopus 로고    scopus 로고
    • Automatic generation of prime length FFT programs
    • I. W. Selesnick and C. S. Burrus, "Automatic generation of prime length FFT programs," IEEE Trans. Signal Processing, vol. 44, no. 1, pp. 14-24, 1996.
    • (1996) IEEE Trans. Signal Processing , vol.44 , Issue.1 , pp. 14-24
    • Selesnick, I.W.1    Burrus, C.S.2
  • 26
    • 84949653778 scopus 로고    scopus 로고
    • Automatic performance tuning in the UHFFT library
    • Proc. Int. Conf. Computational Science ICCS, New York: Springer-Verlag
    • D. Mirković and S. L. Johnsson, "Automatic performance tuning in the UHFFT library," in Proc. Int. Conf. Computational Science (ICCS) (Lecture Notes in Computer Science, vol. 2073). New York: Springer-Verlag, 2001, pp. 71-80.
    • (2001) Lecture Notes in Computer Science , vol.2073 , pp. 71-80
    • Mirković, D.1    Johnsson, S.L.2
  • 28
    • 51049115051 scopus 로고    scopus 로고
    • Library generation for linear transforms,
    • Ph.D. dissertation, Elect. Comput. Eng, Carnegie Mellon Univ, Pittsburgh, PA
    • Y. Voronenko, "Library generation for linear transforms," Ph.D. dissertation, Elect. Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, 2008.
    • (2008)
    • Voronenko, Y.1
  • 29
    • 57049117343 scopus 로고    scopus 로고
    • How to write fast numerical code: A small introduction
    • Proc. Summer School on Generative and Transformational Techniques in Software Engineering GTTSE, Berlin: Springer-Verlag
    • S. Chellappa, F. Franchetti, and M. Püschel, "How to write fast numerical code: A small introduction," in Proc. Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE) (Lecture Notes in Computer Science, vol. 5235). Berlin: Springer-Verlag, 2008, pp. 196-259.
    • (2008) Lecture Notes in Computer Science , vol.5235 , pp. 196-259
    • Chellappa, S.1    Franchetti, F.2    Püschel, M.3
  • 30
    • 0025600627 scopus 로고    scopus 로고
    • J. R. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri, A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures, IEEE Trans. Circuits, Syst., Signal Processing, 9, no. 4, pp. 449-500, 1990.
    • J. R. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri, "A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures," IEEE Trans. Circuits, Syst., Signal Processing, vol. 9, no. 4, pp. 449-500, 1990.
  • 32
    • 85032767023 scopus 로고    scopus 로고
    • OpenMP, 1998, OpenMP C and C, application program interface, version 1.0 [Online, Available
    • OpenMP. (1998). OpenMP C and C++ application program interface, version 1.0 [Online]. Available: Www.openmp.org
  • 33
    • 33748881911 scopus 로고
    • Sebastopol, CA: O'Reilly
    • B. Gallmeister, POSIX.4. Sebastopol, CA: O'Reilly, 1994.
    • (1994) POSIX.4
    • Gallmeister, B.1
  • 34
    • 47249121824 scopus 로고    scopus 로고
    • Generating SIMD vectorized permutations
    • Proc. Int. Conf. Compiler Construction CC, Berlin: Springer-Verlag
    • F. Franchetti and M. Püschel, "Generating SIMD vectorized permutations," in Proc. Int. Conf. Compiler Construction (CC) (Lecture Notes in Computer Science, vol. 4959). Berlin: Springer-Verlag, 2008, pp. 116-131.
    • (2008) Lecture Notes in Computer Science , vol.4959 , pp. 116-131
    • Franchetti, F.1    Püschel, M.2
  • 41
    • 35948931417 scopus 로고    scopus 로고
    • Cache-efficient numerical algorithms using graphics hardware
    • N. K. Govindaraju and D. Manocha, "Cache-efficient numerical algorithms using graphics hardware," Parallel Comput., vol. 33, no. 10-11, pp. 663-684, 2007.
    • (2007) Parallel Comput , vol.33 , Issue.10-11 , pp. 663-684
    • Govindaraju, N.K.1    Manocha, D.2
  • 43
    • 84870629709 scopus 로고    scopus 로고
    • Nvidia Corp, Online, Available
    • Nvidia Corp., Nvidia CUDA [Online]. Available: Www.nvidia.com/cuda
    • Nvidia CUDA
  • 44
    • 85032772531 scopus 로고    scopus 로고
    • Khronos Group, Online, Available
    • Khronos Group, OpenCL [Online]. Available: Www.khronos.org/opencl/
    • OpenCL
  • 46
    • 85032777085 scopus 로고    scopus 로고
    • A. C. Chow, G. C. Fossum, and D. A. Brokenshire, A programming example: Large FFT on the cell broadband engine, IBM, Tech. Rep., May 2005 [Online]. Available: https://www-01.ibm.com/chips/techlib/ techlib.nsf/techdocs/0AA2394A505EF0FB872570AB005BF0F1/$file/ GSPx_FFT_paper_legal_0115.pdf
    • A. C. Chow, G. C. Fossum, and D. A. Brokenshire, "A programming example: Large FFT on the cell broadband engine," IBM, Tech. Rep., May 2005 [Online]. Available: https://www-01.ibm.com/chips/techlib/ techlib.nsf/techdocs/0AA2394A505EF0FB872570AB005BF0F1/$file/ GSPx_FFT_paper_legal_0115.pdf
  • 47
    • 49949095381 scopus 로고    scopus 로고
    • A parallel 64K complex FFT algorithm for the IBM/ Sony/Toshiba cell broadband engine processor
    • J. Greene and R. Cooper, "A parallel 64K complex FFT algorithm for the IBM/ Sony/Toshiba cell broadband engine processor," in Proc. Global Signal Processing Expo (GSPx), 2005.
    • (2005) Proc. Global Signal Processing Expo (GSPx)
    • Greene, J.1    Cooper, R.2
  • 48
    • 38349071299 scopus 로고    scopus 로고
    • Performance and programmability of the IBM/Sony/Toshiba cell broadband engine processor
    • L. Cico, R. Cooper, and J. Greene, "Performance and programmability of the IBM/Sony/Toshiba cell broadband engine processor," in Proc. (EDGE) Workshop, 2006.
    • (2006) Proc. (EDGE) Workshop
    • Cico, L.1    Cooper, R.2    Greene, J.3
  • 50
    • 85032774781 scopus 로고    scopus 로고
    • 4DSP Inc, 4DSP [Online, Available
    • 4DSP Inc., 4DSP [Online]. Available: Www.4dsp.com/fft.htm
  • 51
    • 85032753583 scopus 로고    scopus 로고
    • Dillon Engineering, Dillon FFT [Online]. Available: Www.dilloneng.com/ fft_ip
    • Dillon Engineering, Dillon FFT [Online]. Available: Www.dilloneng.com/ fft_ip


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.