SCOPUS 정보 검색 플랫폼

2012 Innovative Parallel Computing, InPar 2012

Volumn , Issue , 2012, Pages

Optimized strategies for mapping three-dimensional FFTs onto CUDA GPUs

(2) Wu, Jing a JaJa, Joseph a

a UNIVERSITY OF MARYLAND (United States)

Author keywords

Fast Fourier Transform; GPU; Multi threaded Algorithms; Scientific Computing

Indexed keywords

ASSOCIATIVITY; DATA MOVEMENTS; DATA SETS; GPU; GRAPHICS PROCESSING UNITS; MEMORY ACCESS; MEMORY HIERARCHY; MULTI-THREADED ALGORITHMS; MULTI-THREADING; MULTIPLE LEVELS; MULTITHREADED; SHARED MEMORIES;

ALGORITHMS; DATA TRANSFER; FAST FOURIER TRANSFORMS; MEMORY ARCHITECTURE; NATURAL SCIENCES COMPUTING; OPTIMIZATION; PARALLEL ARCHITECTURES; PROGRAM PROCESSORS; SIGNAL RECEIVERS; THREE DIMENSIONAL;

THREE DIMENSIONAL COMPUTER GRAPHICS;

EID: 84870704125 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/InPar.2012.6339608 Document Type: Conference Paper

Times cited : (9)

References (22)

1
- 84968470212
- An algorithm for the machine calculation of complex fourier series
- J. Cooley and J. Tukey. An algorithm for the machine calculation of complex fourier series. Mathematics of Computation, 19(90):297-301, 1965.
- (1965) Mathematics of Computation , vol.19 , Issue.90 , pp. 297-301
- Cooley, J.¹ Tukey, J.²

2
- 79952782168
- Auto-tuning of fast fourier transform on graphics processors
- New York, NY, USA, ACM
- Y. Dotsenko, S. Baghsorkhi, B. Lloyd, and N. Govindaraju. Auto-tuning of fast fourier transform on graphics processors. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP '11, pages 257-266, New York, NY, USA, 2011. ACM.
- (2011) Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11 , pp. 257-266
- Dotsenko, Y.¹ Baghsorkhi, S.² Lloyd, B.³ Govindaraju, N.⁴

3
- 20744449792
- The design and implementation of fftw3
- M. Frigo, Steven, and G. Johnson. The design and implementation of fftw3. In Proceedings of the IEEE, pages 216-231, 2005.
- (2005) Proceedings of the IEEE , pp. 216-231
- Frigo, M.¹ Steven² Johnson, G.³

4
- 0027642189
- Rotating a three-dimensional array in an optimal position for vector processing: Case study for a three-dimensional fast fourier transform
- Aug.
- S. Goedecker. Rotating a three-dimensional array in an optimal position for vector processing: case study for a three-dimensional fast fourier transform. Computer Physics Communications, 76:294-300, Aug. 1993.
- (1993) Computer Physics Communications , vol.76 , pp. 294-300
- Goedecker, S.¹

5
- 34548292052
- A memory model for scientific algorithms on graphics processors
- ACM
- N. K. Govindaraju, S. Larsen, J. Gray, and D. Manocha. A memory model for scientific algorithms on graphics processors. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, SC '06, New York, NY, USA, 2006. ACM.
- (2006) Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC '06, New York, NY, USA
- Govindaraju, N.K.¹ Larsen, S.² Gray, J.³ Manocha, D.⁴

6
- 70350754502
- High performance discrete fourier transforms on graphics processors
- Piscataway, NJ, USA, IEEE Press
- N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, and J. Manferdelli. High performance discrete fourier transforms on graphics processors. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 2:1-2:12, Piscataway, NJ, USA, 2008. IEEE Press.
- (2008) Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC '08
- Govindaraju, N.K.¹ Lloyd, B.² Dotsenko, Y.³ Smith, B.⁴ Manferdelli, J.⁵

7
- 77954713684
- An empirically tuned 2d and 3d fft library on cuda gpu
- New York, NY, USA, ACM
- L. Gu, X. Li, and J. Siegel. An empirically tuned 2d and 3d fft library on cuda gpu. In Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, pages 305-314, New York, NY, USA, 2010. ACM.
- (2010) Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10 , pp. 305-314
- Gu, L.¹ Li, X.² Siegel, J.³

8
- 84966205227
- Computing the Fast Fourier Transform on a Vector Computer
- D. G. Korn and J. J. Lambiotte. Computing the Fast Fourier Transform on a Vector Computer. Mathematics of Computation, 33:977-992, 1979.
- (1979) Mathematics of Computation , vol.33 , pp. 977-992
- Korn, D.G.¹ Lambiotte, J.J.²

9
- 35048828869
- The fft on a gpu
- Aire-la-Ville, Switzerland, Switzerland, Eurographics Association
- K. Moreland and E. Angel. The fft on a gpu. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, HWWS '03, pages 112-119, Aire-la-Ville, Switzerland, Switzerland, 2003. Eurographics Association.
- (2003) Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, HWWS '03 , pp. 112-119
- Moreland, K.¹ Angel, E.²

10
- 0021470572
- Fft algorithms for vector computers
- P. N. and Swarztrauber. Fft algorithms for vector computers. Parallel Computing, 1(1):45-3, 1984.
- (1984) Parallel Computing , vol.1 , Issue.1 , pp. 45-53
- Swarztrauber, P.N.¹

11
- 84870710877
- Nukada. website
- Nukada. Nukada FFT Library website. http://matsu-www.is.titech.ac.jp/ nukada/nufft/, 2011.
- (2011) Nukada FFT Library

12
- 74049114159
- Auto-tuning 3-d fft library for cuda gpus
- New York, NY, USA, ACM
- A. Nukada and S. Matsuoka. Auto-tuning 3-d fft library for cuda gpus. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 30:1-30:10, New York, NY, USA, 2009. ACM.
- (2009) Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09
- Nukada, A.¹ Matsuoka, S.²

13
- 70350759823
- Bandwidth intensive 3-d fft kernel for gpus using cuda
- Piscataway, NJ, USA, IEEE Press
- A. Nukada, Y. Ogata, T. Endo, and S. Matsuoka. Bandwidth intensive 3-d fft kernel for gpus using cuda. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 5:1-5:11, Piscataway, NJ, USA, 2008. IEEE Press.
- (2008) Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC '08
- Nukada, A.¹ Ogata, Y.² Endo, T.³ Matsuoka, S.⁴

14
- 84870653313
- NVIDIA Corporation
- NVIDIA Corporation. CUDA and Fermi Update, 2010.
- (2010) CUDA and Fermi Update

15
- 79551704836
- NVIDIA Corporation
- NVIDIA Corporation. NVIDIA CUDA C programming best practices guide, 2011.
- (2011) NVIDIA CUDA C Programming Best Practices Guide

16
- 79551704836
- NVIDIA Corporation
- NVIDIA Corporation. NVIDIA CUDA C programming guide, 2011.
- (2011) NVIDIA CUDA C Programming Guide

17
- 84863501135
- NVIDIA Corporation
- NVIDIA Corporation. NVIDIA CUDA cufft library, 2011.
- (2011) NVIDIA CUDA Cufft Library

18
- 51049119174
- An efficient, model-based cpu-gpu heterogeneous fft library
- april
- Y. Ogata, T. Endo, N. Maruyama, and S. Matsuoka. An efficient, model-based cpu-gpu heterogeneous fft library. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1-10, april 2008.
- (2008) In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on , pp. 1-10
- Ogata, Y.¹ Endo, T.² Maruyama, N.³ Matsuoka, S.⁴

19
- 77952265152
- Ruetsh, Greg and Micikevicius, Paulius. Optimizing Matrix Transpose in CUDA, 2011.
- (2011) Optimizing Matrix Transpose in CUDA
- Ruetsh, G.¹ Micikevicius, P.²

20
- 0003417587
- Society for Industrial and Applied Mathematics, Philadelphia, PA, USA
- C. Van Loan. Computational frameworks for the fast Fourier transform. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1992.
- (1992) Computational Frameworks for the Fast Fourier Transform
- Van Loan, C.¹

21
- 84863942362
- V. Volkov. Better Performance at Lower Occupancy, 2010.
- (2010) Better Performance at Lower Occupancy
- Volkov, V.¹

22
- 76749123978
- Complexity effective memory access scheduling for many-core accelerator architectures
- New York, NY, USA, ACM
- G. L. Yuan, A. Bakhoda, and T. M. Aamodt. Complexity effective memory access scheduling for many-core accelerator architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pages 34-44, New York, NY, USA, 2009. ACM.
- (2009) Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42 , pp. 34-44
- Yuan, G.L.¹ Bakhoda, A.² Aamodt, T.M.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.