메뉴 건너뛰기




Volumn , Issue , 2012, Pages

Scalable multi-GPU 3-D FFT for TSUBAME 2.0 supercomputer

Author keywords

[No Author keywords available]

Indexed keywords

ALL-TO-ALL COMMUNICATION; AUTOTUNING; DOUBLE PRECISION; INFINIBAND; INTER-NODE COMMUNICATION; MPI LIBRARIES; MULTI-GPU; MULTIPLE GPUS;

EID: 84877706293     PISSN: 21674329     EISSN: 21674337     Source Type: Conference Proceeding    
DOI: 10.1109/SC.2012.100     Document Type: Conference Paper
Times cited : (31)

References (27)
  • 1
    • 84968470212 scopus 로고
    • An Algorithm for the Machine Calculation of Complex Fourier Series
    • J. W. Cooley and J. W. Tukey, "An Algorithm for the Machine Calculation of Complex Fourier Series," Math. Comput., vol. Vol. 19, pp. 297-301, 1965.
    • (1965) Math. Comput. , vol.19 , pp. 297-301
    • Cooley, J.W.1    Tukey, J.W.2
  • 3
    • 85117201228 scopus 로고    scopus 로고
    • 16.4-Tflops direct numerical simulation of turbulence by a Fourier spectral method on the Earth Simulator
    • Proceedings of the 2002 ACM/IEEE conference on Supercomputing, ser. Los Alamitos, CA, USA: IEEE Computer Society Press, [Online]. Available
    • M. Yokokawa, K. Itakura, A. Uno, T. Ishihara, and Y. Kaneda, "16.4-Tflops direct numerical simulation of turbulence by a Fourier spectral method on the Earth Simulator," in Proceedings of the 2002 ACM/IEEE conference on Supercomputing, ser. Supercomputing '02. Los Alamitos, CA, USA: IEEE Computer Society Press, 2002, pp. 1-17. [Online]. Available: http://dl.acm.org/citation.cfm?id=762761.762808
    • (2002) Supercomputing '02 , pp. 1-17
    • Yokokawa, M.1    Itakura, K.2    Uno, A.3    Ishihara, T.4    Kaneda, Y.5
  • 4
    • 0038526303 scopus 로고    scopus 로고
    • ZDOCK: An initial-stage protein-docking algorithm
    • [Online]. Available
    • R. Chen, L. Li, and Z. Weng, "ZDOCK: an initial-stage protein-docking algorithm." Proteins, vol. 52, no. 1, pp. 80-87, 2003. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/12784371
    • (2003) Proteins , vol.52 , Issue.1 , pp. 80-87
    • Chen, R.1    Li, L.2    Weng, Z.3
  • 5
    • 33749020839 scopus 로고    scopus 로고
    • PIPER: An FFT-based protein docking program with pairwise potentials
    • DOI 10.1002/prot.21117
    • D. Kozakov, R. Brenke, S. R. Comeau, and S. Vajda, "PIPER: An FFT-based protein docking program with pairwise potentials," Proteins: Structure, Function, and Bioinformatics, vol. 65, no. 2, pp. 392-406, 2006. (Pubitemid 44454112)
    • (2006) Proteins: Structure, Function and Genetics , vol.65 , Issue.2 , pp. 392-406
    • Kozakov, D.1    Brenke, R.2    Comeau, S.R.3    Vajda, S.4
  • 9
    • 44849137198 scopus 로고    scopus 로고
    • NVIDIA Tesla: A unified graphics and computing architecture
    • DOI 10.1109/MM.2008.31
    • E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, no. 2, pp. 39-55, 2008. (Pubitemid 351796170)
    • (2008) IEEE Micro , vol.28 , Issue.2 , pp. 39-55
    • Lindholm, E.1    Nickolls, J.2    Oberman, S.3    Montrym, J.4
  • 13
    • 74049114159 scopus 로고    scopus 로고
    • Auto-tuning 3-D FFT library for CUDA GPUs
    • Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, ser. New York, NY, USA: ACM, [Online]. Available
    • A. Nukada and S. Matsuoka, "Auto-tuning 3-D FFT library for CUDA GPUs," in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, ser. SC '09. New York, NY, USA: ACM, 2009, pp. 30:1-30:10. [Online]. Available: http://doi.acm.org/10.1145/1654059.1654090
    • (2009) SC '09
    • Nukada, A.1    Matsuoka, S.2
  • 14
    • 79952782168 scopus 로고    scopus 로고
    • Auto-tuning of fast Fourier transform on graphics processors
    • Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, ser. New York, NY, USA: ACM, [Online]. Available
    • Y. Dotsenko, S. S. Baghsorkhi, B. Lloyd, and N. K. Govindaraju, "Auto-tuning of fast Fourier transform on graphics processors," in Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, ser. PPoPP '11. New York, NY, USA: ACM, 2011, pp. 257-266. [Online]. Available: http://doi.acm.org/10.1145/1941553.1941589
    • (2011) PPoPP '11 , pp. 257-266
    • Dotsenko, Y.1    Baghsorkhi, S.S.2    Lloyd, B.3    Govindaraju, N.K.4
  • 15
    • 84858781600 scopus 로고    scopus 로고
    • High performance 3-D FFT using multiple CUDA GPUs
    • Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, ser. New York, NY, USA: ACM, [Online]. Available
    • A. Nukada, Y. Maruyama, and S. Matsuoka, "High performance 3-D FFT using multiple CUDA GPUs," in Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, ser. GPGPU-5. New York, NY, USA: ACM, 2012, pp. 57-63. [Online]. Available: http://doi.acm.org/10. 1145/2159430.2159437
    • (2012) GPGPU-5 , pp. 57-63
    • Nukada, A.1    Maruyama, Y.2    Matsuoka, S.3
  • 16
    • 0030285174 scopus 로고    scopus 로고
    • Implementation of parallel FFT algorithms on distributed memory machines with a minimum overhead of communication
    • Nov. [Online]. Available
    • C. Calvin, "Implementation of parallel FFT algorithms on distributed memory machines with a minimum overhead of communication," Parallel Comput., vol. 22, no. 9, pp. 1255-1279, Nov. 1996. [Online]. Available: http://dx.doi.org/10.1016/S0167-8191(96)00039-7
    • (1996) Parallel Comput. , vol.22 , Issue.9 , pp. 1255-1279
    • Calvin, C.1
  • 17
    • 19344375178 scopus 로고    scopus 로고
    • The development and integration of a distributed 3D FFT for a cluster of workstations
    • Proceedings of the 4th annual Linux Showcase & Conference - Volume 4, ser. Berkeley, CA, USA: USENIX Association, [Online]. Available
    • C. E. Cramer and J. A. Board, "The development and integration of a distributed 3D FFT for a cluster of workstations," in Proceedings of the 4th annual Linux Showcase & Conference - Volume 4, ser. ALS'00. Berkeley, CA, USA: USENIX Association, 2000, pp. 26-26. [Online]. Available: http://dl.acm.org/citation.cfm?id=1268379.1268405
    • (2000) ALS'00 , pp. 26-26
    • Cramer, C.E.1    Board, J.A.2
  • 18
    • 19344378421 scopus 로고    scopus 로고
    • Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: Implementation and early performance measurements
    • Mar. [Online]. Available
    • M. Eleftheriou, B. G. Fitch, A. Rayshubskiy, T. J. C. Ward, and R. S. Germain, "Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements," IBM J. Res. Dev., vol. 49, no. 2, pp. 457-464, Mar. 2005. [Online]. Available: http://dx.doi.org/10. 1147/rd.492.0457
    • (2005) IBM J. Res. Dev. , vol.49 , Issue.2 , pp. 457-464
    • Eleftheriou, M.1    Fitch, B.G.2    Rayshubskiy, A.3    Ward, T.J.C.4    Germain, R.S.5
  • 19
    • 78650819877 scopus 로고    scopus 로고
    • Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers
    • Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. Washington, DC, USA: IEEE Computer Society, [Online]. Available
    • J. Doi and Y. Negishi, "Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1-9. [Online]. Available: http://dx.doi.org/10.1109/SC.2010.38
    • (2010) SC '10 , pp. 1-9
    • Doi, J.1    Negishi, Y.2
  • 20
    • 77955106795 scopus 로고    scopus 로고
    • An implementation of parallel 3-D FFT with 2-D decomposition on a massively parallel cluster of multi-core processors
    • Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I, ser. Berlin, Heidelberg: Springer-Verlag, [Online]. Available
    • D. Takahashi, "An implementation of parallel 3-D FFT with 2-D decomposition on a massively parallel cluster of multi-core processors," in Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I, ser. PPAM'09. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 606-614. [Online]. Available: http://dl.acm.org/citation.cfm?id= 1882792.1882864
    • (2010) PPAM'09 , pp. 606-614
    • Takahashi, D.1
  • 21
    • 33847103649 scopus 로고    scopus 로고
    • Optimizing bandwidth limited problems using one-sided communication and overlap
    • Proceedings of the 20th international conference on Parallel and distributed processing, ser. Washington, DC, USA: IEEE Computer Society, [Online]. Available
    • C. Bell, D. Bonachea, R. Nishtala, and K. Yelick, "Optimizing bandwidth limited problems using one-sided communication and overlap," in Proceedings of the 20th international conference on Parallel and distributed processing, ser. IPDPS'06. Washington, DC, USA: IEEE Computer Society, 2006, pp. 84-84. [Online]. Available: http://dl.acm.org/citation.cfm?id=1898953.1899016
    • (2006) IPDPS'06 , pp. 84-84
    • Bell, C.1    Bonachea, D.2    Nishtala, R.3    Yelick, K.4
  • 23
    • 69349089776 scopus 로고    scopus 로고
    • A New Approach for Investigating the Molecular Recognition of Protein: Toward Structure-Based Drug Design Based on the 3D-RISM Theory
    • T. Imai, A. Kovalenko, F. Hirata, and A. Kidera, "A New Approach for Investigating the Molecular Recognition of Protein: Toward Structure-Based Drug Design Based on the 3D-RISM Theory," J. Am. Chem. Soc., vol. 131, pp. 12 430-12 440, 2009.
    • (2009) J. Am. Chem. Soc. , vol.131 , pp. 12430-12440
    • Imai, T.1    Kovalenko, A.2    Hirata, F.3    Kidera, A.4
  • 24
    • 80755185292 scopus 로고    scopus 로고
    • A New Approach for Investigating the Molecular Recognition of Protein: Toward Structure-Based Drug Design Based on the 3D-RISM Theory
    • Y. Kiyota, N. Yoshida, and F. Hirata, "A New Approach for Investigating the Molecular Recognition of Protein: Toward Structure-Based Drug Design Based on the 3D-RISM Theory," J. Comp. Theo. Chem., vol. 7, pp. 3803-3815, 2011.
    • (2011) J. Comp. Theo. Chem. , vol.7 , pp. 3803-3815
    • Kiyota, Y.1    Yoshida, N.2    Hirata, F.3
  • 26
    • 77954741573 scopus 로고    scopus 로고
    • Large-scale FFT on GPU clusters
    • Proceedings of the 24th ACM International Conference on Supercomputing, ser. New York, NY, USA: ACM, [Online]. Available
    • Y. Chen, X. Cui, and H. Mei, "Large-scale FFT on GPU clusters," in Proceedings of the 24th ACM International Conference on Supercomputing, ser. ICS '10. New York, NY, USA: ACM, 2010, pp. 315-324. [Online]. Available: http://doi.acm.org/10.1145/1810085.1810128
    • (2010) ICS '10 , pp. 315-324
    • Chen, Y.1    Cui, X.2    Mei, H.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.