메뉴 건너뛰기




Volumn 39, Issue 1, 2013, Pages 58-77

Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition

Author keywords

2D decomposition; FFT; Parallel computing

Indexed keywords

CODE PERFORMANCE; COLLECTIVE COMMUNICATIONS; COMMUNICATION TIME; COMPLEXITY ANALYSIS; DATA COPYING; DATA-COMMUNICATION; DIFFERENT DOMAINS; DISTRIBUTED MEMORY; DOMAIN DECOMPOSITIONS; EFFICIENT IMPLEMENTATION; EXECUTION TIME; EXTREME SCALE; GRAND CHALLENGE; LOAD BALANCE; PARALLEL COMPUTER; PARALLEL IMPLEMENTATIONS; PROBLEM SIZE; SCALABILITY ANALYSIS; SUB-DOMAINS; SUBDOMAIN; THEORETICAL COMPLEXITY;

EID: 84872903743     PISSN: 01678191     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.parco.2012.12.002     Document Type: Article
Times cited : (57)

References (29)
  • 1
    • 0036467373 scopus 로고    scopus 로고
    • Parallel implementation of the projector augmented plane wave method for charged systems
    • DOI 10.1016/S0010-4655(01)00413-1, PII S0010465501004131
    • E. Bylaska, M. Valiev, R. Kawai, and J.H. Weare Parallel implementation of the projector augmented plane wave method for charged systems Comput. Phys. Commun. 143 2002 11 28 (Pubitemid 34097258)
    • (2002) Computer Physics Communications , vol.143 , Issue.1 , pp. 11-28
    • Bylaska, E.J.1    Valiev, M.2    Kawai, R.3    Weare, J.H.4
  • 2
    • 0002935932 scopus 로고    scopus 로고
    • A parallel and modular deformable cell Car-Parrinello code
    • PII S001046559900418X
    • C. Cavazzoni, and G. Chiarotti A parallel and modular deformable cell CarParrinello code Comput. Phys. Commun. 123 1999 56 76 (Pubitemid 129530811)
    • (1999) Computer Physics Communications , vol.123 , Issue.1-3 , pp. 56-76
    • Cavazzoni, C.1    Chiarotti, G.L.2
  • 3
    • 40749122534 scopus 로고    scopus 로고
    • A massively parallel implementation of the common azimuth pre-stack depth migration
    • H. Calandra, F. Bothorel, and P. Vezolle A massively parallel implementation of the common azimuth pre-stack depth migration IBM J. Res. Dev. 52 2008 83 91
    • (2008) IBM J. Res. Dev. , vol.52 , pp. 83-91
    • Calandra, H.1    Bothorel, F.2    Vezolle, P.3
  • 4
    • 71149104911 scopus 로고    scopus 로고
    • An efficient spectral method for the simulation of dynamos in Cartesian geometry and its implementation on massively parallel computers
    • S. Stellmach, and U. Hansen An efficient spectral method for the simulation of dynamos in Cartesian geometry and its implementation on massively parallel computers Geochem. Geophys. Geosyst. 9 2008 5
    • (2008) Geochem. Geophys. Geosyst. , vol.9 , pp. 5
    • Stellmach, S.1    Hansen, U.2
  • 5
    • 0001243898 scopus 로고    scopus 로고
    • Examination of hypotheses in the Kolmogorov refined turbulence theory through high-resolution simulations. Part 2. Passive scalar field
    • L.P. Wang, S.Y. Chen, and J.G. Brasseur Examination of hypotheses in the Kolmogorov refined turbulence theory through high-resolution simulations. Part 2. Passive scalar field J. Fluid Mech. 400 1999 163 197 (Pubitemid 129659448)
    • (1999) Journal of Fluid Mechanics , vol.400 , pp. 163-197
    • Wang, L.-P.1    Chen, S.2    Brasseur, J.G.3
  • 6
    • 23044461426 scopus 로고    scopus 로고
    • A new parallel strategy for two-dimensional incompressible flow simulations using pseudo-spectral methods
    • DOI 10.1016/j.jcp.2005.04.010, PII S0021999105002226
    • Z. Yin, L. Yuan, and T. Tang A new parallel strategy for two-dimensional incompressible flow simulations using pseudo-spectral methods J. Comput. Phys. 210 2005 325 341 (Pubitemid 41070401)
    • (2005) Journal of Computational Physics , vol.210 , Issue.1 , pp. 325-341
    • Yin, Z.1    Yuan, L.2    Tang, T.3
  • 7
    • 26944457330 scopus 로고    scopus 로고
    • A low-cost parallel implementation of direct numerical simulation of wall turbulence
    • DOI 10.1016/j.jcp.2005.06.003, PII S0021999105002871
    • P. Luchini, and M. Quadrio A low-cost parallel implementation of direct numerical simulation of wall turbulence J. Comput. Phys. 211 2006 551 571 (Pubitemid 41472441)
    • (2006) Journal of Computational Physics , vol.211 , Issue.2 , pp. 551-571
    • Luchini, P.1    Quadrio, M.2
  • 8
    • 0035576784 scopus 로고    scopus 로고
    • A new technique for a parallel dealiased pseudospectral Navier-Stokes code
    • DOI 10.1016/S0010-4655(01)00433-7, PII S0010465501004337
    • M. Iovieno, C. Cavazzoni, and D. Tordella A new technique for a parallel dealiased pseudospectral Navier-Stokes code Comput. Phys. Commun. 141 2001 365 374 (Pubitemid 33126299)
    • (2001) Computer Physics Communications , vol.141 , Issue.3 , pp. 365-374
    • Iovieno, M.1    Cavazzoni, C.2    Tordella, D.3
  • 9
    • 84872905377 scopus 로고    scopus 로고
    • Ab initio electronic structure methods in parallel computers
    • S. Poykko Ab initio electronic structure methods in parallel computers Appl. Parallel Comput. Large Scale Sci. Ind. Prob. 1541 1998 452 459
    • (1998) Appl. Parallel Comput. Large Scale Sci. Ind. Prob. , vol.1541 , pp. 452-459
    • Poykko, S.1
  • 10
    • 6044219893 scopus 로고
    • A hybrid decomposition parallel implementation of the Car-Parrinello method
    • J. Wiggs, and H. Jonsson A hybrid decomposition parallel implementation of the Car-Parrinello method Comput. Phys. Commun. 87 1995 319 340
    • (1995) Comput. Phys. Commun. , vol.87 , pp. 319-340
    • Wiggs, J.1    Jonsson, H.2
  • 11
    • 0034228674 scopus 로고    scopus 로고
    • Parallel fast Fourier transforms for electronic structure calculations
    • P. Haynes, and M. Cote Parallel fast Fourier transforms for electronic structure calculations Comput. Phys. Commun. 130 2000 130 136
    • (2000) Comput. Phys. Commun. , vol.130 , pp. 130-136
    • Haynes, P.1    Cote, M.2
  • 12
    • 19344378421 scopus 로고    scopus 로고
    • Scalable framework fo 3D FFTs on the Bluegen/l supercomputer: Implementation and early performance measurements
    • M. Eleftheriou, B. Fitch, A. Rayshubskiy, T.J.C. Ward, and R.S. Germain Scalable framework fo 3D FFTs on the Bluegen/l supercomputer: implementation and early performance measurements IBM J. Res. Develop. 49 2005 457
    • (2005) IBM J. Res. Develop. , vol.49 , pp. 457
    • Eleftheriou, M.1    Fitch, B.2    Rayshubskiy, A.3    Ward, T.J.C.4    Germain, R.S.5
  • 13
    • 33947229391 scopus 로고    scopus 로고
    • Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer
    • DOI 10.1016/j.cpc.2006.12.006, PII S0010465507000276
    • B. Fang, Y. Deng, and G. Martyna Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer Comput. Phys. Commun. 176 2007 531 538 (Pubitemid 46435804)
    • (2007) Computer Physics Communications , vol.176 , Issue.8 , pp. 531-538
    • Fang, B.1    Deng, Y.2    Martyna, G.3
  • 14
    • 0037402659 scopus 로고    scopus 로고
    • Efficient implementation of parallel three-dimensional FFT on clusters of PCs
    • D. Takahashi Efficient implementation of parallel three-dimensional FFT on clusters of PCs Comput. Phys. Commun. 152 2003 144 150
    • (2003) Comput. Phys. Commun. , vol.152 , pp. 144-150
    • Takahashi, D.1
  • 15
    • 0042579314 scopus 로고    scopus 로고
    • FFT algorithms and their adaptation to parallel processing
    • E. Chu, and A. George FFT algorithms and their adaptation to parallel processing Linear Algebra Appl. 284 1998 95 124
    • (1998) Linear Algebra Appl. , vol.284 , pp. 95-124
    • Chu, E.1    George, A.2
  • 16
    • 0030285174 scopus 로고    scopus 로고
    • Implementation of parallel FFT algorithms on distributed memory machines with a minimum overhead of communication
    • DOI 10.1016/S0167-8191(96)00039-7, PII S0167819196000397
    • C. Calvin Implementation of parallel FFT algorithms on distributed memory machines with a minimum overhead of communication Parallel Comput. 22 1996 1255 1279 (Pubitemid 126371168)
    • (1996) Parallel Computing , vol.22 , Issue.9 , pp. 1255-1279
    • Calvin, C.1
  • 17
    • 0035980883 scopus 로고    scopus 로고
    • A simple and efficient parallel FFT algorithm using the BSP model
    • DOI 10.1016/S0167-8191(01)00118-1, PII S0167819101001181
    • M. Inda, and R. Bisseling A simple and efficient parallel FFT algorithm using the BSP model Parallel Comput. 27 2001 1847 1878 (Pubitemid 32997724)
    • (2001) Parallel Computing , vol.27 , Issue.14 , pp. 1847-1878
    • Inda, M.A.1    Bisseling, R.H.2
  • 18
    • 0033344480 scopus 로고    scopus 로고
    • Parallel implementation of multidimensional transforms without interprocessor communication
    • DOI 10.1109/12.795223
    • F. Marino, and E.E. Swartzlander Parallel implementation of multidimensional transforms without interprocessor communication IEEE Trans. Comput. 48 1999 951 961 (Pubitemid 30524475)
    • (1999) IEEE Transactions on Computers , vol.48 , Issue.9 , pp. 951-961
    • Marino, F.1    Swartzlander, E.E.2
  • 20
    • 0001229961 scopus 로고
    • The parallel Fourier pseudospectral method
    • R.B. Pelz The parallel Fourier pseudospectral method J. Comput. Phys. 92 1991 296 312
    • (1991) J. Comput. Phys. , vol.92 , pp. 296-312
    • Pelz, R.B.1
  • 21
    • 0035980881 scopus 로고    scopus 로고
    • Scalable parallel FFT for spectral simulations on a Beowulf cluster
    • DOI 10.1016/S0167-8191(01)00120-X, PII S016781910100120X
    • P. Dmitruk, L.-P. Wang, W.H. Matthaeus, R. Zhang, and D. Seckel Scalable parallel FFT for spectral simulations on a Beowulf cluster Parallel Comput. 27 2001 1921 1936 (Pubitemid 32997727)
    • (2001) Parallel Computing , vol.27 , Issue.14 , pp. 1921-1936
    • Dmitruk, P.1    Wang, L.-P.2    Matthaeus, W.H.3    Zhang, R.4    Seckel, D.5
  • 22
    • 0007910094 scopus 로고
    • Plane-wave electronic-structure calculations on a parallel supercomputer
    • J.S. Nelson, S.J. Plimpton, and M.P. Sears Plane-wave electronic-structure calculations on a parallel supercomputer Phys. Rev. B 47 1993 1765 1774
    • (1993) Phys. Rev. B , vol.47 , pp. 1765-1774
    • Nelson, J.S.1    Plimpton, S.J.2    Sears, M.P.3
  • 25
    • 84894155320 scopus 로고    scopus 로고
    • 2DECOMP&FFT-A highly scalable 2D decomposition library and FFT interface
    • N. Li, S. Laizet, 2DECOMP&FFT-A highly scalable 2D decomposition library and FFT interface, Cray User Group 2010 Conference, Edinburgh, 2010, .
    • (2010) Cray User Group 2010 Conference, Edinburgh
    • Li, N.1    Laizet, S.2
  • 27
    • 27144554620 scopus 로고    scopus 로고
    • Performance measurements of the 3D FFT on the Blue Gene/L supercomputer
    • Euro-Par 2005 Parallel Processing: 11th International Euro-Par Conference. Proceedings
    • M. Eleftheriou, B. Fitch, A. Rayshubskiy, T.J.C. Ward, R. Germain, Performance measurements of the 3D FFT on the BlueGene/L supercomputer, in: Euro-Par 2005 Parallel Processing, Proceedings, vol. 3648, 2005, pp. 795-803. (Pubitemid 41490880)
    • (2005) Lecture Notes in Computer Science , vol.3648 , pp. 795-803
    • Eleftheriou, M.1    Fitch, B.2    Rayshubskiy, A.3    Ward, T.J.C.4    Germain, R.5
  • 28
    • 33947416576 scopus 로고    scopus 로고
    • A modified split-radix FFT with fewer arithmetic operations
    • DOI 10.1109/TSP.2006.882087
    • S. Johnson, M. Frigo, A modified split-radix FFT with fewer arithmetic operations, IEEE Trans. Signal Process. 55 (2007) 111-119, . (Pubitemid 46443211)
    • (2007) IEEE Transactions on Signal Processing , vol.55 , Issue.1 , pp. 111-119
    • Johnson, S.G.1    Frigo, M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.