SCOPUS 정보 검색 플랫폼

Proceedings of the International Conference on Supercomputing

Volumn , Issue , 2012, Pages 205-214

On the communication complexity of 3D FFTs and its implications for exascale

(6) Czechowski, Kenneth a Battaglino, Casey a McClanahan, Chris a Iyer, Kartik b Yeung, P K a,b Vuduc, Richard a

a Georgia Institute of Technology (United States)

b Georgia Institute of Technology (United States)

Author keywords

Exascale; FFT; Performance model

Indexed keywords

ALL-TO-ALL COMMUNICATION; CO-PROCESSORS; COMMUNICATION COMPLEXITY; CURRENT TECHNOLOGY; EXASCALE; INTRA-NODE COMMUNICATION; MEMORY BANDWIDTHS; MEMORY HIERARCHY; NETWORK BANDWIDTH; NETWORK COMMUNICATIONS; PERFORMANCE IMPACT; PERFORMANCE MODEL; POTENTIAL SCALING; SOFTWARE IMPLEMENTATION;

FAST FOURIER TRANSFORMS; INTELLIGENT CONTROL; PROGRAM PROCESSORS; THREE DIMENSIONAL;

COMMUNICATION;

EID: 84864032930 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2304576.2304604 Document Type: Conference Paper

Times cited : (65)

References (43)

1
- 84870461764
- The HPC Challenge benchmark. http://icl.cs.utk.edu/hpcc.
- The HPC Challenge Benchmark

2
- 0028584512
- An efficient parallel algorithm for the 3-D FFT NAS parallel benchmark
- IEEE Comput. Soc. Press
- R. Agarwal, F. Gustavson, and M. Zubair. An efficient parallel algorithm for the 3-D FFT NAS parallel benchmark. In Proceedings of IEEE Scalable High Performance Computing Conference, pages 129-133. IEEE Comput. Soc. Press, 1994.
- (1994) Proceedings of IEEE Scalable High Performance Computing Conference , pp. 129-133
- Agarwal, R.¹ Gustavson, F.² Zubair, M.³

3
- 0036105874
- Cellular supercomputing with system-on-a-chip
- Digest of Technical Papers (Cat. No.02CH37315) Ieee
- G. Almasi et al. Cellular supercomputing with system-on-a-chip. In 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315), pages 196-197. Ieee, 2002.
- (2002) 2002 IEEE International Solid-state Circuits Conference , pp. 196-197
- Almasi, G.¹

4
- 79959926022
- FAWN: A fast array of wimpy nodes
- July
- D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A fast array of wimpy nodes. Communications of the ACM, 54(7):101-109, July 2011.
- (2011) Communications of the ACM , vol.54 , Issue.7 , pp. 101-109
- Andersen, D.G.¹ Franklin, J.² Kaminsky, M.³ Phanishayee, A.⁴ Tan, L.⁵ Vasudevan, V.⁶

5
- 33847103649
- Optimizing bandwidth limited problems using one-sided communication and overlap
- IEEE
- C. Bell, D. Bonachea, R. Nishtala, and K. Yelick. Optimizing Bandwidth Limited Problems Using One-Sided Communication and Overlap. In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, pages 1-10. IEEE, 2006.
- (2006) Proceedings 20th IEEE International Parallel & Distributed Processing Symposium , pp. 1-10
- Bell, C.¹ Bonachea, D.² Nishtala, R.³ Yelick, K.⁴

6
- 84858657417
- The hidden cost of low bandwidth communication
- U. Vishkin, editor ACM, New York, NY, USA
- G. E. Blelloch, B. M. Maggs, and G. L. Miller. The hidden cost of low bandwidth communication. In U. Vishkin, editor, Developing a Computer Science Agenda for High-Performance Computing, pages 22-25. ACM, New York, NY, USA, 1994.
- (1994) Developing a Computer Science Agenda for High-performance Computing , pp. 22-25
- Blelloch, G.E.¹ Maggs, B.M.² Miller, G.L.³

7
- 33746887354
- SeaStar interconnect: Balanced bandwidth for scalable performance
- DOI 10.1109/MM.2006.65
- R. Brightwell, K. T. Pedretti, K. D. Underwood, and T. Hudson. Seastar interconnect: Balanced bandwidth for scalable performance. IEEE Micro, 26:41-57, May 2006. (Pubitemid 44194067)
- (2006) IEEE Micro , vol.26 , Issue.3 , pp. 41-57
- Brightwell, R.¹ Pedretti, K.T.² Underwood, K.D.³ Hudson, T.⁴

8
- 0000493064
- Estimating interlock and improving balance for pipelined architectures
- Aug.
- D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance for pipelined architectures. Journal of Parallel and Distributed Computing, 5(4):334-358, Aug. 1988.
- (1988) Journal of Parallel and Distributed Computing , vol.5 , Issue.4 , pp. 334-358
- Callahan, D.¹ Cocke, J.² Kennedy, K.³

9
- 58449124711
- Communication analysis of parallel 3d fft for flat cartesian meshes on large blue gene systems
- Springer-Verlag
- A. Chan, P. Balaji, W. Gropp, and R. Thakur. Communication analysis of parallel 3d fft for flat cartesian meshes on large blue gene systems. In Proceedings of the 15th international conference on High performance computing, pages 350-364. Springer-Verlag, 2008.
- (2008) Proceedings of the 15th International Conference on High Performance Computing , pp. 350-364
- Chan, A.¹ Balaji, P.² Gropp, W.³ Thakur, R.⁴

10
- 19344375178
- The development and integration of a distributed 3D FFT for a cluster of workstations
- Atlanta, GA, USA
- C. E. Cramer and J. Board. The development and integration of a distributed 3D FFT for a cluster of workstations. In Proceedings of the 4th Annual Linux Showcase & Conference, Atlanta, GA, USA, 2000.
- (2000) Proceedings of the 4th Annual Linux Showcase & Conference
- Cramer, C.E.¹ Board, J.²

11
- 84867435449
- Balance principles for algorithm-architecture co-design
- Berkeley, CA, USA Usenix Association
- K. Czechowski, C. Battaglino, C. Mcclanahan, A. Chandramowlishwaran, and R. Vuduc. Balance principles for algorithm-architecture co-design. In USENIX Wkshp. Hot Topics in Parallelism (HotPar), pages 1-5, Berkeley, CA, USA, 2011. Usenix Association.
- (2011) USENIX Wkshp. Hot Topics in Parallelism (HotPar) , pp. 1-5
- Czechowski, K.¹ Battaglino, C.² Mcclanahan, C.³ Chandramowlishwaran, A.⁴ Vuduc, R.⁵

12
- 43949090517
- Titanium performance and potential: An NPB experimental study
- DOI 10.1007/978-3-540-69330-7-14, Languages and Compilers for Parallel Computing - 18th International Workshop, LCPC 2005, Revised Selected Papers
- K. Datta, D. Bonachea, and K. Yelick. Titanium Performance and Potential: An NPB Experimental Study. In Proceedings of the Languages and Compilers for Parallel Computing (LCPC) Workshop, volume LNCS 4339, pages 200-214, 2006. (Pubitemid 351702211)
- (2006) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , vol.LNCS4339 , pp. 200-214
- Datta, K.¹ Bonachea, D.² Yelick, K.³

13
- 0344349185
- A portable 3D FFT package for distributed-memory parallel architectures
- SIAM Press
- H. Q. Ding, R. D. Ferraro, and D. B. Gennery. A Portable 3D FFT Package for Distributed-Memory Parallel Architectures. In Proceedings of 7th SIAM Conference on Parallel Processing, pages 70 - 71. SIAM Press, 1995.
- (1995) Proceedings of 7th SIAM Conference on Parallel Processing , pp. 70-71
- Ding, H.Q.¹ Ferraro, R.D.² Gennery, D.B.³

14
- 0035980881
- Scalable parallel FFT for spectral simulations on a Beowulf cluster
- DOI 10.1016/S0167-8191(01)00120-X, PII S016781910100120X
- P. Dmitruk, L.-P. Wang, W. H. Mattaeus, R. Zhang, and D. Seckel. Scalable parallel FFT for spectral simulations on a Beowulf cluster. Parallel Computing, 27(14):1921-1936, Dec. 2001. (Pubitemid 32997727)
- (2001) Parallel Computing , vol.27 , Issue.14 , pp. 1921-1936
- Dmitruk, P.¹ Wang, L.-P.² Matthaeus, W.H.³ Zhang, R.⁴ Seckel, D.⁵

15
- 78650819877
- Overlapping methods of all-to-all communication and FFT algorithms for torus-connected massively parallel supercomputers
- number November IEEE, Nov.
- J. Doi and Y. Negishi. Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers. In 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, number November, pages 1-9. IEEE, Nov. 2010.
- (2010) 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , pp. 1-9
- Doi, J.¹ Negishi, Y.²

16
- 79951595196
- The international exascale software project roadmap
- J. Dongarra et al. The international exascale software project roadmap. IJHPCA, 25(1):3-60, 2011.
- (2011) IJHPCA , vol.25 , Issue.1 , pp. 3-60
- Dongarra, J.¹

17
- 77957154767
- 4) processors. 2008.
- (2008) 4) Processors
- Donzis, D.¹ Yeung, P.² Pekurovsky, D.³

18
- 0032226441
- The future fast fourier transform?
- A. Edelman, P. McCorquodale, and S. Toledo. The Future Fast Fourier Transform? SIAM Journal on Scientific Computing, 20(3):1094, 1998.
- (1998) SIAM Journal on Scientific Computing , vol.20 , Issue.3 , pp. 1094
- Edelman, A.¹ McCorquodale, P.² Toledo, S.³

19
- 19344378421
- Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: Implementation and early performance measurements
- M. Eleftheriou, B. Fitch, A. Rayshubskiy, T. Ward, and R. Germain. Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements. IBM Journal of Research and Development, 49(2.3):457-464, 2005. (Pubitemid 40718146)
- (2005) IBM Journal of Research and Development , vol.49 , Issue.2-3 , pp. 457-464
- Eleftheriou, M.¹ Fitch, B.G.² Rayshubskiy, A.³ Ward, T.J.C.⁴ Germain, R.S.⁵

20
- 33947229391
- Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer
- DOI 10.1016/j.cpc.2006.12.006, PII S0010465507000276
- B. FANG, Y. DENG, and G. MARTYNA. Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer. Computer Physics Communications, 176(8):531-538, Apr. 2007. (Pubitemid 46435804)
- (2007) Computer Physics Communications , vol.176 , Issue.8 , pp. 531-538
- Fang, B.¹ Deng, Y.² Martyna, G.³

21
- 0033350255
- Cache-oblivious algorithms
- FOCS '99 Washington, DC, USA IEEE Computer Society
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science, FOCS '99, pages 285-, Washington, DC, USA, 1999. IEEE Computer Society.
- (1999) Proceedings of the 40th Annual Symposium on Foundations of Computer Science , pp. 285
- Frigo, M.¹ Leiserson, C.E.² Prokop, H.³ Ramachandran, S.⁴

22
- 77953976700
- An introductory exascale feasibility study for FFTs and multigrid
- Atlanta, GA, USA, Apr. IEEE
- H. Gahvari and W. Gropp. An introductory exascale feasibility study for FFTs and multigrid. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pages 1-9, Atlanta, GA, USA, Apr. 2010. IEEE.
- (2010) 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) , pp. 1-9
- Gahvari, H.¹ Gropp, W.²

23
- 78149258346
- Understanding throughput-oriented architectures
- Nov.
- M. Garland and D. B. Kirk. Understanding throughput-oriented architectures. Communications of the ACM, 53(11):58, Nov. 2010.
- (2010) Communications of the ACM , vol.53 , Issue.11 , pp. 58
- Garland, M.¹ Kirk, D.B.²

24
- 0035280950
- Parallel distributed FFT-based solvers for 3-D Poisson problems in meso-scale atmospheric simulations
- DOI 10.1177/109434200101500104
- L. Giraud, R. Guivarch, and J. Stein. Parallel Distributed FFT-Based Solvers for 3-D Poisson Problems in Meso-Scale Atmospheric Simulations. International Journal of High Performance Computing Applications, 15(1):36-46, Feb. 2001. (Pubitemid 32252488)
- (2001) International Journal of High Performance Computing Applications , vol.15 , Issue.1 , pp. 36-46
- Giraud, L.¹ Guivarch, R.² Stein, J.³

25
- 77954713684
- An empirically tuned 2D and 3D FFT library on CUDA GPU
- Tsukuba, Japan ACM Press
- L. Gu, X. Li, and J. Siegel. An empirically tuned 2D and 3D FFT library on CUDA GPU. In Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10, page 305, Tsukuba, Japan, 2010. ACM Press.
- (2010) Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10 , pp. 305
- Gu, L.¹ Li, X.² Siegel, J.³

26
- 67650635164
- Many-core vs. many-thread machines: Stay away from the valley
- Z. Guz, E. Bolotin, I. Keidar, A. Kolodny, A. Mendelson, and U. C. Weiser. Many-core vs. many-thread machines: Stay away from the valley. IEEE Computer Architecture Letters, 8:25-28, 2009.
- (2009) IEEE Computer Architecture Letters , vol.8 , pp. 25-28
- Guz, Z.¹ Bolotin, E.² Keidar, I.³ Kolodny, A.⁴ Mendelson, A.⁵ Weiser, U.C.⁶

27
- 60649098706
- Parallel 3D-FFTs for multi-core nodes on a mesh communication network
- Helsinki, Finland
- J. Hein, H. Jagode, U. Sigrist, A. Simpson, and A. Trew. Parallel 3D-FFTs for multi-core nodes on a mesh communication network. In Proceedings of the Cray User's Group (CUG) Meeting, pages 1-15, Helsinki, Finland, 2008.
- (2008) Proceedings of the Cray User's Group (CUG) Meeting , pp. 1-15
- Hein, J.¹ Jagode, H.² Sigrist, U.³ Simpson, A.⁴ Trew, A.⁵

28
- 84864027960
- Task placement of parallel multidimensional ffts on a mesh communication network
- H. Jagode, J. Hein, and A. Trew. Task placement of parallel multidimensional ffts on a mesh communication network. University of Tennessee Knoxville, Technical Report No. ut-cs-08-613, 2008.
- (2008) University of Tennessee Knoxville, Technical Report No. Ut-cs-08-613
- Jagode, H.¹ Hein, J.² Trew, A.³

29
- 84971853043
- I/O complexity: The red-blue pebble game
- New York, New York, USA, May ACM Press
- H. Jia-Wei and H. T. Kung. I/O complexity: The red-blue pebble game. In Proceedings of the thirteenth annual ACM symposium on Theory of computing-STOC '81, pages 326-333, New York, New York, USA, May 1981. ACM Press.
- (1981) Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing-STOC '81 , pp. 326-333
- Jia-Wei, H.¹ Kung, H.T.²

30
- 83155160951
- Using the top500 to trace and project technology and architecture trends
- ACM
- P. Kogge and T. Dysart. Using the top500 to trace and project technology and architecture trends. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, page 28. ACM, 2011.
- (2011) Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis , pp. 28
- Kogge, P.¹ Dysart, T.²

31
- 66749092384
- Sept.
- P. Kogge et al. Exascale Computing Study: Technology challenges in acheiving exascale systems, Sept. 2008.
- (2008) Exascale Computing Study: Technology Challenges in Acheiving Exascale Systems
- Kogge, P.¹

32
- 55849105626
- Optimization of all-to-all communication on the blue gene/L supercomputer
- IEEE, Sept.
- S. Kumar, Y. Sabharwal, R. Garg, and P. Heidelberger. Optimization of All-to-All Communication on the Blue Gene/L Supercomputer. In 2008 37th International Conference on Parallel Processing, pages 320-329. IEEE, Sept. 2008.
- (2008) 2008 37th International Conference on Parallel Processing , pp. 320-329
- Kumar, S.¹ Sabharwal, Y.² Garg, R.³ Heidelberger, P.⁴

33
- 0022563298
- Memory requirements for balanced computer architectures
- Tokyo, Japan
- H. T. Kung. Memory requirements for balanced computer architectures. In Proceedings of the ACM Int'l. Symp. Computer Architecture (ISCA), Tokyo, Japan, 1986.
- (1986) Proceedings of the ACM Int'l. Symp. Computer Architecture (ISCA)
- Kung, H.T.¹

34
- 52649125840
- 3D-stacked memory architectures for multi-core processors
- IEEE, June
- G. H. Loh. 3D-Stacked Memory Architectures for Multi-core Processors. In 2008 International Symposium on Computer Architecture, pages 453-464. IEEE, June 2008.
- (2008) 2008 International Symposium on Computer Architecture , pp. 453-464
- Loh, G.H.¹

35
- 0038998034
- Memory bandwidth and machine balance in high performance computers
- Dec.
- J. McCalpin. Memory Bandwidth and Machine Balance in High Performance Computers. IEEE Technical Committee on Computer Architecture (TCCA) Newsletter, Dec. 1995.
- (1995) IEEE Technical Committee on Computer Architecture (TCCA) Newsletter
- McCalpin, J.¹

36
- 79959609169
- November
- D. Pekurovsky and J. H. Goebbert. P3DFFT - highly scalable parallel 3d fast fourier transforms library. http://www.sdsc.edu/us/resources/p3dfft, November 2010.
- (2010) P3DFFT - Highly Scalable Parallel 3d Fast Fourier Transforms Library
- Pekurovsky, D.¹ Goebbert, J.H.²

37
- 84856841346
- Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark
- New Orleans, LA, USA, Nov.
- S. J. Pennycook, S. D. Hammond, S. A. Jarvis, and G. R. Mudalige. Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark. In Proceedings of the International Workshop on Performance Modeling, Benchmarking and Simulation (PMBS), New Orleans, LA, USA, Nov. 2010.
- (2010) Proceedings of the International Workshop on Performance Modeling, Benchmarking and Simulation (PMBS)
- Pennycook, S.J.¹ Hammond, S.D.² Jarvis, S.A.³ Mudalige, G.R.⁴

38
- 79961071291
- Web search using mobile cores: Quantifying and mitigating the price of efficiency
- June
- V. J. Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. Web search using mobile cores: Quantifying and mitigating the price of efficiency. ACM SIGARCH Computer Architecture News, 38(3):215-314, June 2010.
- (2010) ACM SIGARCH Computer Architecture News , vol.38 , Issue.3 , pp. 215-314
- Reddi, V.J.¹ Lee, B.C.² Chilimbi, T.³ Vaid, K.⁴

39
- 84864034954
- PhD thesis, The University of Edinburgh
- U. Sigrist. Optimizing parallel 3D fast Fourier transformations for a cluster of IBM POWER5 SMP nodes. PhD thesis, The University of Edinburgh, 2007.
- (2007) Optimizing Parallel 3D Fast Fourier Transformations for a Cluster of IBM POWER5 SMP Nodes
- Sigrist, U.¹

40
- 0012776293
- A parallel 3-D FFT algorithm on clusters of vector SMPs
- Applied Parallel Computing New Paradigms for HPC in Industry and Academia 5th International Workshop, PARA 2000 Bergen, Norway, June 18-20, 2000 Proceedings
- D. Takahashi. A Parallel 3-D FFT Algorithm on Clusters of Vector SMPs. In Proceedings of Applied Parallel Computing: New Paradigms for HPC in Industry and Academia, volume LNCS 1947, pages 316-323, 2001. (Pubitemid 33239312)
- (2001) LECTURE NOTES IN COMPUTER SCIENCE , Issue.1947 , pp. 316-323
- Takahashi, D.¹

41
- 80052312080
- Keeneland: Bringing heterogeneous gpu computing to the computational science community
- J. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, et al. Keeneland: Bringing heterogeneous gpu computing to the computational science community. IEEE Computing in Science and Engineering, 13(5):90-95, 2011.
- (2011) IEEE Computing in Science and Engineering , vol.13 , Issue.5 , pp. 90-95
- Vetter, J.¹ Glassbrook, R.² Dongarra, J.³ Schwan, K.⁴ Loftis, B.⁵ McNally, S.⁶ Meredith, J.⁷ Rogers, J.⁸ Roth, P.⁹ Spafford, K.¹⁰

42
- 84864034959
- MVAPICH2-GPU: Optimized GPU to GPU communication for InfiniBand clusters
- H. Wang, S. Potluri, M. Luo, A. Singh, S. Sur, and D. Panda. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Computer Science-Research and Development, pages 1-10.
- Computer Science-research and Development , pp. 1-10
- Wang, H.¹ Potluri, S.² Luo, M.³ Singh, A.⁴ Sur, S.⁵ Panda, D.⁶

43
- 74049089074
- A 32×32×32, spatially distributed 3D FFT in four microseconds on anton
- ACM
- C. Young, J. Bank, R. Dror, J. Grossman, J. Salmon, and D. Shaw. A 32×32×32, spatially distributed 3D FFT in four microseconds on Anton. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, page 23. ACM, 2009.
- (2009) Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis , pp. 23
- Young, C.¹ Bank, J.² Dror, R.³ Grossman, J.⁴ Salmon, J.⁵ Shaw, D.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.