SCOPUS 정보 검색 플랫폼

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Volumn , Issue , 2013, Pages

Enabling highly-scalable remote memory access programming with MPI-3 one sided

(3) Gerstenberger, Robert a Besta, Maciej a Hoefler, Torsten a

a ETH ZURICH (Switzerland)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER PROGRAMMING; COMPUTER SCIENCE;

APPLICATION PERFORMANCE; APPLICATION STUDIES; CRITICAL FUNCTIONS; MEMORY CONSUMPTION; PERFORMANCE MODEL; PROGRAMMING COMPLEXITY; REMOTE DIRECT MEMORY ACCESS; REMOTE MEMORY ACCESS;

COMPLEX NETWORKS;

EID: 84899678292 PISSN: 21674329 EISSN: 21674337 Source Type: Conference Proceeding
DOI: 10.1145/2503210.2503286 Document Type: Conference Paper

Times cited : (70)

References (41)

1
- 77958112922
- The Gemini system interconnect
- IEEE Computer Society
- R. Alverson, D. Roweth, and L. Kaplan. The Gemini system interconnect. In Proceedings of the IEEE Symposium on High Performance Interconnects (HOTI'10), pages 83-87. IEEE Computer Society, 2010.
- (2010) Proceedings of the IEEE Symposium on High Performance Interconnects (HOTI'10) , pp. 83-87
- Alverson, R.¹ Roweth, D.² Kaplan, L.³

2
- 77958110333
- The PERCS high-performance interconnect
- IEEE Computer Society
- B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis, J. Li, N. Ni, and R. Rajamony. The PERCS high-performance interconnect. In Proceedings of the IEEE Symposium on High Performance Interconnects (HOTI'10), pages 75-82. IEEE Computer Society, 2010.
- (2010) Proceedings of the IEEE Symposium on High Performance Interconnects (HOTI'10) , pp. 75-82
- Arimilli, B.¹ Arimilli, R.² Chung, V.³ Clark, S.⁴ Denzel, W.⁵ Drerup, B.⁶ Hoefler, T.⁷ Joyner, J.⁸ Lewis, J.⁹ Li, J.¹⁰ Ni, N.¹¹ Rajamony, R.¹²

3
- 0011627265
- R. Barriuso and A. Knies. SHMEM user's guide for C, 1994.
- SHMEM user's guide for C, 1994
- Barriuso, R.¹ Knies, A.²

4
- 84863638735
- Performance modeling and comparative analysis of the MILC lattice QCD application su3 rmd
- IEEE Computer Society
- G. Bauer, S. Gottlieb, and T. Hoefler. Performance modeling and comparative analysis of the MILC lattice QCD application su3 rmd. In Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID'12), pages 652-659. IEEE Computer Society, 2012.
- (2012) Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID'12) , pp. 652-659
- Bauer, G.¹ Gottlieb, S.² Hoefler, T.³

5
- 84899666694
- Performance evaluation of the RDMA over ethernet (RoCE) standard in enterprise data centers infrastructure
- ITCP
- M. Beck and M. Kagan. Performance evaluation of the RDMA over ethernet (RoCE) standard in enterprise data centers infrastructure. In Proceedings of the Workshop on Data Center-Converged and Virtual Ethernet Switching (DC-CaVES'11), pages 9-15. ITCP, 2011.
- (2011) Proceedings of the Workshop on Data Center-Converged and Virtual Ethernet Switching (DC-CaVES'11) , pp. 9-15
- Beck, M.¹ Kagan, M.²

6
- 84947248378
- An evaluation of current high-performance networks
- C. Bell, D. Bonachea, Y. Cote, J. Duell, P. Hargrove, P. Husbands, C. Iancu, M. Welcome, and K. Yelick. An evaluation of current high-performance networks. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'03). IEEE Computer Society, 2003.
- (2003) Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'03). IEEE Computer Society
- Bell, C.¹ Bonachea, D.² Cote, Y.³ Duell, J.⁴ Hargrove, P.⁵ Husbands, P.⁶ Iancu, C.⁷ Welcome, M.⁸ Yelick, K.⁹

7
- 33847103649
- Optimizing bandwidth limited problems using one-sided communication and overlap
- IEEE Computer Society
- C. Bell, D. Bonachea, R. Nishtala, and K. Yelick. Optimizing bandwidth limited problems using one-sided communication and overlap. In Proceedings of the International Conference on Parallel and Distributed Processing (IPDPS'06), pages 1-10. IEEE Computer Society, 2006.
- (2006) Proceedings of the International Conference on Parallel and Distributed Processing (IPDPS'06) , pp. 1-10
- Bell, C.¹ Bonachea, D.² Nishtala, R.³ Yelick, K.⁴

8
- 84973786808
- Studying quarks and gluons on MIMD parallel computers
- C. Bernard, M. C. Ogilvie, T. A. DeGrand, C. E. DeTar, S. A. Gottlieb, A. Krasnitz, R. Sugar, and D. Toussaint. Studying quarks and gluons on MIMD parallel computers. International Journal of High Performance Computing Applications, 5(4):61-70, 1991.
- (1991) International Journal of High Performance Computing Applications, 5(4 , pp. 61-70
- Bernard, C.¹ Ogilvie, M.C.² Degrand, T.A.³ Detar, C.E.⁴ Gottlieb, S.A.⁵ Krasnitz, A.⁶ Sugar, R.⁷ Toussaint, D.⁸

9
- 84877718746
- Cray Cascade: A scalable HPC system based on a Dragonfly network
- 9. IEEE Computer Society
- G. Faanes, A. Bataineh, D. Roweth, T. Court, E. Froese, B. Alverson, T. Johnson, J. Kopnick, M. Higgins, and J. Reinhard. Cray Cascade: A scalable HPC system based on a Dragonfly network. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'12), pages 103:1-103:9. IEEE Computer Society, 2012.
- (2012) Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'12) , vol.103 , pp. 1-103
- Faanes, G.¹ Bataineh, A.² Roweth, D.³ Court, T.⁴ Froese, E.⁵ Alverson, B.⁶ Johnson, T.⁷ Kopnick, J.⁸ Higgins, M.⁹ Reinhard, J.¹⁰

10
- 70449640953
- Automatic tuning of discrete Fourier Transforms driven by analytical modeling
- IEEE Computer Society
- B. B. Fraguela, Y. Voronenko, and M. Pueschel. Automatic tuning of discrete Fourier Transforms driven by analytical modeling. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT'09), pages 271-280. IEEE Computer Society, 2009.
- (2009) Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT'09) , pp. 271-280
- Fraguela, B.B.¹ Voronenko, Y.² Pueschel, M.³

11
- 0027649341
- Isoefficiency: Measuring the scalability of parallel algorithms and architectures
- A. Y. Grama, A. Gupta, and V. Kumar. Isoefficiency: measuring the scalability of parallel algorithms and architectures. Parallel and Distributed Technology: Systems and Technology, 1(3):12-21, 1993.
- (1993) Parallel and Distributed Technology: Systems and Technology , vol.1 , Issue.3 , pp. 12-21
- Grama, A.Y.¹ Gupta, A.² Kumar, V.³

12
- 84867646537
- Leveraging MPI's one-sided communication interface for shared-memory programming
- Springer
- T. Hoefler, J. Dinan, D. Buntinas, P. Balaji, B. Barrett, R. Brightwell, W. Gropp, V. Kale, and R. Thakur. Leveraging MPI's one-sided communication interface for shared-memory programming. In Recent Advances in the Message Passing Interface (EuroMPI'12), volume LNCS 7490, pages 132-141. Springer, 2012.
- (2012) Recent Advances in the Message Passing Interface (EuroMPI'12), Volume LNCS 7490 , pp. 132-141
- Hoefler, T.¹ Dinan, J.² Buntinas, D.³ Balaji, P.⁴ Barrett, B.⁵ Brightwell, R.⁶ Gropp, W.⁷ Kale, V.⁸ Thakur, R.⁹

13
- 78149256345
- Parallel zero-copy algorithms for Fast Fourier Transform and conjugate gradient using MPI datatypes
- Springer
- T. Hoefler and S. Gottlieb. Parallel zero-copy algorithms for Fast Fourier Transform and conjugate gradient using MPI datatypes. In Recent Advances in the Message Passing Interface (EuroMPI'10), volume LNCS 6305, pages 132-141. Springer, 2010.
- (2010) Recent Advances in the Message Passing Interface (EuroMPI'10), Volume LNCS 6305 , pp. 132-141
- Hoefler, T.¹ Gottlieb, S.²

14
- 78650818849
- Characterizing the influence of system noise on large-scale applications by simulation
- IEEE Computer Society
- T. Hoefler, T. Schneider, and A. Lumsdaine. Characterizing the influence of system noise on large-scale applications by simulation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10), pages 1-11. IEEE Computer Society, 2010.
- (2010) Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10) , pp. 1-11
- Hoefler, T.¹ Schneider, T.² Lumsdaine, A.³

15
- 77957567653
- Scalable communication protocols for dynamic sparse data exchange
- ACM
- T. Hoefler, C. Siebert, and A. Lumsdaine. Scalable communication protocols for dynamic sparse data exchange. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'10), pages 159-168. ACM, 2010.
- (2010) Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'10) , pp. 159-168
- Hoefler, T.¹ Siebert, C.² Lumsdaine, A.³

16
- 84899689430
- ISO Fortran Committee
- ISO Fortran Committee. Fortran 2008 Standard (ISO/IEC 1539-1:2010). 2010.
- (2010) Fortran 2008 Standard (ISO/IEC 1539-1:2010)

17
- 4544268140
- High performance MPI-2 one-sided communication over InfiniBand
- IEEE Computer Society
- W. Jiang, J. Liu, H.-W. Jin, D. K. Panda, W. Gropp, and R. Thakur. High performance MPI-2 one-sided communication over InfiniBand. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGRID'04), pages 531-538. IEEE Computer Society, 2004.
- (2004) Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGRID'04) , pp. 531-538
- Jiang, W.¹ Liu, J.² Jin, H.-W.³ Panda, D.K.⁴ Gropp, W.⁵ Thakur, R.⁶

18
- 84957615862
- A comparative characterization of communication patterns in applications using MPI, shared memory on an IBM SP2
- Springer
- S. Karlsson and M. Brorsson. A comparative characterization of communication patterns in applications using MPI and shared memory on an IBM SP2. In Proceedings of the International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications (CANPC'98), pages 189-201. Springer, 1998.
- (1998) Proceedings of the International Workshop on Network-Based Parallel Computing: Communication, Architecture. Applications (CANPC'98) , pp. 189-201
- Karlsson, S.¹ Brorsson, M.²

19
- 85031726860
- Optimal broadcast and summation in the LogP model
- ACM
- R. M. Karp, A. Sahay, E. E. Santos, and K. E. Schauser. Optimal broadcast and summation in the LogP model. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures (SPAA'93), pages 142-153. ACM, 1993.
- (1993) Proceedings of the ACM Symposium on Parallel Algorithms and Architectures (SPAA'93) , pp. 142-153
- Karp, R.M.¹ Sahay, A.² Santos, E.E.³ Schauser, K.E.⁴

20
- 84866873171
- PAMI: A parallel active message interface for the Blue Gene/Q supercomputer
- IEEE Computer Society
- S. Kumar, A. Mamidala, D. A. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. D. Steinmacher-Burrow. PAMI: A parallel active message interface for the Blue Gene/Q supercomputer. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'12), pages 763-773. IEEE Computer Society, 2012.
- (2012) Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'12) , pp. 763-773
- Kumar, S.¹ Mamidala, A.² Faraj, D.A.³ Smith, B.⁴ Blocksome, M.⁵ Cernohous, B.⁶ Miller, D.⁷ Parker, J.⁸ Ratterman, J.⁹ Heidelberger, P.¹⁰ Chen, D.¹¹ Steinmacher-Burrow, B.D.¹²

21
- 77950627571
- Self-consistent MPI performance guidelines
- J. Larsson Traff, W. D. Gropp, and R. Thakur. Self-consistent MPI performance guidelines. IEEE Transactions on Parallel and Distributed Systems, 21(5):698-709, 2010.
- (2010) IEEE Transactions on Parallel and Distributed Systems , vol.21 , Issue.5 , pp. 698-709
- Larsson Traff, J.¹ Gropp, W.D.² Thakur, R.³

22
- 77955144409
- A new vision for coarray fortran
- J. Mellor-Crummey, L. Adhianto, W. N. Scherer III, and G. Jin. A new vision for Coarray Fortran. In Proceedings of the Conference on Partitioned Global Address Space Programming Models (PGAS'09), pages 5:1-5:9. ACM, 2009.
- (2009) Proceedings of the Conference on Partitioned Global Address Space Programming Models (PGAS'09) ACM , pp. 51-59
- Mellor-Crummey, J.¹ Adhianto, L.² Scherer III, W.N.³ Jin, G.⁴

23
- 84976771728
- Scalable reader-writer synchronization for shared-memory multiprocessors
- J. M. Mellor-Crummey and M. L. Scott. Scalable reader-writer synchronization for shared-memory multiprocessors. SIGPLAN Notices, 26(7):106-113, 1991.
- (1991) SIGPLAN Notices , vol.26 , Issue.7 , pp. 106-113
- Mellor-Crummey, J.M.¹ Scott, M.L.²

24
- 0026137159
- Synchronization without contention
- J. M. Mellor-Crummey and M. L. Scott. Synchronization without contention. SIGPLAN Notices, 26(4):269-278, 1991.
- (1991) SIGPLAN Notices , vol.26 , Issue.4 , pp. 269-278
- Mellor-Crummey, J.M.¹ Scott, M.L.²

25
- 23844539932
- A scalable implementation of a finite-volume dynamical core in the community atmosphere model
- A. A. Mirin and W. B. Sawyer. A scalable implementation of a finite-volume dynamical core in the community atmosphere model. International Journal of High Performance Computing Applications, 19(3):203-212, 2005.
- (2005) International Journal of High Performance Computing Applications , vol.19 , Issue.3 , pp. 203-212
- Mirin, A.A.¹ Sawyer, W.B.²

26
- 84903758398
- MPI Forum. MPI: A Message-Passing Interface standard. Version 2. 2, 2009.
- (2009) MPI: A Message-Passing Interface standard Version 2. 2
- Forum, M.¹

27
- 84903758398
- MPI Forum. MPI: A Message-Passing Interface standard. Version 3. 0, 2012.
- (2012) MPI: A Message-Passing Interface standard Version 3. 0
- Forum, M.¹

28
- 70449905663
- Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap
- IEEE Computer Society
- R. Nishtala, P. H. Hargrove, D. O. Bonachea, and K. A. Yelick. Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'09), pages 1-12. IEEE Computer Society, 2009.
- (2009) Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'09) , pp. 1-12
- Nishtala, R.¹ Hargrove, P.H.² Bonachea, D.O.³ Yelick, K.A.⁴

29
- 84899680132
- OpenFabrics Alliance (OFA). OpenFabrics Enterprise Distribution (OFED)
- OpenFabrics Alliance (OFA). OpenFabrics Enterprise Distribution (OFED) www. openfabrics. org.

30
- 84877019178
- The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q
- F. Petrini, D. J. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'03). ACM, 2003.
- (2003) Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'03). ACM
- Petrini, F.¹ Kerbyson, D.J.² Pakin, S.³

31
- 77954729562
- Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application
- ACM
- S. Potluri, P. Lai, K. Tomko, S. Sur, Y. Cui, M. Tatineni, K. W. Schulz, W. L. Barth, A. Majumdar, and D. K. Panda. Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application. In Proceedings of the ACM International Conference on Supercomputing (ICS'10), pages 17-25. ACM, 2010.
- (2010) Proceedings of the ACM International Conference on Supercomputing (ICS'10) , pp. 17-25
- Potluri, S.¹ Lai, P.² Tomko, K.³ Sur, S.⁴ Cui, Y.⁵ Tatineni, M.⁶ Schulz, K.W.⁷ Barth, W.L.⁸ Majumdar, A.⁹ Panda, D.K.¹⁰

32
- 70350441805
- Processing MPI datatypes outside MPI
- Springer
- R. Ross, R. Latham, W. Gropp, E. Lusk, and R. Thakur. Processing MPI datatypes outside MPI. In Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI'09), volume LNCS 5759, pages 42-53. Springer, 2009.
- (2009) Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI'09), Volume LNCS 5759 , pp. 42-53
- Ross, R.¹ Latham, R.² Gropp, W.³ Lusk, E.⁴ Thakur, R.⁵

33
- 70349740809
- Natively supporting true one-sided communication in MPI on multi-core systems with InfiniBand
- IEEE Computer Society
- G. Santhanaraman, P. Balaji, K. Gopalakrishnan, R. Thakur, W. Gropp, and D. K. Panda. Natively supporting true one-sided communication in MPI on multi-core systems with InfiniBand. In Proceedings of the IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'09), pages 380-387. IEEE Computer Society, 2009.
- (2009) Proceedings of the IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'09) , pp. 380-387
- Santhanaraman, G.¹ Balaji, P.² Gopalakrishnan, K.³ Thakur, R.⁴ Gropp, W.⁵ Panda, D.K.⁶

34
- 84899672565
- Accelerating applications at scale using one-sided communication
- H. Shan, B. Austin, N. Wright, E. Strohmaier, J. Shalf, and K. Yelick. Accelerating applications at scale using one-sided communication. In Proceedings of the Conference on Partitioned Global Address Space Programming Models (PGAS'12), 2012.
- (2012) Proceedings of the Conference on Partitioned Global Address Space Programming Models (PGAS'12)
- Shan, H.¹ Austin, B.² Wright, N.³ Strohmaier, E.⁴ Shalf, J.⁵ Yelick, K.⁶

35
- 84899690113
- Infiniband architecture specification volume 1, release 1 2
- Infiniband Trade Association T.
- The InfiniBand Trade Association. Infiniband Architecture Specification Volume 1, Release 1. 2. InfiniBand Trade Association, 2004.
- (2004) InfiniBand Trade Association

36
- 34447571243
- UPC Consortium LBNL-59208
- UPC Consortium. UPC language specifications, v1. 2, 2005. LBNL-59208.
- (2005) UPC language specifications, v1 2

37
- 79959600862
- Active Pebbles: Parallel programming for data-driven applications
- ACM
- J. Willcock, T. Hoefler, N. Edmonds, and A. Lumsdaine. Active Pebbles: Parallel programming for data-driven applications. In Proceedings of the ACM International Conference on Supercomputing (ICS'11), pages 235-245. ACM, 2011.
- (2011) Proceedings of the ACM International Conference on Supercomputing (ICS'11) , pp. 235-245
- Willcock, J.¹ Hoefler, T.² Edmonds, N.³ Lumsdaine, A.⁴

38
- 11244333684
- M. Woodacre, D. Robb, D. Roe, and K. Feind. The SGI Altix TM 3000 global shared-memory architecture, 2003.
- (2003) The SGI Altix TM 3000 global shared-memory architecture
- Woodacre, M.¹ Robb, D.² Roe, D.³ Feind, K.⁴

39
- 33750234379
- High performance RDMA protocols in HPC
- Springer
- T. S. Woodall, G. M. Shipman, G. Bosilca, and A. B. Maccabe. High performance RDMA protocols in HPC. In Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI'06), volume LNCS 4192, pages 76-85. Springer, 2006.
- (2006) Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI'06), Volume LNCS 4192 , pp. 76-85
- Woodall, T.S.¹ Shipman, G.M.² Bosilca, G.³ Maccabe, A.B.⁴

40
- 83155193225
- Optimizing the Barnes-Hut algorithm in UPC
- 11. ACM
- J. Zhang, B. Behzad, and M. Snir. Optimizing the Barnes-Hut algorithm in UPC. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'11), pages 75:1-75:11. ACM, 2011.
- (2011) Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'11) , vol.75 , pp. 1-75
- Zhang, J.¹ Behzad, B.² Snir, M.³

41
- 84867648742
- Adaptive strategy for one-sided communication in MPICH2
- Springer
- X. Zhao, G. Santhanaraman, and W. Gropp. Adaptive strategy for one-sided communication in MPICH2. In Recent Advances in the Message Passing Interface (EuroMPI'12), pages 16-26. Springer, 2012.
- (2012) Recent Advances in the Message Passing Interface (EuroMPI'12) , pp. 16-26
- Zhao, X.¹ Santhanaraman, G.² Gropp, W.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.