SCOPUS 정보 검색 플랫폼

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Volumn , Issue , 2012, Pages

Optimization principles for collective neighborhood communications

(2) Hoefler, Torsten a Schneider, Timo b

a ETH ZURICH (Switzerland)

b UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COLLECTIVE COMMUNICATIONS; COLLECTIVE OPERATIONS; NEIGHBORHOOD COMMUNICATION; OPTIMIZATION HEURISTICS; OPTIMIZATION PRINCIPLE; OPTIMIZED IMPLEMENTATION; PERFORMANCE IMPROVEMENTS; SCIENTIFIC APPLICATIONS;

COMMUNICATION; ITERATIVE METHODS;

OPTIMIZATION;

EID: 84877693951 PISSN: 21674329 EISSN: 21674337 Source Type: Conference Proceeding
DOI: 10.1109/SC.2012.86 Document Type: Conference Paper

Times cited : (38)

References (39)

1
- 0025467711
- A bridging model for parallel computation
- L. G. Valiant, "A bridging model for parallel computation," Commun. ACM, vol. 33, no. 8, pp. 103-111, 1990.
- (1990) Commun. ACM , vol.33 , Issue.8 , pp. 103-111
- Valiant, L.G.¹

2
- 35248859849
- Improving the performance of collective operations in mpich
- Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer Verlag 257267 10th European PVM/MPI Users Group Meeting, Springer Verlag, 2003
- R. Thakur, "Improving the performance of collective operations in mpich," in Recent Advances in Parallel Virtual Machine and Message Passing Interface. Number 2840 in LNCS, Springer Verlag (2003) 257267 10th European PVM/MPI Users Group Meeting, pp. 257-267, Springer Verlag, 2003.
- (2003) LNCS , Issue.2840 , pp. 257-267
- Thakur, R.¹

3
- 1242332596
- Send-receive considered harmful: Myths and realities of message passing
- Jan.
- S. Gorlatch, "Send-receive considered harmful: Myths and realities of message passing," ACM Trans. Program. Lang. Syst., vol. 26, pp. 47-56, Jan. 2004.
- (2004) ACM Trans. Program. Lang. Syst. , vol.26 , pp. 47-56
- Gorlatch, S.¹

4
- 79960749447
- MPI Forum, June 23rd
- MPI Forum, MPI: A Message-Passing Interface Standard. Version 2.2, June 23rd 2009.
- (2009) MPI: A Message-Passing Interface Standard. Version 2.2

5
- 79951761626
- The Scalable Process Topology Interface of MPI 2.2
- Aug.
- T. Hoefler, R. Rabenseifner, H. Ritzdorf, B. R. de Supinski, R. Thakur, and J. L. Traeff, "The Scalable Process Topology Interface of MPI 2.2," Concurrency and Computation: Practice and Experience, vol. 23, pp. 293-310, Aug. 2010.
- (2010) Concurrency and Computation: Practice and Experience , vol.23 , pp. 293-310
- Hoefler, T.¹ Rabenseifner, R.² Ritzdorf, H.³ De Supinski, B.R.⁴ Thakur, R.⁵ Traeff, J.L.⁶

6
- 70450031957
- Sparse collective operations for MPI
- T. Hoefler and J. L. Traeff, "Sparse collective operations for MPI," in Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium (IPDPS), HIPS Workshop, May 2009.
- Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium (IPDPS), HIPS Workshop, May 2009
- Hoefler, T.¹ Traeff, J.L.²

7
- 56449130431
- Sparse Non-Blocking Collectives in Quantum Mechanical Calculations
- Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, Springer, Sep.
- T. Hoefler, F. Lorenzen, and A. Lumsdaine, "Sparse Non-Blocking Collectives in Quantum Mechanical Calculations," in Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, vol. LNCS 5205, pp. 55-63, Springer, Sep. 2008.
- (2008) LNCS , vol.5205 , pp. 55-63
- Hoefler, T.¹ Lorenzen, F.² Lumsdaine, A.³

8
- 34548691020
- MPI collective algorithm selection and quadtree encoding
- DOI 10.1016/j.parco.2007.06.005, PII S0167819107000804
- J. Pješivac-Grbović, G. Bosilca, G. E. Fagg, T. Angskun, and J. J. Dongarra, "Mpi collective algorithm selection and quadtree encoding," Parallel Comput., vol. 33, pp. 613-623, Sept. 2007. (Pubitemid 47418299)
- (2007) Parallel Computing , vol.33 , Issue.9 , pp. 613-623
- Pjesivac-Grbovic, J.¹ Bosilca, G.² Fagg, G.E.³ Angskun, T.⁴ Dongarra, J.J.⁵

9
- 0000729827
- Designing broadcasting algorithms in the postal model for message-passing systems
- A. Bar-Noy and S. Kipnis, "Designing broadcasting algorithms in the postal model for message-passing systems," Math. Syst. Theory, vol. 27, no. 5, pp. 431-452, 1994.
- (1994) Math. Syst. Theory , vol.27 , Issue.5 , pp. 431-452
- Bar-Noy, A.¹ Kipnis, S.²

10
- 85031726860
- Optimal broadcast and summation in the LogP model
- R. M. Karp, A. Sahay, E. E. Santos, and K. E. Schauser, "Optimal broadcast and summation in the LogP model," in Proc. of Symposium on Parallel Algorithms and Architectures, pp. 142-153, 1993.
- (1993) Proc. of Symposium on Parallel Algorithms and Architectures , pp. 142-153
- Karp, R.M.¹ Sahay, A.² Santos, E.E.³ Schauser, K.E.⁴

11
- 71549164097
- Two-tree algorithms for full bandwidth broadcast, reduction and scan
- December
- P. Sanders, J. Speck, and J. L. Träff, "Two-tree algorithms for full bandwidth broadcast, reduction and scan," Parallel Comput., vol. 35, pp. 581-594, December 2009.
- (2009) Parallel Comput. , vol.35 , pp. 581-594
- Sanders, P.¹ Speck, J.² Träff, J.L.³

12
- 33750234379
- High performance RDMA protocols in HPC
- Proceedings, 13th European PVM/MPI Users' Group Meeting, (Bonn, Germany), Springer-Verlag, September
- "High performance RDMA protocols in HPC," in Proceedings, 13th European PVM/MPI Users' Group Meeting, Lecture Notes in Computer Science, (Bonn, Germany), Springer-Verlag, September 2006.
- (2006) Lecture Notes in Computer Science

13
- 0002076006
- An upper bound for the chromatic number of a graph and its application to timetabling problems
- D. J. A. Welsh and M. B. Powell, "An upper bound for the chromatic number of a graph and its application to timetabling problems," The Computer Journal, vol. 10, no. 1, pp. 85-86, 1967.
- (1967) The Computer Journal , vol.10 , Issue.1 , pp. 85-86
- Welsh, D.J.A.¹ Powell, M.B.²

14
- 0008669884
- LogGP: Incorporating long messages into the LogP model for parallel computation
- DOI 10.1006/jpdc.1997.1346, PII S0743731597913460
- A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman, "LogGP: Incorporating long messages into the LogP model," J. of Par. and Distr. Comp., vol. 44, no. 1, pp. 71-79, 1995. (Pubitemid 127340829)
- (1997) Journal of Parallel and Distributed Computing , vol.44 , Issue.1 , pp. 71-79
- Alexandrov, A.¹ Ionescu, M.F.² Schauser, K.E.³ Scheiman, C.⁴

15
- 21044437801
- Overview of the bluegene/l system architecture
- A. Gara, M. A. Blumrich, D. Chen, G. L.-T. Chiu, M. E. G. P. Coteus, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-Burow, T. Takken, and P. Vranas, "Overview of the bluegene/l system architecture," IBM Journal of Research and Development, vol. 49, no. 2, pp. 195-213, 2005.
- (2005) IBM Journal of Research and Development , vol.49 , Issue.2 , pp. 195-213
- Gara, A.¹ Blumrich, M.A.² Chen, D.³ Chiu, G.L.-T.⁴ Coteus, M.E.G.P.⁵ Haring, R.A.⁶ Heidelberger, P.⁷ Hoenicke, D.⁸ Kopcsay, G.V.⁹ Liebsch, T.A.¹⁰ Ohmacht, M.¹¹ Steinmacher-Burow, B.D.¹² Takken, T.¹³ Vranas, P.¹⁴

16
- 56749151145
- Implementation and performance analysis of nonblocking collective operations for mpi
- CDROM
- T. Hoefler, A. Lumsdaine, and W. Rehm, "Implementation and performance analysis of nonblocking collective operations for mpi," in Proc. of the 2007 ACM/IEEE conference on Supercomputing (CDROM), 2007.
- (2007) Proc. of the 2007 ACM/IEEE Conference on Supercomputing
- Hoefler, T.¹ Lumsdaine, A.² Rehm, W.³

17
- 84886312809
- Dmapp - An api for one-sided program models on baker systems
- M. ten Bruggencate and D. Roweth, "Dmapp - an api for one-sided program models on baker systems," in Cray User Group Conference, CUG, 2010.
- Cray User Group Conference, CUG, 2010
- Ten Bruggencate, M.¹ Roweth, D.²

18
- 77958112922
- The gemini system interconnect
- IEEE Computer Society
- R. Alverson, D. Roweth, and L. Kaplan, "The gemini system interconnect," in Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects, HOTI '10, (Washington, DC, USA), pp. 83-87, IEEE Computer Society, 2010.
- (2010) Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects, HOTI '10, (Washington, DC, USA) , pp. 83-87
- Alverson, R.¹ Roweth, D.² Kaplan, L.³

19
- 77958103019
- Hyper Transport Consortium
- Hyper Transport Consortium, HyperTransport I/O Technology Overview An Optimized, Low-latency Board-level Architecture, 2004.
- (2004) HyperTransport I/O Technology Overview An Optimized, Low-latency Board-level Architecture

20
- 11244333684
- M. Woodacre, D. Robb, D. Roe, and K. Feind, "The sgi® altixtm 3000 global shared-memory architecture," 2005.
- (2005) The Sgi® Altixtm 3000 Global Shared-memory Architecture
- Woodacre, M.¹ Robb, D.² Roe, D.³ Feind, K.⁴

21
- 77951446376
- Group Operation Assembly Language - A Flexible Way to Express Collective Communication
- T. Hoefler, C. Siebert, and A. Lumsdaine, "Group Operation Assembly Language - A Flexible Way to Express Collective Communication," in Intl. Conf. on Par. Proc., Sep. 2009.
- Intl. Conf. on Par. Proc., Sep. 2009
- Hoefler, T.¹ Siebert, C.² Lumsdaine, A.³

22
- 0001847481
- On the evolution of random graphs
- P. Erdo{combining double acute accent}s and A. Rényi, "On the evolution of random graphs," in Publication of the Mathematical Institute of the Hungarian Academy of Sciences, pp. 17-61, 1960.
- (1960) Publication of the Mathematical Institute of the Hungarian Academy of Sciences , pp. 17-61
- Erdos, P.¹ Rényi, A.²

23
- 39749134275
- A time-split nonhydrostatic atmospheric model for weather research and forecasting applications
- Mar.
- W. C. Skamarock and J. B. Klemp, "A time-split nonhydrostatic atmospheric model for weather research and forecasting applications," J. Comput. Phys., vol. 227, pp. 3465-3485, Mar. 2008.
- (2008) J. Comput. Phys. , vol.227 , pp. 3465-3485
- Skamarock, W.C.¹ Klemp, J.B.²

24
- 0000331979
- Lattice boltzmann method for 3-d flows with curved boundary
- July
- R. Mei, W. Shyy, D. Yu, and L.-S. Luo, "Lattice boltzmann method for 3-d flows with curved boundary," J. Comput. Phys., vol. 161, pp. 680-699, July 2000.
- (2000) J. Comput. Phys. , vol.161 , pp. 680-699
- Mei, R.¹ Shyy, W.² Yu, D.³ Luo, L.-S.⁴

25
- 84973786808
- Studying Quarks and Gluons On Mimd Parallel Computers
- C. Bernard, M. C. Ogilvie, T. A. DeGrand, C. E. DeTar, S. A. Gottlieb, A. Krasnitz, R. Sugar, and D. Toussaint, "Studying Quarks and Gluons On Mimd Parallel Computers," International Journal of High Performance Computing Applications, vol. 5, no. 4, pp. 61-70, 1991.
- (1991) International Journal of High Performance Computing Applications , vol.5 , Issue.4 , pp. 61-70
- Bernard, C.¹ Ogilvie, M.C.² DeGrand, T.A.³ DeTar, C.E.⁴ Gottlieb, S.A.⁵ Krasnitz, A.⁶ Sugar, R.⁷ Toussaint, D.⁸

26
- 0013269731
- University of Florida Sparse Matrix Collection
- T. A. Davis, "University of Florida Sparse Matrix Collection," NA Digest, vol. 92, 1994.
- (1994) NA Digest , vol.92
- Davis, T.A.¹

27
- 0036505103
- Parallel static and dynamic multi-constraint graph partitioning
- DOI 10.1002/cpe.605
- K. Schloegel, G. Karypis, and V. Kumar, "Parallel static and dynamic multi-constraint graph partitioning," Concurrency and Computation: Practice and Experience, vol. 14, no. 3, pp. 219-240, 2002. (Pubitemid 34460007)
- (2002) Concurrency Computation Practice and Experience , vol.14 , Issue.3 , pp. 219-240
- Schloegel, K.¹ Karypis, G.² Kumar, V.³

28
- 0037249228
- Parallel algebraic multigrid methods on distributed memory computers
- Feb.
- G. Haase, M. Kuhn, and S. Reitzinger, "Parallel algebraic multigrid methods on distributed memory computers," SIAM J. Sci. Comput., vol. 24, pp. 410-427, Feb. 2002.
- (2002) SIAM J. Sci. Comput. , vol.24 , pp. 410-427
- Haase, G.¹ Kuhn, M.² Reitzinger, S.³

29
- 84883516917
- Efficient algorithms for all-to-all communications in multi-port message-passing systems
- J. Bruck, C. T. Ho, S. Kipnis, and D. Weathersby, "Efficient algorithms for all-to-all communications in multi-port message-passing systems," in 6th ACM Symp. on Par. Alg. and Arch., pp. 298-309, 1994.
- (1994) 6th ACM Symp. on Par. Alg. and Arch. , pp. 298-309
- Bruck, J.¹ Ho, C.T.² Kipnis, S.³ Weathersby, D.⁴

30
- 0242308158
- Communication characteristics of large-scale scientific applications for contemporary cluster architectures
- DOI 10.1016/S0743-7315(03)00104-7
- J. S. Vetter and F. Mueller, "Communication characteristics of large-scale scientific applications for contemporary cluster architectures," J. Parallel Distrib. Comput., vol. 63, pp. 853-865, Sept. 2003. (Pubitemid 37364491)
- (2003) Journal of Parallel and Distributed Computing , vol.63 , Issue.9 , pp. 853-865
- Vetter, J.S.¹ Mueller, F.²

31
- 75449107210
- Communication requirements and interconnect optimization for high-end scientific applications
- S. Kamil, L. Oliker, A. Pinar, and J. Shalf, "Communication requirements and interconnect optimization for high-end scientific applications," IEEE Trans. Parallel Distrib. Syst., vol. 21, no. 2, pp. 188-202, 2010.
- (2010) IEEE Trans. Parallel Distrib. Syst. , vol.21 , Issue.2 , pp. 188-202
- Kamil, S.¹ Oliker, L.² Pinar, A.³ Shalf, J.⁴

32
- 0029717350
- Automatic optimization of communication in compiling out-of-core stencil codes
- ACM
- R. Bordawekar, A. Choudhary, and J. Ramanujam, "Automatic optimization of communication in compiling out-of-core stencil codes," in Proceedings of the 10th international conference on Supercomputing, ICS '96, (New York, NY, USA), pp. 366-373, ACM, 1996.
- (1996) Proceedings of the 10th International Conference on Supercomputing, ICS '96, (New York, NY, USA) , pp. 366-373
- Bordawekar, R.¹ Choudhary, A.² Ramanujam, J.³

33
- 34548752231
- Towards optimal multi-level tiling for stencil computations
- march
- L. Renganarayana, M. Harthikote-Matha, R. Dewri, and S. Rajopadhye, "Towards optimal multi-level tiling for stencil computations," in Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pp. 1-10, march 2007.
- (2007) Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International , pp. 1-10
- Renganarayana, L.¹ Harthikote-Matha, M.² Dewri, R.³ Rajopadhye, S.⁴

34
- 35448944792
- Effective automatic parallelization of stencil computations
- ACM
- S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan, "Effective automatic parallelization of stencil computations," in Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI '07, (New York, NY, USA), pp. 235-244, ACM, 2007.
- (2007) Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, (New York, NY, USA) , pp. 235-244
- Krishnamoorthy, S.¹ Baskaran, M.² Bondhugula, U.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

35
- 84877705001
- tech. rep., Ohio State University, OSU-CISRC-12/09
- S. Potluri, P. Lai, K. Tomko, Y. Cui, M. Tatineni, K. Schulz, W. Barth, A. Majumdar, and D. Panda, "Optimizing a Stencil-Based Application for Earthquake Modeling on Modern InfiniBand Clusters," tech. rep., Ohio State University, 2009. OSU-CISRC-12/09.
- (2009) Optimizing a Stencil-Based Application for Earthquake Modeling on Modern InfiniBand Clusters
- Potluri, S.¹ Lai, P.² Tomko, K.³ Cui, Y.⁴ Tatineni, M.⁵ Schulz, K.⁶ Barth, W.⁷ Majumdar, A.⁸ Panda, D.⁹

36
- 84871158565
- Towards performance portability through runtime adaptation for high-performance computing applications
- Nov.
- E. Gabriel, S. Feki, K. Benkert, and M. M. Resch, "Towards performance portability through runtime adaptation for high-performance computing applications," Concurr. Comput. : Pract. Exper., vol. 22, pp. 2230-2246, Nov. 2010.
- (2010) Concurr. Comput.: Pract. Exper. , vol.22 , pp. 2230-2246
- Gabriel, E.¹ Feki, S.² Benkert, K.³ Resch, M.M.⁴

37
- 77953986067
- Optimization of applications with non-blocking neighborhood collectives via multisends on the blue gene/p supercomputer
- april
- S. Kumar, P. Heidelberger, D. Chen, and M. Hines, "Optimization of applications with non-blocking neighborhood collectives via multisends on the blue gene/p supercomputer," in Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pp. 1 -11, april 2010.
- (2010) Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on , pp. 1-11
- Kumar, S.¹ Heidelberger, P.² Chen, D.³ Hines, M.⁴

38
- 0001483604
- Communication optimizations for irregular scientific computations on distributed memory architectures
- Sept.
- R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang, "Communication optimizations for irregular scientific computations on distributed memory architectures," J. Parallel Distrib. Comput., vol. 22, pp. 462-478, Sept. 1994.
- (1994) J. Parallel Distrib. Comput. , vol.22 , pp. 462-478
- Das, R.¹ Uysal, M.² Saltz, J.³ Hwang, Y.-S.⁴

39
- 34248373234
- Star-mpi: Self tuned adaptive routines for mpi collective operations
- ACM
- A. Faraj, X. Yuan, and D. Lowenthal, "Star-mpi: self tuned adaptive routines for mpi collective operations," in Proceedings of the 20th annual international conference on Supercomputing, ICS '06, (New York, NY, USA), pp. 199-208, ACM, 2006.
- (2006) Proceedings of the 20th Annual International Conference on Supercomputing, ICS '06, (New York, NY, USA) , pp. 199-208
- Faraj, A.¹ Yuan, X.² Lowenthal, D.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.