SCOPUS 정보 검색 플랫폼

1st USENIX Workshop on Hot Topics in Parallelism, HotPar 2009

Volumn , Issue , 2009, Pages

Optimizing collective communication on multicores

(2) Nishtala, Rajesh a Yelick, Katherine A a

a UNIVERSITY OF CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

SEMANTICS;

APPLICATION PERFORMANCE; AUTOMATIC TUNING TECHNIQUES; COLLECTIVE COMMUNICATION OPERATIONS; COLLECTIVE COMMUNICATIONS; COMMUNICATION COMPONENTS; COMMUNICATION OPERATION; MULTI-CORE SYSTEMS; PARTITIONED GLOBAL ADDRESS SPACE;

SYNCHRONIZATION;

EID: 85092794892 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (19)

References (21)

1
- 85092791812
- Intel Xeon Quad Processor
- Intel Xeon Quad Processor. http://www.intel.com/products/processor/xeon5000.

2
- 85092781517
- Sun Fire X4600 M2 Server Information. http://www.sun.com/servers/x64/x4600/.
- Sun Fire X4600 M2 Server Information

3
- 85092791414
- Sun Ultra Sparc T2 Processor Information. http://www.sun.com/processors/UltraSPARC-T2/.
- Sun Ultra Sparc T2 Processor Information

4
- 32844464238
- Optimization of mpi collective communication on BlueGene/L systems
- (New York, NY, USA), ACM Press
- ALMÁSI, G., HEIDELBERGER, P., ARCHER, C. J., MARTORELL, X., ERWAY, C. C., MOREIRA, J. E., STEINMACHERBUROW, B., AND ZHENG, Y. Optimization of mpi collective communication on BlueGene/L systems. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing (New York, NY, USA, 2005), ACM Press, pp. 253-262.
- (2005) ICS '05: Proceedings of the 19th annual international conference on Supercomputing , pp. 253-262
- ALMÁSI, G.¹ HEIDELBERGER, P.² ARCHER, C. J.³ MARTORELL, X.⁴ ERWAY, C. C.⁵ MOREIRA, J. E.⁶ STEINMACHERBUROW, B.⁷ ZHENG, Y.⁸

5
- 35648995516
- Tech. Rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec
- ASANOVIC, K., BODIK, R., CATANZARO, B. C., GEBIS, J. J., HUSBANDS, P., KEUTZER, K., PATTERSON, D. A., PLISHKER, W. L., SHALF, J., WILLIAMS, S. W., AND YELICK, K. A. The landscape of parallel computing research: A view from berkeley. Tech. Rep. UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec 2006.
- (2006) The landscape of parallel computing research: A view from berkeley
- ASANOVIC, K.¹ BODIK, R.² CATANZARO, B. C.³ GEBIS, J. J.⁴ HUSBANDS, P.⁵ KEUTZER, K.⁶ PATTERSON, D. A.⁷ PLISHKER, W. L.⁸ SHALF, J.⁹ WILLIAMS, S. W.¹⁰ YELICK, K. A.¹¹

6
- 85092794707
- Nonuniformly communicating noncontiguous data: A case study with petsc and mpi
- BALAJI, P., BUNTINAS, D., BALAY, S., SMITH, B., THAKUR, R., AND GROPP, W. Nonuniformly communicating noncontiguous data: A case study with petsc and mpi. In IEEE Parallel and Distributed Processing Symposium (IPDPS) (2006).
- (2006) IEEE Parallel and Distributed Processing Symposium (IPDPS)
- BALAJI, P.¹ BUNTINAS, D.² BALAY, S.³ SMITH, B.⁴ THAKUR, R.⁵ GROPP, W.⁶

7
- 33847103649
- Optimizing bandwidth limited problems using one-sided communication and overlap
- BELL, C., BONACHEA, D., NISHTALA, R., AND YELICK, K. Optimizing bandwidth limited problems using one-sided communication and overlap. In The 20th Int'l Parallel and Distributed Processing Symposium (IPDPS) (2006).
- (2006) The 20th Int'l Parallel and Distributed Processing Symposium (IPDPS)
- BELL, C.¹ BONACHEA, D.² NISHTALA, R.³ YELICK, K.⁴

8
- 0031269329
- Efficient algorithms for all-to-all communications in multiport message-passing systems
- BRUCK, J., HO, C.-T., UPFAL, E., KIPNIS, S., AND WEATHERSBY, D. Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Trans. Parallel Distrib. Syst. 8, 11 (1997), 1143-1156.
- (1997) IEEE Trans. Parallel Distrib. Syst , vol.8 , Issue.11 , pp. 1143-1156
- BRUCK, J.¹ HO, C.-T.² UPFAL, E.³ KIPNIS, S.⁴ WEATHERSBY, D.⁵

9
- 10044225941
- Co-array Fortran performance and potential: An NPB experimental study
- (October)
- COARFA, C., DOTSENKO, Y., ECKHARDT, J., AND MELLORCRUMMEY, J. Co-array Fortran performance and potential: An NPB experimental study. In 16th Int'l Workshop on Languages and Compilers for Parallel Processing (LCPC) (October 2003).
- (2003) 16th Int'l Workshop on Languages and Compilers for Parallel Processing (LCPC)
- COARFA, C.¹ DOTSENKO, Y.² ECKHARDT, J.³ MELLORCRUMMEY, J.⁴

10
- 84976790986
- Towards a realistic model of parallel computation
- CULLER, D. E., KARP, R. M., PATTERSON, D. A., SAHAY, A., SCHAUSER, K. E., SANTOS, E., SUBRAMONIAN, R., AND VON EICKEN, T. LogP: Towards a realistic model of parallel computation. In Proc. 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (1993), pp. 1-12.
- (1993) Proc. 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 1-12
- CULLER, D. E.¹ KARP, R. M.² PATTERSON, D. A.³ SAHAY, A.⁴ SCHAUSER, K. E.⁵ SANTOS, E.⁶ SUBRAMONIAN, R.⁷ VON EICKEN, T.⁸ Log, P⁹

11
- 70350771127
- Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures
- (November)
- DATTA, K., MURPHY, M., VOLKOV, V., WILLIAMS, S., CARTER, J., OLIKER, L., PATTERSON, D., SHALF, J., AND YELICK, K. Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures. In Supercomputing 2008 (SC08) (November 2008).
- (2008) Supercomputing 2008 (SC08)
- DATTA, K.¹ MURPHY, M.² VOLKOV, V.³ WILLIAMS, S.⁴ CARTER, J.⁵ OLIKER, L.⁶ PATTERSON, D.⁷ SHALF, J.⁸ YELICK, K.⁹

12
- 50649091849
- Mpi collectives on modern multicore clusters: Performance optimizations and communication characteristics
- Lyon, France (May)
- MAMIDALA, A., KUMAR, R., DE, D., AND PANDA, D. K. Mpi collectives on modern multicore clusters: Performance optimizations and communication characteristics. Int'l Symposium on Cluster Computing and the Grid, Lyon, France (May 2008).
- (2008) Int'l Symposium on Cluster Computing and the Grid
- MAMIDALA, A.¹ KUMAR, R.² DE, D.³ PANDA, D. K.⁴

13
- 84976718540
- Algorithms for scalable synchronization on shared-memory multiprocessors
- MELLOR-CRUMMEY, J. M., AND SCOTT, M. L. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (1991), 21-65.
- (1991) ACM Trans. Comput. Syst , vol.9 , Issue.1 , pp. 21-65
- MELLOR-CRUMMEY, J. M.¹ SCOTT, M. L.²

14
- 0003413672
- v1.1. Technical report, University of Tennessee, Knoxville, June 12
- MPI: A message-passing interface standard, v1.1. Technical report, University of Tennessee, Knoxville, June 12, 1995.
- (1995) MPI: A message-passing interface standard

15
- 70350625706
- Performance without pain = productivity: Data layout and collective communication in upc
- NISHTALA, R., ALMASI, G., AND CASCAVAL, C. Performance without pain = productivity: Data layout and collective communication in upc. In Principles and Practices of Parallel Programming (PPoPP) (2008).
- (2008) Principles and Practices of Parallel Programming (PPoPP)
- NISHTALA, R.¹ ALMASI, G.² CASCAVAL, C.³

16
- 84893341343
- PhD thesis, University of Tennessee, Knoxville, December
- PJEŠIVAC-GRBOVIĆ, J. Towards Automatic and Adaptive Optimizations of MPI Collective Operations. PhD thesis, University of Tennessee, Knoxville, December 2007.
- (2007) Towards Automatic and Adaptive Optimizations of MPI Collective Operations
- PJEŠIVAC-GRBOVIĆ, J.¹

17
- 33847138695
- Efficient rdma-based multi-port collectives on multi-rail qsnetii clusters
- QIAN, Y., AND AFSAHI, A. Efficient rdma-based multi-port collectives on multi-rail qsnetii clusters. In The 6th Workshop on Communication Architecture for Clusters (CAC 2006), In Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006) (2006).
- (2006) The 6th Workshop on Communication Architecture for Clusters (CAC 2006), In Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006)
- QIAN, Y.¹ AFSAHI, A.²

18
- 34447571243
- v1.2. Tech. Rep. LBNL-59208, Berkeley National Lab
- UPC language specifications, v1.2. Tech. Rep. LBNL-59208, Berkeley National Lab, 2005.
- (2005) UPC language specifications

19
- 51049106193
- Lattice Boltzmann simulation optimization on leading multicore platforms
- WILLIAMS, S., CARTER, J., OLIKER, L., SHALF, J., AND YELICK, K. Lattice Boltzmann simulation optimization on leading multicore platforms. In Interational Conference on Parallel and Distributed Computing Systems (IPDPS) (2008).
- (2008) Interational Conference on Parallel and Distributed Computing Systems (IPDPS)
- WILLIAMS, S.¹ CARTER, J.² OLIKER, L.³ SHALF, J.⁴ YELICK, K.⁵

20
- 51549110265
- Optimization of sparse matrix-vector multiplication on emerging multicore platforms
- WILLIAMS, S., OLIKER, L., VUDUC, R., SHALF, J., YELICK, K., AND DEMMEL, J. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of Supercomputing 2007 (2007).
- (2007) Proceedings of Supercomputing 2007
- WILLIAMS, S.¹ OLIKER, L.² VUDUC, R.³ SHALF, J.⁴ YELICK, K.⁵ DEMMEL, J.⁶

21
- 0001310691
- Titanium: a high performance java dialect
- (February)
- YELICK, K., SEMENZATO, L., PIKE, G., MIYAMOTO, C., LIBLIT, B., KRISHNAMURTHY, A., HILFINGER, P., GRAHAM, S., GAY, D., COLELLA, P., AND AIKEN, A. Titanium: a high performance java dialect. In Proc. of ACM 1998 Workshop on Java for High-Performance Network Computing (February 1998).
- (1998) Proc. of ACM 1998 Workshop on Java for High-Performance Network Computing
- YELICK, K.¹ SEMENZATO, L.² PIKE, G.³ MIYAMOTO, C.⁴ LIBLIT, B.⁵ KRISHNAMURTHY, A.⁶ HILFINGER, P.⁷ GRAHAM, S.⁸ GAY, D.⁹ COLELLA, P.¹⁰ AIKEN, A.¹¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.