SCOPUS 정보 검색 플랫폼

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Volumn , Issue , 2008, Pages 99-109

Performance without pain = Productivity data layout and collective communication in UPC

(3) Nishtala, Rajesh a Almási, George b Caşcaval, Cǎlin b

a UNIVERSITY OF CALIFORNIA (United States)

b IBM T J WATSON RESEARCH CENTER (United States)

Author keywords

Blue gene; Collective communication; Parallel programming; PGAS; Programming productivity; UPC

Indexed keywords

BLUE GENE; CHOLESKY FACTORIZATIONS; COLLECTIVE COMMUNICATIONS; DENSE MATRICES; MACHINE RESOURCES; MULTIDIMENSIONAL FOURIER TRANSFORM; PARALLEL LANGUAGES; PARTITIONED GLOBAL ADDRESS SPACE; PGAS; PRODUCTIVITY DATA; RUNTIME SYSTEMS; UNIFIED PARALLEL C; UPC; UPC CODE;

C (PROGRAMMING LANGUAGE); COMPUTER PROGRAMMING LANGUAGES; FOURIER TRANSFORMS; GENES; PARALLEL PROCESSING SYSTEMS; PRODUCTIVITY; SCALABILITY; SUPERCOMPUTERS;

PARALLEL PROGRAMMING;

EID: 70350625706 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (21)

References (48)

1
- 7444229864
- The cascade high productivity language
- The cascade high productivity language, hips, 00: 52-60, 2004.
- (2004) Hips , vol.0 , pp. 52-60

2
- 0028757636
- A high performance parallel algorithm for 1-d fft
- Los Alamitos, CA, USA, IEEE Computer Society Press
- R. C. Agarwal, F. G. Gustavson, and M. Zubair. A high performance parallel algorithm for 1-d fft. In Supercomputing '94: Proceedings of the 1994 conference on Supercomputing, pages 34-40, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press.
- (1994) Supercomputing '94: Proceedings of the 1994 Conference on Supercomputing , pp. 34-40
- Agarwal, R.C.¹ Gustavson, F.G.² Zubair, M.³

3
- 33646421297
- Sun Microsystems, Inc., 1.0α edition, Sept.
- E. Allen, D. Chase, J. Hallett, V. Luchangco, J.-W. Maessen, S. Ryu, G. L. Steele Jr., and S. Tobin-Hochstadt. The Fortress Language Specification. Sun Microsystems, Inc., 1.0α edition, Sept. 2006.
- (2006) The Fortress Language Specification.
- Allen, E.¹ Chase, D.² Hallett, J.³ Luchangco, V.⁴ Maessen, J.-W.⁵ Ryu, S.⁶ Steele Jr., G.L.⁷ Tobin-Hochstadt, S.⁸

4
- 21044456455
- Design and implementation of message-passing services for the Blue Gene/L supercomputer
- March/May
- G. Almási, C. Archer, J. G. C. nos, J. A. Gunnels, C. C. Erway, P. Heidelberger, X. Martorell, J. E. Moreira, K. Pinnow, J. Ratterman, B. SteinmacherBurow, W. Gropp, and B. Toonen. Design and implementation of message-passing services for the Blue Gene/L supercomputer. IBM Journal of Research and Development, 49(2/3): 393-406, March/May 2005. Available at http://www.research.ibm.com/journal/rd49-23.html.
- (2005) IBM Journal of Research and Development , vol.49 , Issue.2-3 , pp. 393-406
- Almási, G.¹ Archer, C.² Nos, J.G.C.³ Gunnels, J.A.⁴ Erway, C.C.⁵ Heidelberger, P.⁶ Martorell, X.⁷ Moreira, J.E.⁸ Pinnow, K.⁹ Ratterman, J.¹⁰ SteinmacherBurow, B.¹¹ Gropp, W.¹² Toonen, B.¹³

5
- 32844464238
- Optimization of mpi collective communication on bluegene/1 systems
- New York, NY, USA, ACM Press
- G. Almási, P. Heidelberger, C. J. Archer, X. Martorell, C. C. Erway, J. E. Moreira, B. Steinmacher-Burow, and Y. Zheng. Optimization of mpi collective communication on bluegene/1 systems. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, pages 253-262, New York, NY, USA, 2005. ACM Press.
- (2005) ICS '05: Proceedings of the 19th Annual International Conference on Supercomputing , pp. 253-262
- Almási, G.¹ Heidelberger, P.² Archer, C.J.³ Martorell, X.⁴ Erway, C.C.⁵ Moreira, J.E.⁶ Steinmacher-Burow, B.⁷ Zheng, Y.⁸

6
- 34548784885
- Nonuniformly communicating noncontiguous data: A case study with petsc and mpi
- P. Balaji, D. Buntinas, S. Balay, B. Smith, R. Thakur, and W Gropp. Nonuniformly communicating noncontiguous data: A case study with petsc and mpi. In IEEE Parallel and Distributed Processing Symposium (IPDPS), 2006.
- (2006) IEEE Parallel and Distributed Processing Symposium (IPDPS)
- Balaji, P.¹ Buntinas, D.² Balay, S.³ Smith, B.⁴ Thakur, R.⁵ Gropp, W.⁶

7
- 0003660984
- PETSc users manual
- Argonne National Laboratory
- S. Balay, K. Buschelman, V. Eijkhout, W D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 2.1.5, Argonne National Laboratory, 2004.
- (2004) Technical Report ANL-95/11 - Revision 2.1.5
- Balay, S.¹ Buschelman, K.² Eijkhout, V.³ Gropp, W.D.⁴ Kaushik, D.⁵ Knepley, M.G.⁶ McInnes, L.C.⁷ Smith, B.F.⁸ Zhang, H.⁹

8
- 33746070421
- Shared memory programming for large scale machines
- Ottawa, Canada
- C. Barton, C. Caşcaval, G. Almási, Y. Zheng, M. Farreras, S. Chatterjee, and J. N. Amaral. Shared memory programming for large scale machines. In Programming Language Design and Implementation (PLDI), Ottawa, Canada, 2006.
- (2006) Programming Language Design and Implementation (PLDI)
- Barton, C.¹ Caşcaval, C.² Almási, G.³ Zheng, Y.⁴ Farreras, M.⁵ Chatterjee, S.⁶ Amaral, J.⁷

9
- 79959417706
- Multidimensional blocking in UPC
- IBM, July
- C. Barton, C. Cascaval, G. Almasi, R. Garg, and J. N. Amaral. Multidimensional blocking in UPC. Technical Report RC24305, IBM, July 2007.
- (2007) Technical Report RC24305
- Barton, C.¹ Cascaval, C.² Almasi, G.³ Garg, R.⁴ Amaral, J.⁵

10
- 33847103649
- Optimizing bandwidth limited problems using one-sided communication and overlap
- C. Bell, D. Bonachea, R. Nishtala, and K. Yelick. Optimizing bandwidth limited problems using one-sided communication and overlap. In The 20th Int'l Parallel and Distributed Processing Symposium (IPDPS), 2006.
- (2006) The 20th Int'l Parallel and Distributed Processing Symposium (IPDPS)
- Bell, C.¹ Bonachea, D.² Nishtala, R.³ Yelick, K.⁴

11
- 79959386767
- The Berkeley UPC Compiler
- The Berkeley UPC Compiler, 2002. http : //upc.1b1.gov.
- (2002)

12
- 79959485086
- BLAS Home Page
- BLAS Home Page, http://www.netlib.org/blas/.

13
- 79959389951
- J. Bruck, C.-T. Ho, S. Kipnis, E. Upfal, and D. W. y. Efficient algorithms for all-to-all communications in multiport messagepassing systems. 1997.
- (1997) D. W. Y. Efficient Algorithms for All-to-all Communications in Multiport Messagepassing Systems
- Bruck, J.¹ Ho, C.-T.² Kipnis, S.³ Upfal, E.⁴

14
- 0003712293
- PhD thesis, Montanat State University
- L. E. Cannon. A cellular computer to implement the kalman filter algorithm. PhD thesis, Montanat State University, 1969.
- (1969) A Cellular Computer to Implement the Kalman Filter Algorithm.
- Cannon, L.E.¹

15
- 12444259721
- Productivity analysis of the UPC language
- F. Cantonnet, Y. Yao, M. Zahraň, and T. El-Ghazawi. Productivity Analysis of the UPC Language. In IPDPS, 2004.
- (2004) IPDPS
- Cantonnet, F.¹ Yao, Y.² Zahraň, M.³ El-Ghazawi, T.⁴

16
- 0009930394
- ZPL: A machine independent programming language for parallel computers
- B. L. Chamberlain, S.-E. Choi, E. C. Lewis, C. Lin, L. Snyder, and D. Weathersby. ZPL: A machine independent programming language for parallel computers. Software Engineering, 26(3): 197-211, 2000.
- (2000) Software Engineering , vol.26 , Issue.3 , pp. 197-211
- Chamberlain, B.L.¹ Choi, S.-E.² Lewis, E.C.³ Lin, C.⁴ Snyder, L.⁵ Weathersby, D.⁶

17
- 1142293067
- A performance analysis of the berkeley UPC compiler
- June
- W Chen, D. Bonachea, J. Duell, P. Husband, C. Iancu, and K. Yelick. A Performance Analysis of the Berkeley UPC Compiler. In Proc. of Int'l Conference on Supercomputing (ICS), June 2003.
- (2003) Proc. of Int'l Conference on Supercomputing (ICS)
- Chen, W.¹ Bonachea, D.² Duell, J.³ Husband, P.⁴ Iancu, C.⁵ Yelick, K.⁶

18
- 84947808952
- A proposal for a set of parallel basic linear algebra subprograms
- London, UK, Springer-Verlag
- J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. W Walker, and R. C. Whaley. A proposal for a set of parallel basic linear algebra subprograms. In PARA '95: Proceedings of the Second International Workshop on Applied Parallel Computing, Computations in Physics, Chemistry and Engineering Science, pages 107-114, London, UK, 1996. Springer-Verlag.
- (1996) PARA '95: Proceedings of the Second International Workshop on Applied Parallel Computing, Computations in Physics, Chemistry and Engineering Science , pp. 107-114
- Choi, J.¹ Dongarra, J.² Ostrouchov, S.³ Petitet, A.⁴ Walker, D.W.⁵ Whaley, R.C.⁶

19
- 31844441256
- An evaluation of global address space languages: Co-array fortran and unified parallel c
- New York, NY, USA, ACM Press
- C. Coarfa, Y Dotsenko, J. Mellor-Crummey, F. Cantonnet, T. ElGhazawi, A. Mohanti, Y. Yao, and D. Chavarría-Miranda. An evaluation of global address space languages: co-array fortran and unified parallel c. In PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 36-47, New York, NY, USA, 2005. ACM Press.
- (2005) PPoPP '05: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 36-47
- Coarfa, C.¹ Dotsenko, Y.² Mellor-Crummey, J.³ Cantonnet, F.⁴ ElGhazawi, T.⁵ Mohanti, A.⁶ Yao, Y.⁷ Chavarría-Miranda, D.⁸

20
- 0009346826
- LogP: Towards a realistic model of parallel computation
- D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a realistic model of parallel computation. In Proc. 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1-12, 1993.
- (1993) Proc. 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 1-12
- Culler, D.E.¹ Karp, R.M.² Patterson, D.A.³ Sahay, A.⁴ Schauser, K.E.⁵ Santos, E.⁶ Subramonian, R.⁷ Von Eicken, T.⁸

21
- 33847121535
- Titanium performance and potential: An NPB experimental study
- K. Datta, D. Bonachea, and K. Yelick. Titanium performance and potential: an NPB experimental study. In Proc. of Languages and Compilers for Parallel Computing, 2005.
- (2005) Proc. of Languages and Compilers for Parallel Computing
- Datta, K.¹ Bonachea, D.² Yelick, K.³

22
- 80052802178
- Upc performance and potential: A npb experimental study
- Los Alamitos, CA, USA, IEEE Computer Society Press
- T. El-Ghazawi and F. Cantonnet. Upc performance and potential: a npb experimental study. In Supercomputing '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1-26, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.
- (2002) Supercomputing '02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing , pp. 1-26
- El-Ghazawi, T.¹ Cantonnet, F.²

23
- 54249097779
- ESSL User Guide. http://www-03.ibm.com/systems/p/software/essl.html.
- ESSL User Guide.

24
- 27144559253
- ScaLAPACK: A linear algebra library for messagepassing computers
- Minneapolis, MN, (electronic), Philadelphia, PA, USA, 1997. Society for Industrial and Applied Mathematics
- L. S. B. et al. ScaLAPACK: a linear algebra library for messagepassing computers. In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing (Minneapolis, MN, 1997), page 15 (electronic), Philadelphia, PA, USA, 1997. Society for Industrial and Applied Mathematics.
- (1997) Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing , pp. 15
- Sanoj, L.S.¹

25
- 20744449792
- The design and implementation of FFTW3
- DOI 10.1109/JPROC.2004.840301, Program Generation, Optimization and Platform Adaptation
- M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2): 216-231, 2005. special issue on "Program Generation, Optimization, and Platform Adaptation". (Pubitemid 40851223)
- (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 216-231
- Frigo, M.¹ Johnson, S.G.²

26
- 21044437801
- Overview of the BlueGene/L system architecture
- A. Gara, M. A. Blumrich, D. Chen, G. L.-T. Chiu, P. Coteus, M. Giampapa, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-burow, T. Takken, and P. Vranas. Overview of the BlueGene/L system architecture. IBM Journal of Research and Development, 49(2/3): 195-212, 2005.
- (2005) IBM Journal of Research and Development , vol.49 , Issue.2-3 , pp. 195-212
- Gara, A.¹ Blumrich, M.A.² Chen, D.³ Chiu, G.L.-T.⁴ Coteus, P.⁵ Giampapa, M.⁶ Haring, R.A.⁷ Heidelberger, P.⁸ Hoenicke, D.⁹ Kopcsay, G.V.¹⁰ Liebsch, T.A.¹¹ Ohmacht, M.¹² Steinmacher-burow, B.D.¹³ Takken, T.¹⁴ Vranas, P.¹⁵

27
- 0003487728
- High Performance Fortran Forum. Technical Report CRPCTR92225, Houston, Tex.
- High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. Technical Report CRPCTR92225, Houston, Tex., 1993.
- (1993) High Performance Fortran Language Specification, Version 1.0

28
- 1142307058
- Tech Report UCB/CSD-01-1163, U.C. Berkeley, November
- P. Hilfinger, D. Bonachea, D. Gay, S. Graham, B. Liblit, G. Pike, and K. Yelick. Titanium language reference manual. Tech Report UCB/CSD-01-1163, U.C. Berkeley, November 2001.
- (2001) Titanium Language Reference Manual
- Hilfinger, P.¹ Bonachea, D.² Gay, D.³ Graham, S.⁴ Liblit, B.⁵ Pike, G.⁶ Yelick, K.⁷

29
- 79959434399
- HPL Algorithm description. http://www.netlib.org/benchmark/hpl/algorithm. html.
- HPL Algorithm Description

30
- 0010716169
- Intel Math Kernel Library Reference Manual. http://www.intel.com/ software/products/mkl/techtopics/mklman52.pdf.
- Intel Math Kernel Library Reference Manual

31
- 0004235292
- T. MathWorks
- T. MathWorks. Using matlab, 1997.
- (1997) Using Matlab

32
- 79959416586
- Message Passing Interface
- Message Passing Interface. http://www.mpiforum.org/docs/docs.html.

33
- 22144436121
- The cholesky decomposition
- chapter 7, Bristol, England: Adam Hilger, 2nd edition
- J. C. Nash. "The Cholesky Decomposition." In Compact Numerical Methods for Computers: Linear Algebra and Function Minimisation, chapter 7, pages 84-93. Bristol, England: Adam Hilger, 2nd edition, 1990.
- (1990) Compact Numerical Methods for Computers: Linear Algebra and Function Minimisation , pp. 84-93
- Nash, J.C.¹

34
- 0002081678
- Co-array fortran for parallel programming
- R. W. Numrich and J. Reid. Co-array fortran for parallel programming. ACMFortran Forum, 17(2): 1 -31, 1998.
- (1998) ACMFortran Forum , vol.17 , Issue.2 , pp. 1-31
- Numrich, R.W.¹ Reid, J.²

35
- 0002081678
- Co-array fortran for parallel programming
- R. W. Numrich and J. Reid. Co-array fortran for parallel programming. SIGPLAN Fortran Forum, 17(2): 1-31, 1998.
- (1998) SIGPLAN Fortran Forum , vol.17 , Issue.2 , pp. 1-31
- Numrich, R.W.¹ Reid, J.²

36
- 79959403892
- OpenMP. Simple, portable, scalable SMP programming. http://www.openmp. org/, 2000.
- (2000) Simple, Portable, Scalable SMP Programming
- Open, M.P.¹

37
- 19344368072
- SPIRAL: Code generation for DSP transforms
- M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93 (2): 23 2-275, 2005.
- (2005) Proceedings of the IEEE, Special Issue on Program Generation, Optimization, and Adaptation , vol.93 , Issue.2 , pp. 232-275
- Püschel, M.¹ Moura, J.M.F.² Johnson, J.³ Padua, D.⁴ Veloso, M.⁵ Singer, B.W.⁶ Xiong, J.⁷ Franchetti, F.⁸ Gačić, A.⁹ Voronenko, Y.¹⁰ Chen, K.¹¹ Johnson, R.W.¹² Rizzolo, N.¹³

38
- 33847138695
- Efficient rdma-based multi-port collectives on multi-rail qsnetii clusters
- Proceedin gs of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006)
- Y. Qian and A. Afsahi. Efficient rdma-based multi-port collectives on multi-rail qsnetii clusters. In The 6th Workshop on Communication Architecture for Clusters (CAC 2006), In Proceedin gs of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006.
- (2006) The 6th Workshop on Communication Architecture for Clusters (CAC 2006)
- Qian, Y.¹ Afsahi, A.²

39
- 79959485085
- A specification of the extensions to the collective operations of unified parallel c
- Michigan Technological University, Department of Computer Science
- Z. Ryne and S. Seidel. A specification of the extensions to the collective operations of unified parallel c. Technical Report Technical Report 05-08, Michigan Technological University, Department of Computer Science, 2005.
- (2005) Technical Report Technical Report 05-08
- Ryne, Z.¹ Seidel, S.²

40
- 33746613581
- Co-array collectives: Refined semantics for co-array fortran
- V. N. Alexandrov, G. D. van Albada, P. M. A. Sloot, and J. Dongarra, editors, Springer
- M. J. Sottile, C. E. Rasmussen, and R. L. Graham. Co-array collectives: Refined semantics for co-array fortran. In V. N. Alexandrov, G. D. van Albada, P. M. A. Sloot, and J. Dongarra, editors, International Conference on Computational Science (2), volume 3992 of Lecture Notes in Computer Science, pages 945-952. Springer, 2006.
- (2006) International Conference on Computational Science (2), Volume 3992 of Lecture Notes in Computer Science , pp. 945-952
- Sottile, M.J.¹ Rasmussen, C.E.² Graham, R.L.³

41
- 34447571243
- May
- UPC Language Specification, V1.2, May 2005.
- (2005) UPC Language Specification, V1.2

42
- 4344655318
- Performance modeling for self adapting collective communications for mpi
- S. S. Vadhiyar, G. E. Fagg, and J. J. Dongarra. Performance modeling for self adapting collective communications for mpi. In LACSI Symposium, 2001.
- (2001) LACSI Symposium
- Vadhiyar, S.S.¹ Fagg, G.E.² Dongarra, J.J.³

43
- 0003588633
- Department of Computer Sciences, University of Texas
- R. van de Geijn and J. Watts. Summa: Scalable universal matrix multiplication algorithm. TR-95-13, Department of Computer Sciences, University of Texas, 1995.
- (1995) Summa: Scalable Universal Matrix Multiplication Algorithm. TR-95-13
- Van De Geijn, R.¹ Watts, J.²

44
- 24344485098
- OSKI: A library of automatically tuned sparse matrix kernels
- San Francisco, CA, USA, June 2005. Institute of Physics Publishing
- R. Vuduc, J. W Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proceedings of SciDAC 2005, Journal of Physics: Conference Series, San Francisco, CA, USA, June 2005. Institute of Physics Publishing.
- Proceedings of SciDAC 2005, Journal of Physics: Conference Series
- Vuduc, R.¹ Demmel, J.W.² Yelick, K.A.³

45
- 0343462141
- Automated empirical optimizations of software and the ATLAS project
- DOI 10.1016/S0167-8191(00)00087-9
- R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2): 3-35, 2001. (Pubitemid 32264775)
- (2001) Parallel Computing , vol.27 , Issue.1-2 , pp. 3-35
- Clint, W.R.¹ Petitet, A.² Dongarra, J.J.³

46
- 54249155479
- The X10 programming language. http://x10.sourceforge.net, 2004.
- (2004) The X10 Programming Language

47
- 79959465035
- Keynote: Compilation techniques for partitioned global address space languages
- K. Yelick. Keynote: Compilation techniques for partitioned global address space languages. In The 19th International Workshop on Languages and Compilers for Parallel Computing, 2006.
- (2006) The 19th International Workshop on Languages and Compilers for Parallel Computing
- Yelick, K.¹

48
- 0032155556
- Titanium: A high-performance java dialect
- September-November
- K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance java dialect. Concurrency: Practice and Experience, 10(11-13), September-November 1998.
- (1998) Concurrency: Practice and Experience , vol.10 , Issue.11-13
- Yelick, K.¹ Semenzato, L.² Pike, G.³ Miyamoto, C.⁴ Liblit, B.⁵ Krishnamurthy, A.⁶ Hilfinger, P.⁷ Graham, S.⁸ Gay, D.⁹ Colella, P.¹⁰ Aiken, A.¹¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.