SCOPUS 정보 검색 플랫폼

Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012

Volumn , Issue , 2012, Pages 557-568

Productive programming of GPU clusters with OmpSs

(7) Bueno, Javier a Planas, Judit a Duran, Alejandro a Badia, Rosa M b Martorell, Xavier a Ayguadé, Eduard a Labarta, Jesús a

a BARCELONA SUPERCOMPUTING CENTER (Spain)

b ARTIFICIAL INTELLIGENCE RESEARCH INSTITUTE IIIA CSIC (Spain)

Author keywords

accelerators; Cluster programming; GPGPU computing; OpenMP

Indexed keywords

ASYNCHRONY; COMPUTATIONAL TASK; GPGPU COMPUTING; GPU CLUSTERS; HYBRID MODEL; OPENMP; PARALLELIZATIONS; REMOTE NODE; RUNTIME SYSTEMS; TASK PARALLELISM; TASK-BASED;

APPLICATION PROGRAMMING INTERFACES (API); CLUSTER COMPUTING; COMMUNICATION; DISTRIBUTED PARAMETER NETWORKS; PARTICLE ACCELERATORS;

PROGRAM PROCESSORS;

EID: 84866856745 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IPDPS.2012.58 Document Type: Conference Paper

Times cited : (135)

References (36)

1
- 73449144074
- OpenMP ARB, May
- OpenMP ARB, "OpenMP Application Program Interface, v. 3.0," May 2008.
- (2008) OpenMP Application Program Interface, V. 3.0

2
- 57949083229
- A dependency-aware task-based programming environment for multi-core architectures
- September
- J. M. Perez, R. M. Badia, and J. Labarta, "A dependency-aware task-based programming environment for multi-core architectures," IEEE Int. Conference on Cluster Computing, pp. 142-151, September 2008.
- (2008) IEEE Int. Conference on Cluster Computing , pp. 142-151
- Perez, J.M.¹ Badia, R.M.² Labarta, J.³

3
- 35649006026
- CellSs: Making it easier to program the Cell Broadband Engine processor
- September
- J. M. Perez, P. Bellens, R. M. Badia, and J. Labarta, "CellSs: Making it easier to program the Cell Broadband Engine processor," IBM Journal of Research and Development, vol. 51, no. 5, pp. 593-604, September 2007.
- (2007) IBM Journal of Research and Development , vol.51 , Issue.5 , pp. 593-604
- Perez, J.M.¹ Bellens, P.² Badia, R.M.³ Labarta, J.⁴

4
- 84866846310
- Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL
- "Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL," in Proceedings of the 23rd International Workshop on Languages and Compilers for Parallel Computing (LCPC2010), October 2010.
- Proceedings of the 23rd International Workshop on Languages and Compilers for Parallel Computing (LCPC2010), October 2010

5
- 84893623161
- Productive Cluster Programming with OmpSs
- to appear
- J. Bueno, L. Martinell, A. Duran, M. Farreras, X. Martorell, R. M. Badia, E. Ayguade, and J. Labarta, " Productive Cluster Programming with OmpSs ," in Europar'11 (to appear), 2011.
- (2011) Europar'11
- Bueno, J.¹ Martinell, L.² Duran, A.³ Farreras, M.⁴ Martorell, X.⁵ Badia, R.M.⁶ Ayguade, E.⁷ Labarta, J.⁸

6
- 0029191296
- Cilk: An efficient multithreaded runtime system
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, "Cilk: an efficient multithreaded runtime system," SIGPLAN Not., vol. 30, no. 8, pp. 207-216, 1995.
- (1995) SIGPLAN Not. , vol.30 , Issue.8 , pp. 207-216
- Blumofe, R.D.¹ Joerg, C.F.² Kuszmaul, B.C.³ Leiserson, C.E.⁴ Randall, K.H.⁵ Zhou, Y.⁶

7
- 67650056929
- Extending the OpenMP Tasking Model to Allow Dependent Tasks
- Springer Berlin / Heidelberg
- A. Duran, J. M. Pérez, E. Eduard Ayguadé, R. M. Badia, and J. Labarta, "Extending the OpenMP Tasking Model to Allow Dependent Tasks," in OpenMP in a New Era of Parallelism. Springer Berlin / Heidelberg, 2008, pp. 111-122.
- (2008) OpenMP in A New Era of Parallelism , pp. 111-122
- Duran, A.¹ Pérez, J.M.² Eduard Ayguadé, E.³ Badia, R.M.⁴ Labarta, J.⁵

8
- 77954751089
- Handling task dependencies under strided and aliased references
- ser. ICS '10. New York, NY, USA: ACM
- J. M. Perez, R. M. Badia, and J. Labarta, "Handling task dependencies under strided and aliased references," in Proceedings of the 24th ACM International Conference on Supercomputing, ser. ICS '10. New York, NY, USA: ACM, 2010, pp. 263-274.
- (2010) Proceedings of the 24th ACM International Conference on Supercomputing , pp. 263-274
- Perez, J.M.¹ Badia, R.M.² Labarta, J.³

9
- 77951980969
- A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
- Dresden, Germany: Springer, June
- E. Ayguade, R. M. Badia, D. Cabrera, A. Duran, M. Gonzalez, F. Igual, D. Jimenez, J. Labarta, X. Martorell, R. Mayo, J. M. Perez, and E. S. Quintana-Orti, "A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures," in IWOMP: Evolving OpenMP in an Age of Extreme Parallelism, vol. 5568. Dresden, Germany: Springer, June 2009, pp. 154-167.
- (2009) IWOMP: Evolving OpenMP in An Age of Extreme Parallelism , vol.5568 , pp. 154-167
- Ayguade, E.¹ Badia, R.M.² Cabrera, D.³ Duran, A.⁴ Gonzalez, M.⁵ Igual, F.⁶ Jimenez, D.⁷ Labarta, J.⁸ Martorell, X.⁹ Mayo, R.¹⁰ Perez, J.M.¹¹ Quintana-Orti, E.S.¹²

10
- 84866871692
- J. J. Dongarra, I. High, and P. C. Systems, "Overview of the hpc challenge benchmark suite."
- Overview of the Hpc Challenge Benchmark Suite
- Dongarra, J.J.¹ High, I.² Systems, P.C.³

11
- 79957528059
- Trace-driven Simulation of Multithreaded Applications
- to appear
- "Trace-driven Simulation of Multithreaded Applications," in Proceedings of the 2011 ISPASS (to appear), 2011.
- (2011) Proceedings of the 2011 ISPASS

12
- 84866874333
- Master's thesis, Computer Architecture Department, Universitat Politècnica de Catalunya
- L. Martinell, ""Memory usage improvements for the SMPSs runtime"," Master's thesis, Computer Architecture Department, Universitat Politècnica de Catalunya, 2010.
- (2010) Memory Usage Improvements for the SMPSs Runtime
- Martinell, L.¹

13
- 84886469904
- Tech. Rep.
- D. Bonachea, "GASNet Specification, v1.8 ," http://gasnet.cs.berkeley.edu/, U.C. Berkeley, Tech. Rep., 2006.
- (2006) GASNet Specification, V1.8
- Bonachea, D.¹

14
- 67650694407
- NVIDIA Corporation
- NVIDIA CUDA Compute Unified Device Architecture Version 2.0, NVIDIA Corporation, 2008.
- (2008) NVIDIA CUDA Compute Unified Device Architecture Version 2.0

15
- 0003588633
- Tech. Rep.
- R. A. V. D. Geijn and J. Watts, "Summa: Scalable universal matrix multiplication algorithm," Tech. Rep., 1997.
- (1997) Summa: Scalable Universal Matrix Multiplication Algorithm
- Geijn, R.A.V.D.¹ Watts, J.²

16
- 0001439335
- MPI: A Message Passing Interface Standard
- MPI Forum
- MPI Forum, "MPI: A Message Passing Interface Standard," Intl. Journal of Supercomputer Applications and High Performance Computing, vol. 8, no. 3/4, pp. 159-416, 1994.
- (1994) Intl. Journal of Supercomputer Applications and High Performance Computing , vol.8 , Issue.3-4 , pp. 159-416

17
- 34447571243
- U. Consortium, May
- U. Consortium, "UPC Language Specifications v1.2," May 2005.
- (2005) UPC Language Specifications V1.2

18
- 31744441529
- X10: An object-oriented approach to non-uniform cluster computing
- ser. OOPSLA '05. New York, NY, USA: ACM
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar, "X10: an object-oriented approach to non-uniform cluster computing," in Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, ser. OOPSLA '05. New York, NY, USA: ACM, 2005, pp. 519-538.
- (2005) Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications , pp. 519-538
- Charles, P.¹ Grothoff, C.² Saraswat, V.³ Donawa, C.⁴ Kielstra, A.⁵ Ebcioglu, K.⁶ Von Praun, C.⁷ Sarkar, V.⁸

19
- 34249696738
- Parallel programmability and the chapel language
- August
- B. Chamberlain, D. Callahan, and H. Zima, "Parallel programmability and the chapel language," Int. J. High Perform. Comput. Appl., vol. 21, pp. 291-312, August 2007.
- (2007) Int. J. High Perform. Comput. Appl. , vol.21 , pp. 291-312
- Chamberlain, B.¹ Callahan, D.² Zima, H.³

20
- 77749249189
- Effective communication and computation overlap with hybrid mpi/smpss
- ser. PPoPP '10. New York, NY, USA: ACM
- E. A. V. Marjanovic, J. Labarta and M. Valero, "Effective communication and computation overlap with hybrid mpi/smpss," in Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming, ser. PPoPP '10. New York, NY, USA: ACM, 2010, pp. 337-338.
- (2010) Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 337-338
- Marjanovic, E.A.V.¹ Labarta, J.² Valero, M.³

21
- 77952597755
- CUDA-lite: Reducing GPU Programming Complexity
- S.-Z. Ueng, M. Lathara, S. S. Baghsorkhi, and W. mei W. Hwu, "CUDA-lite: Reducing GPU Programming Complexity," in In Languages and Compilers for Parallel Computing (LCPC) 21st Annual Workshop, August 2008.
- Languages and Compilers for Parallel Computing (LCPC) 21st Annual Workshop, August 2008
- Ueng, S.-Z.¹ Lathara, M.² Baghsorkhi, S.S.³ Mei, W.⁴ Hwu, W.⁵

22
- 79951728783
- 8 December [Online]. Available
- Khronos OpenCLWorking Group, The OpenCL Specification, version 1.0.29, 8 December 2008. [Online]. Available: http://khronos.org/registry/cl/specs/opencl- 1.0.29.pdf
- (2008) The OpenCL Specification, Version 1.0.29

23
- 78650835532
- 190 tflops astrophysical n-body simulation on a cluster of gpus
- ser. SC '10. Washington, DC, USA: IEEE Computer Society
- T. Hamada and K. Nitadori, "190 tflops astrophysical n-body simulation on a cluster of gpus," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1-9.
- (2010) Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , pp. 1-9
- Hamada, T.¹ Nitadori, K.²

24
- 84856841346
- Performance analysis of a hybrid mpi/cuda implementation of the naslu benchmark
- March
- S. J. Pennycook, S. D. Hammond, S. A. Jarvis, and G. R. Mudalige, "Performance analysis of a hybrid mpi/cuda implementation of the naslu benchmark," SIGMETRICS Perform. Eval. Rev., vol. 38, pp. 23-29, March 2011.
- (2011) SIGMETRICS Perform. Eval. Rev. , vol.38 , pp. 23-29
- Pennycook, S.J.¹ Hammond, S.D.² Jarvis, S.A.³ Mudalige, G.R.⁴

25
- 77951610849
- Accelerating high performance applications with CUDA and MPI
- N. Karunadasa and D. D. N. Ranasinghe, "Accelerating high performance applications with CUDA and MPI," in Proceedings of the Fourth International Conference on Industrial and Information Systems (ICIIS 2009), Sri Lanka, 28-31 December 2009.
- Proceedings of the Fourth International Conference on Industrial and Information Systems (ICIIS 2009), Sri Lanka, 28-31 December 2009
- Karunadasa, N.¹ Ranasinghe, D.D.N.²

26
- 72049099859
- Message passing for gpgpu clusters: Cudampi
- O. S. Lawlor, "Message passing for gpgpu clusters: Cudampi," in Proceedings of the IEEE International Conference on Cluster Computing, 2009, pp. 1-8.
- Proceedings of the IEEE International Conference on Cluster Computing, 2009 , pp. 1-8
- Lawlor, O.S.¹

27
- 67650686517
- Accelerating linpack with cuda on heterogenous clusters
- ser. GPGPU-2. New York, NY, USA: ACM
- M. Fatica, "Accelerating linpack with cuda on heterogenous clusters," in Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, ser. GPGPU-2. New York, NY, USA: ACM, 2009, pp. 46-51.
- (2009) Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units , pp. 46-51
- Fatica, M.¹

28
- 78650802947
- Openmpc: Extended openmp programming and tuning for gpus
- ser. SC '10. Washington, DC, USA: IEEE Computer Society
- S. Lee and R. Eigenmann, "Openmpc: Extended openmp programming and tuning for gpus," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1-11.
- (2010) Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , pp. 1-11
- Lee, S.¹ Eigenmann, R.²

29
- 79952596877
- Unified parallel c for gpu clusters: Language extensions and compiler implementation
- Languages and Compilers for Parallel Computing, ser. K. Cooper, J. Mellor-Crummey, and V. Sarkar, Eds. Springer Berlin / Heidelberg
- L. Chen, L. Liu, S. Tang, L. Huang, Z. Jing, S. Xu, D. Zhang, and B. Shou, "Unified parallel c for gpu clusters: Language extensions and compiler implementation," in Languages and Compilers for Parallel Computing, ser. Lecture Notes in Computer Science, K. Cooper, J. Mellor-Crummey, and V. Sarkar, Eds. Springer Berlin / Heidelberg, 2011, vol. 6548, pp. 151-165.
- (2011) Lecture Notes in Computer Science , vol.6548 , pp. 151-165
- Chen, L.¹ Liu, L.² Tang, S.³ Huang, L.⁴ Jing, Z.⁵ Xu, S.⁶ Zhang, D.⁷ Shou, B.⁸

30
- 78649898391
- Hicuda: High-level gpgpu programming
- T. D. Han and T. S. Abdelrahman, "hicuda: High-level gpgpu programming," IEEE Transactions on Parallel and Distributed Systems, vol. 22, pp. 78-90, 2011.
- (2011) IEEE Transactions on Parallel and Distributed Systems , vol.22 , pp. 78-90
- Han, T.D.¹ Abdelrahman, T.S.²

31
- 79959601133
- Mint: Realizing cuda performance in 3d stencil methods with annotated c
- ser. ICS '11. New York, NY, USA: ACM
- D. Unat, X. Cai, and S. B. Baden, "Mint: Realizing cuda performance in 3d stencil methods with annotated c," in Proceedings of the 25th ACM International Conference on Supercomputing, ser. ICS '11. New York, NY, USA: ACM, 2011, pp. 214-224.
- (2011) Proceedings of the 25th ACM International Conference on Supercomputing , pp. 214-224
- Unat, D.¹ Cai, X.² Baden, S.B.³

32
- 68249112512
- HMPP: A Hybrid Multicore Parallel Programming Environment
- R. Dolbeau, S. Bihan, and F. Bodin, "HMPP: A Hybrid Multicore Parallel Programming Environment," in Workshop on General Processing Using GPUs, 2006.
- Workshop on General Processing Using GPUs, 2006
- Dolbeau, R.¹ Bihan, S.² Bodin, F.³

33
- 78649498878
- Offload - Automating code migration to heterogeneous multicore systems
- Lecture Notes in Computer Science
- P. Cooper, U. Dolinsky, A. F. Donaldson, A. Richards, C. Riley, and G. Russell, "Offload - automating code migration to heterogeneous multicore systems," in Lecture Notes in Computer Science, HiPEAC Conference 2010, 2010, pp. 307-321.
- (2010) HiPEAC Conference 2010 , pp. 307-321
- Cooper, P.¹ Dolinsky, U.² Donaldson, A.F.³ Richards, A.⁴ Riley, C.⁵ Russell, G.⁶

34
- 34748865391
- Compilation for explicitly managed memory hierarchies
- T. J. Knight, J. Y. Park, M. Ren, M. Houston, M. Erez, K. Fatahalian, A. Aiken, W. J. Dally, and P. Hanrahan, "Compilation for explicitly managed memory hierarchies," in Proceedings of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007.
- Proceedings of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007
- Knight, T.J.¹ Park, J.Y.² Ren, M.³ Houston, M.⁴ Erez, M.⁵ Fatahalian, K.⁶ Aiken, A.⁷ Dally, W.J.⁸ Hanrahan, P.⁹

35
- 84865717999
- Portland Group Inc., Sep
- Portland Group Inc., "PGI Accelerator Compilers," Sep 2011.
- (2011) PGI Accelerator Compilers

36
- 84866846309
- December [Online]. Available
- Alistair Hart and Harvey Richardson and Alan Gray and Karthee Sivalingham, "Directive-based programming for GPUs, accelerators and HPC," December 2010. [Online]. Available: www.many-core.group.cam.ac.uk/ ukgpucc2/talks/Hart.pdf
- (2010) Directive-based Programming for GPUs, Accelerators and HPC
- Hart, A.¹ Richardson, H.² Gray, A.³ Sivalingham, K.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.