SCOPUS 정보 검색 플랫폼

Microprocessors and Microsystems

Volumn 36, Issue 2, 2012, Pages 78-87

Optimization strategies in different CUDA architectures using llCoMP

(2) Reyes, Ruymán a De Sande, Francisco a

a UNIVERSIDAD DE LA LAGUNA (Spain)

Author keywords

Code optimization; Coding effort; CUDA; GPGPU; llc; Performance portability

Indexed keywords

CODE OPTIMIZATION; CODING EFFORT; CUDA; GPGPU; LLC; PERFORMANCE PORTABILITY;

CODES (SYMBOLS); PROGRAM PROCESSORS;

OPTIMIZATION;

EID: 84857311397 PISSN: 01419331 EISSN: None Source Type: Journal
DOI: 10.1016/j.micpro.2011.05.006 Document Type: Conference Paper

Times cited : (12)

References (35)

1
- 77956773183
- Extending OpenMP to survive the heterogeneous multi-core era
- (Cited by (since 1996) 0). <
- E. Ayguadé Extending OpenMP to survive the heterogeneous multi-core era Int. J. Parallel Progr. 38 5-6 2010 440 459 (Cited by (since 1996) 0). < http://www.scopus.com/inward/record.url?eid=2-s2.0- 77956773183&partnerID=40&md5=4fd3a3df6ed4dc78ac2046b6366b5a9c
- (2010) Int. J. Parallel Progr. , vol.38 , Issue.56 , pp. 440-459
- Ayguadé, E.¹

2
- 77951980969
- Springer Dresden, Germany
- E. Ayguadé, R.M. Badia, D. Cabrera, A. Duran, M. Gonzàlez, F.D. Igual, D. Jimenez, J. Labarta, X. Martorell, R. Mayo, J.M. Pérez, and E.S. Quintana-Ortíz A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures 2009 Springer Dresden, Germany 154 167
- (2009) A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures , pp. 154-167
- Ayguadé, E.¹ Badia, R.M.² Cabrera, D.³ Duran, A.⁴ Gonzàlez, M.⁵ Igual, F.D.⁶ Jimenez, D.⁷ Labarta, J.⁸ Martorell, X.⁹ Mayo, R.¹⁰ Pérez, J.M.¹¹ Quintana-Ortíz, E.S.¹²

3
- 84857340112
- E. Bendersky Pycparse 2009 < http://code.google.com/p/pycparser/>
- (2009) Pycparse
- Bendersky, E.¹

4
- 77951931560
- State-of-the-art in heterogeneous computing
- A.R. Brodtkorb, C. Dyken, T.R. Hagen, J.M. Hjelmervik, and O.O. Storaasli State-of-the-art in heterogeneous computing Scientific Progr 18 2010 1 33 < http://babrodtk.at.ifi.uio.no/files/publications/brodtkorb-etal-star-heterocomp- final.pdf>
- (2010) Scientific Progr , vol.18 , pp. 1-33
- Brodtkorb, A.R.¹ Dyken, C.² Hagen, T.R.³ Hjelmervik, J.M.⁴ Storaasli, O.O.⁵

5
- 73449130892
- Cetus: A source-to-source compiler infrastructure for multicores
- C. Dave, H. Bae, S.-J. Min, S. Lee, R. Eigenmann, and S. Midkiff Cetus: a source-to-source compiler infrastructure for multicores Computer 42 12 2009 36 42
- (2009) Computer , vol.42 , Issue.12 , pp. 36-42
- Dave, C.¹ Bae, H.² Min, S.-J.³ Lee, S.⁴ Eigenmann, R.⁵ Midkiff, S.⁶

6
- 33646152544
- Implementing OpenMP for clusters on top of MPI
- LNCS Springer-Verlag Sorrento, Italy
- A.J. Dorta, J.M. Badía, E.S. Quintana, and F. de Sande Implementing OpenMP for clusters on top of MPI Proceedings of the 12th European PVM/MPI Users' Group Meeting LNCS vol. 3666 2005 Springer-Verlag Sorrento, Italy 148 155
- (2005) Proceedings of the 12th European PVM/MPI Users' Group Meeting , vol.3666 , pp. 148-155
- Dorta, A.J.¹ Badía, J.M.² Quintana, E.S.³ De Sande, F.⁴

7
- 0346267382
- Llc: A parallel skeletal language
- A.J. Dorta, J.A. González, C. Rodriguez, and F. de Sande llc: a parallel skeletal language Parallel Proces. Lett. 13 3 2003 437 448
- (2003) Parallel Proces. Lett. , vol.13 , Issue.3 , pp. 437-448
- Dorta, A.J.¹ González, J.A.² Rodriguez, C.³ De Sande, F.⁴

8
- 33750341021
- Basic skeletons in llc
- DOI 10.1016/j.parco.2006.07.001, PII S0167819106000342
- A.J. Dorta, P. López, and F. de Sande Basic skeletons in llc Parallel Comput. 32 7-8 2006 491 506 (Pubitemid 44634782)
- (2006) Parallel Computing , vol.32 , Issue.7-8 , pp. 491-506
- Dorta, A.¹ Lopez, P.² De Sande, F.³

9
- 77954616164
- JCudaMP: OpenMP/Java on CUDA
- ACM New York, NY, USA
- G. Dotzler, R. Veldema, and M. Klemm JCudaMP: OpenMP/Java on CUDA IWMSE '10: Proceedings of the 3rd International Workshop on Multicore Software Engineering 2010 ACM New York, NY, USA 10 17
- (2010) IWMSE '10: Proceedings of the 3rd International Workshop on Multicore Software Engineering , pp. 10-17
- Dotzler, G.¹ Veldema, R.² Klemm, M.³

10
- 53349090243
- A closer look at GPUs
- K. Fatahalian, and M. Houston A closer look at GPUs Commun. ACM 51 10 2008 50 57
- (2008) Commun. ACM , vol.51 , Issue.10 , pp. 50-57
- Fatahalian, K.¹ Houston, M.²

11
- 9744247646
- Measuring high performance computing productivity
- S.e.a. Faulk Measuring high performance computing productivity Int. J. High Perform. Comput. Appl. 18 4 2004 459 473
- (2004) Int. J. High Perform. Comput. Appl. , vol.18 , Issue.4 , pp. 459-473
- Faulk, S.E.A.¹

12
- 79955000998
- Unrolling Loops Containing Task Parallelism
- R. Ferrer, A. Duran, X. Martorell, E. Ayguadé, Unrolling Loops Containing Task Parallelism, in: Proceedings of the 22nd International Workshop on Languages and Compilers for Parallel Computing, 2009.
- (2009) Proceedings of the 22nd International Workshop on Languages and Compilers for Parallel Computing
- R. Ferrer¹

13
- 77954450663
- Application of a development time productivity metric to parallel software development
- ACM New York, NY, USA
- A. Funk, V. Basili, L. Hochstein, and J. Kepner Application of a development time productivity metric to parallel software development SE-HPCS '05: Proceedings of the Second International Workshop on Software Engineering for High Performance Computing System Applications 2005 ACM New York, NY, USA 8 12
- (2005) SE-HPCS '05: Proceedings of the Second International Workshop on Software Engineering for High Performance Computing System Applications , pp. 8-12
- Funk, A.¹ Basili, V.² Hochstein, L.³ Kepner, J.⁴

14
- 56449125059
- Dynamic load balancing on dedicated heterogeneous systems
- A. Lastovetsky, T. Kechadi, J. Dongarra, Lecture Notes in Computer Science Springer Berlin/Heidelberg
- I. Galindo, F. Almeida, and J.M. Badía-Contelles Dynamic load balancing on dedicated heterogeneous systems A. Lastovetsky, T. Kechadi, J. Dongarra, Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface Lecture Notes in Computer Science vol. 5205 2008 Springer Berlin/Heidelberg 64 74
- (2008) Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface , vol.5205 , pp. 64-74
- Galindo, I.¹ Almeida, F.² Badía-Contelles, J.M.³

15
- 0347937585
- An asynchronous approach to efficient execution of programs on adaptive architectures utilizing FPGAs
- S. Ghosh An asynchronous approach to efficient execution of programs on adaptive architectures utilizing FPGAs J. Netw. Comput. Appl. 20 3 1997 223 252 (Pubitemid 127452098)
- (1997) Journal of Network and Computer Applications , vol.20 , Issue.3 , pp. 223-252
- Ghosh, S.¹

16
- 79958013713
- GPGPU
- GPGPU, General-Purpose Computation on Graphics Hardware, 2010. < http://www.gpgpu.org >.
- (2010) General-Purpose Computation on Graphics Hardware

17
- 67650661447
- M. Harris, Optimizing Parallel Reduction in CUDA, 2007. < http://tiny.cc/t2phi>.
- (2007) Optimizing Parallel Reduction in CUDA
- Harris, M.¹

18
- 70450231944
- An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness
- S. Hong, and H. Kim An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness SIGARCH Comput. Archit. News 37 2009 152 163
- (2009) SIGARCH Comput. Archit. News , vol.37 , pp. 152-163
- Hong, S.¹ Kim, H.²

19
- 84857279068
- Intel
- Intel, Sophisticated Library for Vector Parallelism: Intel Array Building Blocks, 2010. < http://software.intel.com/en-us/articles/intel-array- building-blocks/ >.
- (2010) Sophisticated Library for Vector Parallelism: Intel Array Building Blocks

20
- 9744274567
- High performance computing productivity model synthesis
- J. Kepner High performance computing productivity model synthesis Int. J. High Perform. Comput. Appl. 18 2004 505 516 < http://portal.acm.org/ citation.cfm?id=1057159.1057169 >
- (2004) Int. J. High Perform. Comput. Appl. , vol.18 , pp. 505-516
- Kepner, J.¹

21
- 74349092397
- Khronos Group
- Khronos Group, OpenCL the Open Standard for Parallel Programming of Heterogeneous Systems, 2008. < http://www.khronos.org/opencl/ >.
- (2008) OpenCL the Open Standard for Parallel Programming of Heterogeneous Systems

22
- 70350583252
- OpenMP to GPGPU: A compiler framework for automatic translation and optimization
- ACM New York, NY, USA
- S. Lee, S.-J. Min, and R. Eigenmann OpenMP to GPGPU: a compiler framework for automatic translation and optimization PPoPP '09: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2009 ACM New York, NY, USA 101 110
- (2009) PPoPP '09: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 101-110
- Lee, S.¹ Min, S.-J.² Eigenmann, R.³

23
- 77955364968
- A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries
- C. Liao, D.J. Quinlan, T. Panas, B.R. de Supinski, A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries, in: IWOMP, 2010, pp. 15-28.
- (2010) IWOMP , pp. 15-28
- Liao, C.¹ Quinlan, D.J.² Panas, T.³ De Supinski, B.R.⁴

24
- 77954400468
- Effective source-to-source outlining to support whole program empirical optimization
- (Cited by (since 1996) 0.)
- C.a.Q. Liao, Effective source-to-source outlining to support whole program empirical optimization. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5898, LNCS, pp. 308-322, 2010. (Cited by (since 1996) 0.)
- (2010) Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS , vol.5898 , pp. 308-322
- Liao, C.A.Q.¹

25
- 84857258418
- llc
- llc, 2011. llc Home Page. < http://llc.pcg.ull.es >.
- (2011) Llc Home Page

26
- 84982318971
- GPGPU: General purpose computation on graphics hardware
- ACM New York, NY, USA p. 33
- D. Luebke, M. Harris, J. Krüger, T. Purcell, N. Govindaraju, I. Buck, C. Woolley, and A. Lefohn GPGPU: general purpose computation on graphics hardware SIGGRAPH '04: ACM SIGGRAPH 2004 Course Notes 2004 ACM New York, NY, USA p. 33
- (2004) SIGGRAPH '04: ACM SIGGRAPH 2004 Course Notes
- Luebke, D.¹ Harris, M.² Krüger, J.³ Purcell, T.⁴ Govindaraju, N.⁵ Buck, I.⁶ Woolley, C.⁷ Lefohn, A.⁸

27
- 34147164008
- Languages for high-productivity computing: The DARPA HPCS language project
- DOI 10.1142/S0129626407002892, PII S0129626407002892
- E. Lusk, and K. Yelick Languages for high-productivity computing: the DARPA HPCS language project Parallel Proces. Lett. 17 1 2007 89 102 doi:10.1142/S0129626407002892 (Pubitemid 46573928)
- (2007) Parallel Processing Letters , vol.17 , Issue.1 , pp. 89-102
- Lusk, E.¹ Yelick, K.²

28
- 78651550268
- Scalable parallel programming with CUDA
- J. Nickolls, I. Buck, M. Garland, and K. Skadron Scalable parallel programming with CUDA Queue 6 2 2008 40 53
- (2008) Queue , vol.6 , Issue.2 , pp. 40-53
- Nickolls, J.¹ Buck, I.² Garland, M.³ Skadron, K.⁴

29
- 63549150883
- NVIDIA Corp.
- NVIDIA Corp., 2007. Cuda Occupancy Calculator, 2007. < http://developer.download.nvidia.com/compute/cuda/CUDA-Occupancy-calculator.xls >.
- (2007) Cuda Occupancy Calculator, 2007

30
- 73449144074
- OpenMP Architecture Review Board May
- OpenMP Architecture Review Board, OpenMP Application Program Interface v. 3.0, May 2008. < http://www.openmp.org/drupal/mp-documents/spec30.pdf >.
- (2008) OpenMP Application Program Interface V. 3.0

31
- 33947588048
- A survey of general-purpose computation on graphics hardware
- DOI 10.1111/j.1467-8659.2007.01012.x
- J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A.E. Lefohn, and T. Purcell A survey of general-purpose computation on graphics hardware Comput. Graph. Forum 26 1 2007 80 113 < http://graphics.idav. ucdavis.edu/publications/print-pub?pub-id=907>(Pubitemid 46481097)
- (2007) Computer Graphics Forum , vol.26 , Issue.1 , pp. 80-113
- Owens, J.D.¹ Luebke, D.² Govindaraju, N.³ Harris, M.⁴ Kruger, J.⁵ Lefohn, A.E.⁶ Purcell, T.J.⁷

32
- 70350454023
- Automatic hybrid MPI + OpenMP code generation with llc
- M. Ropo, J. Westerholm, J. Dongarra, Lecture Notes in Computer Science Springer-Verlag Espoo, Finland
- R. Reyes, A.J. Dorta, F. Almeida, and F. de Sande Automatic hybrid MPI + OpenMP code generation with llc M. Ropo, J. Westerholm, J. Dongarra, Proceedings of the 16th European PVM/MPI Users' Group Meeting Lecture Notes in Computer Science vol. 5759 2009 Springer-Verlag Espoo, Finland 185 195
- (2009) Proceedings of the 16th European PVM/MPI Users' Group Meeting , vol.5759 , pp. 185-195
- Reyes, R.¹ Dorta, A.J.² Almeida, F.³ De Sande, F.⁴

33
- 79955004994
- Automatic code generation for GPUs in llc
- J. Vigo-Aguiar (Ed.) Almeria, Andalucia, Spain
- R. Reyes, A.J. Dorta, F. Almeida, F. de Sande, Automatic code generation for GPUs in llc, in: J. Vigo-Aguiar (Ed.), Proceedings of the 10th International Conference on Computational and Mathematical Methods in Science and Engineering (vol. III)., Almeria, Andalucia, Spain, 2010, pp. 804-815.
- (2010) Proceedings of the 10th International Conference on Computational and Mathematical Methods in Science and Engineering (Vol. III) , pp. 804-815
- Reyes, R.¹ Dorta, A.J.² Almeida, F.³ De Sande, F.⁴

34
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
- ACM New York, NY, USA
- S. Ryoo, C.I. Rodrigues, S.S. Baghsorkhi, S.S. Stone, D.B. Kirk, and W.-m.W. Hwu Optimization principles and application performance evaluation of a multithreaded GPU using CUDA PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2008 ACM New York, NY, USA 73 82
- (2008) PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 73-82
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Hwu, M.W.W.⁶

35
- 33947317857
- D. Wheeler, Sloccount, 2009. < http://www.dwheeler.com/sloccount/ >.
- (2009) Sloccount
- Wheeler, D.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.