SCOPUS 정보 검색 플랫폼

Transactions on Architecture and Code Optimization

Volumn 9, Issue 4, 2013, Pages

Polyhedral parallel code generation for CUDA

(6) Verdoolaege, Sven a Juega, Juan Carlos b Cohen, Albert a Gómez, José Ignacio b Tenllado, Christian b Catthoor, Francky c

a INRIA (France)

b UNIVERSIDAD COMPLUTENSE DE MADRID (Spain)

c imec (Belgium)

Author keywords

C to CUDA; Code generation; Compilers; CUDA; GPU; Loop transformations; Par4All; Polyhedral model; PPCG

Indexed keywords

C-TO-CUDA; CODE GENERATION; CUDA; GPU; LOOP TRANSFORMATION; PAR4ALL; POLYHEDRAL MODELS; PPCG;

NETWORK COMPONENTS;

PROGRAM COMPILERS;

EID: 84872943015 PISSN: 15443566 EISSN: 15443973 Source Type: Journal
DOI: 10.1145/2400682.2400713 Document Type: Article

Times cited : (311)

References (50)

1
- 0023438847
- Automatic translation of fortran programs to vector form
- ALLEN, R. AND KENNEDY, K. 1987. Automatic translation of fortran programs to vector form. ACM Trans. Program. Lang. Syst. 9, 4, 491-542.
- (1987) ACM Trans. Program. Lang. Syst. , vol.9 , Issue.4 , pp. 491-542
- Allen, R.¹ Kennedy, K.²

2
- 0037952146
- Morgan Kaufmann Publishers
- ALLEN, R. AND KENNEDY, K. 2001. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers.
- (2001) Optimizing Compilers for Modern Architectures
- Allen, R.¹ Kennedy, K.²

3
- 84979017283
- Static compilation analysis for host-accelerator communication optimization
- Springer
- AMINI, M., COELHO, F., IRIGOIN, F., AND KERYELL, R. 2011. Static compilation analysis for host-accelerator communication optimization. In Workshop on Languages and Compilers for Parallel Computing (LCPC'11). Lecture Notes in Computer Science. Springer.
- (2011) Workshop on Languages and Compilers for Parallel Computing (LCPC'11). Lecture Notes in Computer Science
- Amini, M.¹ Coelho, F.² Irigoin, F.³ Keryell, R.⁴

4
- 84872936950
- AMP
- AMP 2011. C++ accelerated massive parallelism. http://msdn.microsoft.com/ en-us/library/hh2651377
- (2011) C++ Accelerated Massive Parallelism

5
- 84885181524
- Putting automatic polyhedral compilation for GPGPU to work
- BAGHDADI, S., GRÖSSLINGER, A., AND COHEN, A. 2010. Putting automatic polyhedral compilation for GPGPU to work. In Proceedings of the International Workshop Compilers for Parallel Computer (CPC).
- (2010) Proceedings of the International Workshop Compilers for Parallel Computer (CPC)
- Baghdadi, S.¹ Grösslinger, A.² Cohen, A.³

6
- 77951572335
- Automatic C-to-CUDA code generation for affine programs
- Springer
- BASKARAN, M., RAMANUJAM, J., AND SADAYAPPAN, P. 2010. Automatic C-to-CUDA code generation for affine programs. In Compiler Construction (CC 10), Held as Part of the Joint European Conferences on Theory and Practice of Software, (ETAPS 10), Lecture Notes in Computer Science, vol. 6011. Springer, 244-2633
- (2010) Compiler Construction (CC 10), Held As Part of the Joint European Conferences on Theory and Practice of Software, (ETAPS 10), Lecture Notes in Computer Science , vol.6011 , pp. 244-2633
- Baskaran, M.¹ Ramanujam, J.² Sadayappan, P.³

7
- 10444289646
- Code generation in the polyhedral model is easier than you think
- IEEE Computer Society, Washington, DC
- BASTOUL, C. 2004. Code generation in the polyhedral model is easier than you think. In Proceedings of the International Conference on parallel Architectures and Compilation Techniques (PACT'04). IEEE Computer Society, Washington, DC, 7-16.
- (2004) Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'04) , pp. 7-16
- Bastoul, C.¹

8
- 77951599594
- The polyhedral model is more widely applicable than you think
- Springer
- BENABDERRAHMANE, M.-W., POUCHET, L.-N., COHEN, A., AND BASTOUL, C. 2010. The polyhedral model is more widely applicable than you think. In Proceedings of the International Conference on Compiler Construction (CC'10). Lecture Notes in Computer Science, vol. 6011. Springer.
- (2010) Proceedings of the International Conference on Compiler Construction (CC'10). Lecture Notes in Computer Science , vol.6011
- Benabderrahmane, M.-W.¹ Pouchet, L.-N.² Cohen, A.³ Bastoul, C.⁴

9
- 24144474794
- Intel Press
- BIK, A. J. C. 2004. The Software Vectorization Handbook. Applying Multimedia Extensions for Maximum Performance. Intel Press.
- (2004) The Software Vectorization Handbook. Applying Multimedia Extensions for Maximum Performance
- Bik, A.J.C.¹

10
- 84872974169
- BONDHUGULA, U. 2012. PLuTo: An automatic parallelizer and locality optimizer for multicores, version 0.7. http://pluto-compiler.sourceforge.net//
- (2012) PLuTo: An Automatic Parallelizer and Locality Optimizer for Multicores, Version 0.7
- Bondhugula, U.¹

11
- 57349139452
- A practical automatic polyhedral parallelizer and locality optimizer
- BONDHUGULA, U., HARTONO, A., RAMANUJAM, J., AND SADAYAPPAN, P. 2008a. A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Not. 43, 6, 101-113.
- (2008) SIGPLAN Not. , vol.43 , Issue.6 , pp. 101-113
- Bondhugula, U.¹ Hartono, A.² Ramanujam, J.³ Sadayappan, P.⁴

12
- 74049164978
- PLuTo: A practical and fully automatic polyhedral program optimization system
- BONDHUGULA, U., RAMANUJAM, J., 2008b. PLuTo: A practical and fully automatic polyhedral program optimization system. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
- (2008) Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
- Bondhugula, U.¹ Ramanujam, J.²

13
- 0032066690
- Loop parallelization algorithms: From parallelism extraction to code generation
- BOULET, P., DARTE, A., SILBER, G.-A., AND VIVIEN, F. 1998. Loop parallelization algorithms: From parallelism extraction to code generation. Parallel Comput. 24, 421-444.
- (1998) Parallel Comput. , vol.24 , pp. 421-444
- Boulet, P.¹ Darte, A.² Silber, G.-A.³ Vivien, F.⁴

14
- 70449959487
- Tech. rep., USC Computer Science
- CHEN, C., CHAME, J., AND HALL, M. 2008. A framework for composing high-level loop transformations. Tech. rep., USC Computer Science.
- (2008) A Framework for Composing High-level Loop Transformations
- Chen, C.¹ Chame, J.² Hall, M.³

15
- 77949650907
- Offload-Automating code migration to heterogeneous multicore systems
- COOPER, P., DOLINSKY, U., DONALDSON, A. F., RICHARDS, A., RILEY, C., AND RUSSELL, G. 2010. Offload-Automating code migration to heterogeneous multicore systems. In Proceedings of International Conference on High-Performance Embedded Architectures and Compilers (HIPEAC). 337-352.
- (2010) Proceedings of International Conference on High-Performance Embedded Architectures and Compilers (HIPEAC) , pp. 337-352
- Cooper, P.¹ Dolinsky, U.² Donaldson, A.F.³ Richards, A.⁴ Riley, C.⁵ Russell, G.⁶

16
- 0026109335
- Dataflow analysis of array and scalar references
- FEAUTRIER, P. 1991. Dataflow analysis of array and scalar references. Int. J. Parallel Program. 20, 1, 23-53.
- (1991) Int. J. Parallel Program. , vol.20 , Issue.1 , pp. 23-53
- Feautrier, P.¹

17
- 0026933251
- Some efficient solutions to the affine scheduling problem. Part I. one-dimensional time
- FEAUTRIER, P. 1992a. Some efficient solutions to the affine scheduling problem. Part I. One-Dimensional time. Int. J. Parallel Program. 21, 313-347.
- (1992) Int. J. Parallel Program. , vol.21 , pp. 313-347
- Feautrier, P.¹

18
- 0001448065
- Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time
- FEAUTRIER, P. 1992b. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. Int. J. Parallel Program. 21, 389-420.
- (1992) Int. J. Parallel Program. , vol.21 , pp. 389-420
- Feautrier, P.¹

19
- 77949621946
- Analysis of task offloading for accelerators
- FERRER, R., BELTRAN, V., GONŹALEZ, M., MARTORELL, X., AND AYGUAD́e, E. 2010. Analysis of task offloading for accelerators. In Proceedings of the International Conference on High-Performance Embedded Architectures and Compilers (HiPEAC). 322-336.
- (2010) Proceedings of the International Conference on High-Performance Embedded Architectures and Compilers (HiPEAC) , pp. 322-336
- Ferrer, R.¹ Beltran, V.² Gonźalez, M.³ Martorell, X.⁴ Ayguad́e, E.⁵

20
- 84871776762
- Polly: Polyhedral optimization in llvm
- GROSSER, T., ZHENG, H., A, R., SIMBÜRGER, A., GRÖSSLINGER, A., AND POUCHET, L.-N. 2011. Polly: Polyhedral optimization in llvm. In 1st InterantionalWorkshop on Polyhedral Compilation Techniques (IMPACT'11).
- (2011) 1st InterantionalWorkshop on Polyhedral Compilation Techniques (IMPACT'11)
- Grosser, T.¹ Zheng, H.A.R.² Simbürger, A.³ Grösslinger, A.⁴ Pouchet, L.-N.⁵

21
- 70350627685
- Precise management of scratchpad memories for localising array accesses in scientific codes
- Springer
- GRÖSSLINGER, A. 2009. Precise management of scratchpad memories for localising array accesses in scientific codes. In CC'09. Springer, 236-250.
- (2009) CC'09 , pp. 236-250
- Grösslinger, A.¹

22
- 84872937545
- HMPP
- HMPP 2010. HMPP workbench: directive-based multi-language and multi-target hybrid programming model. http://www.caps-entreprise.com/hmpp.htmll
- (2010) HMPP Workbench: Directive-based Multi-language and Multi-target Hybrid Programming Model

23
- 84872980321
- HPC PROJECT
- HPC PROJECT. 2012. Par4All automatic parallelization version 1.3. http://www.par4all.orgg
- (2012) Par4All Automatic Parallelization Version 1.3

24
- 78650802947
- OpenMPC: Extended openmp programming and tuning for GPUs
- IEEE Computer Society, Washington, DC
- LEE, S. AND EIGENMANN, R. 2010. OpenMPC: Extended openmp programming and tuning for GPUs. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10). . IEEE Computer Society, Washington, DC, 1-11.
- (2010) Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10) , pp. 1-11
- Lee, S.¹ Eigenmann, R.²

25
- 67650081010
- OpenMP to GPGPU: A compiler framework for automatic translation and optimization
- LEE, S., MIN, S.-J., AND EIGENMANN, R. 2009. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In Proceedings of the Symposium on Principles and Practice of Parallel Programming.
- (2009) Proceedings of the Symposium on Principles and Practice of Parallel Programming
- Lee, S.¹ Min, S.-J.² Eigenmann, R.³

26
- 77952264175
- A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction
- ACM, New York
- LEUNG, A., VASILACHE, N., MEISTER, B., BASKARAN, M., WOHLFORD, D., BASTOUL, C., AND LETHIN, R. 2010. A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU'10). ACM, New York, 51-61.
- (2010) Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU'10) , pp. 51-61
- Leung, A.¹ Vasilache, N.² Meister, B.³ Baskaran, M.⁴ Wohlford, D.⁵ Bastoul, C.⁶ Lethin, R.⁷

27
- 4243731804
- M.S. thesis, Stanford University
- LIM, A. 2001. Improving parallelism and data locality with affine partitioning. M.S. thesis, Stanford University.
- (2001) Improving Parallelism and Data Locality with Affine Partitioning
- Lim, A.¹

28
- 70450103746
- A cross-input adaptive framework for gpu programs optimization
- LIU, Y., ZHANG, E. Z., AND SHEN, X. 2009. A cross-input adaptive framework for gpu programs optimization. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium.
- (2009) Proceedings of the IEEE International Parallel and Distributed Processing Symposium
- Liu, Y.¹ Zhang, E.Z.² Shen, X.³

29
- 33746034953
- Auto-vectorization of interleaved data for SIMD
- NUZMAN, D., ROSEN, I., AND ZAKS, A. 2006. Auto-vectorization of interleaved data for SIMD. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI'06).
- (2006) Proceedings of the Conference on Programming Language Design and Implementation (PLDI'06)
- Nuzman, D.¹ Rosen, I.² Zaks, A.³

30
- 63549093768
- Outer-loop vectorization-revisited for short SIMD architectures
- NUZMAN, D. AND ZAKS, A. 2008. Outer-loop vectorization-revisited for short SIMD architectures. In International Conference on Parallel Architecture and Compilation Techniques (PACT'08).
- (2008) International Conference on Parallel Architecture and Compilation Techniques (PACT'08)
- Nuzman, D.¹ Zaks, A.²

31
- 35948991669
- NVIDIA Corporation, NVIDIA Corporation
- NVIDIA Corporation 2011. NVIDIA CUDA Programming guide 4.0. NVIDIA Corporation.
- (2011) NVIDIA CUDA Programming guide 4.0

32
- 84867263494
- OpenACC
- OpenACC 2011. OpenACC: Directives for accelerators. http://www.openacc- standard.orgg
- (2011) OpenACC: Directives for Accelerators

33
- 84872913097
- PoCC
- PoCC 2012. PoCC: the polyhedral compiler collection version 1.1. http://www.cse.ohio-state.edu/pouchet/-software/pocc//
- (2012) PoCC: The Polyhedral Compiler Collection Version 1.1

34
- 84864028974
- Apricot: An optimizing compiler and productivity tool for x86-compatible many-core coprocessors
- RAVI, N., YANG, Y., BAO, T., AND CHAKRADHAR, S. 2012. Apricot: An optimizing compiler and productivity tool for x86-compatible many-core coprocessors. In International Conference on Supercomputing (ICS'12).
- (2012) International Conference on Supercomputing (ICS'12)
- Ravi, N.¹ Yang, Y.² Bao, T.³ Chakradhar, S.⁴

35
- 35448985754
- Parameterized tiled loops for free
- RENGANARAYANAN, L., KIM, D., RAJOPADHYE, S., AND STROUT, M. 2007. Parameterized tiled loops for free. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
- (2007) ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
- Renganarayanan, L.¹ Kim, D.² Rajopadhye, S.³ Strout, M.⁴

36
- 79952576869
- A programming language interface to describe transformations and code generation
- Springer
- RUDY, G., KHAN, M. M., HALL, M., CHEN, C., AND JACQUELINE, C. 2011. A programming language interface to describe transformations and code generation. In Proceedings of the 23rd international conference on Languages and Compilers for Parallel Computing (LCPC'10). Springer, 136-150.
- (2011) Proceedings of the 23rd International Conference on Languages and Compilers for Parallel Computing (LCPC'10) , pp. 136-150
- Rudy, G.¹ Khan, M.M.² Hall, M.³ Chen, C.⁴ Jacqueline, C.⁵

37
- 79959466764
- Optimization principles and application perform-ance evaluation of a multithreadedGPUusing CUDA
- RYOO, S., RODRIGUES, C. I., BAGHSORKHI, S. S., STONE, S. S., KIRK, D. B., AND HWU, W.-M. 2008a. Optimization principles and application perform-ance evaluation of a multithreadedGPUusing CUDA. In Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP'08).
- (2008) Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP'08)
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Hwu, W.-M.⁶

38
- 43449094719
- Optimization space pruning for a multithreaded GPU
- RYOO, S., RODRIGUES, C. I., STONE, S. S., BAGHSORKHI, S. S., S. UENG, J. A. S., AND HWU, W.-M. 2008b. Optimization space pruning for a multithreaded GPU. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'08).
- (2008) Proceedings of the International Symposium on Code Generation and Optimization (CGO'08)
- Ryoo, S.¹ Rodrigues, C.I.² Stone, S.S.³ Baghsorkhi, S.S.S.⁴ Ueng, J.A.S.⁵ Hwu, W.-M.⁶

39
- 84948740064
- Compiler-controlled caching in superword register files for multimedia extension architectures
- SHIN, J., CHAME, J., AND HALL, M. W. 2002. Compiler-controlled caching in superword register files for multimedia extension architectures. In International Conference on Parallel Architecture and Compilation Techniques (PACT'02).
- (2002) International Conference on Parallel Architecture and Compilation Techniques (PACT'02)
- Shin, J.¹ Chame, J.² Hall, M.W.³

40
- 33646554301
- Superword-level parallelism in the presence of control flow
- SHIN, J., HALL, M., AND CHAME, J. 2005. Superword-level parallelism in the presence of control flow. In Proceedings of the Interanational Symposium on Code Generation and Optimization (CGO'05).
- (2005) Proceedings of the Interanational Symposium on Code Generation and Optimization (CGO'05)
- Shin, J.¹ Hall, M.² Chame, J.³

41
- 83655177066
- THE PORTLAND GROUP The Portland Group
- THE PORTLAND GROUP 2010. PGI Accelerator Programming Model for Fortran&C v1.3 ed. The Portland Group.
- (2010) PGI Accelerator Programming Model for Fortran&C v1.3 Ed

42
- 70449626135
- Polyhedral-model guided loop-nest auto-vectorization
- TRIFUNOVÍC, K., NUZMAN, D., COHEN, A., ZAKS, A., AND ROSEN, I. 2009. Polyhedral-model guided loop-nest auto-vectorization. In International Conference on Parallel Architecture and Compilation Techniques (PACT'09).
- (2009) International Conference on Parallel Architecture and Compilation Techniques (PACT'09)
- Trifunovíc, K.¹ Nuzman, D.² Cohen, A.³ Zaks, A.⁴ Rosen, I.⁵

43
- 77952597755
- CUDA-lite: Reducing GPU programming complexity
- UENG, S., LATHARA, M., BAGHSORKHI, S. S., AND HWU, W.-M. 2008. CUDA-lite: Reducing GPU programming complexity. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing (LCPC'08).
- (2008) Proceedings of the Workshop on Languages and Compilers for Parallel Computing (LCPC'08)
- Ueng, S.¹ Lathara, M.² Baghsorkhi, S.S.³ Hwu, W.-M.⁴

44
- 84859153100
- Automatic restructuring of gpu kernels for exploiting inter-thread data locality
- Springer
- UNKULE, S., SHALTZ, C., AND QASEM, A. 2012. Automatic restructuring of gpu kernels for exploiting inter-thread data locality. In International Conference on Compiler Construction (CC'12). Lecture Notes in Computer Science, vol. 7210. Springer.
- (2012) International Conference on Compiler Construction (CC'12). Lecture Notes in Computer Science , vol.7210
- Unkule, S.¹ Shaltz, C.² Qasem, A.³

45
- 84872972843
- Joint scheduling and layout optimization to enable multi-level vectorization
- VASILACHE, N., MEISTER, B., BASKARAN, M., AND LETHIN, R. 2012. Joint scheduling and layout optimization to enable multi-level vectorization. In Proceedings of the International workshop on Polyhedral Compilation Techniques (IMPACT'12).
- (2012) Proceedings of the International Workshop on Polyhedral Compilation Techniques (IMPACT'12)
- Vasilache, N.¹ Meister, B.² Baskaran, M.³ Lethin, R.⁴

46
- 78149237521
- Isl: An integer set library for the polyhedral model
- K. Fukuda, J. Hoeven, M. Joswig, and N. Takayama, Eds. Lecture Notes in Computer Science Series Springer
- VERDOOLAEGE, S. 2010. isl: An integer set library for the polyhedral model. In International Conference on Mathematical Software (ICMS'10), K. Fukuda, J. Hoeven, M. Joswig, and N. Takayama, Eds. Lecture Notes in Computer Science Series, vol. 6327. Springer, 299-302.
- (2010) International Conference on Mathematical Software (ICMS'10) , vol.6327 , pp. 299-302
- Verdoolaege, S.¹

47
- 84893151816
- Polyhedral extraction tool
- VERDOOLAEGE, S. AND GROSSER, T. 2012. Polyhedral extraction tool. In Proceedings of the InternationalWorkshop on Polyhedral Compilation Techniques (IMPACT'12).
- (2012) Proceedings of the InternationalWorkshop on Polyhedral Compilation Techniques (IMPACT'12)
- Verdoolaege, S.¹ Grosser, T.²

48
- 13244279577
- Minimizing development and maintenance costs in supporting persistently optimized BLAS
- WHALEY, R. C. AND PETITET, A. 2005. Minimizing development and maintenance costs in supporting persistently optimized BLAS. Softw. Pract. Exper. 35, 2, 101-121. http://www.cs.utsa.edu/-whaley/papers/spercw04.ps+.
- (2005) Softw. Pract. Exper. , vol.35 , Issue.2 , pp. 101-121
- Whaley, R.C.¹ Petitet, A.²

49
- 0003927035
- Addison Wesley
- WOLFE, M. 1996. High Performance Compilers for Parallel Computing. Addison Wesley.
- (1996) High Performance Compilers for Parallel Computing
- Wolfe, M.¹

50
- 32844466554
- An integrated Simdization framework using virtual vectors
- WU, P., EICHENBERGER, A. E., WANG, A., AND ZHAO, P. 2005. An integrated Simdization framework using virtual vectors. In International Conference on Supercomputing (ICS'05).
- (2005) International Conference on Supercomputing (ICS'05)
- U, P.W.¹ Eichenberger, A.E.² Wang, A.³ Zhao, P.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.