SCOPUS 정보 검색 플랫폼

Proceedings of the International Conference on Supercomputing

Volumn , Issue , 2008, Pages 225-234

A compiler framework for optimization of affine loop nests for GPGPUs

(6) Baskaran, Muthu Manikandan a Bondhugula, Uday a Krishnamoorthy, Sriram a Ramanujam, J b Rountev, Atanas a Sadayappan, P a

a The Ohio State University

b Louisiana State University

Author keywords

Empirical tuning; GPU; Memory access optimization; Polyhedral model

Indexed keywords

AUTOMATIC PARALLELIZATION; COMPILER OPTIMIZATIONS; COMPUTATIONAL POWERS; DATA ACCESSES; DATA DEPENDENCES; DEVICE ARCHITECTURES; EMPIRICAL TUNING; GPU; LOOP NESTS; MEMORY ACCESS OPTIMIZATION; OPTIMAL PARAMETERS; PARALLEL ARCHITECTURES; PARALLEL CODES; PERFORMANCE OPTIMIZATIONS; POLYHEDRAL MODEL; PROGRAM TRANSFORMATIONS; PROGRAMMING MODELS; SHARED MEMORIES;

COMPUTERS; INTELLIGENT CONTROL; MATHEMATICAL TRANSFORMATIONS; OPTIMIZATION; PROGRAM COMPILERS; PROGRAM PROCESSORS; STRUCTURED PROGRAMMING; SYSTEMS ANALYSIS;

GLOBAL OPTIMIZATION;

EID: 57349180412 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1375527.1375562 Document Type: Conference Paper

Times cited : (171)

References (28)

1
- 84976766536
- Scanning polyhedra with do loops
- C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In PPoPP'91, pages 39-50, 1991.
- (1991) PPoPP'91 , pp. 39-50
- Ancourt, C.¹ Irigoin, F.²

2
- 79959456077
- Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
- Feb
- M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In A CM SIGPLAN PPoPP 2008, Feb. 2008.
- (2008) A CM SIGPLAN PPoPP 2008
- Baskaran, M.¹ Bondhugula, U.² Krishnamoorthy, S.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

3
- 10444289646
- Code generation in the polyhedral model is easier than you think
- C. Bastoul. Code generation in the polyhedral model is easier than you think. In PACT'04, pages 7-16, 2004.
- (2004) PACT'04 , pp. 7-16
- Bastoul, C.¹

4
- 57349145904
- Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model
- Apr
- U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In International Conference on Compiler Construction (ETAPS CC), Apr. 2008.
- (2008) International Conference on Compiler Construction (ETAPS CC)
- Bondhugula, U.¹ Baskaran, M.² Krishnamoorthy, S.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

5
- 57349139452
- A practical automatic polyhedral parallelizer and locality optimizer
- U. Bondhugula, A. Hartono, J. Ramanujan, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In ACM SIGPLAN Programming Languages Design and Implementation (PLDI'08), 2008.
- (2008) ACM SIGPLAN Programming Languages Design and Implementation (PLDI'08)
- Bondhugula, U.¹ Hartono, A.² Ramanujan, J.³ Sadayappan, P.⁴

6
- 10644248153
- Brook for GPUs: Stream computing on graphics hardware
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. In S1GGRAPH'04, pages 777-786, 2004.
- (2004) S1GGRAPH'04 , pp. 777-786
- Buck, I.¹ Foley, T.² Horn, D.³ Sugerman, J.⁴ Fatahalian, K.⁵ Houston, M.⁶ Hanrahan, P.⁷

7
- 84910075371
- CLooG: The Chunky Loop Generator. http://www.cloog.org.
- The Chunky Loop Generator

8
- 78651269052
- Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
- K. Fatahalian, J. Sugerman, and P. Hanrahan. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication. In ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pages 133-137, 2004.
- (2004) ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware , pp. 133-137
- Fatahalian, K.¹ Sugerman, J.² Hanrahan, P.³

9
- 0026109335
- Dataflow analysis of array and scalar references
- P. Feautrier. Dataflow analysis of array and scalar references. IJPP, 20(1):23-53, 1991.
- (1991) IJPP , vol.20 , Issue.1 , pp. 23-53
- Feautrier, P.¹

10
- 0026933251
- Some efficient solutions to the affine scheduling problem, part I: One-dimensional time
- P. Feautrier. Some efficient solutions to the affine scheduling problem, part I: one-dimensional time. IJPP, 21(5):313-348, 1992.
- (1992) IJPP , vol.21 , Issue.5 , pp. 313-348
- Feautrier, P.¹

11
- 0001448065
- Some efficient solutions to the affine scheduling problem, part II: Multidimensional time
- P. Feautrier. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. IJPP, 21(6):389-420, 1992.
- (1992) IJPP , vol.21 , Issue.6 , pp. 389-420
- Feautrier, P.¹

12
- 34548292052
- A memory model for scientific algorithms on graphics processors
- N. K. Govindaraju, S. Larsen, J. Gray, and D. Manocha. A memory model for scientific algorithms on graphics processors. In SC'06, 2006.
- (2006) SC'06
- Govindaraju, N.K.¹ Larsen, S.² Gray, J.³ Manocha, D.⁴

13
- 57349162527
- General-Purpose Computation Using Graphics Hardware. http://www.gpgpu. org/.
- General-Purpose Computation Using Graphics Hardware. http://www.gpgpu. org/.

14
- 57349100116
- Automatic Parallelization of Loop Programs for Distributed Memory Architectures. FMI, University of Passau, Habilitation Thesis
- M. Griebl. Automatic Parallelization of Loop Programs for Distributed Memory Architectures. FMI, University of Passau, 2004. Habilitation Thesis.
- (2004)
- Griebl, M.¹

15
- 57349101237
- Data and computation transformations for Brook streaming applications on multiprocessors
- S.-W. Liao, Z. Du, G. Wu, and G.-Y. Lueh. Data and computation transformations for Brook streaming applications on multiprocessors. In CGO'06, pages 196-207, 2006.
- (2006) CGO'06 , pp. 196-207
- Liao, S.-W.¹ Du, Z.² Wu, G.³ Lueh, G.-Y.⁴

16
- 4243731804
- PhD thesis, Stanford University, Aug
- A. Lim. Improving Parallelism And Data Locality With Affine Partitioning. PhD thesis, Stanford University, Aug. 2001.
- (2001) Improving Parallelism And Data Locality With Affine Partitioning
- Lim, A.¹

17
- 0030645995
- Maximizing parallelism and minimizing synchronization with affine transforms
- A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In POPL, pages 201-214, 1997.
- (1997) POPL , pp. 201-214
- Lim, A.W.¹ Lam, M.S.²

18
- 57349189733
- NVIDIA CUDA
- NVIDIA CUDA. http://developer.nvidia.com/object/cuda.html.

19
- 57349128633
- NVIDIA GeForce 8800. http://www.nvidia.com/page/geforce-8800.html.
- , vol.8800

20
- 84877715579
- PLuTo: A polyhedral automatic parallelizer and locality optimizer for multicores. http://pluto-compiler.sourceforge.net.
- PLuTo: A polyhedral automatic parallelizer and locality optimizer for multicores

21
- 34547683700
- Iterative optimization in the polyhedral model: Part I, one-dimensional time
- L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In CGO'07, pages 144-156, 2007.
- (2007) CGO'07 , pp. 144-156
- Pouchet, L.-N.¹ Bastoul, C.² Cohen, A.³ Vasilache, N.⁴

22
- 84976676720
- The Omega test: A fast and practical integer programming algorithm for dependence analysis
- Aug
- W. Pugh. The Omega test: a fast and practical integer programming algorithm for dependence analysis. Communications of the ACM, 8:102-114, Aug. 1992.
- (1992) Communications of the ACM , vol.8 , pp. 102-114
- Pugh, W.¹

23
- 0034299275
- Generation of efficient nested loops from polyhedra
- F. Quilleré, S. V. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. IJPP, 28(5):469-498, 2000.
- (2000) IJPP , vol.28 , Issue.5 , pp. 469-498
- Quilleré, F.¹ Rajopadhye, S.V.² Wilde, D.³

24
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
- Feb
- S. Ryoo, C. Rodrigues, S. Baghsorkhi, S. Stone, D. Kirk, and W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In ACM SIGPLAN PPoPP 2008, Feb. 2008.
- (2008) ACM SIGPLAN PPoPP 2008
- Ryoo, S.¹ Rodrigues, C.² Baghsorkhi, S.³ Stone, S.⁴ Kirk, D.⁵ Hwu, W.⁶

25
- 51449106975
- Program optimization study on a 128-core GPU
- October
- S. Ryoo, C. Rodrigues, S. Stone, S. Baghsorkhi, S. Ueng, and W. Hwu. Program optimization study on a 128-core GPU. In The First Workshop on General Purpose Processing on Graphics Processing Units, October 2007.
- (2007) The First Workshop on General Purpose Processing on Graphics Processing Units
- Ryoo, S.¹ Rodrigues, C.² Stone, S.³ Baghsorkhi, S.⁴ Ueng, S.⁵ Hwu, W.⁶

26
- 43449094719
- S. Ryoo, C. Rodrigues, S. Stone, S. Baghsorkhi, S. Ueng, J. Stratton, and W. Hwu. Program optimization space pruning for a multithreaded GPU. In CGO, 2008.
- S. Ryoo, C. Rodrigues, S. Stone, S. Baghsorkhi, S. Ueng, J. Stratton, and W. Hwu. Program optimization space pruning for a multithreaded GPU. In CGO, 2008.

27
- 33947595619
- Accelerator: Using data parallelism to program GPUs for general-purpose uses
- D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. In ASPLOS-XII, pages 325-335, 2006.
- (2006) ASPLOS-XII , pp. 325-335
- Tarditi, D.¹ Puri, S.² Oglesby, J.³

28
- 33745804733
- Polyhedral code generation in the real world
- Mar
- N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In International Conference on Compiler Construction (ETAPS CC'06), pages 185-201, Mar. 2006.
- (2006) International Conference on Compiler Construction (ETAPS CC'06) , pp. 185-201
- Vasilache, N.¹ Bastoul, C.² Cohen, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.