SCOPUS 정보 검색 플랫폼

ACM SIGPLAN Notices

Volumn 45, Issue 6, 2010, Pages 86-97

A GPGPU compiler for memory optimization and parallelism management

(4) Yang, Yi a Xiang, Ping b Kong, Jingfei b Zhou, Huiyang a

a North Carolina State University (United States)

b UNIVERSITY OF CENTRAL FLORIDA (United States)

Author keywords

Compiler; GPGPU

Indexed keywords

ALGORITHM REFINEMENT; COMPILER; DATA REUSE; GENERAL PURPOSE; GPGPU; GRAPHICS PROCESSING UNIT; KERNEL FUNCTION; MEDIA-PROCESSING ALGORITHMS; MEMORY ACCESS PATTERNS; MEMORY BANDWIDTHS; MEMORY HIERARCHY; MEMORY OPTIMIZATION; OPTIMIZATION PROCESS; OPTIMIZING COMPILERS; PERFORMANCE ANALYSIS; PERFORMANCE OPTIMIZATIONS; REMAPPING; UNDERSTANDABILITY; VECTORIZATION;

FLOCCULATION; PROGRAM COMPILERS;

OPTIMIZATION;

EID: 77957600490 PISSN: 15232867 EISSN: None Source Type: Journal
DOI: 10.1145/1809028.1806606 Document Type: Conference Paper

Times cited : (54)

References (20)

1
- 0004072686
- Pearson Education
- A.V. Aho, Ravi Sethi, and J.D. Ullman. Compilers, Principles, Techniques, & Tools, Pearson Education, 2007.
- (2007) Compilers, Principles Techniques & Tools
- Aho, A.V.¹ Sethi, R.² Ullman, J.D.³

2
- 57349180412
- A compiler framework for optimization of affine loop nests for GPGPUs
- M.M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs. In Proc. International Conference on Supercomputing, 2008.
- (2008) Proc. International Conference on Supercomputing
- Baskaran, M.M.¹ Bondhugula, U.² Krishnamoorthy, S.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

3
- 79959456077
- Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
- M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories. In Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008.
- (2008) Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- Baskaran, M.¹ Bondhugula, U.² Krishnamoorthy, S.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

4
- 84968470212
- An algorithm for the machine calculation of complex Fourier series
- J. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series, In Math. Comput, 1965.
- (1965) Math. Comput
- Cooley, J.¹ Tukey, J.W.²

5
- 51049101693
- Fast matrix-vector multiplication on GeForce 8800 GTX
- N. Fujimoto. Fast Matrix-Vector Multiplication on GeForce 8800 GTX. In Proc. IEEE International Parallel & Distributed Processing Symposium, 2008
- (2008) Proc. IEEE International Parallel & Distributed Processing Symposium
- Fujimoto, N.¹

6
- 60849099135
- High performance discrete Fourier transforms on graphics processors
- N. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, and J. Manferdelli. High performance discrete Fourier transforms on graphics processors. In Proc. Supercomputing, 2008.
- (2008) Proc. Supercomputing
- Govindaraju, N.¹ Lloyd, B.² Dotsenko, Y.³ Smith, B.⁴ Manferdelli, J.⁵

7
- 70450231944
- An analytical model for GPU architecture with memory-level and thread-level parallelism awareness
- S. Hong and H. Kim. An analytical model for GPU architecture with memory-level and thread-level parallelism awareness. In Proc. International Symposium on Computer Architecture, 2009.
- (2009) Proc. International Symposium on Computer Architecture
- Hong, S.¹ Kim, H.²

8
- 26444437628
- Cetus - An extensible compiler infrastructure for source-to-source transformation
- S.-I. Lee, T. Johnson, and R. Eigenmann. Cetus - An extensible compiler infrastructure for source-to-source transformation. In Proc. Workshops on Languages and Compilers for Parallel Computing, 2003
- (2003) Proc. Workshops on Languages and Compilers for Parallel Computing
- Lee, S.-I.¹ Johnson, T.² Eigenmann, R.³

9
- 67650081010
- OpenMP to GPGPU: A compiler framework for automatic translation and optimization
- S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009
- (2009) Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- Lee, S.¹ Min, S.-J.² Eigenmann, R.³

10
- 70450103746
- A cross-input adaptive framework for GPU programs optimization
- Y. Liu, E.Z. Zhang, amd X. Shen. A Cross-Input Adaptive Framework for GPU Programs Optimization. In Proc. IEEE International Parallel & Distributed Processing Symposium, 2009.
- (2009) Proc. IEEE International Parallel & Distributed Processing Symposium
- Liu, Y.¹ Zhang, E.Z.² Shen, X.³

11
- 34547683700
- Iterative optimization in the polyhedral mode: Part I, on dimensional time
- L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral mode: Part I, on dimensional time. In Proc. International Symposium on Code Generation and Optimization, 2007
- (2007) Proc. International Symposium on Code Generation and Optimization
- Pouchet, L.-N.¹ Bastoul, C.² Cohen, A.³ Vasilache, N.⁴

12
- 77952265152
- Optimize matrix transpose in CUDA
- G. Ruetsch and P. Micikevicius. Optimize matrix transpose in CUDA. NVIDIA, 2009.
- (2009) NVIDIA
- Ruetsch, G.¹ Micikevicius, P.²

13
- 43449094719
- Optimization space pruning for a multithreaded GPU
- S. Ryoo, C.I. Rodrigues, S.S. Stone, S.S. Baghsorkhi, S. Ueng, J.A. Stratton, and W.W. Hwu. Optimization space pruning for a multithreaded GPU. In Proc. International Symposium on Code Generation and Optimization, 2008.
- (2008) Proc. International Symposium on Code Generation and Optimization
- Ryoo, S.¹ Rodrigues, C.I.² Stone, S.S.³ Baghsorkhi, S.S.⁴ Ueng, S.⁵ Stratton, J.A.⁶ Hwu, W.W.⁷

14
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
- S. Ryoo, C.I. Rodrigues, S.S. Baghsorkhi, S.S. Stone, D.B. Kirk, and W.W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008.
- (2008) Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Hwu, W.W.⁶

15
- 77957561221
- An adaptive performance modling tool for GPU architectures
- S.S. Baghsorkhi, M. Delahaye, S.J. Patel, W.D. Gropp, and W.W. Hwu. An adaptive performance modling tool for GPU architectures. In Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010.
- (2010) Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- Baghsorkhi, S.S.¹ Delahaye, M.² Patel, S.J.³ Gropp, W.D.⁴ Hwu, W.W.⁵

16
- 53749087821
- MCUDA: An efficient implementation of CUDA kernels on multicores
- UIUC, Feb.
- J.A. Stratton, S.S. Stone, and W.W. Hwu. MCUDA:An efficient implementation of CUDA kernels on multicores. IMPACT Technical Report IMPACT-08-01, UIUC, Feb. 2008.
- (2008) IMPACT Technical Report IMPACT-08-01
- Stratton, J.A.¹ Stone, S.S.² Hwu, W.W.³

17
- 77952597755
- CUDA-lite: Reducing GPU programming complexity
- S. Ueng, M. Lathara, S.S. Baghsorkhi, and W.W. Hwu. CUDA-lite: Reducing GPU programming Complexity, In Proc. Workshops on Languages and Compilers for Parallel Computing, 2008
- (2008) Proc. Workshops on Languages and Compilers for Parallel Computing
- Ueng, S.¹ Lathara, M.² Baghsorkhi, S.S.³ Hwu, W.W.⁴

18
- 67349149521
- Benchmarking GPUs to tune dense linear algebra
- V. Volkov and J.W. Demmel. Benchmarking GPUs to tune dense linear algebra. In Proc. Supercomputing, 2008.
- (2008) Proc. Supercomputing
- Volkov, V.¹ Demmel, J.W.²

19
- 77957557473
- NVIDIA CUDA Programming Guide, Version 2.1
- NVIDIA CUDA Programming Guide, Version 2.1, 2008
- (2008)

20
- 77957551580
- http://code.google.com/p/gpgpucompiler/

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.