SCOPUS 정보 검색 플랫폼

Journal of Parallel and Distributed Computing

Volumn 68, Issue 10, 2008, Pages 1389-1401

Program optimization carving for GPU computing

(7) Ryoo, Shane a Rodrigues, Christopher I a Stone, Sam S a Stratton, John A a Ueng, Sain Zee a Baghsorkhi, Sara S a Hwu, Wen mei W a

a UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

Author keywords

GPU computing; Optimization space exploration; Parallel computing

Indexed keywords

APPLICATIONS; COMPUTER NETWORKS;

GPU COMPUTING; OPTIMIZATION SPACE EXPLORATION; PARALLEL COMPUTING; PROGRAM OPTIMIZATION;

GLOBAL OPTIMIZATION;

EID: 51449112813 PISSN: 07437315 EISSN: None Source Type: Journal
DOI: 10.1016/j.jpdc.2008.05.011 Document Type: Article

Times cited : (104)

References (34)

1
- 84886006847
- F. Agakov, et al. Using machine learning to focus iterative optimization, in: Proc. 4th Annual International Symposium on Code Generation and Optimization, 2006, pp. 295-305
- F. Agakov, et al. Using machine learning to focus iterative optimization, in: Proc. 4th Annual International Symposium on Code Generation and Optimization, 2006, pp. 295-305

2
- 84987205012
- F.E. Allen, M. Burke, P. Charles, R. Cytron, J. Ferrante, An overview of the PTRAN analysis system for multiprocessing, in: Proc. 1st International Conference on Supercomputing, 1987, pp. 194-211
- F.E. Allen, M. Burke, P. Charles, R. Cytron, J. Ferrante, An overview of the PTRAN analysis system for multiprocessing, in: Proc. 1st International Conference on Supercomputing, 1987, pp. 194-211

3
- 85033444523
- J. Allen, K. Kennedy, Automatic loop interchange, in: Proc. ACM SIGPLAN '84 Symposium on Compiler Construction, 1984, pp. 233-246
- J. Allen, K. Kennedy, Automatic loop interchange, in: Proc. ACM SIGPLAN '84 Symposium on Compiler Construction, 1984, pp. 233-246

4
- 0001160585
- PFC: A program to convert Fortran to parallel form
- Hwang K. (Ed), IEEE Computer Society Press, Los Alamitos, CA
- Allen J.R., and Kennedy K. PFC: A program to convert Fortran to parallel form. In: Hwang K. (Ed). Supercomputers: Design and Applications (1984), IEEE Computer Society Press, Los Alamitos, CA 186-203
- (1984) Supercomputers: Design and Applications , pp. 186-203
- Allen, J.R.¹ Kennedy, K.²

5
- 79959456077
- M.M. Baskaran, et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories, in: Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008, pp. 1-10
- M.M. Baskaran, et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories, in: Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008, pp. 1-10

6
- 51449109831
- I. Buck, Brook Specification v0.2, October 2003
- I. Buck, Brook Specification v0.2, October 2003

7
- 0025447908
- D. Callahan, S. Carr, K. Kennedy, Improving register allocation for subscripted variables, in: Proc. SIGPLAN 1990 Conference on Program Language Design and Implementation, 1990, pp. 53-65
- D. Callahan, S. Carr, K. Kennedy, Improving register allocation for subscripted variables, in: Proc. SIGPLAN 1990 Conference on Program Language Design and Implementation, 1990, pp. 53-65

8
- 4644226058
- Y. Chou, B. Fahs, S. Abraham, Microarchitecture optimizations for exploiting memory-level parallelism, in: Proc. 31th Annual International Symposium on Computer Architecture, 2004, pp. 76-88
- Y. Chou, B. Fahs, S. Abraham, Microarchitecture optimizations for exploiting memory-level parallelism, in: Proc. 31th Annual International Symposium on Computer Architecture, 2004, pp. 76-88

9
- 78651269052
- K. Fatahalian, J. Sugerman, P. Hanrahan, Understanding the efficiency of GPU algorithms for matrix-matrix multiplication, in: Proc. 2004 ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2004, pp. 133-137
- K. Fatahalian, J. Sugerman, P. Hanrahan, Understanding the efficiency of GPU algorithms for matrix-matrix multiplication, in: Proc. 2004 ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2004, pp. 133-137

10
- 0347468637
- S. Ghosh, M. Martonosi, S. Malik, Precise miss analysis for program transformations with caches of arbitrary associativity, in: Proc. 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998, pp. 228-239
- S. Ghosh, M. Martonosi, S. Malik, Precise miss analysis for program transformations with caches of arbitrary associativity, in: Proc. 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998, pp. 228-239

11
- 34548292052
- N.K. Govindaraju, S. Larsen, J. Gray, D. Manocha, A memory model for scientific algorithms on graphics processors, in: Proc. 2006 ACM/IEEE Conference on Supercomputing, No. 89, 2006, pp. 89-99
- N.K. Govindaraju, S. Larsen, J. Gray, D. Manocha, A memory model for scientific algorithms on graphics processors, in: Proc. 2006 ACM/IEEE Conference on Supercomputing, No. 89, 2006, pp. 89-99

12
- 0013107503
- Software support for improving locality in scientific codes
- Han H., Rivera G., and Tseng C.-W. Software support for improving locality in scientific codes. Workshop on Compilers for Parallel Computers (2000)
- (2000) Workshop on Compilers for Parallel Computers
- Han, H.¹ Rivera, G.² Tseng, C.-W.³

13
- 33746686278
- M. Haneda, P.M.W. Knijnenburg, H.A.G. Wijshoff, Automatic selection of compiler options using non-parametric inferential statistics, in: Proc. 14th International Conference on Parallel Architectures and Compilation Techniques, 2005, pp. 123-132
- M. Haneda, P.M.W. Knijnenburg, H.A.G. Wijshoff, Automatic selection of compiler options using non-parametric inferential statistics, in: Proc. 14th International Conference on Parallel Architectures and Compilation Techniques, 2005, pp. 123-132

14
- 33746706203
- C. Jiang, M. Snir, Automatic tuning matrix multiplication performance on graphics hardware, in: Proc. 14th International Conference on Parallel Architecture and Compilation Techniques, 2005, pp. 185-196
- C. Jiang, M. Snir, Automatic tuning matrix multiplication performance on graphics hardware, in: Proc. 14th International Conference on Parallel Architecture and Compilation Techniques, 2005, pp. 185-196

15
- 36949033619
- D. Jimenez-Gonzalez, X. Martorell, A. Ramirez, Performance analysis of Cell Broadband Engine for high memory bandwidth applications, in: Proc. IEEE International Symposium on Performance Analysis of Systems and Software, 2007, pp. 210-219
- D. Jimenez-Gonzalez, X. Martorell, A. Ramirez, Performance analysis of Cell Broadband Engine for high memory bandwidth applications, in: Proc. IEEE International Symposium on Performance Analysis of Systems and Software, 2007, pp. 210-219

16
- 0037952146
- Morgan Kaufmann Publishers, San Francisco, CA
- Kennedy K., and Allen R. Optimizing Compilers for Modern Architectures: A Dependence-based Approach (2002), Morgan Kaufmann Publishers, San Francisco, CA
- (2002) Optimizing Compilers for Modern Architectures: A Dependence-based Approach
- Kennedy, K.¹ Allen, R.²

17
- 0034512401
- T. Kisuki, P.M.W. Knijnenburg, M.F.P. O'Boyle, Combined selection of tile sizes and unroll factors using iterative compilation, in: Proc. 2000 International Conference on Parallel Architectures and Compilation Techniques, 2000, pp. 237-248
- T. Kisuki, P.M.W. Knijnenburg, M.F.P. O'Boyle, Combined selection of tile sizes and unroll factors using iterative compilation, in: Proc. 2000 International Conference on Parallel Architectures and Compilation Techniques, 2000, pp. 237-248

18
- 0021570530
- D.J. Kuck, et al. The effects of program restructuring, algorithm change, and architecture choice on program performance, in: Proc. 13th International Conference on Parallel Processing, 1984, pp. 129-138
- D.J. Kuck, et al. The effects of program restructuring, algorithm change, and architecture choice on program performance, in: Proc. 13th International Conference on Parallel Processing, 1984, pp. 129-138

19
- 34547684388
- P.A. Kulkarni, D.B. Whalley, G.S. Tyson, J.W. Davidson, Evaluation heuristic optimization phase order search algorithms, in: Proc. 2007 International Symposium on Code Generation and Optimization, 2007, pp. 157-169
- P.A. Kulkarni, D.B. Whalley, G.S. Tyson, J.W. Davidson, Evaluation heuristic optimization phase order search algorithms, in: Proc. 2007 International Symposium on Code Generation and Optimization, 2007, pp. 157-169

20
- 51449099053
- J. Nickolls, I. Buck, NVIDIA CUDA software and GPU parallel computing architecture, Microprocessor Forum, May 2007
- J. Nickolls, I. Buck, NVIDIA CUDA software and GPU parallel computing architecture, Microprocessor Forum, May 2007

21
- 27344435504
- The design and implementation of a first-generation CELL processor
- Pham D., et al. The design and implementation of a first-generation CELL processor. IEEE International Solid-State Circuits Conference (2005)
- (2005) IEEE International Solid-State Circuits Conference
- Pham, D.¹

22
- 1542396679
- SPIRAL: A generator for platform-adapted libraries of signal processing algorithms
- (special issue on Automatic Performance Tuning)
- Püschel M., et al. SPIRAL: A generator for platform-adapted libraries of signal processing algorithms. Journal of High Performance Computing and Applications 18 1 (2004) 21-45 (special issue on Automatic Performance Tuning)
- (2004) Journal of High Performance Computing and Applications , vol.18 , Issue.1 , pp. 21-45
- Püschel, M.¹

23
- 51449106975
- Program optimization study on a 128-core GPU
- Ryoo S., et al. Program optimization study on a 128-core GPU. The First Workshop on General Purpose Processing on Graphics Processing Units (2007)
- (2007) The First Workshop on General Purpose Processing on Graphics Processing Units
- Ryoo, S.¹

24
- 51449122095
- S. Ryoo, Program optimization strategies for many-core processors, Ph.D. thesis, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 2008
- S. Ryoo, Program optimization strategies for many-core processors, Ph.D. thesis, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 2008

25
- 79959466764
- S. Ryoo, et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, in: Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008, pp. 73-82
- S. Ryoo, et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, in: Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008, pp. 73-82

26
- 51449105693
- J.W. Sias, A systematic approach to delivering instruction-level parallelism in EPIC systems, Ph.D. thesis, University of Illinois at Urbana-Champaign, 2005
- J.W. Sias, A systematic approach to delivering instruction-level parallelism in EPIC systems, Ph.D. thesis, University of Illinois at Urbana-Champaign, 2005

27
- 33846475612
- D. Tarditi, S. Puri, J. Oglesby, Accelerator: Using data parallelism to program GPUs for general-purpose uses, in: Proc. 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006, pp. 325-335
- D. Tarditi, S. Puri, J. Oglesby, Accelerator: Using data parallelism to program GPUs for general-purpose uses, in: Proc. 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006, pp. 325-335

28
- 67650534864
- S. Triantafyllis, M. Vachharajani, N. Vachharajani, D.I. August, Compiler optimization-space exploration, in: Proc. 2003 International Symposium on Code Generation and Optimization, 2003, pp. 204-215
- S. Triantafyllis, M. Vachharajani, N. Vachharajani, D.I. August, Compiler optimization-space exploration, in: Proc. 2003 International Symposium on Code Generation and Optimization, 2003, pp. 204-215

29
- 85006879958
- D.N. Truong, F. Bodin, A. Seznec, Improving cache behavior of dynamically allocated data structures, in: Proc. Seventh International Conference on Parallel Architectures and Compilation Techniques, 1998, pp. 322+
- D.N. Truong, F. Bodin, A. Seznec, Improving cache behavior of dynamically allocated data structures, in: Proc. Seventh International Conference on Parallel Architectures and Compilation Techniques, 1998, pp. 322+

30
- 0030379246
- M.E. Wolf, D.E. Maydan, D.-K. Chen, Combining loop transformations considering caches and scheduling, in: Proc. 29th Annual ACM/IEEE International Symposium on Microarchitecture, 1996, pp. 274-286
- M.E. Wolf, D.E. Maydan, D.-K. Chen, Combining loop transformations considering caches and scheduling, in: Proc. 29th Annual ACM/IEEE International Symposium on Microarchitecture, 1996, pp. 274-286

31
- 51449085846
- M. Wolfe, Iteration space tiling for memory hierarchies, in: Proc. Third SIAM Conference on Parallel Processing for Scientific Computing, 1987, pp. 357-361
- M. Wolfe, Iteration space tiling for memory hierarchies, in: Proc. Third SIAM Conference on Parallel Processing for Scientific Computing, 1987, pp. 357-361

32
- 0006424869
- Y. Yamada, J. Gyllenhaal, G. Haab, W.W. Hwu, Data relocation and prefetching for large data sets, in: Proc. 27th Annual ACM/IEEE International Symposium on Microarchitecture, 1994, pp. 118-127
- Y. Yamada, J. Gyllenhaal, G. Haab, W.W. Hwu, Data relocation and prefetching for large data sets, in: Proc. 27th Annual ACM/IEEE International Symposium on Microarchitecture, 1994, pp. 118-127

33
- 1442337777
- M. Zhao, B. Childers, M.L. Soffa, Predicting the impact of optimizations for embedded systems, in: Proc. 2003 Conference on Languages, Compilers, and Tools for Embedded Systems, 2003, pp. 1-11
- M. Zhao, B. Childers, M.L. Soffa, Predicting the impact of optimizations for embedded systems, in: Proc. 2003 Conference on Languages, Compilers, and Tools for Embedded Systems, 2003, pp. 1-11

34
- 0003488086
- Addison-Wesley Publishing Company, Reading, MA
- Zima H., and Chapman B. Supercompilers for Parallel and Vector Computers (1991), Addison-Wesley Publishing Company, Reading, MA
- (1991) Supercompilers for Parallel and Vector Computers
- Zima, H.¹ Chapman, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.