메뉴 건너뛰기




Volumn 68, Issue 10, 2008, Pages 1389-1401

Program optimization carving for GPU computing

Author keywords

GPU computing; Optimization space exploration; Parallel computing

Indexed keywords

APPLICATIONS; COMPUTER NETWORKS;

EID: 51449112813     PISSN: 07437315     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.jpdc.2008.05.011     Document Type: Article
Times cited : (104)

References (34)
  • 1
    • 84886006847 scopus 로고    scopus 로고
    • F. Agakov, et al. Using machine learning to focus iterative optimization, in: Proc. 4th Annual International Symposium on Code Generation and Optimization, 2006, pp. 295-305
    • F. Agakov, et al. Using machine learning to focus iterative optimization, in: Proc. 4th Annual International Symposium on Code Generation and Optimization, 2006, pp. 295-305
  • 2
    • 84987205012 scopus 로고    scopus 로고
    • F.E. Allen, M. Burke, P. Charles, R. Cytron, J. Ferrante, An overview of the PTRAN analysis system for multiprocessing, in: Proc. 1st International Conference on Supercomputing, 1987, pp. 194-211
    • F.E. Allen, M. Burke, P. Charles, R. Cytron, J. Ferrante, An overview of the PTRAN analysis system for multiprocessing, in: Proc. 1st International Conference on Supercomputing, 1987, pp. 194-211
  • 3
    • 85033444523 scopus 로고    scopus 로고
    • J. Allen, K. Kennedy, Automatic loop interchange, in: Proc. ACM SIGPLAN '84 Symposium on Compiler Construction, 1984, pp. 233-246
    • J. Allen, K. Kennedy, Automatic loop interchange, in: Proc. ACM SIGPLAN '84 Symposium on Compiler Construction, 1984, pp. 233-246
  • 4
    • 0001160585 scopus 로고
    • PFC: A program to convert Fortran to parallel form
    • Hwang K. (Ed), IEEE Computer Society Press, Los Alamitos, CA
    • Allen J.R., and Kennedy K. PFC: A program to convert Fortran to parallel form. In: Hwang K. (Ed). Supercomputers: Design and Applications (1984), IEEE Computer Society Press, Los Alamitos, CA 186-203
    • (1984) Supercomputers: Design and Applications , pp. 186-203
    • Allen, J.R.1    Kennedy, K.2
  • 5
    • 79959456077 scopus 로고    scopus 로고
    • M.M. Baskaran, et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories, in: Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008, pp. 1-10
    • M.M. Baskaran, et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories, in: Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008, pp. 1-10
  • 6
    • 51449109831 scopus 로고    scopus 로고
    • I. Buck, Brook Specification v0.2, October 2003
    • I. Buck, Brook Specification v0.2, October 2003
  • 7
    • 0025447908 scopus 로고    scopus 로고
    • D. Callahan, S. Carr, K. Kennedy, Improving register allocation for subscripted variables, in: Proc. SIGPLAN 1990 Conference on Program Language Design and Implementation, 1990, pp. 53-65
    • D. Callahan, S. Carr, K. Kennedy, Improving register allocation for subscripted variables, in: Proc. SIGPLAN 1990 Conference on Program Language Design and Implementation, 1990, pp. 53-65
  • 8
    • 4644226058 scopus 로고    scopus 로고
    • Y. Chou, B. Fahs, S. Abraham, Microarchitecture optimizations for exploiting memory-level parallelism, in: Proc. 31th Annual International Symposium on Computer Architecture, 2004, pp. 76-88
    • Y. Chou, B. Fahs, S. Abraham, Microarchitecture optimizations for exploiting memory-level parallelism, in: Proc. 31th Annual International Symposium on Computer Architecture, 2004, pp. 76-88
  • 9
    • 78651269052 scopus 로고    scopus 로고
    • K. Fatahalian, J. Sugerman, P. Hanrahan, Understanding the efficiency of GPU algorithms for matrix-matrix multiplication, in: Proc. 2004 ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2004, pp. 133-137
    • K. Fatahalian, J. Sugerman, P. Hanrahan, Understanding the efficiency of GPU algorithms for matrix-matrix multiplication, in: Proc. 2004 ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2004, pp. 133-137
  • 10
    • 0347468637 scopus 로고    scopus 로고
    • S. Ghosh, M. Martonosi, S. Malik, Precise miss analysis for program transformations with caches of arbitrary associativity, in: Proc. 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998, pp. 228-239
    • S. Ghosh, M. Martonosi, S. Malik, Precise miss analysis for program transformations with caches of arbitrary associativity, in: Proc. 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998, pp. 228-239
  • 11
    • 34548292052 scopus 로고    scopus 로고
    • N.K. Govindaraju, S. Larsen, J. Gray, D. Manocha, A memory model for scientific algorithms on graphics processors, in: Proc. 2006 ACM/IEEE Conference on Supercomputing, No. 89, 2006, pp. 89-99
    • N.K. Govindaraju, S. Larsen, J. Gray, D. Manocha, A memory model for scientific algorithms on graphics processors, in: Proc. 2006 ACM/IEEE Conference on Supercomputing, No. 89, 2006, pp. 89-99
  • 13
    • 33746686278 scopus 로고    scopus 로고
    • M. Haneda, P.M.W. Knijnenburg, H.A.G. Wijshoff, Automatic selection of compiler options using non-parametric inferential statistics, in: Proc. 14th International Conference on Parallel Architectures and Compilation Techniques, 2005, pp. 123-132
    • M. Haneda, P.M.W. Knijnenburg, H.A.G. Wijshoff, Automatic selection of compiler options using non-parametric inferential statistics, in: Proc. 14th International Conference on Parallel Architectures and Compilation Techniques, 2005, pp. 123-132
  • 14
    • 33746706203 scopus 로고    scopus 로고
    • C. Jiang, M. Snir, Automatic tuning matrix multiplication performance on graphics hardware, in: Proc. 14th International Conference on Parallel Architecture and Compilation Techniques, 2005, pp. 185-196
    • C. Jiang, M. Snir, Automatic tuning matrix multiplication performance on graphics hardware, in: Proc. 14th International Conference on Parallel Architecture and Compilation Techniques, 2005, pp. 185-196
  • 15
    • 36949033619 scopus 로고    scopus 로고
    • D. Jimenez-Gonzalez, X. Martorell, A. Ramirez, Performance analysis of Cell Broadband Engine for high memory bandwidth applications, in: Proc. IEEE International Symposium on Performance Analysis of Systems and Software, 2007, pp. 210-219
    • D. Jimenez-Gonzalez, X. Martorell, A. Ramirez, Performance analysis of Cell Broadband Engine for high memory bandwidth applications, in: Proc. IEEE International Symposium on Performance Analysis of Systems and Software, 2007, pp. 210-219
  • 17
    • 0034512401 scopus 로고    scopus 로고
    • T. Kisuki, P.M.W. Knijnenburg, M.F.P. O'Boyle, Combined selection of tile sizes and unroll factors using iterative compilation, in: Proc. 2000 International Conference on Parallel Architectures and Compilation Techniques, 2000, pp. 237-248
    • T. Kisuki, P.M.W. Knijnenburg, M.F.P. O'Boyle, Combined selection of tile sizes and unroll factors using iterative compilation, in: Proc. 2000 International Conference on Parallel Architectures and Compilation Techniques, 2000, pp. 237-248
  • 18
    • 0021570530 scopus 로고    scopus 로고
    • D.J. Kuck, et al. The effects of program restructuring, algorithm change, and architecture choice on program performance, in: Proc. 13th International Conference on Parallel Processing, 1984, pp. 129-138
    • D.J. Kuck, et al. The effects of program restructuring, algorithm change, and architecture choice on program performance, in: Proc. 13th International Conference on Parallel Processing, 1984, pp. 129-138
  • 19
    • 34547684388 scopus 로고    scopus 로고
    • P.A. Kulkarni, D.B. Whalley, G.S. Tyson, J.W. Davidson, Evaluation heuristic optimization phase order search algorithms, in: Proc. 2007 International Symposium on Code Generation and Optimization, 2007, pp. 157-169
    • P.A. Kulkarni, D.B. Whalley, G.S. Tyson, J.W. Davidson, Evaluation heuristic optimization phase order search algorithms, in: Proc. 2007 International Symposium on Code Generation and Optimization, 2007, pp. 157-169
  • 20
    • 51449099053 scopus 로고    scopus 로고
    • J. Nickolls, I. Buck, NVIDIA CUDA software and GPU parallel computing architecture, Microprocessor Forum, May 2007
    • J. Nickolls, I. Buck, NVIDIA CUDA software and GPU parallel computing architecture, Microprocessor Forum, May 2007
  • 22
    • 1542396679 scopus 로고    scopus 로고
    • SPIRAL: A generator for platform-adapted libraries of signal processing algorithms
    • (special issue on Automatic Performance Tuning)
    • Püschel M., et al. SPIRAL: A generator for platform-adapted libraries of signal processing algorithms. Journal of High Performance Computing and Applications 18 1 (2004) 21-45 (special issue on Automatic Performance Tuning)
    • (2004) Journal of High Performance Computing and Applications , vol.18 , Issue.1 , pp. 21-45
    • Püschel, M.1
  • 24
    • 51449122095 scopus 로고    scopus 로고
    • S. Ryoo, Program optimization strategies for many-core processors, Ph.D. thesis, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 2008
    • S. Ryoo, Program optimization strategies for many-core processors, Ph.D. thesis, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 2008
  • 25
    • 79959466764 scopus 로고    scopus 로고
    • S. Ryoo, et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, in: Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008, pp. 73-82
    • S. Ryoo, et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, in: Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008, pp. 73-82
  • 26
    • 51449105693 scopus 로고    scopus 로고
    • J.W. Sias, A systematic approach to delivering instruction-level parallelism in EPIC systems, Ph.D. thesis, University of Illinois at Urbana-Champaign, 2005
    • J.W. Sias, A systematic approach to delivering instruction-level parallelism in EPIC systems, Ph.D. thesis, University of Illinois at Urbana-Champaign, 2005
  • 27
    • 33846475612 scopus 로고    scopus 로고
    • D. Tarditi, S. Puri, J. Oglesby, Accelerator: Using data parallelism to program GPUs for general-purpose uses, in: Proc. 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006, pp. 325-335
    • D. Tarditi, S. Puri, J. Oglesby, Accelerator: Using data parallelism to program GPUs for general-purpose uses, in: Proc. 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006, pp. 325-335
  • 28
    • 67650534864 scopus 로고    scopus 로고
    • S. Triantafyllis, M. Vachharajani, N. Vachharajani, D.I. August, Compiler optimization-space exploration, in: Proc. 2003 International Symposium on Code Generation and Optimization, 2003, pp. 204-215
    • S. Triantafyllis, M. Vachharajani, N. Vachharajani, D.I. August, Compiler optimization-space exploration, in: Proc. 2003 International Symposium on Code Generation and Optimization, 2003, pp. 204-215
  • 29
    • 85006879958 scopus 로고    scopus 로고
    • D.N. Truong, F. Bodin, A. Seznec, Improving cache behavior of dynamically allocated data structures, in: Proc. Seventh International Conference on Parallel Architectures and Compilation Techniques, 1998, pp. 322+
    • D.N. Truong, F. Bodin, A. Seznec, Improving cache behavior of dynamically allocated data structures, in: Proc. Seventh International Conference on Parallel Architectures and Compilation Techniques, 1998, pp. 322+
  • 30
    • 0030379246 scopus 로고    scopus 로고
    • M.E. Wolf, D.E. Maydan, D.-K. Chen, Combining loop transformations considering caches and scheduling, in: Proc. 29th Annual ACM/IEEE International Symposium on Microarchitecture, 1996, pp. 274-286
    • M.E. Wolf, D.E. Maydan, D.-K. Chen, Combining loop transformations considering caches and scheduling, in: Proc. 29th Annual ACM/IEEE International Symposium on Microarchitecture, 1996, pp. 274-286
  • 31
    • 51449085846 scopus 로고    scopus 로고
    • M. Wolfe, Iteration space tiling for memory hierarchies, in: Proc. Third SIAM Conference on Parallel Processing for Scientific Computing, 1987, pp. 357-361
    • M. Wolfe, Iteration space tiling for memory hierarchies, in: Proc. Third SIAM Conference on Parallel Processing for Scientific Computing, 1987, pp. 357-361
  • 32
    • 0006424869 scopus 로고    scopus 로고
    • Y. Yamada, J. Gyllenhaal, G. Haab, W.W. Hwu, Data relocation and prefetching for large data sets, in: Proc. 27th Annual ACM/IEEE International Symposium on Microarchitecture, 1994, pp. 118-127
    • Y. Yamada, J. Gyllenhaal, G. Haab, W.W. Hwu, Data relocation and prefetching for large data sets, in: Proc. 27th Annual ACM/IEEE International Symposium on Microarchitecture, 1994, pp. 118-127
  • 33
    • 1442337777 scopus 로고    scopus 로고
    • M. Zhao, B. Childers, M.L. Soffa, Predicting the impact of optimizations for embedded systems, in: Proc. 2003 Conference on Languages, Compilers, and Tools for Embedded Systems, 2003, pp. 1-11
    • M. Zhao, B. Childers, M.L. Soffa, Predicting the impact of optimizations for embedded systems, in: Proc. 2003 Conference on Languages, Compilers, and Tools for Embedded Systems, 2003, pp. 1-11


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.