메뉴 건너뛰기




Volumn , Issue , 2011, Pages 676-687

PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures

Author keywords

autotuning; code generation; high performance computing; stencil computations

Indexed keywords

AUTOTUNING; CODE GENERATION; COMPLEX HARDWARE; GRAPHICS PROCESSING UNIT; HARDWARE ARCHITECTURE; HIGH PERFORMANCE COMPUTING; IMAGE PROCESSING APPLICATIONS; MANY CORE; MICRO ARCHITECTURES; MULTI CORE; MULTIGRID METHODS; OPTIMIZATION STRATEGY; PARALLEL IMPLEMENTATIONS; PARALLELIZATIONS; PDE SOLVERS; SCIENTIFIC COMPUTING APPLICATIONS; STENCIL COMPUTATIONS; TIME-TO-SOLUTION;

EID: 80053238973     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/IPDPS.2011.70     Document Type: Conference Paper
Times cited : (273)

References (31)
  • 4
    • 24344485098 scopus 로고    scopus 로고
    • OSKI: A library of automatically tuned sparse matrix kernels
    • R. Vuduc, J. W. Demmel, and K. A. Yelick, "OSKI: A library of automatically tuned sparse matrix kernels," Journal of Physics: Conference Series, vol. 16, no. 1, p. 521, 2005.
    • (2005) Journal of Physics: Conference Series , vol.16 , Issue.1 , pp. 521
    • Vuduc, R.1    Demmel, J.W.2    Yelick, K.A.3
  • 5
    • 20744449792 scopus 로고    scopus 로고
    • The Design and Implementation of FFTW3
    • special issue on "Program Generation, Optimization, and Platform Adaptation"
    • M. Frigo and S. G. Johnson, "The Design and Implementation of FFTW3," Proceedings of the IEEE, vol. 93, no. 2, pp. 216-231, 2005, special issue on "Program Generation, Optimization, and Platform Adaptation".
    • (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 216-231
    • Frigo, M.1    Johnson, S.G.2
  • 8
    • 77954412565 scopus 로고    scopus 로고
    • Loop Transformation Recipes for Code Generation and Auto-Tuning
    • Languages and Compilers for Parallel Computing, ser. G. Gao, L. Pollock, J. Cavazos, and X. Li, Eds., Springer Berlin / Heidelberg
    • M. Hall, J. Chame, C. Chen, J. Shin, G. Rudy, and M. Khan, "Loop Transformation Recipes for Code Generation and Auto-Tuning," in Languages and Compilers for Parallel Computing, ser. Lecture Notes in Computer Science, G. Gao, L. Pollock, J. Cavazos, and X. Li, Eds., vol. 5898. Springer Berlin / Heidelberg, 2010, pp. 50-64.
    • (2010) Lecture Notes in Computer Science , vol.5898 , pp. 50-64
    • Hall, M.1    Chame, J.2    Chen, C.3    Shin, J.4    Rudy, G.5    Khan, M.6
  • 9
    • 24644456455 scopus 로고    scopus 로고
    • Automatic tiling of iterative stencil loops
    • DOI 10.1145/1034774.1034777
    • Z. Li and Y. Song, "Automatic tiling of iterative stencil loops,"ACM Trans. Program. Lang. Syst., vol. 26, no. 6, pp. 975-1028, 2004. (Pubitemid 41270296)
    • (2004) ACM Transactions on Programming Languages and Systems , vol.26 , Issue.6 , pp. 975-1028
    • Li, Z.1    Song, Y.2
  • 10
    • 35448985754 scopus 로고    scopus 로고
    • Parameterized Tiled Loops for Free
    • June [Online]. Available
    • L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout, "Parameterized Tiled Loops for Free," SIGPLAN Not., vol. 42, pp. 405-414, June 2007. [Online]. Available: http://doi.acm.org/10.1145/1273442. 1250780
    • (2007) SIGPLAN Not. , vol.42 , pp. 405-414
    • Renganarayanan, L.1    Kim, D.2    Rajopadhye, S.3    Strout, M.M.4
  • 15
    • 79551491518 scopus 로고    scopus 로고
    • A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations
    • 10.1007/s10766-010-0142-5. [Online]. Available
    • J. Meng and K. Skadron, "A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations,"International Journal of Parallel Programming, vol. 39, pp. 115-142, 2011, 10.1007/s10766-010-0142-5. [Online]. Available: http://dx.doi.org/10.1007/s10766-010-0142-5
    • (2011) International Journal of Parallel Programming , vol.39 , pp. 115-142
    • Meng, J.1    Skadron, K.2
  • 16
    • 70449657442 scopus 로고    scopus 로고
    • Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
    • G. Wellein, G. Hager, T. Zeiser, M. Wittmann, and H. Fehske, "Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization," in COMPSAC (1), 2009, pp. 579-586.
    • (2009) COMPSAC (1) , pp. 579-586
    • Wellein, G.1    Hager, G.2    Zeiser, T.3    Wittmann, M.4    Fehske, H.5
  • 21
    • 84888360034 scopus 로고
    • Analysis of Tissue and Arterial Blood Temperatures in the Resting Human Forearm
    • H. H. Pennes, "Analysis of Tissue and Arterial Blood Temperatures in the Resting Human Forearm," J Appl Physiol, vol. 1, no. 2, pp. 93-122, 1948.
    • (1948) J Appl Physiol , vol.1 , Issue.2 , pp. 93-122
    • Pennes, H.H.1
  • 23
    • 70449997300 scopus 로고    scopus 로고
    • Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
    • to appear
    • K. Datta, S. Kamil, S. Williams, L. Oliker, J. Shalf, and K. Yelick, "Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors," SIAM Review, 2008, to appear.
    • (2008) SIAM Review
    • Datta, K.1    Kamil, S.2    Williams, S.3    Oliker, L.4    Shalf, J.5    Yelick, K.6
  • 24
    • 70349100958 scopus 로고    scopus 로고
    • Khronos OpenCLWorking Group 8 December
    • Khronos OpenCLWorking Group, The OpenCL Specification, 8 December 2008.
    • (2008) The OpenCL Specification
  • 29
    • 0000238336 scopus 로고
    • A simplex method for function minimization
    • J. A. Nelder and R. Mead, "A simplex method for function minimization," Computer Journal, vol. 7, p. 308313, 1965.
    • (1965) Computer Journal , vol.7 , pp. 308313
    • Nelder, J.A.1    Mead, R.2
  • 31
    • 77951200277 scopus 로고    scopus 로고
    • Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor
    • P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes, "Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor," IEEE Micro, vol. 30, pp. 16-29, 2010.
    • (2010) IEEE Micro , vol.30 , pp. 16-29
    • Conway, P.1    Kalyanasundharam, N.2    Donley, G.3    Lepak, K.4    Hughes, B.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.