메뉴 건너뛰기




Volumn 39, Issue 1, 2011, Pages 115-142

A performance study for iterative stencil loops on GPUs with ghost zone optimizations

Author keywords

Ghost zone; GPU; Halo; Iterative stencil loops; Performance model; Tiling

Indexed keywords

GHOST ZONE; GPU; HALO; ITERATIVE STENCIL LOOPS; PERFORMANCE MODEL; TILING;

EID: 79551491518     PISSN: 08857458     EISSN: None     Source Type: Journal    
DOI: 10.1007/s10766-010-0142-5     Document Type: Article
Times cited : (45)

References (37)
  • 1
    • 30344463648 scopus 로고    scopus 로고
    • Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus
    • Allen, G., Dramlitsch, T., Foster, I., Karonis, N.T., Ripeanu, M., Seidel, E., Toonen, B.: Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus. In: SC'01, pp. 52-52 (2001)
    • (2001) SC'01 , pp. 52-52
    • Allen, G.1    Dramlitsch, T.2    Foster, I.3    Karonis, N.T.4    Ripeanu, M.5    Seidel, E.6    Toonen, B.7
  • 3
    • 84976690230 scopus 로고
    • Fortran at ten gigaflops: The connection machine convolution compiler
    • 10.1145/113445.113458
    • M. Bromley S. Heller T. McNerney G.L. Steele Jr 1991 Fortran at ten gigaflops: the connection machine convolution compiler PLDI '91 26 6 145 156 10.1145/113445.113458
    • (1991) PLDI '91 , vol.26 , Issue.6 , pp. 145-156
    • Bromley, M.1    Heller, S.2    McNerney, T.3    Steele Jr., G.L.4
  • 4
    • 0027763868 scopus 로고
    • Mobile and replicated alignment of arrays in data-parallel programs
    • November
    • Chatterjee, S., Gilbert, J.R., Schreiber, R.: Mobile and replicated alignment of arrays in data-parallel programs. In: SC'93, pp. 420-429 November (1993)
    • (1993) SC'93 , pp. 420-429
    • Chatterjee, S.1    Gilbert, J.R.2    Schreiber, R.3
  • 6
    • 79551499286 scopus 로고    scopus 로고
    • NVIDIA Corporation. Geforce gtx 280 specifications. (2008)
    • NVIDIA Corporation. Geforce gtx 280 specifications. (2008)
  • 7
    • 79551493358 scopus 로고    scopus 로고
    • NVIDIA Corporation. NVIDIA CUDA visual profiler. June (2008)
    • NVIDIA Corporation. NVIDIA CUDA visual profiler. June (2008)
  • 10
    • 0034818853 scopus 로고    scopus 로고
    • Eliminating redundancies in sum-of-product array computations
    • Deitz, S.J., Chamberlain, B.L., Snyder, L.: Eliminating redundancies in sum-of-product array computations. In: ICS '01, pp. 65-77 (2001)
    • (2001) ICS '01 , pp. 65-77
    • Deitz, S.J.1    Chamberlain, B.L.2    Snyder, L.3
  • 11
    • 0003338620 scopus 로고    scopus 로고
    • Partial differential equations
    • Evans, L.C.: Partial differential equations. Am. Math. Soc. (1998)
    • (1998) Am. Math. Soc.
    • Evans, L.C.1
  • 12
    • 84964699790 scopus 로고    scopus 로고
    • Redundant computation partition on distributed-memory systems
    • Chen, L., Zhang, Z.-Q., Feng, X.-B.: Redundant computation partition on distributed-memory systems. In: ICA3PP '02, pp. 252 (2002)
    • (2002) ICA3PP '02 , pp. 252
    • Chen, L.1    Zhang, Z.-Q.2    Feng, X.-B.3
  • 13
    • 32844463802 scopus 로고    scopus 로고
    • Cache oblivious stencil computations
    • Frigo, M., Strumpen, V.: Cache oblivious stencil computations. In: ICS'05, pp. 361-366 (2005)
    • (2005) ICS'05 , pp. 361-366
    • Frigo, M.1    Strumpen, V.2
  • 15
    • 34247376580 scopus 로고    scopus 로고
    • Chip multiprocessing and the cell broadband engine
    • Gschwind, M.: Chip multiprocessing and the cell broadband engine. In: CF'06 (2006)
    • (2006) CF'06
    • Gschwind, M.1
  • 16
    • 38249000489 scopus 로고
    • Communication-free hyperplane partitioning of nested loops
    • 0783.68027 10.1006/jpdc.1993.1094
    • C.-H. Huang P. Sadayappan 1993 Communication-free hyperplane partitioning of nested loops J. Parallel Distrib. Comput. 19 2 90 102 0783.68027 10.1006/jpdc.1993.1094
    • (1993) J. Parallel Distrib. Comput. , vol.19 , Issue.2 , pp. 90-102
    • Huang, C.-H.1    Sadayappan, P.2
  • 20
    • 84958661690 scopus 로고    scopus 로고
    • Impact of modern memory subsystems on cache optimizations for stencil computations
    • Kamil, S., Husbands, P., Oliker, L., Shalf, J., Yelick, K.: Impact of modern memory subsystems on cache optimizations for stencil computations. In: MSP'05, pp. 36-43 (2005)
    • (2005) MSP'05 , pp. 36-43
    • Kamil, S.1    Husbands, P.2    Oliker, L.3    Shalf, J.4    Yelick, K.5
  • 21
    • 0010828753 scopus 로고    scopus 로고
    • Cache-aware multigrid methods for solving poisson's equation in two dimensions
    • 0957.68002 10.1007/s006070070032 1783469
    • M. Kowarschik C. Wei W. Karl U. Rüde 2000 Cache-aware multigrid methods for solving poisson's equation in two dimensions Computing 64 4 381 399 0957.68002 10.1007/s006070070032 1783469
    • (2000) Computing , vol.64 , Issue.4 , pp. 381-399
    • Kowarschik, M.1    Wei, C.2    Karl, W.3    Rüde, U.4
  • 23
    • 0029490313 scopus 로고
    • Techniques for compiling programs on distributed memory multicomputers
    • 10.1016/0167-8191(95)00052-6 1369230
    • P. Lee 1995 Techniques for compiling programs on distributed memory multicomputers Parallel Comput. 21 1895 1923 10.1016/0167-8191(95)00052-6 1369230
    • (1995) Parallel Comput. , vol.21 , pp. 1895-1923
    • Lee, P.1
  • 24
    • 24644456455 scopus 로고    scopus 로고
    • Automatic tiling of iterative stencil loops
    • DOI 10.1145/1034774.1034777
    • Z. Li Y. Song 2004 Automatic tiling of iterative stencil loops ACM Trans. Program. Lang. Syst. 26 6 975 1028 10.1145/1034774.1034777 (Pubitemid 41270296)
    • (2004) ACM Transactions on Programming Languages and Systems , vol.26 , Issue.6 , pp. 975-1028
    • Li, Z.1    Song, Y.2
  • 26
    • 70449723385 scopus 로고    scopus 로고
    • Performance modeling and automatic ghost zone optimization for iterative stencil loops on gpus
    • Meng, J., Skadron, K.: Performance modeling and automatic ghost zone optimization for iterative stencil loops on gpus. In: ICS '09, pp. 256-265 (2009)
    • (2009) ICS '09 , pp. 256-265
    • Meng, J.1    Skadron, K.2
  • 27
    • 78651550268 scopus 로고    scopus 로고
    • Scalable parallel programming with CUDA
    • 10.1145/1365490.1365500
    • J. Nickolls I. Buck M. Garland K. Skadron 2008 Scalable parallel programming with CUDA Queue 6 2 40 53 10.1145/1365490.1365500
    • (2008) Queue , vol.6 , Issue.2 , pp. 40-53
    • Nickolls, J.1    Buck, I.2    Garland, M.3    Skadron, K.4
  • 28
    • 34248593308 scopus 로고    scopus 로고
    • Three-dimensional multi-relaxation time (mrt) lattice-Boltzmann models for multiphase flow
    • 1116.76066 10.1016/j.jcp.2006.10.023 2330283
    • K.N. Premnath J. Abraham 2007 Three-dimensional multi-relaxation time (mrt) lattice-Boltzmann models for multiphase flow J. Comput. Phys. 224 2 539 559 1116.76066 10.1016/j.jcp.2006.10.023 2330283
    • (2007) J. Comput. Phys. , vol.224 , Issue.2 , pp. 539-559
    • Premnath, K.N.1    Abraham, J.2
  • 31
    • 70350786558 scopus 로고    scopus 로고
    • Positivity, posynomials and tile size selection
    • Renganarayana, L., Rajopadhye, S.: Positivity, posynomials and tile size selection. In: SC '08, pp. 1-12 (2008)
    • (2008) SC '08 , pp. 1-12
    • Renganarayana, L.1    Rajopadhye, S.2
  • 32
    • 70449722959 scopus 로고    scopus 로고
    • Cactus application: Performance predictions in a grid environment
    • Ripeanu, M., Iamnitchi, A., Foster, I.: Cactus application: Performance predictions in a grid environment. In: EuroPar'01. (2001)
    • (2001) EuroPar'01
    • Ripeanu, M.1    Iamnitchi, A.2    Foster, I.3
  • 33
    • 33845574641 scopus 로고    scopus 로고
    • Tiling optimizations for 3D scientific computations
    • Rivera G., Tseng, C.-W.: Tiling optimizations for 3D scientific computations. In: SC '00, p. 32 (2000)
    • (2000) SC '00 , pp. 32
    • Rivera, G.1    Tseng, C.-W.2
  • 35
    • 84863436006 scopus 로고    scopus 로고
    • Time skewing for parallel computers
    • Wonnacott, D.: Time skewing for parallel computers. In: WLCPC'99, pp. 477-480 (1999)
    • (1999) WLCPC'99 , pp. 477-480
    • Wonnacott, D.1
  • 36
    • 1542392248 scopus 로고    scopus 로고
    • Achieving scalable locality with time skewing
    • 1019.68024 10.1023/A:1015460304860
    • D. Wonnacott 2002 Achieving scalable locality with time skewing Int. J. Parallel Program. 30 3 181 221 1019.68024 10.1023/A:1015460304860
    • (2002) Int. J. Parallel Program. , vol.30 , Issue.3 , pp. 181-221
    • Wonnacott, D.1
  • 37
    • 79951471318 scopus 로고    scopus 로고
    • Parallel image processing based on CUDA
    • 10.1109/CSSE.2008.1448
    • Z. Yang Y. Zhu Y. Pu 2008 Parallel image processing based on CUDA Int. Conf. Comput. Sci. Software Eng. 3 198 201 10.1109/CSSE.2008.1448
    • (2008) Int. Conf. Comput. Sci. Software Eng. , vol.3 , pp. 198-201
    • Yang, Z.1    Zhu, Y.2    Pu, Y.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.