메뉴 건너뛰기




Volumn 27, Issue 1, 2012, Pages 57-74

A hybrid circular queue method for iterative stencil computations on GPUs

Author keywords

Circular queue; GPU; Occupancy; Register; Stencil computation

Indexed keywords

CIRCULAR QUEUE; GPU; OCCUPANCY; REGISTER; STENCIL COMPUTATIONS;

EID: 84861635761     PISSN: 10009000     EISSN: None     Source Type: Journal    
DOI: 10.1007/s11390-012-1206-3     Document Type: Article
Times cited : (9)

References (37)
  • 1
    • 1542392248 scopus 로고    scopus 로고
    • Achieving scalable locality with time skewing
    • Wonnacott D. Achieving scalable locality with time skewing. Int. J. Parallel Program, 2002, 30(3): 181-221.
    • (2002) Int. J. Parallel Program , vol.30 , Issue.3 , pp. 181-221
    • Wonnacott, D.1
  • 2
    • 34547503691 scopus 로고    scopus 로고
    • Time skewing: A value-based ap-proach to optimizing for memory locality
    • Department of Computer Science, Rugers Uni-versity
    • Mccalpin J, Wonnacott D. Time skewing: A value-based ap-proach to optimizing for memory locality. Technical Report DCS-TR-379, Department of Computer Science, Rugers Uni-versity. 1999.
    • (1999) Technical Report DCS-TR-379
    • McCalpin, J.1    Wonnacott, D.2
  • 3
    • 77954709215 scopus 로고    scopus 로고
    • Cache oblivious parallelograms in iterative stencil computations
    • Tsukuba, Japan, Jun. 1-4
    • Strzodka R, Shaheen M, Pajak D et al. Cache oblivious parallelograms in iterative stencil computations. In Proc. The 24th ACM Int. Conf. Supercomputing, Tsukuba, Japan, Jun. 1-4, 2010, pp.49-59.
    • (2010) Proc. The 24th ACM Int. Conf. Supercomputing , pp. 49-59
    • Strzodka, R.1    Shaheen, M.2    Pajak, D.3
  • 5
  • 6
    • 70350771127 scopus 로고    scopus 로고
    • Stencil computation op-timization and auto-tuning on state-of-the-art multicore ar-chitectures
    • Austin, USA, Nov.15-21, Article 4.
    • Datta K, Murphy M, Volkov V et al. Stencil computation op-timization and auto-tuning on state-of-the-art multicore ar-chitectures. In Proc. ACM/IEEE Conference on Supercom-puting, Austin, USA, Nov.15-21, 2008, Article 4.
    • (2008) Proc. ACM/IEEE Conference on Supercom-putting
    • Datta, K.1    Murphy, M.2    Volkov, V.3
  • 8
    • 70449723385 scopus 로고    scopus 로고
    • Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
    • Yorktown Heights, USA, Jun. 8-12
    • Meng J, Skadron K. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. In Proc. The 23rd International Conference on Supercomput-ing, Yorktown Heights, USA, Jun. 8-12, 2009, pp.256-265.
    • (2009) Proc. The 23rd International Conference on Supercomput-ing , pp. 256-265
    • Meng, J.1    Skadron, K.2
  • 9
    • 79953817719 scopus 로고    scopus 로고
    • NVIDIA. NVIDIA CUDA programming guide 3.0, http://de-veloper.download. nvidia.com/compute/cuda/3 0/toolkit/do-cs/NVIDIA CUDA ProgrammingGuide-pdf, 2010.
    • (2010) NVIDIA CUDA Programming Guide 3.0
  • 11
    • 70450231944 scopus 로고    scopus 로고
    • An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
    • Austin, USA, Jun. 20-24
    • Hong S, Kim H. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In Proc. The 36th Annual Int. Symp. Computer Architecture, Austin, USA, Jun. 20-24, 2009, pp.152-163.
    • (2009) Proc. The 36th Annual Int. Symp. Computer Architecture , pp. 152-163
    • Hong, S.1    Kim, H.2
  • 13
    • 84861583048 scopus 로고    scopus 로고
    • Sept.
    • van der Laan W J. Decuda. http://wiki.github.com/laanwj/decuda/, Sept., 2010.
    • (2010) Decuda
    • Van Der Laan, W.J.1
  • 15
  • 17
    • 77954713684 scopus 로고    scopus 로고
    • An empirically tuned 2D and 3D FFT library on CUDA GPU
    • Tsukuba, Japan, Jun. 1-4
    • Gu L, Li X, Siegel J. An empirically tuned 2D and 3D FFT library on CUDA GPU. In Proc. The 24th ACM Int. Conf. Supercomputing, Tsukuba, Japan, Jun. 1-4, 2010, pp.305-314.
    • (2010) Proc. The 24th ACM Int. Conf. Supercomputing , pp. 305-314
    • Gu, L.1    Li, X.2    Siegel, J.3
  • 18
    • 77949629485 scopus 로고    scopus 로고
    • Optimal data distribu-tion for versatile finite impulse response filtering on next-generation graphics hardware using CUDA
    • Shenzhen, China, Dec. 9-11
    • Goorts P, Rogmans S, Bekaert P. Optimal data distribu-tion for versatile finite impulse response filtering on next-generation graphics hardware using CUDA. In Proc. The 15th International Conference on Parallel and Distributed Sys-tems, Shenzhen, China, Dec. 9-11, 2009, pp.300-307.
    • (2009) Proc. The 15th International Conference on Parallel and Distributed Sys-tems , pp. 300-307
    • Goorts, P.1    Rogmans, S.2    Bekaert, P.3
  • 20
  • 23
    • 78649538491 scopus 로고    scopus 로고
    • Toward harnessing DOACROSS parallelism for multi-GPGPUs
    • San Diego, USA, Sep. 13-16
    • Di P, Wan Q, Zhang X et al. Toward harnessing DOACROSS parallelism for multi-GPGPUs. In Proc. The 39th Int. Conf. Parallel Processing, San Diego, USA, Sep. 13-16, 2010, pp.40-50.
    • (2010) Proc. The 39th Int. Conf. Parallel Processing , pp. 40-50
    • Di, P.1    Wan, Q.2    Zhang, X.3
  • 24
    • 80053238973 scopus 로고    scopus 로고
    • Patus: A code genera-tion and autotuning framework for parallel iterative stencil computations on modern microarchitectures
    • Anchorage, USA, May 16-20
    • Christen M, Schenk O, Burkhart H. Patus: A code genera-tion and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In Proc. IEEE International Parallel & Distributed Processing Symposium, Anchorage, USA, May 16-20, 2011, pp.676-687.
    • (2011) Proc. IEEE International Parallel & Distributed Processing Symposium , pp. 676-687
    • Christen, M.1    Schenk, O.2    Burkhart, H.3
  • 26
    • 34547401051 scopus 로고    scopus 로고
    • Profitable loop fusion and tiling using model-driven empirical search
    • DOI 10.1145/1183401.1183437, Proceedings of the 20th Annual International Conference on Supercomputing, ICS 2006
    • Qasem A, Kennedy K. Profitable loop fusion and tiling us-ing model-driven empirical search. In Proc. The 20th Annual International Conference on Supercomputing, Cairns, Aus-tralia, Jun. 28-Jul. 1, 2006, pp.249-258. (Pubitemid 47168511)
    • (2006) Proceedings of the International Conference on Supercomputing , pp. 249-258
    • Qasem, A.1    Kennedy, K.2
  • 27
    • 0442295621 scopus 로고    scopus 로고
    • The effect of cache models on iterative compilation for combined tiling and unrolling: Research articles
    • Knijnenburg P M W, Kisuki T, Gallivan K et al. The effect of cache models on iterative compilation for combined tiling and unrolling: Research articles. Concurrency and Computation: Practice & Experience, 2004, 16(2-3): 247-270.
    • (2004) Concurrency and Computation: Practice & Experience , vol.16 , Issue.2-3 , pp. 247-270
    • Knijnenburg, P.M.W.1    Kisuki, T.2    Gallivan, K.3
  • 29
    • 33646828918 scopus 로고    scopus 로고
    • Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy
    • 1402081, Proceedings of the 2005 International Symposium onCode Generation and Optimization, CGO 2005
    • Chen C, Chame J, Hall M. Combining models and guided empirical search to optimize for multiple levels of the mem-ory hierarchy. In Proc. Int. Symp. Code Generation and Optimization, San Jose, USA, Mar. 20-23, 2005, pp.111-122. (Pubitemid 43773797)
    • (2005) Proceedings of the 2005 International Symposium on Code Generation and Optimization, CGO 2005 , vol.2005 , pp. 111-122
    • Chen, C.1    Chame, J.2    Hall, M.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.