메뉴 건너뛰기




Volumn 48, Issue 8, 2013, Pages 57-67

Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU

Author keywords

Computational complexity; Data transformation; GPGPU; Memory coalescing; Runtime optimizations; Thread data remapping

Indexed keywords

DATA TRANSFORMATION; GPGPU; MEMORY COALESCING; RUNTIME OPTIMIZATION; THREAD-DATA REMAPPING;

EID: 84885201786     PISSN: 15232867     EISSN: None     Source Type: Journal    
DOI: 10.1145/2517327.2442523     Document Type: Conference Paper
Times cited : (39)

References (26)
  • 5
    • 83155184570 scopus 로고    scopus 로고
    • Dymaxion: Optimizing memory access patterns for heterogeneous systems
    • S. Che, J. W. Sheaffer, and K. Skadron. Dymaxion: Optimizing memory access patterns for heterogeneous systems. In SC, 2011.
    • (2011) SC
    • Che, S.1    Sheaffer, J.W.2    Skadron, K.3
  • 6
    • 33746070806 scopus 로고    scopus 로고
    • Cache-conscious coallocation of hot data streams
    • T. M. Chilimbi and R. Shaham. Cache-conscious coallocation of hot data streams. In PLDI, 2006.
    • (2006) PLDI
    • Chilimbi, T.M.1    Shaham, R.2
  • 8
    • 1642502420 scopus 로고    scopus 로고
    • Improving effective bandwidth through compiler enhancement of global cache reuse
    • C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing, 64(1): 108-134, 2004.
    • (2004) Journal of Parallel and Distributed Computing , vol.64 , Issue.1 , pp. 108-134
    • Ding, C.1    Kennedy, K.2
  • 9
    • 47349104432 scopus 로고    scopus 로고
    • Dynamic warp formation and scheduling for efficient gpu control flow
    • Washington, DC, USA, IEEE Computer Society
    • W. Fung, I. Sham, G. Yuan, and T. Aamodt. Dynamic warp formation and scheduling for efficient gpu control flow. In MICRO'07, pages 407-420, Washington, DC, USA, 2007. IEEE Computer Society.
    • (2007) MICRO'07 , pp. 407-420
    • Fung, W.1    Sham, I.2    Yuan, G.3    Aamodt, T.4
  • 13
    • 79959575872 scopus 로고    scopus 로고
    • An execution strategy and optimized runtime support for parallelizing irregular reductions on modern gpus
    • X. Huo, V. Ravi, W. Ma, and G. Agrawal. An execution strategy and optimized runtime support for parallelizing irregular reductions on modern gpus. In ICS, 2011.
    • (2011) ICS
    • Huo, X.1    Ravi, V.2    Ma, W.3    Agrawal, G.4
  • 14
    • 81455141868 scopus 로고    scopus 로고
    • Enhancing locality for recursive traversals of recursive structures
    • Y. Jo and M. KulKarni. Enhancing locality for recursive traversals of recursive structures. In OOPSLA, 2011.
    • (2011) OOPSLA
    • Jo, Y.1    Kulkarni, M.2
  • 15
    • 0035029828 scopus 로고    scopus 로고
    • A compiler technique for improving whole-program locality
    • M. Kandemir. A compiler technique for improving whole-program locality. In POPL, 2001.
    • (2001) POPL
    • Kandemir, M.1
  • 16
    • 84863371431 scopus 로고    scopus 로고
    • Opencl as a unified programming model for heterogeneous cpu/gpu clusters
    • J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. Opencl as a unified programming model for heterogeneous cpu/gpu clusters. In PPoPP, 2012.
    • (2012) PPoPP
    • Kim, J.1    Seo, S.2    Lee, J.3    Nah, J.4    Jo, G.5    Lee, J.6
  • 18
    • 67650081010 scopus 로고    scopus 로고
    • Openmp to gpgpu: A compiler framework for automatic translation and optimization
    • S. Lee, S. Min, and R. Eigenmann. Openmp to gpgpu: A compiler framework for automatic translation and optimization. In PPoPP, 2009.
    • (2009) PPoPP
    • Lee, S.1    Min, S.2    Eigenmann, R.3
  • 19
    • 77954976292 scopus 로고    scopus 로고
    • Dynamic warp subdivision for integrated branch and memory divergence tolerance
    • J. Meng, D. Tarjan, and K. Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance. In ISCA, 2010.
    • (2010) ISCA
    • Meng, J.1    Tarjan, D.2    Skadron, K.3
  • 20
    • 79959466764 scopus 로고    scopus 로고
    • Optimization principles and application performance evaluation of a multithreaded gpu using cuda
    • S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, andW.W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP, pages 73-82, 2008.
    • (2008) PPoPP , pp. 73-82
    • Ryoo, S.1    Rodrigues, C.I.2    Baghsorkhi, S.S.3    Stone, S.S.4    Kirk, D.B.5    Hwu, W.W.6
  • 21
    • 0038039924 scopus 로고    scopus 로고
    • Compile-time composition of run-time data and iteration reorderings
    • San Diego, CA, June
    • M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In PLDI, San Diego, CA, June 2003.
    • (2003) PLDI
    • Strout, M.M.1    Carter, L.2    Ferrante, J.3
  • 22
    • 74049151553 scopus 로고    scopus 로고
    • Increasing memory miss tolerance for simd cores
    • D. Tarjan, J. Meng, and K. Skadron. Increasing memory miss tolerance for simd cores. In SC, 2009.
    • (2009) SC
    • Tarjan, D.1    Meng, J.2    Skadron, K.3
  • 23
    • 84856544146 scopus 로고    scopus 로고
    • Enhancing data locality for dynamic simulations through asynchronous data transformations and adaptive control
    • B. Wu, E. Zhang, and X. Shen. Enhancing data locality for dynamic simulations through asynchronous data transformations and adaptive control. In PACT, 2011.
    • (2011) PACT
    • Wu, B.1    Zhang, E.2    Shen, X.3
  • 25
    • 77954691442 scopus 로고    scopus 로고
    • A gpgpu compiler for memory optimization and parallelism management
    • Y. Yang, P. Xiang, J. Kong, and H. Zhou. A gpgpu compiler for memory optimization and parallelism management. In PLDI, 2010.
    • (2010) PLDI
    • Yang, Y.1    Xiang, P.2    Kong, J.3    Zhou, H.4
  • 26
    • 79953126288 scopus 로고    scopus 로고
    • On-the-fly elimination of dynamic irregularities for gpu computing
    • E. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. In ASPLOS, 2011.
    • (2011) ASPLOS
    • Zhang, E.1    Jiang, Y.2    Guo, Z.3    Tian, K.4    Shen, X.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.