메뉴 건너뛰기




Volumn , Issue , 2013, Pages 516-523

An efficient compiler framework for cache bypassing on GPUs

Author keywords

Cache Bypassing; Compiler Optimization; GPU

Indexed keywords

CACHE BYPASSING; COMPILER OPTIMIZATIONS; GENERAL PURPOSE GPU; GPU; GRAPHICS PROCESSING UNITS; INSTRUCTION SET ARCHITECTURE; PERFORMANCE METRICS; SCRATCH PAD MEMORY;

EID: 84893396474     PISSN: 10923152     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICCAD.2013.6691165     Document Type: Conference Paper
Times cited : (90)

References (35)
  • 1
    • 84893426008 scopus 로고    scopus 로고
    • GE Intelligent Platforms. http://defense.ge-ip. com/products/hpec/c560.
  • 2
    • 84893408658 scopus 로고    scopus 로고
    • Mosek. http://www.mosek.com/.
  • 3
    • 84893362632 scopus 로고    scopus 로고
    • NVIDIA. Fermi GPUs www.nvidia.com/object/fermi-architecture.html.
  • 4
    • 84893389126 scopus 로고    scopus 로고
    • NVIDIA. Kepler GPUs www.nvidia.com/object/nvidia-kepler.html.
  • 5
    • 84893347454 scopus 로고    scopus 로고
    • NVIDIA. PTX Code http://docs.nvidia.com/cuda/pdf/ptx-isa-3.1.pdf.
  • 7
    • 84893425171 scopus 로고    scopus 로고
    • NVIDIA. Profiler http://docs.nvidia.com/cuda/profiler-users-guide/index. html.
  • 8
    • 84893367579 scopus 로고    scopus 로고
    • NVIDIA GPU Computing SDK. http://developer.nvidia.com/gpu-computing-sdk.
  • 9
    • 84893429325 scopus 로고    scopus 로고
    • NVIDIA Tegra. http://www.nvidia.com/object/tegra.html.
  • 10
    • 84893381500 scopus 로고    scopus 로고
    • QualcommInc. http://www.qualcomm.com/snapdragon.
  • 11
    • 84893398220 scopus 로고    scopus 로고
    • SamSung Inc. www.samsung.com/exynos.
  • 13
    • 84866876242 scopus 로고    scopus 로고
    • An accurate GPU performance model for effective control flow divergence optimization
    • Z. Cui, Y. Liang, K. Rupnow, and D. Chen. An accurate GPU performance model for effective control flow divergence optimization. In IPDPS, 2012.
    • (2012) IPDPS
    • Cui, Z.1    Liang, Y.2    Rupnow, K.3    Chen, D.4
  • 14
    • 84863389330 scopus 로고    scopus 로고
    • SHiP: Signature-based hit predictor for high performance caching
    • C. J. Wu et al. SHiP: signature-based hit predictor for high performance caching. In Micro, 2011.
    • (2011) Micro
    • Wu, C.J.1
  • 15
    • 84873470137 scopus 로고    scopus 로고
    • Parboil: A revised benchmark suite for scientific and commercial throughput computing
    • J. A. Stratton et al. Parboil: A revised benchmark suite for scientific and commercial throughput computing. In IMPACT Technical Report, 2012.
    • (2012) IMPACT Technical Report
    • Stratton, J.A.1
  • 17
    • 57349180412 scopus 로고    scopus 로고
    • A compiler framework for optimization of affine loop nests for GPGPUs
    • M. M. Baskaran et al. A compiler framework for optimization of affine loop nests for GPGPUs. In ICS, 2008.
    • (2008) ICS
    • Baskaran, M.M.1
  • 19
    • 79959466764 scopus 로고    scopus 로고
    • Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
    • S. Ryoo et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP, 2008.
    • (2008) PPoPP
    • Ryoo, S.1
  • 20
    • 84948958301 scopus 로고    scopus 로고
    • Compiler managed micro-cache bypassing for high performance EPIC processors
    • Y. Wu et al. Compiler managed micro-cache bypassing for high performance EPIC processors. In Micro, 2002.
    • (2002) Micro
    • Wu, Y.1
  • 21
    • 4444328501 scopus 로고    scopus 로고
    • An integrated hardware/software approach for run-time scratchpad management
    • P. Francesco et al. An integrated hardware/software approach for run-time scratchpad management. In DAC, 2004.
    • (2004) DAC
    • Francesco, P.1
  • 22
    • 70450231944 scopus 로고    scopus 로고
    • An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
    • S. Hong and H. Kim. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In ISCA, 2009.
    • (2009) ISCA
    • Hong, S.1    Kim, H.2
  • 23
    • 84864068497 scopus 로고    scopus 로고
    • Characterizing and improving the use of demand-fetched caches in GPUs
    • W. Jia, K. A. Shaw, and M. Martonosi. Characterizing and improving the use of demand-fetched caches in GPUs. In ICS, 2012.
    • (2012) ICS
    • Jia, W.1    Shaw, K.A.2    Martonosi, M.3
  • 25
    • 80052655793 scopus 로고    scopus 로고
    • CuMAPz: A tool to analyze memory access patterns in CUDA
    • Y. Kim and A. Shrivastava. CuMAPz: A tool to analyze memory access patterns in CUDA. In DAC, 2011.
    • (2011) DAC
    • Kim, Y.1    Shrivastava, A.2
  • 26
    • 84877739484 scopus 로고    scopus 로고
    • Cache capacity aware thread scheduling for irregular memory access on many-core GPGPUs
    • H. Kuo, T. Yen, B. C. Lai, and J. Jou. Cache capacity aware thread scheduling for irregular memory access on many-core GPGPUs. In ASPDAC, 2013.
    • (2013) ASPDAC
    • Kuo, H.1    Yen, T.2    Lai, B.C.3    Jou, J.4
  • 27
    • 84877777934 scopus 로고    scopus 로고
    • Register and thread structure optimization for GPUs
    • Y. Liang, Z. Cui, K. Rupnow, and D. Chen. Register and thread structure optimization for GPUs. In ASPDAC, 2013.
    • (2013) ASPDAC
    • Liang, Y.1    Cui, Z.2    Rupnow, K.3    Chen, D.4
  • 28
    • 84862069040 scopus 로고    scopus 로고
    • Real-time implementation and performance optimization of 3D sound localization on GPUs
    • Y. Liang et al. Real-time implementation and performance optimization of 3D sound localization on GPUs. In DATE, 2012.
    • (2012) DATE
    • Liang, Y.1
  • 29
    • 63349099764 scopus 로고    scopus 로고
    • Static analysis for fast and accurate design space exploration of caches
    • Y. Liang and T. Mitra. Static analysis for fast and accurate design space exploration of caches. In CODES+ISSS, 2008.
    • (2008) CODES+ISSS
    • Liang, Y.1    Mitra, T.2
  • 30
    • 66749155879 scopus 로고    scopus 로고
    • Cache Bursts: A new approach for eliminating dead blocks and increasing cache efficiency
    • H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache Bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Micro, 2008.
    • (2008) Micro
    • Liu, H.1    Ferdman, M.2    Huh, J.3    Burger, D.4
  • 31
    • 78149251414 scopus 로고    scopus 로고
    • Data layout transformation exploiting memory-level parallelism in structured grid many-core applications
    • I. J. Sung, J. A. Stratton, and W. W. Hwu. Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In PACT, 2010.
    • (2010) PACT
    • Sung, I.J.1    Stratton, J.A.2    Hwu, W.W.3
  • 32
    • 47649086892 scopus 로고    scopus 로고
    • Dynamic allocation for scratch-pad memory using compile-time decisions
    • May
    • S. Udayakumaran, A. Dominguez, and R. Barua. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst., 5(2):472-511, May 2006.
    • (2006) ACM Trans. Embed. Comput. Syst. , vol.5 , Issue.2 , pp. 472-511
    • Udayakumaran, S.1    Dominguez, A.2    Barua, R.3
  • 33
    • 14944380022 scopus 로고    scopus 로고
    • Using the compiler to improve cache replacement decisions
    • Z. Wang, K. S. McKinley, A. L. Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In PACT, 2002.
    • (2002) PACT
    • Wang, Z.1    McKinley, K.S.2    Rosenberg, A.L.3    Weems, C.C.4
  • 34
    • 77954691442 scopus 로고    scopus 로고
    • A GPGPU compiler for memory optimization and parallelism management
    • Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU compiler for memory optimization and parallelism management. In PLDI, 2010.
    • (2010) PLDI
    • Yang, Y.1    Xiang, P.2    Kong, J.3    Zhou, H.4
  • 35
    • 79953126288 scopus 로고    scopus 로고
    • On-the-fly elimination of dynamic irregularities for GPU computing
    • E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for GPU computing. In ASPLOS, 2011.
    • (2011) ASPLOS
    • Zhang, E.Z.1    Jiang, Y.2    Guo, Z.3    Tian, K.4    Shen, X.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.