메뉴 건너뛰기




Volumn , Issue , 2013, Pages

A large-scale cross-architecture evaluation of thread-coarsening

Author keywords

GPU; OpenCL; Regression trees; Thread coarsening

Indexed keywords

DIGITAL STORAGE; HARDWARE; PROGRAM COMPILERS;

EID: 84899692998     PISSN: 21674329     EISSN: 21674337     Source Type: Conference Proceeding    
DOI: 10.1145/2503210.2503268     Document Type: Conference Paper
Times cited : (65)

References (28)
  • 1
    • 84899700709 scopus 로고    scopus 로고
    • AMD Inc., AMD APP Profiler
    • AMD Inc., AMD APP Profiler http://developer. amd. com/tools/ heterogeneous-computing/amd-app-profiler/.
  • 2
    • 84899683908 scopus 로고    scopus 로고
    • The llvm compiler infrastructure
    • The llvm compiler infrastructure http://llvm. org.
  • 3
    • 84899683649 scopus 로고    scopus 로고
    • NVIDIA Corporation, NVIDIA Profiler
    • NVIDIA Corporation, NVIDIA Profiler http: //docs. nvidia. com/cuda/profiler-users-guide/.
  • 4
    • 84899696089 scopus 로고    scopus 로고
    • Nvidia's Next Generation CUDA Compute Architecture: Fermi
    • Nvidia's Next Generation CUDA Compute Architecture: Fermi http://www. nvidia. com/content/PDF/fermi-white papers/NVIDIA-Fermi-Compute-Architecture Whitepaper. pdf, 2009.
    • (2009)
  • 8
    • 84856530584 scopus 로고    scopus 로고
    • Divergence analysis and optimizations
    • oct.
    • B. Coutinho, D. Sampaio, F. Pereira, and W. Meira. Divergence analysis and optimizations. PACT, pages 320-329, oct. 2011.
    • (2011) PACT , pp. 320-329
    • Coutinho, B.1    Sampaio, D.2    Pereira, F.3    Meira, W.4
  • 9
    • 78149233155 scopus 로고    scopus 로고
    • Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
    • New York, NY, USA,. ACM
    • G. F. Diamos, A. R. Kerr, S. Yalamanchili, and N. Clark. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. PACT'10, pages 353-364, New York, NY, USA, 2010. ACM.
    • (2010) PACT'10 , pp. 353-364
    • Diamos, G.F.1    Kerr, A.R.2    Yalamanchili, S.3    Clark, N.4
  • 10
    • 84863463369 scopus 로고    scopus 로고
    • Compiling a high-level language for gpus: (Via language support for architectures and compilers)
    • C. Dubach, P. Cheng, R. M. Rabbah, D. F. Bacon, and S. J. Fink. Compiling a high-level language for gpus: (via language support for architectures and compilers). In PLDI, pages 1-12, 2012.
    • (2012) PLDI , pp. 1-12
    • Dubach, C.1    Cheng, P.2    Rabbah, R.M.3    Bacon, D.F.4    Fink, S.J.5
  • 11
    • 84876937393 scopus 로고    scopus 로고
    • Portable mapping of data parallel programs to opencl for heterogeneous systems
    • D. Grewe, Z. Wang, and M. F. O'Boyle. Portable mapping of data parallel programs to opencl for heterogeneous systems. CGO'13. ACM, 2013.
    • (2013) CGO'13. ACM
    • Grewe, D.1    Wang, Z.2    O'boyle, M.F.3
  • 12
    • 79953071805 scopus 로고    scopus 로고
    • Sponge: Portable stream programming on graphics engines
    • New York, NY, USA, ACM
    • A. H. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke. Sponge: portable stream programming on graphics engines. ASPLOS'11, pages 381-392, New York, NY, USA, 2011. ACM.
    • (2011) ASPLOS'11 , pp. 381-392
    • Hormati, A.H.1    Samadi, M.2    Woh, M.3    Mudge, T.4    Mahlke, S.5
  • 14
    • 79957502935 scopus 로고    scopus 로고
    • Whole-function vectorization
    • april
    • R. Karrenberg and S. Hack. Whole-function vectorization. CGO'11, pages 141-150, april 2011.
    • (2011) CGO'11 , pp. 141-150
    • Karrenberg, R.1    Hack, S.2
  • 15
    • 84859143447 scopus 로고    scopus 로고
    • Improving performance of opencl on cpus
    • R. Karrenberg and S. Hack. Improving performance of opencl on cpus. CC, pages 1-20, 2012.
    • (2012) CC , pp. 1-20
    • Karrenberg, R.1    Hack, S.2
  • 16
    • 77952256778 scopus 로고    scopus 로고
    • Modeling gpu-cpu workloads and systems
    • New York, NY, USA,. ACM
    • A. Kerr, G. Diamos, and S. Yalamanchili. Modeling gpu-cpu workloads and systems. GPGPU'10, pages 31-42, New York, NY, USA, 2010. ACM.
    • (2010) GPGPU'10 , pp. 31-42
    • Kerr, A.1    Diamos, G.2    Yalamanchili, S.3
  • 17
    • 70450103746 scopus 로고    scopus 로고
    • A cross-input adaptive framework for gpu program optimizations
    • may
    • Y. Liu, E. Zhang, and X. Shen. A cross-input adaptive framework for gpu program optimizations. IPDPS'09, pages 1-10, may 2009.
    • (2009) IPDPS'09 , pp. 1-10
    • Liu, Y.1    Zhang, E.2    Shen, X.3
  • 21
    • 79959466764 scopus 로고    scopus 로고
    • Optimization principles and application performance evaluation of a multithreaded gpu using cuda
    • New York, NY, USA,. ACM
    • S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. W. Hwu. Optimization principles and application performance evaluation of a multithreaded gpu using cuda. PPoPP'08, pages 73-82, New York, NY, USA, 2008. ACM.
    • (2008) PPoPP'08 , pp. 73-82
    • Ryoo, S.1    Rodrigues, C.I.2    Baghsorkhi, S.S.3    Stone, S.S.4    Kirk, D.B.5    Hwu, W.-M.W.6
  • 22
    • 84863347222 scopus 로고    scopus 로고
    • A performance analysis framework for identifying potential benefits in gpgpu applications
    • New York, NY, USA, ACM
    • J. Sim, A. Dasgupta, H. Kim, and R. Vuduc. A performance analysis framework for identifying potential benefits in gpgpu applications. PPoPP'12, pages 11-22, New York, NY, USA, 2012. ACM.
    • (2012) PPoPP'12 , pp. 11-22
    • Sim, J.1    Dasgupta, A.2    Kim, H.3    Vuduc, R.4
  • 23
    • 84859153100 scopus 로고    scopus 로고
    • Automatic restructuring of gpu kernels for exploiting inter-thread data locality
    • S. Unkule, C. Shaltz, and A. Qasem. Automatic restructuring of gpu kernels for exploiting inter-thread data locality. CC, pages 21-40, 2012.
    • (2012) CC , pp. 21-40
    • Unkule, S.1    Shaltz, C.2    Qasem, A.3
  • 24
    • 70350771131 scopus 로고    scopus 로고
    • Benchmarking gpus to tune dense linear algebra
    • Piscataway, NJ, USA,. IEEE Press
    • V. Volkov and J. W. Demmel. Benchmarking gpus to tune dense linear algebra. SC'08, pages 31:1-31:11, Piscataway, NJ, USA, 2008. IEEE Press.
    • (2008) SC'08 , pp. 311-3111
    • Volkov, V.1    Demmel, J.W.2
  • 25
    • 85050273691 scopus 로고
    • Program slicing
    • Piscataway, NJ, USA,. IEEE Press
    • M. Weiser. Program slicing. ICSE'81, pages 439-449, Piscataway, NJ, USA, 1981. IEEE Press.
    • (1981) ICSE'81 , pp. 439-449
    • Weiser, M.1
  • 27
    • 84863663143 scopus 로고    scopus 로고
    • A unified optimizing compiler framework for different gpgpu architectures
    • Y. Yang, P. Xiang, J. Kong, M. Mantor, and H. Zhou. A unified optimizing compiler framework for different gpgpu architectures. TACO, 9(2):9, 2012.
    • (2012) TACO , vol.9 , Issue.2 , pp. 9
    • Yang, Y.1    Xiang, P.2    Kong, J.3    Mantor, M.4    Zhou, H.5
  • 28
    • 79953126288 scopus 로고    scopus 로고
    • On-the-fly elimination of dynamic irregularities for gpu computing
    • New York, NY, USA, ACM
    • E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. ASPLOS'11, pages 369-380, New York, NY, USA, 2011. ACM.
    • (2011) ASPLOS'11 , pp. 369-380
    • Zhang, E.Z.1    Jiang, Y.2    Guo, Z.3    Tian, K.4    Shen, X.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.