메뉴 건너뛰기




Volumn , Issue , 2016, Pages 534-543

Compiler-Assisted Workload Consolidation for Efficient Dynamic Parallelism on GPU

Author keywords

Consolidation; Dynamic Parallelism; GPU; Irregular Computations

Indexed keywords

CONSOLIDATION; COSINE TRANSFORMS; MASKS; PROGRAM COMPILERS; PROGRAM PROCESSORS;

EID: 84983239150     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/IPDPS.2016.98     Document Type: Conference Paper
Times cited : (23)

References (34)
  • 2
    • 84946029581 scopus 로고    scopus 로고
    • Characterization and analysis of dynamic parallelism in unstructured GPU applications
    • J. Wang, and S. Yalamanchili, "Characterization and Analysis of Dynamic Parallelism in Unstructured GPU Applications, " in Proc. of IISWC 2014.
    • (2014) Proc. of IISWC
    • Wang, J.1    Yalamanchili, S.2
  • 3
    • 84976510144 scopus 로고    scopus 로고
    • Nested parallelism on GPU: Exploring parallelization templates for irregular loops and recursive computations
    • D. Li, H. Wu, and M. Becchi, "Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations, " in Proc. of ICPP 2015.
    • (2015) Proc. of ICPP
    • Li, D.1    Wu, H.2    Becchi, M.3
  • 4
    • 84896893237 scopus 로고    scopus 로고
    • CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications
    • Y. Yang, and H. Zhou, "CUDA-NP: realizing nested thread-level parallelism in GPGPU applications, " in Proc. of PPoPP 2014.
    • (2014) Proc. of PPoPP
    • Yang, Y.1    Zhou, H.2
  • 5
    • 60649099910 scopus 로고    scopus 로고
    • Accelerating large graph algorithms on the GPU using CUDA
    • P. Harish, and P. J. Narayanan, "Accelerating large graph algorithms on the GPU using CUDA, " in Proc. of HiPC 2007.
    • (2007) Proc. of HiPC
    • Harish, P.1    Narayanan, P.J.2
  • 7
    • 84976484929 scopus 로고    scopus 로고
    • General transformations for GPU execution of tree traversals
    • M. Goldfarb, Y. Jo, and M. Kulkarni, "General transformations for GPU execution of tree traversals, " in Proc. of HPDC 2013.
    • (2013) Proc. of HPDC
    • Goldfarb, M.1    Jo, Y.2    Kulkarni, M.3
  • 8
    • 0025380943 scopus 로고
    • Compiling collection-oriented languages onto massively parallel computers
    • G. E. Blelloch, and G. W. Sabot, "Compiling collection-oriented languages onto massively parallel computers, " J. Parallel Distrib. Comput., vol. 8, no. 2, pp. 119-134, 1990.
    • (1990) J. Parallel Distrib. Comput. , vol.8 , Issue.2 , pp. 119-134
    • Blelloch, G.E.1    Sabot, G.W.2
  • 10
    • 79960506159 scopus 로고    scopus 로고
    • Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework
    • V. T. Ravi, M. Becchi, G. Agrawal, and S. Chakradhar, "Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework, " in Proc. of HPDC 2011.
    • (2011) Proc. of HPDC
    • Ravi, V.T.1    Becchi, M.2    Agrawal, G.3    Chakradhar, S.4
  • 14
    • 84976478010 scopus 로고    scopus 로고
    • Efficient sparse matrix-vector multiplication onFlorida using the CSR storage format
    • J. L. Greathouse, and M. Daga, "Efficient sparse matrix-vector multiplication onFlorida using the CSR storage format. " in Proc. of SC 2014
    • (2014) Proc. of SC
    • Greathouse, J.L.1    Daga, M.2
  • 17
    • 84875967341 scopus 로고    scopus 로고
    • "Profiler User's Guide, " http://docs. nvidia. com/cuda/profiler-usersguide/# axzz3nGyZAhq7.
    • Profiler User's Guide
  • 18
    • 84936980200 scopus 로고    scopus 로고
    • A quantitative study of irregular programs onFlorida
    • M. Burtscher, R. Nasre, and K. Pingali, "A quantitative study of irregular programs onFlorida, " in Proc. IISWC 2012.
    • (2012) Proc. IISWC
    • Burtscher, M.1    Nasre, R.2    Pingali, K.3
  • 19
    • 84946053358 scopus 로고    scopus 로고
    • Microarchitectural performance characterization of irregular GPU kernels
    • M. A. O'Neil, and M. Burtscher, "Microarchitectural Performance Characterization of Irregular GPU Kernels, " in Proc. of IISWC 2014.
    • (2014) Proc. of IISWC
    • O'Neil, M.A.1    Burtscher, M.2
  • 21
    • 84962303704 scopus 로고    scopus 로고
    • Performance characterization for high-level programming models for GPU graph analytics
    • Y. Wu, Y. Wang, Y. Pan, C. Yang, and J. D. Owens, " Performance Characterization for High-Level Programming Models for GPU Graph Analytics, " in Proc. of IISWC 2015.
    • (2015) Proc. of IISWC
    • Wu, Y.1    Wang, Y.2    Pan, Y.3    Yang, C.4    Owens, J.D.5
  • 22
    • 77956200064 scopus 로고    scopus 로고
    • An effective GPU implementation of breadth-first search
    • L. Luo, M. Wong, and W.-m. Hwu, "An effective GPU implementation of breadth-first search, " in Proc. of DAC 2010.
    • (2010) Proc. of DAC
    • Luo, L.1    Wong, M.2    Hwu, W.-M.3
  • 25
    • 84946577056 scopus 로고    scopus 로고
    • Deploying graph algorithms onFlorida: An adaptive solution
    • D. Li, and M. Becchi, "Deploying Graph Algorithms onFlorida: an Adaptive Solution, " in Proc. of IPDPS 2013.
    • (2013) Proc. of IPDPS
    • Li, D.1    Becchi, M.2
  • 26
    • 84884887302 scopus 로고    scopus 로고
    • On graphs,Florida, and blind dating: A workload to processor matchmaking quest
    • A. Gharaibeh, L. B. Costa, E. Santos-Neto, and M. Ripeanu, "On Graphs,Florida, and Blind Dating: A Workload to Processor Matchmaking Quest, " in Proc. of IPDPS 2013.
    • (2013) Proc. of IPDPS
    • Gharaibeh, A.1    Costa, L.B.2    Santos-Neto, E.3    Ripeanu, M.4
  • 29
  • 30
    • 84870690379 scopus 로고    scopus 로고
    • A study of persistent threads style GPU programming for gpgpu workloads
    • K. Gupta, J. A. Stuart, and J. D. Owens, "A Study of Persistent Threads Style GPU Programming for GPGPU Workloads, " in Proc. of IPC 2012.
    • (2012) Proc. of IPC
    • Gupta, K.1    Stuart, J.A.2    Owens, J.D.3
  • 31
    • 84976466502 scopus 로고    scopus 로고
    • Performance impact of dynamic parallelism on different clustering algorithms and the new GPU architecture
    • J. DiMarco, and M. Taufer, "Performance Impact of Dynamic Parallelism on Different Clustering Algorithms and the New GPU Architecture, " in Proc. of SPIE Defense, Security, and Sensing Symposium 2013.
    • (2013) Proc. of SPIE Defense, Security, and Sensing Symposium
    • DiMarco, J.1    Taufer, M.2
  • 33
    • 84959927541 scopus 로고    scopus 로고
    • Free launch: Optimizing GPU dynamic kernel launches through thread reuse
    • G. Chen, and X. Shen, "Free Launch: Optimizing GPU Dynamic Kernel Launches through Thread Reuse, " in Proc. of MICRO 2015.
    • (2015) Proc. of MICRO
    • Chen, G.1    Shen, X.2
  • 34
    • 84960076275 scopus 로고    scopus 로고
    • Dynamic thread block launch: A lightweight execution mechanism to support irregular applications onFlorida
    • J. Wang, N. Rubin, A. Sidelnik, and S. Yalamanchili, "Dynamic Thread Block Launch: a Lightweight Execution Mechanism to Support Irregular Applications onFlorida, " in Proc. of ISCA 2015.
    • (2015) Proc. of ISCA
    • Wang, J.1    Rubin, N.2    Sidelnik, A.3    Yalamanchili, S.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.