SCOPUS 정보 검색 플랫폼

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Volumn , Issue , 2016, Pages 534-543

Compiler-Assisted Workload Consolidation for Efficient Dynamic Parallelism on GPU

(3) Wu, Hancheng a Li, Da a Becchi, Michela a

a Department of Electrical Engineering and Computer Science (United States)

Author keywords

Consolidation; Dynamic Parallelism; GPU; Irregular Computations

Indexed keywords

CONSOLIDATION; COSINE TRANSFORMS; MASKS; PROGRAM COMPILERS; PROGRAM PROCESSORS;

CODE TRANSFORMATION; COMPILER-ASSISTED; DEGREE OF PARALLELISM; HARDWARE UTILIZATION; IMPROVE PERFORMANCE; IRREGULAR COMPUTATIONS; NESTED PARALLELISM; WORKLOAD CONSOLIDATION;

COMPUTER HARDWARE;

EID: 84983239150 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IPDPS.2016.98 Document Type: Conference Paper

Times cited : (23)

References (34)

1
- 84960802332
- A. Adinetz. "Adaptive Parallel Computation with CUDA Dynamic Parallelism, " http://devblogs. nvidia. com/parallelforall/introductioncuda-dynamic-parallelism/.
- Adaptive Parallel Computation with CUDA Dynamic Parallelism
- Adinetz, A.¹

2
- 84946029581
- Characterization and analysis of dynamic parallelism in unstructured GPU applications
- J. Wang, and S. Yalamanchili, "Characterization and Analysis of Dynamic Parallelism in Unstructured GPU Applications, " in Proc. of IISWC 2014.
- (2014) Proc. of IISWC
- Wang, J.¹ Yalamanchili, S.²

3
- 84976510144
- Nested parallelism on GPU: Exploring parallelization templates for irregular loops and recursive computations
- D. Li, H. Wu, and M. Becchi, "Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations, " in Proc. of ICPP 2015.
- (2015) Proc. of ICPP
- Li, D.¹ Wu, H.² Becchi, M.³

4
- 84896893237
- CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications
- Y. Yang, and H. Zhou, "CUDA-NP: realizing nested thread-level parallelism in GPGPU applications, " in Proc. of PPoPP 2014.
- (2014) Proc. of PPoPP
- Yang, Y.¹ Zhou, H.²

5
- 60649099910
- Accelerating large graph algorithms on the GPU using CUDA
- P. Harish, and P. J. Narayanan, "Accelerating large graph algorithms on the GPU using CUDA, " in Proc. of HiPC 2007.
- (2007) Proc. of HiPC
- Harish, P.¹ Narayanan, P.J.²

6
- 84898796621
- S. Jones. "Introduction to Dynamic Parallelism, " http://ondemand. GPUtechconf. com/gtc/2012/presentations/S0338-GTC2012-CUDA-Programming-Model. pdf.
- Introduction to Dynamic Parallelism
- Jones, S.¹

7
- 84976484929
- General transformations for GPU execution of tree traversals
- M. Goldfarb, Y. Jo, and M. Kulkarni, "General transformations for GPU execution of tree traversals, " in Proc. of HPDC 2013.
- (2013) Proc. of HPDC
- Goldfarb, M.¹ Jo, Y.² Kulkarni, M.³

8
- 0025380943
- Compiling collection-oriented languages onto massively parallel computers
- G. E. Blelloch, and G. W. Sabot, "Compiling collection-oriented languages onto massively parallel computers, " J. Parallel Distrib. Comput., vol. 8, no. 2, pp. 119-134, 1990.
- (1990) J. Parallel Distrib. Comput. , vol.8 , Issue.2 , pp. 119-134
- Blelloch, G.E.¹ Sabot, G.W.²

9
- 0003966887
- G. E. Blelloch, NESL: A Nested Data-Parallel Language 1992.
- (1992) NESL: A Nested Data-Parallel Language
- Blelloch, G.E.¹

10
- 79960506159
- Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework
- V. T. Ravi, M. Becchi, G. Agrawal, and S. Chakradhar, "Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework, " in Proc. of HPDC 2011.
- (2011) Proc. of HPDC
- Ravi, V.T.¹ Becchi, M.² Agrawal, G.³ Chakradhar, S.⁴

11
- 84872376103
- D. J. Quinlan, C. Liao, J. Too, R. P. Matzke, and M. Schordan. "ROSE Compiler Infrastructure, " 2015; http://www. rosecompiler. org.
- (2015) ROSE Compiler Infrastructure
- Quinlan, D.J.¹ Liao, C.² Too, J.³ Matzke, R.P.⁴ Schordan, M.⁵

12
- 84983370979
- A. V. Adinetz, and D. Pleiter. "Halloc: A fast and highly scalable GPU dynamic memory allocator, " https://github. com/canonizer/halloc.
- Halloc: A Fast and Highly Scalable GPU Dynamic Memory Allocator
- Adinetz, A.V.¹ Pleiter, D.²

13
- 84946577059
- Parallel pagerank computation usingFlorida
- N. T. Duong, Q. A. P. Nguyen, A. T. Nguyen, and H.-D. Nguyen, "Parallel PageRank computation usingFlorida, " in Proc. of the Third Symposium on Information and Communication Technology, 2012.
- (2012) Proc. of the Third Symposium on Information and Communication Technology
- Duong, N.T.¹ Nguyen, Q.A.P.² Nguyen, A.T.³ Nguyen, H.-D.⁴

14
- 84976478010
- Efficient sparse matrix-vector multiplication onFlorida using the CSR storage format
- J. L. Greathouse, and M. Daga, "Efficient sparse matrix-vector multiplication onFlorida using the CSR storage format. " in Proc. of SC 2014
- (2014) Proc. of SC
- Greathouse, J.L.¹ Daga, M.²

15
- 77955887863
- "DIMACS Implementation Challenges, " http://dimacs. rutgers. edu/Challenges/.
- DIMACS Implementation Challenges

16
- 84906718925
- Nitro: A framework for adaptive code variant tuning
- S. Muralidharan, M. Shantharam, M. Hall, M. Garland, and B. Catanzaro, "Nitro: A Framework for Adaptive Code Variant Tuning, " in Proc. of IPDPS 2014.
- (2014) Proc. of IPDPS
- Muralidharan, S.¹ Shantharam, M.² Hall, M.³ Garland, M.⁴ Catanzaro, B.⁵

17
- 84875967341
- "Profiler User's Guide, " http://docs. nvidia. com/cuda/profiler-usersguide/# axzz3nGyZAhq7.
- Profiler User's Guide

18
- 84936980200
- A quantitative study of irregular programs onFlorida
- M. Burtscher, R. Nasre, and K. Pingali, "A quantitative study of irregular programs onFlorida, " in Proc. IISWC 2012.
- (2012) Proc. IISWC
- Burtscher, M.¹ Nasre, R.² Pingali, K.³

19
- 84946053358
- Microarchitectural performance characterization of irregular GPU kernels
- M. A. O'Neil, and M. Burtscher, "Microarchitectural Performance Characterization of Irregular GPU Kernels, " in Proc. of IISWC 2014.
- (2014) Proc. of IISWC
- O'Neil, M.A.¹ Burtscher, M.²

20
- 84893628986
- Pannotia: Understanding irregular GPGPU graph applications
- C. Shuai, B. M. Beckmann, S. K. Reinhardt, and K. Skadron, "Pannotia: Understanding irregular GPGPU graph applications, " in Proc. of IISWC 2013.
- (2013) Proc. of IISWC
- Shuai, C.¹ Beckmann, B.M.² Reinhardt, S.K.³ Skadron, K.⁴

21
- 84962303704
- Performance characterization for high-level programming models for GPU graph analytics
- Y. Wu, Y. Wang, Y. Pan, C. Yang, and J. D. Owens, " Performance Characterization for High-Level Programming Models for GPU Graph Analytics, " in Proc. of IISWC 2015.
- (2015) Proc. of IISWC
- Wu, Y.¹ Wang, Y.² Pan, Y.³ Yang, C.⁴ Owens, J.D.⁵

22
- 77956200064
- An effective GPU implementation of breadth-first search
- L. Luo, M. Wong, and W.-m. Hwu, "An effective GPU implementation of breadth-first search, " in Proc. of DAC 2010.
- (2010) Proc. of DAC
- Luo, L.¹ Wong, M.² Hwu, W.-M.³

23
- 84858391043
- Scalable GPU graph traversal
- D. Merrill, M. Garland, and A. Grimshaw, "Scalable GPU graph traversal, " in Proc. of PPoPP 2012.
- (2012) Proc. of PPoPP
- Merrill, D.¹ Garland, M.² Grimshaw, A.³

24
- 79952811127
- Accelerating CUDA graph algorithms at maximum warp
- S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun, "Accelerating CUDA graph algorithms at maximum warp, " in Proc. of PPoPP 2011.
- Proc. of PPoPP 2011
- Hong, S.¹ Kim, S.K.² Oguntebi, T.³ Olukotun, K.⁴

25
- 84946577056
- Deploying graph algorithms onFlorida: An adaptive solution
- D. Li, and M. Becchi, "Deploying Graph Algorithms onFlorida: an Adaptive Solution, " in Proc. of IPDPS 2013.
- (2013) Proc. of IPDPS
- Li, D.¹ Becchi, M.²

26
- 84884887302
- On graphs,Florida, and blind dating: A workload to processor matchmaking quest
- A. Gharaibeh, L. B. Costa, E. Santos-Neto, and M. Ripeanu, "On Graphs,Florida, and Blind Dating: A Workload to Processor Matchmaking Quest, " in Proc. of IPDPS 2013.
- (2013) Proc. of IPDPS
- Gharaibeh, A.¹ Costa, L.B.² Santos-Neto, E.³ Ripeanu, M.⁴

27
- 84976483630
- Atomic-free irregular computations onFlorida
- R. Nasre, M. Burtscher, and K. Pingali, "Atomic-free irregular computations onFlorida, " in Proc. of GPGPU 2013.
- (2013) Proc. of GPGPU
- Nasre, R.¹ Burtscher, M.² Pingali, K.³

28
- 84976497139
- Betweenness centrality onFlorida and heterogeneous architectures
- A. E. Sriyuce, K. Kaya, E. Saule, and U. V. Catalyurek, "Betweenness Centrality onFlorida and Heterogeneous Architectures, " in Proc. of GPGPU 2013.
- (2013) Proc. of GPGPU
- Sriyuce, A.E.¹ Kaya, K.² Saule, E.³ Catalyurek, U.V.⁴

29
- 84867546922
- Nested data-parallelism on the GPU
- L. Bergstrom, and J. Reppy, "Nested data-parallelism on the GPU, " in Proc. of ICFPC 2012.
- (2012) Proc. of ICFPC
- Bergstrom, L.¹ Reppy, J.²

30
- 84870690379
- A study of persistent threads style GPU programming for gpgpu workloads
- K. Gupta, J. A. Stuart, and J. D. Owens, "A Study of Persistent Threads Style GPU Programming for GPGPU Workloads, " in Proc. of IPC 2012.
- (2012) Proc. of IPC
- Gupta, K.¹ Stuart, J.A.² Owens, J.D.³

31
- 84976466502
- Performance impact of dynamic parallelism on different clustering algorithms and the new GPU architecture
- J. DiMarco, and M. Taufer, "Performance Impact of Dynamic Parallelism on Different Clustering Algorithms and the New GPU Architecture, " in Proc. of SPIE Defense, Security, and Sensing Symposium 2013.
- (2013) Proc. of SPIE Defense, Security, and Sensing Symposium
- DiMarco, J.¹ Taufer, M.²

32
- 84976469523
- A. Adinetz. "A CUDA Dynamic Parallelism Case Study: PANDA, " http://devblogs. nvidia. com/parallelforall/a-cuda-dynamic-parallelismcase-study-panda/.
- A CUDA Dynamic Parallelism Case Study: PANDA
- Adinetz, A.¹

33
- 84959927541
- Free launch: Optimizing GPU dynamic kernel launches through thread reuse
- G. Chen, and X. Shen, "Free Launch: Optimizing GPU Dynamic Kernel Launches through Thread Reuse, " in Proc. of MICRO 2015.
- (2015) Proc. of MICRO
- Chen, G.¹ Shen, X.²

34
- 84960076275
- Dynamic thread block launch: A lightweight execution mechanism to support irregular applications onFlorida
- J. Wang, N. Rubin, A. Sidelnik, and S. Yalamanchili, "Dynamic Thread Block Launch: a Lightweight Execution Mechanism to Support Irregular Applications onFlorida, " in Proc. of ISCA 2015.
- (2015) Proc. of ISCA
- Wang, J.¹ Rubin, N.² Sidelnik, A.³ Yalamanchili, S.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.