-
2
-
-
84946029581
-
Characterization and analysis of dynamic parallelism in unstructured GPU applications
-
J. Wang, and S. Yalamanchili, "Characterization and Analysis of Dynamic Parallelism in Unstructured GPU Applications, " in Proc. of IISWC 2014.
-
(2014)
Proc. of IISWC
-
-
Wang, J.1
Yalamanchili, S.2
-
3
-
-
84976510144
-
Nested parallelism on GPU: Exploring parallelization templates for irregular loops and recursive computations
-
D. Li, H. Wu, and M. Becchi, "Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations, " in Proc. of ICPP 2015.
-
(2015)
Proc. of ICPP
-
-
Li, D.1
Wu, H.2
Becchi, M.3
-
4
-
-
84896893237
-
CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications
-
Y. Yang, and H. Zhou, "CUDA-NP: realizing nested thread-level parallelism in GPGPU applications, " in Proc. of PPoPP 2014.
-
(2014)
Proc. of PPoPP
-
-
Yang, Y.1
Zhou, H.2
-
5
-
-
60649099910
-
Accelerating large graph algorithms on the GPU using CUDA
-
P. Harish, and P. J. Narayanan, "Accelerating large graph algorithms on the GPU using CUDA, " in Proc. of HiPC 2007.
-
(2007)
Proc. of HiPC
-
-
Harish, P.1
Narayanan, P.J.2
-
7
-
-
84976484929
-
General transformations for GPU execution of tree traversals
-
M. Goldfarb, Y. Jo, and M. Kulkarni, "General transformations for GPU execution of tree traversals, " in Proc. of HPDC 2013.
-
(2013)
Proc. of HPDC
-
-
Goldfarb, M.1
Jo, Y.2
Kulkarni, M.3
-
8
-
-
0025380943
-
Compiling collection-oriented languages onto massively parallel computers
-
G. E. Blelloch, and G. W. Sabot, "Compiling collection-oriented languages onto massively parallel computers, " J. Parallel Distrib. Comput., vol. 8, no. 2, pp. 119-134, 1990.
-
(1990)
J. Parallel Distrib. Comput.
, vol.8
, Issue.2
, pp. 119-134
-
-
Blelloch, G.E.1
Sabot, G.W.2
-
10
-
-
79960506159
-
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework
-
V. T. Ravi, M. Becchi, G. Agrawal, and S. Chakradhar, "Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework, " in Proc. of HPDC 2011.
-
(2011)
Proc. of HPDC
-
-
Ravi, V.T.1
Becchi, M.2
Agrawal, G.3
Chakradhar, S.4
-
11
-
-
84872376103
-
-
D. J. Quinlan, C. Liao, J. Too, R. P. Matzke, and M. Schordan. "ROSE Compiler Infrastructure, " 2015; http://www. rosecompiler. org.
-
(2015)
ROSE Compiler Infrastructure
-
-
Quinlan, D.J.1
Liao, C.2
Too, J.3
Matzke, R.P.4
Schordan, M.5
-
13
-
-
84946577059
-
Parallel pagerank computation usingFlorida
-
N. T. Duong, Q. A. P. Nguyen, A. T. Nguyen, and H.-D. Nguyen, "Parallel PageRank computation usingFlorida, " in Proc. of the Third Symposium on Information and Communication Technology, 2012.
-
(2012)
Proc. of the Third Symposium on Information and Communication Technology
-
-
Duong, N.T.1
Nguyen, Q.A.P.2
Nguyen, A.T.3
Nguyen, H.-D.4
-
14
-
-
84976478010
-
Efficient sparse matrix-vector multiplication onFlorida using the CSR storage format
-
J. L. Greathouse, and M. Daga, "Efficient sparse matrix-vector multiplication onFlorida using the CSR storage format. " in Proc. of SC 2014
-
(2014)
Proc. of SC
-
-
Greathouse, J.L.1
Daga, M.2
-
16
-
-
84906718925
-
Nitro: A framework for adaptive code variant tuning
-
S. Muralidharan, M. Shantharam, M. Hall, M. Garland, and B. Catanzaro, "Nitro: A Framework for Adaptive Code Variant Tuning, " in Proc. of IPDPS 2014.
-
(2014)
Proc. of IPDPS
-
-
Muralidharan, S.1
Shantharam, M.2
Hall, M.3
Garland, M.4
Catanzaro, B.5
-
17
-
-
84875967341
-
-
"Profiler User's Guide, " http://docs. nvidia. com/cuda/profiler-usersguide/# axzz3nGyZAhq7.
-
Profiler User's Guide
-
-
-
18
-
-
84936980200
-
A quantitative study of irregular programs onFlorida
-
M. Burtscher, R. Nasre, and K. Pingali, "A quantitative study of irregular programs onFlorida, " in Proc. IISWC 2012.
-
(2012)
Proc. IISWC
-
-
Burtscher, M.1
Nasre, R.2
Pingali, K.3
-
19
-
-
84946053358
-
Microarchitectural performance characterization of irregular GPU kernels
-
M. A. O'Neil, and M. Burtscher, "Microarchitectural Performance Characterization of Irregular GPU Kernels, " in Proc. of IISWC 2014.
-
(2014)
Proc. of IISWC
-
-
O'Neil, M.A.1
Burtscher, M.2
-
21
-
-
84962303704
-
Performance characterization for high-level programming models for GPU graph analytics
-
Y. Wu, Y. Wang, Y. Pan, C. Yang, and J. D. Owens, " Performance Characterization for High-Level Programming Models for GPU Graph Analytics, " in Proc. of IISWC 2015.
-
(2015)
Proc. of IISWC
-
-
Wu, Y.1
Wang, Y.2
Pan, Y.3
Yang, C.4
Owens, J.D.5
-
22
-
-
77956200064
-
An effective GPU implementation of breadth-first search
-
L. Luo, M. Wong, and W.-m. Hwu, "An effective GPU implementation of breadth-first search, " in Proc. of DAC 2010.
-
(2010)
Proc. of DAC
-
-
Luo, L.1
Wong, M.2
Hwu, W.-M.3
-
25
-
-
84946577056
-
Deploying graph algorithms onFlorida: An adaptive solution
-
D. Li, and M. Becchi, "Deploying Graph Algorithms onFlorida: an Adaptive Solution, " in Proc. of IPDPS 2013.
-
(2013)
Proc. of IPDPS
-
-
Li, D.1
Becchi, M.2
-
26
-
-
84884887302
-
On graphs,Florida, and blind dating: A workload to processor matchmaking quest
-
A. Gharaibeh, L. B. Costa, E. Santos-Neto, and M. Ripeanu, "On Graphs,Florida, and Blind Dating: A Workload to Processor Matchmaking Quest, " in Proc. of IPDPS 2013.
-
(2013)
Proc. of IPDPS
-
-
Gharaibeh, A.1
Costa, L.B.2
Santos-Neto, E.3
Ripeanu, M.4
-
30
-
-
84870690379
-
A study of persistent threads style GPU programming for gpgpu workloads
-
K. Gupta, J. A. Stuart, and J. D. Owens, "A Study of Persistent Threads Style GPU Programming for GPGPU Workloads, " in Proc. of IPC 2012.
-
(2012)
Proc. of IPC
-
-
Gupta, K.1
Stuart, J.A.2
Owens, J.D.3
-
31
-
-
84976466502
-
Performance impact of dynamic parallelism on different clustering algorithms and the new GPU architecture
-
J. DiMarco, and M. Taufer, "Performance Impact of Dynamic Parallelism on Different Clustering Algorithms and the New GPU Architecture, " in Proc. of SPIE Defense, Security, and Sensing Symposium 2013.
-
(2013)
Proc. of SPIE Defense, Security, and Sensing Symposium
-
-
DiMarco, J.1
Taufer, M.2
-
33
-
-
84959927541
-
Free launch: Optimizing GPU dynamic kernel launches through thread reuse
-
G. Chen, and X. Shen, "Free Launch: Optimizing GPU Dynamic Kernel Launches through Thread Reuse, " in Proc. of MICRO 2015.
-
(2015)
Proc. of MICRO
-
-
Chen, G.1
Shen, X.2
-
34
-
-
84960076275
-
Dynamic thread block launch: A lightweight execution mechanism to support irregular applications onFlorida
-
J. Wang, N. Rubin, A. Sidelnik, and S. Yalamanchili, "Dynamic Thread Block Launch: a Lightweight Execution Mechanism to Support Irregular Applications onFlorida, " in Proc. of ISCA 2015.
-
(2015)
Proc. of ISCA
-
-
Wang, J.1
Rubin, N.2
Sidelnik, A.3
Yalamanchili, S.4
|