-
1
-
-
84860351763
-
The case for gpgpu spatial multitasking
-
J. Adriaens et al. The case for GPGPU spatial multitasking. In HPCA, 2012.
-
(2012)
HPCA
-
-
Adriaens, J.1
-
2
-
-
70349169075
-
Analyzing cuda workloads using a detailed gpu simulator
-
A. Bakhoda et al. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In ISPASS, 2009.
-
(2009)
ISPASS
-
-
Bakhoda, A.1
-
3
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, 2009.
-
(2009)
IISWC
-
-
Che, S.1
-
5
-
-
78650613386
-
A scalable concurrent malloc(3) implementation for freebsd
-
J. Evans. A Scalable Concurrent malloc(3) Implementation for FreeBSD. In BSDcan, 2006.
-
(2006)
BSDcan
-
-
Evans, J.1
-
6
-
-
47249094055
-
System-level performance metrics for multiprogram workloads
-
S. Eyerman and L. Eeckhout. System-level Performance Metrics for Multiprogram Workloads. IEEE Micro, 28(3), 2008.
-
(2008)
IEEE Micro
, vol.28
, Issue.3
-
-
Eyerman, S.1
Eeckhout, L.2
-
7
-
-
84894883016
-
Fine-grained resource sharing for concurrent gpgpu kernels
-
C. Gregg et al. Fine-grained resource sharing for concurrent GPGPU kernels. In HotPar, 2012.
-
(2012)
HotPar
-
-
Gregg, C.1
-
9
-
-
44849137198
-
Nvidia tesla: A unified graphics and computing architecture
-
E. Lindholm et al. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 28(2):39-55, 2008.
-
(2008)
IEEE Micro
, vol.28
, Issue.2
, pp. 39-55
-
-
Lindholm, E.1
-
14
-
-
79960506159
-
Supporting gpu sharing in cloud environments with a transparent runtime consolidation framework
-
V. T. Ravi et al. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework. In HPDC, 2011.
-
(2011)
HPDC
-
-
Ravi, V.T.1
-
16
-
-
70449725275
-
Chunking parallel loops in the presence of synchronization
-
J. Shirako et al. Chunking parallel loops in the presence of synchronization. In ICS, 2009.
-
(2009)
ICS
-
-
Shirako, J.1
-
17
-
-
77953978573
-
Efficient compilation of fine-grained spmdthreaded programs for multicore cpus
-
J. A. Stratton et al. Efficient compilation of fine-grained SPMDthreaded programs for multicore CPUs. In CGO, 2010.
-
(2010)
CGO
-
-
Stratton, J.A.1
-
18
-
-
70449749047
-
Mcuda: An efficient implementation of cuda kernels for multi-core cpus
-
J. A. Stratton, S. S. Stone, andW. meiW. Hwu. MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs. In LCPC, 2008.
-
(2008)
LCPC
-
-
Stratton, J.A.1
Stone, S.S.2
Mei, W.3
Hwu, W.4
-
19
-
-
84875683628
-
-
TOP500.org
-
TOP500.org. The Top 500.
-
The Top 500
-
-
-
20
-
-
34547715870
-
Initial observations of the simultaneous multithreading pentium 4 processor
-
N. Tuck and D. M. Tullsen. Initial Observations of the Simultaneous Multithreading Pentium 4 Processor. In PACT, 2003.
-
(2003)
PACT
-
-
Tuck, N.1
Tullsen, D.M.2
-
21
-
-
79953080838
-
Kernel fusion: An effective method for better power efficiency on multithreaded gpu
-
G. Wang, Y. Lin, and W. Yi. Kernel fusion: An effective method for better power efficiency on multithreaded GPU. In Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing, GREENCOM-CPSCOM '10, 2010.
-
(2010)
Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing, GREENCOM-CPSCOM '10
-
-
Wang, G.1
Lin, Y.2
Yi, W.3
-
22
-
-
80052985746
-
Exploiting concurrent kernel execution on graphic processing units
-
L.Wang, M. Huang, and T. El-Ghazawi. Exploiting concurrent kernel execution on graphic processing units. In HPCS, 2011.
-
(2011)
HPCS
-
-
Wang, L.1
Huang, M.2
El-Ghazawi, T.3
|