-
1
-
-
70349169075
-
Analyzing CUDA Workloads Using a Detailed GPU Simulator
-
A. Bakhoda, et al., "Analyzing CUDA Workloads Using a Detailed GPU Simulator, " ISPASS-2009, 2009
-
(2009)
ISPASS-2009
-
-
Bakhoda, A.1
-
2
-
-
84864834311
-
Simultaneous branch and warp inter-weaving for sustained gpu performance
-
N. Brunie, et al., "Simultaneous Branch and Warp Inter-weaving for Sustained GPU Performance"ISCA-39, 2012.
-
(2012)
ISCA-39
-
-
Brunie, N.1
-
3
-
-
84903953935
-
-
CUDA programming guide
-
CUDA programming guide
-
-
-
-
4
-
-
21644487687
-
Control flow optimization via dynamic reconvergence prediction
-
J. D. Collins, et al., "Control flow optimization via dynamic reconvergence prediction, " MICRO-37, 2004
-
(2004)
MICRO-37
-
-
Collins, J.D.1
-
5
-
-
70649092154
-
Rodinia: A benchmark suite for hetero-geneous computing
-
S. Che, et al., "Rodinia: A Benchmark Suite for Hetero-geneous Computing, " IISWC-2009, 2009.
-
(2009)
IISWC-2009
-
-
Che, S.1
-
6
-
-
84863351470
-
SIMD re-convergence at thread frontiers
-
G. Diamos, et al., "SIMD Re-Convergence at Thread Frontiers, " MICRO-44, 2011.
-
(2011)
MICRO-44
-
-
Diamos, G.1
-
7
-
-
79955923056
-
Thread block compaction for efficient simt control flow
-
W. W. Fung, et al., "Thread Block Compaction for Efficient SIMT Control Flow, " HPCA-17, 2011.
-
(2011)
HPCA-17
-
-
Fung, W.W.1
-
8
-
-
47349104432
-
Dynamic warp formation and schedul-ing for efficient gpu control flow
-
W. Fung, et al., "Dynamic Warp Formation and Schedul-ing for Efficient GPU Control Flow, " MICRO-40, 2007.
-
(2007)
MICRO-40
-
-
Fung, W.1
-
9
-
-
80052533471
-
Energy-efficient mechanisms for managing thread context in throughput processors
-
M. Gebhart, et al., "Energy-efficient mechanisms for managing thread context in throughput processors, " ISCA-38, 2011
-
(2011)
ISCA-38
-
-
Gebhart, M.1
-
10
-
-
84862154605
-
Reducing branch divergence in GPU programs
-
T. D. Han, et al., "Reducing Branch Divergence in GPU Programs, " GPGPU-4, 2011.
-
(2011)
GPGPU-4
-
-
Han, T.D.1
-
11
-
-
84903953925
-
-
IMPACT Research Group. The Parboil Benchmark Suite
-
IMPACT Research Group. The Parboil Benchmark Suite.
-
-
-
-
12
-
-
84881151222
-
GPUWattch: Enabling energy optimizations in GPGPUs
-
J. Leng, et al., "GPUWattch: Enabling Energy Optimizations in GPGPUs, " ISCA-40, 2013.
-
(2013)
ISCA-40
-
-
Leng, J.1
-
14
-
-
84903936214
-
TLP-aware cache management schemes for a CPU-GPU heterogeneous architecture
-
J. Lee, et al., "TLP-Aware Cache Management Schemes for a CPU-GPU Heterogeneous Architecture", HPCA-18, 2012.
-
(2012)
HPCA-18
-
-
Lee, J.1
-
15
-
-
84880287859
-
Warped register file: A power efficient register file for GPGPUs
-
A. Mohammad, et al., "Warped Register File: A Power Efficient Register File for GPGPUs, " HPCA-19, 2013.
-
(2013)
HPCA-19
-
-
Mohammad, A.1
-
16
-
-
77954976292
-
Dynamic warp subdivision for integrated branch and memory divergence tolerance
-
J. Meng, et al., "Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance"ISCA-37, 2010.
-
(2010)
ISCA-37
-
-
Meng, J.1
-
17
-
-
84863342255
-
Improving GPU performance via large warps and two-level warp scheduling
-
V. Narasiman, et al., " Improving GPU Performance via Large Warps and Two-Level Warp Scheduling, " MICRO-44, 2011.
-
(2011)
MICRO-44
-
-
Narasiman, V.1
-
19
-
-
84903953927
-
-
NVIDIA. CUDA C/C++ SDK Code Samples 2011
-
NVIDIA. CUDA C/C++ SDK Code Samples, 2011. http://developer.nvidia.com/ gpu-computing-sdk, 2011
-
(2011)
-
-
-
20
-
-
84880298026
-
The dual-path execution model for efficient GPU control flow
-
Minsoo Rhu, et al., "The Dual-Path Execution Model for Efficient GPU Control Flow, " HPCA-19, 2013.
-
(2013)
HPCA-19
-
-
Rhu, M.1
-
21
-
-
84864855982
-
CAPRI: Prediction of compaction-adequacy for handling control-divergence in GPGPU architectures
-
Minsoo Rhu, et al., "CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures, " ISCA-39, 2012.
-
(2012)
ISCA-39
-
-
Rhu, M.1
-
22
-
-
84876590572
-
Cache-conscious wavefront scheduling
-
T. Rogers, et al., "Cache-Conscious Wavefront Scheduling, " MICRO-45, 2012.
-
(2012)
MICRO-45
-
-
Rogers, T.1
-
24
-
-
84867547570
-
RISE: Improving streaming processors reliability against soft errors in GPGPUs
-
J. Tan, et al., "RISE: Improving Streaming Processors Reliability against Soft Errors in GPGPUs" PACT-21, 2012
-
(2012)
PACT-21
-
-
Tan, J.1
-
25
-
-
84862974517
-
Analyzing soft-error vulnerability on GPGPU microarchitecture
-
J. Tan, et al., "Analyzing Soft-Error Vulnerability on GPGPU Microarchitecture, " IISWC-2011, 2011.
-
(2011)
IISWC-2011
-
-
Tan, J.1
-
26
-
-
84867509598
-
Shared memory multiplexing: A novel way to improve GPGPU throughput
-
Y. Yang, et al., "Shared Memory Multiplexing: A Novel Way to Improve GPGPU Throughput, " PACT-21, 2012.
-
(2012)
PACT-21
-
-
Yang, Y.1
|