-
1
-
-
70349169075
-
Analyzing CUDA workloads using a detailed GPU simulator
-
A. Bakhoda, G. Yuan,W. Fung, H.Wong, and T. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," in Proc. ISPASS, 2009, pp. 163-174.
-
(2009)
Proc. ISPASS
, pp. 163-174
-
-
Bakhoda, A.1
Yuan, G.2
Fung, W.3
Wong, H.4
Aamodt, T.5
-
2
-
-
77952265942
-
Best-effort semantic document search on GPUs
-
S. Byna, J. Meng, A. Raghunathan, S. Chakradhar, and S. Cadambi, "Best-effort semantic document search on GPUs," in Proc. GPGPU, 2010, pp. 86-93.
-
(2010)
Proc. GPGPU
, pp. 86-93
-
-
Byna, S.1
Meng, J.2
Raghunathan, A.3
Chakradhar, S.4
Cadambi, S.5
-
3
-
-
83155184570
-
Dymaxion: Optimizing memory access patterns for heterogeneous systems
-
S. Che, J. Sheaffer, and K. Skadron, "Dymaxion: Optimizing memory access patterns for heterogeneous systems," in Proc. SC, 2011, pp. 13:1-13:11.
-
(2011)
Proc. SC
, pp. 131-11311
-
-
Che, S.1
Sheaffer, J.2
Skadron, K.3
-
4
-
-
78049504879
-
-
United States Patent #7, 353,369, NVIDIA
-
B. Coon, "System and Method for Managing Divergent Threads in a SIMD Architecture," United States Patent #7,353,369, 2008, NVIDIA.
-
(2008)
System and Method for Managing Divergent Threads in A SIMD Architecture
-
-
Coon, B.1
-
5
-
-
79955923056
-
Thread block compaction for efficient SIMT control flow
-
W. Fung and T. Aamodt, "Thread block compaction for efficient SIMT control flow," in Proc. HPCA, 2011, pp. 25-36.
-
(2011)
Proc. HPCA
, pp. 25-36
-
-
Fung, W.1
Aamodt, T.2
-
6
-
-
47349104432
-
Dynamicwarp formation and scheduling for efficient GPU control flow
-
W. Fung, I. Sham, G. Yuan, and T. Aamodt, "Dynamicwarp formation and scheduling for efficient GPU control flow," in Proc.MICRO, 2007, pp. 407-420.
-
(2007)
Proc.MICRO
, pp. 407-420
-
-
Fung, W.1
Sham, I.2
Yuan, G.3
Aamodt, T.4
-
7
-
-
84872742646
-
-
Khronos Group
-
Khronos Group, OpenCL, 2010.
-
(2010)
OpenCL
-
-
-
8
-
-
70449885048
-
Best-effort parallel execution framework for recognition and mining applications
-
J. Meng, S. Chakradhar, and A. Raghunathan, "Best-effort parallel execution framework for recognition and mining applications," in Proc. IPDPS, 2009, pp. 1-12.
-
(2009)
Proc. IPDPS
-
-
Meng, J.1
Chakradhar, S.2
Raghunathan, A.3
-
9
-
-
77954976292
-
Dynamic warp subdivision for integrated branch and memory divergence tolerance
-
J. Meng, D. Tarjan, and K. Skadron, "Dynamic warp subdivision for integrated branch and memory divergence tolerance," in Proc. ISCA, 2010, pp. 235-246.
-
(2010)
Proc. ISCA
, pp. 235-246
-
-
Meng, J.1
Tarjan, D.2
Skadron, K.3
-
11
-
-
84863342255
-
Improving GPU performance via large warps and two-level warp scheduling
-
V. Narasiman,M. Shebanow, C. Lee, R. Miftakhutdinov, O.Mutlu, and Y. Patt, "Improving GPU performance via large warps and two-level warp scheduling," in Proc. MICRO, 2011, pp. 308-317.
-
(2011)
Proc. MICRO
, pp. 308-317
-
-
Narasiman, V.1
Shebanow, M.2
Lee, C.3
Miftakhutdinov, R.4
Mutlu, O.5
Patt, Y.6
-
15
-
-
84864858573
-
Increasing memory miss tolerance for SIMD cores
-
D. Tarjan, J. Meng, and K. Skadron, "Increasing memory miss tolerance for SIMD cores," in Proc. SC, 2009, pp. 22:1-22:11.
-
(2009)
Proc. SC
, pp. 221-2211
-
-
Tarjan, D.1
Meng, J.2
Skadron, K.3
-
19
-
-
84968854658
-
Y-branches: When you come to a fork in the road, take it
-
N. Wang, M. Fertig, and S. Patel, "Y-branches: When you come to a fork in the road, take it," in Proc. PACT, 2003, pp. 56-118.
-
(2003)
Proc. PACT
-
-
Wang, N.1
Fertig, M.2
Patel, S.3
-
20
-
-
84872696857
-
-
[Online]
-
Wikipedia, Mandelbrot Set, 2011. [Online]. Available: http://en.wikipedia.org/wiki/Mandelbrot set.
-
(2011)
Wikipedia Mandelbrot Set
-
-
-
21
-
-
77952579552
-
Demystifying GPU microarchitecture through microbenchmarking
-
H.Wong,M. Papadopoulou, M. Sadooghi-Alvandi, and A.Moshovos, "Demystifying GPU microarchitecture through microbenchmarking," in Proc. ISPASS, 2010, pp. 235-246.
-
(2010)
Proc. ISPASS
, pp. 235-246
-
-
Wong, H.1
Papadopoulou, M.2
Sadooghi-Alvandi, M.3
Moshovos, A.4
-
22
-
-
47349114260
-
The art of deception: Adaptive precision reduction for area efficient physics acceleration
-
T. Yeh, P. Faloutsos, M. Ercegovac, S. Patel, and G. Reinman, "The art of deception: Adaptive precision reduction for area efficient physics acceleration," in Proc. MICRO, 2007, pp. 394-406.
-
(2007)
Proc. MICRO
, pp. 394-406
-
-
Yeh, T.1
Faloutsos, P.2
Ercegovac, M.3
Patel, S.4
Reinman, G.5
-
23
-
-
79953126288
-
On-The-fly elimination of dynamic irregularities for GPU computing
-
E. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen, "On-the-fly elimination of dynamic irregularities for GPU computing," in Proc. ASPLOS, 2011, pp. 369-380.
-
(2011)
Proc. ASPLOS
, pp. 369-380
-
-
Zhang, E.1
Jiang, Y.2
Guo, Z.3
Tian, K.4
Shen, X.5
|