-
1
-
-
70349169075
-
Analyzing CUDA workloads using a detailed GPU simulator
-
A. Bakhoda, G. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, Analyzing CUDA workloads using a detailed GPU simulator. IEEE International Symposium on Performance Analysis of Systems and Software, April 2009.
-
IEEE International Symposium on Performance Analysis of Systems and Software, April 2009
-
-
Bakhoda, A.1
Yuan, G.2
Fung, W.W.L.3
Wong, H.4
Aamodt, T.M.5
-
4
-
-
0034839064
-
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors
-
C. K. Luk, Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. International Symposium on Computer Architecture, 2001.
-
International Symposium on Computer Architecture, 2001
-
-
Luk, C.K.1
-
11
-
-
0034839033
-
Speculative precomputation: Long range prefetching of delinquent loads
-
J. D. Collins, H. Wang, D. Tullsen, C. Hughes, Y.-F. Lee, D. Lavery, and J. P. Shen, Speculative precomputation: long range prefetching of delinquent loads. International Symposium on Computer Architecture, 2001.
-
International Symposium on Computer Architecture, 2001
-
-
Collins, J.D.1
Wang, H.2
Tullsen, D.3
Hughes, C.4
Lee, Y.-F.5
Lavery, D.6
Shen, J.P.7
-
12
-
-
79951719035
-
Many-thread aware prefetching mechanisms for gpgpu applications
-
J. Lee, N. B. Lakshminarayana, H. Kim, and R. Vuduc, Many-thread aware prefetching mechanisms for gpgpu applications. IEEE/ACM International Symposium on Microarchitecture, 2010.
-
IEEE/ACM International Symposium on Microarchitecture, 2010
-
-
Lee, J.1
Lakshminarayana, N.B.2
Kim, H.3
Vuduc, R.4
-
13
-
-
84860345627
-
-
MARSSx86, http://marss86.org/~marss86/index.php/Home
-
-
-
-
14
-
-
70350754502
-
High performance discrete Fourier transforms on graphics processors
-
N. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, and J. Manferdelli, High performance discrete Fourier transforms on graphics processors. Proceedings of Supercomputing, 2008.
-
Proceedings of Supercomputing, 2008
-
-
Govindaraju, N.1
Lloyd, B.2
Dotsenko, Y.3
Smith, B.4
Manferdelli, J.5
-
19
-
-
84881411973
-
Memory System on Fusion APUs - The Benefits of Zero Copy
-
P. Boudier, Memory System on Fusion APUs - The Benefits of Zero Copy. AMD fusion developer summit, 2011.
-
(2011)
AMD Fusion Developer Summit
-
-
Boudier, P.1
-
20
-
-
84949755841
-
Memory latency-tolerance approaches for Itanium processors: Out-of-order execution vs. speculative precomputation
-
P. H. Wang, H. Wang, J. D. Collins, E. Grochowski, R. M. Kling, and J. P. Shen, Memory latency-tolerance approaches for Itanium processors: out-of-order execution vs. speculative precomputation. IEEE International Symposium on High Performance Computer Architecture, 2002.
-
IEEE International Symposium on High Performance Computer Architecture, 2002
-
-
Wang, P.H.1
Wang, H.2
Collins, J.D.3
Grochowski, E.4
Kling, R.M.5
Shen, J.P.6
-
21
-
-
84860339270
-
-
Sandy Bridge
-
Sandy Bridge, http://en.wikipedia.org/wiki/Sandy-Bridge.
-
-
-
-
23
-
-
43449094719
-
Optimization space pruning for a multi-threaded GPU
-
S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S. Ueng, J. A. Stratton, and W. W. Hwu, Optimization space pruning for a multi-threaded GPU. International Symposium on Code Generation and Optimization, 2008.
-
International Symposium on Code Generation and Optimization, 2008
-
-
Ryoo, S.1
Rodrigues, C.I.2
Stone, S.S.3
Baghsorkhi, S.S.4
Ueng, S.5
Stratton, J.A.6
Hwu, W.W.7
-
24
-
-
0036296856
-
Using a user-level memory thread for correlation prefetching
-
Y. Solihin, J. Lee and J. Torrellas, Using a user-level memory thread for correlation prefetching, ISCA 2002
-
(2002)
ISCA
-
-
Solihin, Y.1
Lee, J.2
Torrellas, J.3
|