-
1
-
-
84876947481
-
-
Asfermi. http://code. google. com/p/asfermi/.
-
Asfermi
-
-
-
2
-
-
84876915752
-
-
Netlib. http://www. netlib. org/blas/.
-
-
-
-
3
-
-
33645207150
-
-
Nvidia. Visual Profiler, https://developer. nvidia. com/nvidia-visual- profiler.
-
Visual Profiler
-
-
-
4
-
-
0025028257
-
The tera computer system
-
New York, NY, USA. ACM
-
R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The tera computer system. In Proceedings of the 4th international conference on Supercomputing, ICS '90, New York, NY, USA, 1990. ACM.
-
(1990)
Proceedings of the 4th International Conference on Supercomputing, ICS '90
-
-
Alverson, R.1
Callahan, D.2
Cummings, D.3
Koblenz, B.4
Porterfield, A.5
Smith, B.6
-
5
-
-
70349169075
-
Analyzing cuda workloads using a detailed gpu simulator
-
april
-
A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on, april 2009.
-
(2009)
Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on
-
-
Bakhoda, A.1
Yuan, G.2
Fung, W.3
Wong, H.4
Aamodt, T.5
-
6
-
-
70450231944
-
An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness
-
New York, NY, USA. ACM
-
S. Hong and H. Kim. An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness. In Proceedings of the 36th annual international symposium on Computer architecture, ISCA '09, New York, NY, USA, 2009. ACM.
-
(2009)
Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09
-
-
Hong, S.1
Kim, H.2
-
7
-
-
84867310932
-
Autotuning gemm kernels for the fermi gpu
-
PP
-
J. Kurzak, S. Tomov, and J. Dongarra. Autotuning gemm kernels for the fermi gpu. Parallel and Distributed Systems, IEEE Transactions on, PP(99):1, 2012.
-
(2012)
Parallel and Distributed Systems, IEEE Transactions on
, Issue.99
, pp. 1
-
-
Kurzak, J.1
Tomov, S.2
Dongarra, J.3
-
8
-
-
0026137116
-
The cache performance and optimizations of blocked algorithms
-
Apr
-
M. D. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. SIGPLAN Not. , 26(4):63-74, Apr. 1991.
-
(1991)
SIGPLAN Not.
, vol.26
, Issue.4
, pp. 63-74
-
-
Lam, M.D.1
Rothberg, E.E.2
Wolf, M.E.3
-
9
-
-
84945709131
-
Organizing matrices and matrix operations for paged memory systems
-
Mar
-
A. C. McKellar and E. G. Coffman, Jr. Organizing matrices and matrix operations for paged memory systems. Commun. ACM, 12(3):153-165, Mar. 1969.
-
(1969)
Commun. ACM
, vol.12
, Issue.3
, pp. 153-165
-
-
McKellar, A.C.1
Coffman Jr., E.G.2
-
10
-
-
83155184571
-
Grophecy: Gpu performance projection from cpu code skeletons
-
New York, NY, USA. ACM
-
J. Meng, V. A. Morozov, K. Kumaran, V. Vishwanath, and T. D. Uram. Grophecy: Gpu performance projection from cpu code skeletons. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, New York, NY, USA, 2011. ACM.
-
(2011)
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11
-
-
Meng, J.1
Morozov, V.A.2
Kumaran, K.3
Vishwanath, V.4
Uram, T.D.5
-
13
-
-
84876900901
-
-
NVIDIA. Fermi Whitepaper. http://www. nvidia. com/content/PDF/Fermi- white-papers/ NVIDIA-Fermi-Compute-Architecture-Whitepaper. pdf, 2009.
-
(2009)
Fermi Whitepaper
-
-
-
14
-
-
84876891538
-
-
NVIDIA
-
NVIDIA. GTX680 Whitepaper. http://www. geforce. com/Active/en-US/en-US/ pdf/GeForce-GTX-680-Whitepaper-FINAL. pdf, 2012.
-
(2012)
GTX680 Whitepaper
-
-
-
16
-
-
43449094719
-
Program optimization space pruning for a multithreaded gpu
-
New York, NY, USA. ACM
-
S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W. mei W. Hwu. Program optimization space pruning for a multithreaded gpu. In CGO '08: Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, New York, NY, USA, 2008. ACM.
-
(2008)
CGO '08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization
-
-
Ryoo, S.1
Rodrigues, C.I.2
Stone, S.S.3
Baghsorkhi, S.S.4
Ueng, S.-Z.5
Stratton, J.A.6
Mei, W.7
Hwu, W.8
-
17
-
-
84863347222
-
A performance analysis framework for identifying potential benefits in gpgpu applications
-
New York, NY, USA. ACM
-
J. Sim, A. Dasgupta, H. Kim, and R. Vuduc. A performance analysis framework for identifying potential benefits in gpgpu applications. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '12, New York, NY, USA, 2012. ACM.
-
(2012)
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12
-
-
Sim, J.1
Dasgupta, A.2
Kim, H.3
Vuduc, R.4
-
18
-
-
83155160943
-
Fast implementation of dgemm on fermi gpu
-
New York, NY, USA. ACM
-
G. Tan, L. Li, S. Triechle, E. Phillips, Y. Bao, and N. Sun. Fast implementation of dgemm on fermi gpu. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 35:1-35:11, New York, NY, USA, 2011. ACM.
-
(2011)
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11
, pp. 351-3511
-
-
Tan, G.1
Li, L.2
Triechle, S.3
Phillips, E.4
Bao, Y.5
Sun, N.6
-
19
-
-
65949107549
-
Roofline: An insightful visual performance model for multicore architectures
-
Apr
-
S. Williams, A. Waterman, and D. Patterson. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52(4), Apr. 2009.
-
(2009)
Commun. ACM
, vol.52
, Issue.4
-
-
Williams, S.1
Waterman, A.2
Patterson, D.3
|