-
2
-
-
0003666392
-
LAPACK: A portable linear algebra library for high-performance computers
-
May
-
E. Anderson, Z. Bai, C. Bischof, J. W. Demmel, J. J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. C. Sorensen. LAPACK: A portable linear algebra library for high-performance computers. Technical Report 20, LAPACK Working Note, May 1990.
-
(1990)
Technical Report 20, LAPACK Working Note
-
-
Anderson, E.1
Bai, Z.2
Bischof, C.3
Demmel, J.W.4
Dongarra, J.J.5
Croz, J.D.6
Greenbaum, A.7
Hammarling, S.8
McKenney, A.9
Sorensen, D.C.10
-
3
-
-
20744452904
-
Self-adapting linear algebra algorithms and software
-
J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R. Whaley, and K. Yelick. Self-adapting linear algebra algorithms and software. In Proceedings of the IEEE, volume 93, pages 293-312, 2005.
-
(2005)
Proceedings of the IEEE
, vol.93
, pp. 293-312
-
-
Demmel, J.1
Dongarra, J.2
Eijkhout, V.3
Fuentes, E.4
Petitet, A.5
Vuduc, R.6
Whaley, R.7
Yelick, K.8
-
5
-
-
44249094647
-
Anatomy of high-performance matrix multiplication
-
34:12:1-12:25, May
-
K. Goto and R. A. v. d. Geijn. Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw., 34:12:1-12:25, May 2008.
-
(2008)
ACM Trans. Math. Softw.
-
-
Goto, K.1
Geijn, R.A.V.D.2
-
7
-
-
68849128792
-
A note on auto-tuning gemm for gpus
-
Berlin, Heidelberg, Springer-Verlag
-
Y. Li, J. Dongarra, and S. Tomov. A note on auto-tuning gemm for gpus. In Proceedings of the 9th International Conference on Computational Science: Part I, ICCS'09, pages 884-892, Berlin, Heidelberg, 2009. Springer-Verlag.
-
(2009)
Proceedings of the 9th International Conference on Computational Science: Part I, ICCS'09
, pp. 884-892
-
-
Li, Y.1
Dongarra, J.2
Tomov, S.3
-
8
-
-
81555213505
-
A fast gemm implementation on the cypress gpu
-
March
-
N. Nakasato. A fast gemm implementation on the cypress gpu. SIGMETRICS Perform. Eval. Rev., 38:50-55, March 2011.
-
(2011)
Sigmetrics Perform. Eval. Rev.
, vol.38
, pp. 50-55
-
-
Nakasato, N.1
-
10
-
-
84886934561
-
-
NVIDIA. Cuda Community Showcase. http://www.nvidia.com/object/ cudaappsflashnew.html.
-
Cuda Community Showcase
-
-
-
13
-
-
79959466764
-
Optimization principles and application performance evaluation of a multithreaded gpu using cuda
-
New York, NY, USA, ACM
-
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. W. Hwu. Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, PPoPP'08, pages 73-82, New York, NY, USA, 2008. ACM.
-
(2008)
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'08
, pp. 73-82
-
-
Ryoo, S.1
Rodrigues, C.I.2
Baghsorkhi, S.S.3
Stone, S.S.4
Kirk, D.B.5
Hwu, W.-M.W.6
-
14
-
-
43449094719
-
Program optimization space pruning for a multithreaded GPU
-
DOI 10.1145/1356058.1356084, Proceedings of the 2008 CGO - Sixth International Symposium on Code Generation and Optimization
-
S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W.-m. W. Hwu. Program optimization space pruning for a multithreaded gpu. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, CGO'08, pages 195-204, New York, NY, USA, 2008. ACM. (Pubitemid 351667266)
-
(2008)
Proceedings of the 2008 CGO - Sixth International Symposium on Code Generation and Optimization
, pp. 195-204
-
-
Ryoo, S.1
Rodrigues, C.I.2
Stone, S.S.3
Baghsorkhi, S.S.4
Ueng, S.-Z.5
Stratton, J.A.6
Hwu, W.-M.W.7
-
15
-
-
70350771131
-
Benchmarking gpus to tune dense linear algebra
-
pages 31:1-31:11, Piscataway, NJ, USA, IEEE Press
-
V. Volkov and J. W. Demmel. Benchmarking gpus to tune dense linear algebra. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC'08, pages 31:1-31:11, Piscataway, NJ, USA, 2008. IEEE Press.
-
(2008)
Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC'08
-
-
Volkov, V.1
Demmel, J.W.2
-
16
-
-
77952579552
-
Demystifying gpu microarchitecture through microbenchmarking
-
H. Wong, M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos. Demystifying gpu microarchitecture through microbenchmarking. In 2010 IEEE International Symposium on Performance Analysis of Systems & Software, ISPASS'10, pages 235-246, 2010.
-
(2010)
2010 IEEE International Symposium on Performance Analysis of Systems & Software, ISPASS'10
, pp. 235-246
-
-
Wong, H.1
Papadopoulou, M.2
Sadooghi-Alvandi, M.3
Moshovos, A.4
-
17
-
-
20744459570
-
Is search really necessary to generate high-performance blas?
-
K. Yotov, X. Li, G. Ren, M. Garzaran, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance blas? In Proceedings of the IEEE, volume 93, pages 358-386, 2005.
-
(2005)
Proceedings of the IEEE
, vol.93
, pp. 358-386
-
-
Yotov, K.1
Li, X.2
Ren, G.3
Garzaran, M.4
Padua, D.5
Pingali, K.6
Stodghill, P.7
|