-
2
-
-
0018515759
-
Basic linear algebra subprograms for Fortran usage
-
Lawson C, Hanson R, Kincaid D, Krogh F. Basic linear algebra subprograms for Fortran usage. ACM Transactions on Mathematical Software 1979; 5(3):308-323.
-
(1979)
ACM Transactions on Mathematical Software
, vol.5
, Issue.3
, pp. 308-323
-
-
Lawson, C.1
Hanson, R.2
Kincaid, D.3
Krogh, F.4
-
3
-
-
0023982822
-
Algorithm 656: An extended set of basic linear algebra subprograms: Model implementation and test programs
-
Dongarra J, Du Croz J, Hammarling S, Hanson R. Algorithm 656: An extended set of basic linear algebra subprograms: Model implementation and test programs. ACM Transactions on Mathematical Software 1988; 14(1): 18-32.
-
(1988)
ACM Transactions on Mathematical Software
, vol.14
, Issue.1
, pp. 18-32
-
-
Dongarra, J.1
Du Croz, J.2
Hammarling, S.3
Hanson, R.4
-
4
-
-
0023983122
-
An extended set of FORTRAN basic linear algebra subprograms
-
Dongarra J, Du Croz J, Hammarling S, Hanson R. An extended set of FORTRAN basic linear algebra subprograms. ACM Transactions on Mathematical Software 1988; 14(1):1-17.
-
(1988)
ACM Transactions on Mathematical Software
, vol.14
, Issue.1
, pp. 1-17
-
-
Dongarra, J.1
Du Croz, J.2
Hammarling, S.3
Hanson, R.4
-
5
-
-
0025402476
-
A set of level 3 basic linear algebra subprograms
-
Dongarra J, Du Croz J, Duff I, Hammarling S. A set of Level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software 1990; 16(1):1-17.
-
(1990)
ACM Transactions on Mathematical Software
, vol.16
, Issue.1
, pp. 1-17
-
-
Dongarra, J.1
Du Croz, J.2
Duff, I.3
Hammarling, S.4
-
6
-
-
0003418094
-
Automatically tuned linear algebra software
-
University of Tennessee, December
-
Whaley RC, Dongarra J. Automatically tuned linear algebra software. Technical Report UT-CS-97-366, University of Tennessee, December 1997. Available at: http://www.netlib.org/lapack/lawns/lawn131.ps.
-
(1997)
Technical Report
, vol.UT-CS-97-366
-
-
Whaley, R.C.1
Dongarra, J.2
-
9
-
-
0343462141
-
Automated empirical optimization of software and the ATLAS project
-
Whaley RC, Petitet A, Dongarra JJ. Automated empirical optimization of software and the ATLAS project. Parallel Computing 2001; 27(1-2):3-35. Also available as University of Tennessee LAPACK Working Note #147, UT-CS-00-448, 2000 (http://www.netlib.org/lapack/lawns/lawn147.ps).
-
(2001)
Parallel Computing
, vol.27
, Issue.1-2
, pp. 3-35
-
-
Whaley, R.C.1
Petitet, A.2
Dongarra, J.J.3
-
10
-
-
0343462141
-
-
UT-CS-00-448
-
Whaley RC, Petitet A, Dongarra JJ. Automated empirical optimization of software and the ATLAS project. Parallel Computing 2001; 27(1-2):3-35. Also available as University of Tennessee LAPACK Working Note #147, UT-CS-00-448, 2000 (http://www.netlib.org/lapack/lawns/lawn147.ps).
-
(2000)
University of Tennessee LAPACK Working Note #147
, vol.147
-
-
-
12
-
-
0030661485
-
Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology
-
Vienna, Austria, July
-
Bilmes J, Asanovic K, Chin C, Demmel J. Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. Proceedings of the ACM SIGARC International Conference on SuperComputing, Vienna, Austria, July 1997.
-
(1997)
Proceedings of the ACM SIGARC International Conference on SuperComputing
-
-
Bilmes, J.1
Asanovic, K.2
Chin, C.3
Demmel, J.4
-
13
-
-
0003533835
-
The fastest fourier transform in the West
-
Massachusetts Institute of Technology
-
Frigo M, Johnson SG. The fastest Fourier transform in the West. Technical Report MIT-LCS-TR-728, Massachusetts Institute of Technology, 1997.
-
(1997)
Technical Report
, vol.MIT-LCS-TR-728
-
-
Frigo, M.1
Johnson, S.G.2
-
14
-
-
0031636309
-
FFTW: An adaptive software architecture for the FFT
-
May
-
Frigo M, Johnson S. FFTW: An adaptive software architecture for the FFT. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, May 1998; 1381-1384.
-
(1998)
Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
, vol.3
, pp. 1381-1384
-
-
Frigo, M.1
Johnson, S.2
-
16
-
-
13244250227
-
Spiral: Automatic implementation of signal processing algorithms
-
MIT Lincoln Laboratories: Boston, MA
-
Moura J, Johnson J, Johnson R, Padua D, Puschel M, Veloso M. Spiral: Automatic implementation of signal processing algorithms. Proceedings of the Conference on High-Performance Embedded Computing. MIT Lincoln Laboratories: Boston, MA, 2000.
-
(2000)
Proceedings of the Conference on High-Performance Embedded Computing
-
-
Moura, J.1
Johnson, J.2
Johnson, R.3
Padua, D.4
Puschel, M.5
Veloso, M.6
-
17
-
-
84901913528
-
New generalized data structures for matrices lead to a variety of high performance algorithms
-
Boisvert R and Tang P (eds.), August
-
Gustavson F. New generalized data structures for matrices lead to a variety of high performance algorithms. The Architectures for Scientific Software (IFIP Conference Proceedings, vol. 188), Boisvert R and Tang P (eds.), August 2001; 211-234.
-
(2001)
The Architectures for Scientific Software (IFIP Conference Proceedings)
, vol.188
, pp. 211-234
-
-
Gustavson, F.1
-
18
-
-
0028743437
-
Compiler transformations for high-performance computing
-
Bacon DF, Graham SL, Sharp OJ. Compiler transformations for high-performance computing. ACM Computing Survey 1994;26(4):345-420.
-
(1994)
ACM Computing Survey
, vol.26
, Issue.4
, pp. 345-420
-
-
Bacon, D.F.1
Graham, S.L.2
Sharp, O.J.3
-
19
-
-
1842832833
-
Recursive blocked algorithms and hybrid data structures for dense matrix library software
-
Elmroth E, Gustavson F, Jonsson I, Kagstrom B. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review 2004; 46(1):3-45.
-
(2004)
SIAM Review
, vol.46
, Issue.1
, pp. 3-45
-
-
Elmroth, E.1
Gustavson, F.2
Jonsson, I.3
Kagstrom, B.4
-
21
-
-
0040831411
-
GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark
-
Department of Computing Science, Umeå University
-
Kågström B, Ling P, van Loan C. GEMM-based Level 3 BLAS: High-performance model implementations and performance evaluation benchmark. Technical Report UMINF 95-18, Department of Computing Science, Umeå University, 1995.
-
(1995)
Technical Report
, vol.UMINF 95-18
-
-
Kågström, B.1
Ling, P.2
Van Loan, C.3
-
22
-
-
0032155271
-
GEMM-based level 3 BLAS: High performance model implementations and performance evaluation benchmark
-
Kågström B, Ling P, van Loan C. GEMM-based Level 3 BLAS: High performance model implementations and performance evaluation benchmark. ACM Transactions on Mathematical Software 1998; 24(3):268-302.
-
(1998)
ACM Transactions on Mathematical Software
, vol.24
, Issue.3
, pp. 268-302
-
-
Kågström, B.1
Ling, P.2
Van Loan, C.3
-
23
-
-
0032155271
-
GEMM-based level 3 BLAS: High performance model implementations and performance evaluation benchmark
-
Kågström B, Ling P, van Loan C. GEMM-based Level 3 BLAS: High performance model implementations and performance evaluation benchmark. ACM Transactions on Mathematical Software 1998; 24(3):268-302.
-
(1998)
ACM Transactions on Mathematical Software
, vol.24
, Issue.3
, pp. 268-302
-
-
Kågström, B.1
Ling, P.2
Van Loan, C.3
-
24
-
-
0028443077
-
A parallel block implementation of Level 3 BLAS for MIMD vector processors
-
Dayde M, Duff I, Petitet A. A parallel block implementation of Level 3 BLAS for MIMD vector processors. ACM Transactions on Mathematical Software 1994; 20(2): 178-193.
-
(1994)
ACM Transactions on Mathematical Software
, vol.20
, Issue.2
, pp. 178-193
-
-
Dayde, M.1
Duff, I.2
Petitet, A.3
-
25
-
-
84947926251
-
Recursive blocked data formats and BLAS for dense linear algebra algorithms
-
Kågström B, Dongarra J, Elmroth E and Waśniewski J (eds.), June
-
Gustavson F, Henriksson A, Jonsson I, Kågström B, Ling P. Recursive blocked data formats and BLAS for dense linear algebra algorithms. Applied Parallel Computing, PARA'98 (Lecture Notes in Computer Science, vol. 1541), Kågström B, Dongarra J, Elmroth E and Waśniewski J (eds.), June 1998; 195-206.
-
(1998)
Applied Parallel Computing, PARA'98 (Lecture Notes in Computer Science)
, vol.1541
, pp. 195-206
-
-
Gustavson, F.1
Henriksson, A.2
Jonsson, I.3
Kågström, B.4
Ling, P.5
-
26
-
-
84947907655
-
Superscalar GEMM-based level 3 BLAS - The on-going evolution of a portable and high-performance library
-
Kågström B, Dongarra J, Elmroth E and Waśniewski J (eds.), June
-
Gustavson F, Henriksson A, Jonsson I, Kågström B, Ling P. Superscalar GEMM-based Level 3 BLAS - the on-going evolution of a portable and high-performance library. Applied Parallel Computing, PARA'98 (Lecture Notes in Computer Science, vol. 1541), Kågström B, Dongarra J, Elmroth E and Waśniewski J (eds.), June 1998; 207-215.
-
(1998)
Applied Parallel Computing, PARA'98 (Lecture Notes in Computer Science)
, vol.1541
, pp. 207-215
-
-
Gustavson, F.1
Henriksson, A.2
Jonsson, I.3
Kågström, B.4
Ling, P.5
-
27
-
-
0031496750
-
Locality of reference in lu decomposition with partial pivoting
-
Toledo S. Locality of reference in lu decomposition with partial pivoting. SIAM Journal on Matrix Analysis and Applications 1997; 18(4): 1065-1081.
-
(1997)
SIAM Journal on Matrix Analysis and Applications
, vol.18
, Issue.4
, pp. 1065-1081
-
-
Toledo, S.1
-
28
-
-
0031273280
-
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
-
Gustavson F. Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM Journal of Research and Development 1997; 41(6):737-755.
-
(1997)
IBM Journal of Research and Development
, vol.41
, Issue.6
, pp. 737-755
-
-
Gustavson, F.1
-
29
-
-
0039637901
-
A recursive formulation of cholesky factorization of a matrix in packed storage
-
LAPACK Working Note No. 146, University of Tennessee
-
Andersen BS, Gustavson FG, Wasniewski J. A recursive formulation of cholesky factorization of a matrix in packed storage. Technical Report UT CS-00-448, LAPACK Working Note No. 146, University of Tennessee, 2000.
-
(2000)
Technical Report
, vol.UT CS-00-448
-
-
Andersen, B.S.1
Gustavson, F.G.2
Wasniewski, J.3
-
30
-
-
0034224207
-
Applying recursion to serial and parallel qr factorization leads to better performance
-
Elmroth E, Gustavson F. Applying recursion to serial and parallel qr factorization leads to better performance. IBM Journal of Research and Development 2000; 44(4):605-624.
-
(2000)
IBM Journal of Research and Development
, vol.44
, Issue.4
, pp. 605-624
-
-
Elmroth, E.1
Gustavson, F.2
-
31
-
-
13244297349
-
-
[September]
-
Inversion problem with TRSM. http://www.cs.utk.edu/~rwhaley/ATLAS/trsm_prob.html [September 2003].
-
(2003)
Inversion Problem with TRSM
-
-
|