메뉴 건너뛰기




Volumn 35, Issue 2, 2005, Pages 101-121

Minimizing development and maintenance costs in supporting persistently optimized BLAS

Author keywords

ATLAS; BLAS; Kernel optimization; Linear algebra; Recursive optimization

Indexed keywords

ALGORITHMS; COMPUTER ARCHITECTURE; COMPUTER OPERATING SYSTEMS; COMPUTER SOFTWARE; CONSTRAINT THEORY; COSTS; MATRIX ALGEBRA; OPTIMIZATION; PHOTOGRAPHY; PROBLEM SOLVING;

EID: 13244279577     PISSN: 00380644     EISSN: None     Source Type: Journal    
DOI: 10.1002/spe.626     Document Type: Article
Times cited : (188)

References (32)
  • 1
    • 0042014175 scopus 로고
    • A proposal for standard linear algebra subprograms
    • Hanson R, Krogh F, Lawson C. A proposal for standard linear algebra subprograms. ACM SIGNUM Newsletter 1973; 8(16).
    • (1973) ACM SIGNUM Newsletter , vol.8 , Issue.16
    • Hanson, R.1    Krogh, F.2    Lawson, C.3
  • 3
    • 0023982822 scopus 로고
    • Algorithm 656: An extended set of basic linear algebra subprograms: Model implementation and test programs
    • Dongarra J, Du Croz J, Hammarling S, Hanson R. Algorithm 656: An extended set of basic linear algebra subprograms: Model implementation and test programs. ACM Transactions on Mathematical Software 1988; 14(1): 18-32.
    • (1988) ACM Transactions on Mathematical Software , vol.14 , Issue.1 , pp. 18-32
    • Dongarra, J.1    Du Croz, J.2    Hammarling, S.3    Hanson, R.4
  • 6
    • 0003418094 scopus 로고    scopus 로고
    • Automatically tuned linear algebra software
    • University of Tennessee, December
    • Whaley RC, Dongarra J. Automatically tuned linear algebra software. Technical Report UT-CS-97-366, University of Tennessee, December 1997. Available at: http://www.netlib.org/lapack/lawns/lawn131.ps.
    • (1997) Technical Report , vol.UT-CS-97-366
    • Whaley, R.C.1    Dongarra, J.2
  • 9
    • 0343462141 scopus 로고    scopus 로고
    • Automated empirical optimization of software and the ATLAS project
    • Whaley RC, Petitet A, Dongarra JJ. Automated empirical optimization of software and the ATLAS project. Parallel Computing 2001; 27(1-2):3-35. Also available as University of Tennessee LAPACK Working Note #147, UT-CS-00-448, 2000 (http://www.netlib.org/lapack/lawns/lawn147.ps).
    • (2001) Parallel Computing , vol.27 , Issue.1-2 , pp. 3-35
    • Whaley, R.C.1    Petitet, A.2    Dongarra, J.J.3
  • 10
    • 0343462141 scopus 로고    scopus 로고
    • UT-CS-00-448
    • Whaley RC, Petitet A, Dongarra JJ. Automated empirical optimization of software and the ATLAS project. Parallel Computing 2001; 27(1-2):3-35. Also available as University of Tennessee LAPACK Working Note #147, UT-CS-00-448, 2000 (http://www.netlib.org/lapack/lawns/lawn147.ps).
    • (2000) University of Tennessee LAPACK Working Note #147 , vol.147
  • 13
    • 0003533835 scopus 로고    scopus 로고
    • The fastest fourier transform in the West
    • Massachusetts Institute of Technology
    • Frigo M, Johnson SG. The fastest Fourier transform in the West. Technical Report MIT-LCS-TR-728, Massachusetts Institute of Technology, 1997.
    • (1997) Technical Report , vol.MIT-LCS-TR-728
    • Frigo, M.1    Johnson, S.G.2
  • 17
    • 84901913528 scopus 로고    scopus 로고
    • New generalized data structures for matrices lead to a variety of high performance algorithms
    • Boisvert R and Tang P (eds.), August
    • Gustavson F. New generalized data structures for matrices lead to a variety of high performance algorithms. The Architectures for Scientific Software (IFIP Conference Proceedings, vol. 188), Boisvert R and Tang P (eds.), August 2001; 211-234.
    • (2001) The Architectures for Scientific Software (IFIP Conference Proceedings) , vol.188 , pp. 211-234
    • Gustavson, F.1
  • 18
    • 0028743437 scopus 로고
    • Compiler transformations for high-performance computing
    • Bacon DF, Graham SL, Sharp OJ. Compiler transformations for high-performance computing. ACM Computing Survey 1994;26(4):345-420.
    • (1994) ACM Computing Survey , vol.26 , Issue.4 , pp. 345-420
    • Bacon, D.F.1    Graham, S.L.2    Sharp, O.J.3
  • 19
    • 1842832833 scopus 로고    scopus 로고
    • Recursive blocked algorithms and hybrid data structures for dense matrix library software
    • Elmroth E, Gustavson F, Jonsson I, Kagstrom B. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review 2004; 46(1):3-45.
    • (2004) SIAM Review , vol.46 , Issue.1 , pp. 3-45
    • Elmroth, E.1    Gustavson, F.2    Jonsson, I.3    Kagstrom, B.4
  • 21
    • 0040831411 scopus 로고
    • GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark
    • Department of Computing Science, Umeå University
    • Kågström B, Ling P, van Loan C. GEMM-based Level 3 BLAS: High-performance model implementations and performance evaluation benchmark. Technical Report UMINF 95-18, Department of Computing Science, Umeå University, 1995.
    • (1995) Technical Report , vol.UMINF 95-18
    • Kågström, B.1    Ling, P.2    Van Loan, C.3
  • 22
    • 0032155271 scopus 로고    scopus 로고
    • GEMM-based level 3 BLAS: High performance model implementations and performance evaluation benchmark
    • Kågström B, Ling P, van Loan C. GEMM-based Level 3 BLAS: High performance model implementations and performance evaluation benchmark. ACM Transactions on Mathematical Software 1998; 24(3):268-302.
    • (1998) ACM Transactions on Mathematical Software , vol.24 , Issue.3 , pp. 268-302
    • Kågström, B.1    Ling, P.2    Van Loan, C.3
  • 23
    • 0032155271 scopus 로고    scopus 로고
    • GEMM-based level 3 BLAS: High performance model implementations and performance evaluation benchmark
    • Kågström B, Ling P, van Loan C. GEMM-based Level 3 BLAS: High performance model implementations and performance evaluation benchmark. ACM Transactions on Mathematical Software 1998; 24(3):268-302.
    • (1998) ACM Transactions on Mathematical Software , vol.24 , Issue.3 , pp. 268-302
    • Kågström, B.1    Ling, P.2    Van Loan, C.3
  • 24
    • 0028443077 scopus 로고
    • A parallel block implementation of Level 3 BLAS for MIMD vector processors
    • Dayde M, Duff I, Petitet A. A parallel block implementation of Level 3 BLAS for MIMD vector processors. ACM Transactions on Mathematical Software 1994; 20(2): 178-193.
    • (1994) ACM Transactions on Mathematical Software , vol.20 , Issue.2 , pp. 178-193
    • Dayde, M.1    Duff, I.2    Petitet, A.3
  • 26
    • 84947907655 scopus 로고    scopus 로고
    • Superscalar GEMM-based level 3 BLAS - The on-going evolution of a portable and high-performance library
    • Kågström B, Dongarra J, Elmroth E and Waśniewski J (eds.), June
    • Gustavson F, Henriksson A, Jonsson I, Kågström B, Ling P. Superscalar GEMM-based Level 3 BLAS - the on-going evolution of a portable and high-performance library. Applied Parallel Computing, PARA'98 (Lecture Notes in Computer Science, vol. 1541), Kågström B, Dongarra J, Elmroth E and Waśniewski J (eds.), June 1998; 207-215.
    • (1998) Applied Parallel Computing, PARA'98 (Lecture Notes in Computer Science) , vol.1541 , pp. 207-215
    • Gustavson, F.1    Henriksson, A.2    Jonsson, I.3    Kågström, B.4    Ling, P.5
  • 27
    • 0031496750 scopus 로고    scopus 로고
    • Locality of reference in lu decomposition with partial pivoting
    • Toledo S. Locality of reference in lu decomposition with partial pivoting. SIAM Journal on Matrix Analysis and Applications 1997; 18(4): 1065-1081.
    • (1997) SIAM Journal on Matrix Analysis and Applications , vol.18 , Issue.4 , pp. 1065-1081
    • Toledo, S.1
  • 28
    • 0031273280 scopus 로고    scopus 로고
    • Recursion leads to automatic variable blocking for dense linear-algebra algorithms
    • Gustavson F. Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM Journal of Research and Development 1997; 41(6):737-755.
    • (1997) IBM Journal of Research and Development , vol.41 , Issue.6 , pp. 737-755
    • Gustavson, F.1
  • 29
    • 0039637901 scopus 로고    scopus 로고
    • A recursive formulation of cholesky factorization of a matrix in packed storage
    • LAPACK Working Note No. 146, University of Tennessee
    • Andersen BS, Gustavson FG, Wasniewski J. A recursive formulation of cholesky factorization of a matrix in packed storage. Technical Report UT CS-00-448, LAPACK Working Note No. 146, University of Tennessee, 2000.
    • (2000) Technical Report , vol.UT CS-00-448
    • Andersen, B.S.1    Gustavson, F.G.2    Wasniewski, J.3
  • 30
    • 0034224207 scopus 로고    scopus 로고
    • Applying recursion to serial and parallel qr factorization leads to better performance
    • Elmroth E, Gustavson F. Applying recursion to serial and parallel qr factorization leads to better performance. IBM Journal of Research and Development 2000; 44(4):605-624.
    • (2000) IBM Journal of Research and Development , vol.44 , Issue.4 , pp. 605-624
    • Elmroth, E.1    Gustavson, F.2
  • 31
    • 13244297349 scopus 로고    scopus 로고
    • [September]
    • Inversion problem with TRSM. http://www.cs.utk.edu/~rwhaley/ATLAS/trsm_prob.html [September 2003].
    • (2003) Inversion Problem with TRSM


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.