메뉴 건너뛰기




Volumn 24, Issue 3, 1998, Pages 268-302

GEMM-Based Level 3 BLAS: High-Performance Model Implementations and Performance Evaluation Benchmark

Author keywords

Language classifications Fortran 77; Mathematical Software benchmarking; Numerical Algorithms and Problems computation on matrices; Numerical Linear Algebra linear systems (direct and iterative methods)

Indexed keywords

BENCHMARKING; COMPUTATIONAL METHODS; COMPUTER SOFTWARE PORTABILITY; EFFICIENCY; FORTRAN (PROGRAMMING LANGUAGE); MATRIX ALGEBRA;

EID: 0032155271     PISSN: 00983500     EISSN: None     Source Type: Journal    
DOI: 10.1145/292395.292412     Document Type: Article
Times cited : (143)

References (27)
  • 1
    • 0028513316 scopus 로고
    • Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms
    • Sept.
    • AGARWAL, R., GUSTAVSON, F., AND ZUBAIR, Z. 1994a. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM J. Res. Develop. 38, 5 (Sept.), 563-576.
    • (1994) IBM J. Res. Develop. , vol.38 , Issue.5 , pp. 563-576
    • Agarwal, R.1    Gustavson, F.2    Zubair, Z.3
  • 2
    • 0028427170 scopus 로고
    • Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetching
    • May
    • AGARWAL, R., GUSTAVSON, F., AND ZUBAIR, Z. 1994b. Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetching. IBM J. Res. Develop. 38, 3 (May), 265-275.
    • (1994) IBM J. Res. Develop. , vol.38 , Issue.3 , pp. 265-275
    • Agarwal, R.1    Gustavson, F.2    Zubair, Z.3
  • 4
    • 0031223129 scopus 로고    scopus 로고
    • Compiler blockability of dense matrix factorizations
    • Sept.
    • CARR, S. AND LEHOUCQ, R. 1997. Compiler blockability of dense matrix factorizations. ACM Trans. Math. Softw. 23, 3 (Sept.), 336-361.
    • (1997) ACM Trans. Math. Softw. , vol.23 , Issue.3 , pp. 336-361
    • Carr, S.1    Lehoucq, R.2
  • 5
    • 0346864565 scopus 로고
    • Design issues and the performance of level 1 and level 2 kernels on Intel i860-based platforms
    • Department of Computing Science, Umeå University, Umeå, Sweden
    • DACKLAND, K. 1995. Design issues and the performance of level 1 and level 2 kernels on Intel i860-based platforms. Report UMINF-95.xx, Department of Computing Science, Umeå University, Umeå, Sweden.
    • (1995) Report UMINF-95.xx
    • Dackland, K.1
  • 6
    • 0028443077 scopus 로고
    • A parallel block implementation of level-3 BLAS for MIMD vector processors
    • June
    • DAYDÉ, M. J., DUFF, I. S., AND PETITET, A. 1994. A parallel block implementation of level-3 BLAS for MIMD vector processors. ACM Trans. Math. Softw. 20, 2 (June), 178-193.
    • (1994) ACM Trans. Math. Softw. , vol.20 , Issue.2 , pp. 178-193
    • Daydé, M.J.1    Duff, I.S.2    Petitet, A.3
  • 9
    • 0025401417 scopus 로고
    • Algorithm 679: A set of level 3 Basic Linear Algebra Subprograms: Model implementation and test programs
    • Mar.
    • DONGARRA, J., DUCROZ, J., DUFF, I., AND HAMMARLING, S. 1990b. Algorithm 679: A set of level 3 Basic Linear Algebra Subprograms: Model implementation and test programs. ACM Trans. Math. Software 16, 1 (Mar.), 18-28.
    • (1990) ACM Trans. Math. Software , vol.16 , Issue.1 , pp. 18-28
    • Dongarra, J.1    DuCroz, J.2    Duff, I.3    Hammarling, S.4
  • 10
    • 0040354150 scopus 로고
    • The IBM RISC System 6000 and linear algebra operations
    • DONGARRA, J., MAYES, P., AND RADICATI DI BROZOLO, G. 1991. The IBM RISC System 6000 and linear algebra operations. Supercomput. 8, 4, 15-30.
    • (1991) Supercomput. , vol.8 , Issue.4 , pp. 15-30
    • Dongarra, J.1    Mayes, P.2    Radicati Di Brozolo, G.3
  • 11
    • 0002663082 scopus 로고
    • GEMMV: A portable level 3 BLAS Winograd variant of Strassen's matrix-matrix multiply algorithm
    • DOUGLAS, C., HEROUX, M., SLISHMAN, G., AND SMITH, R. 1994. GEMMV: A portable level 3 BLAS Winograd variant of Strassen's matrix-matrix multiply algorithm. J. Comput. Phys. 110, 1-10.
    • (1994) J. Comput. Phys. , vol.110 , pp. 1-10
    • Douglas, C.1    Heroux, M.2    Slishman, G.3    Smith, R.4
  • 12
    • 84972622535 scopus 로고
    • Impact of hierarchical memory systems on linear algebra algorithm design
    • GALLIVAN, K., JALBY, W., MEIER, U., AND SAMEH, A. 1988. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2, 12-48.
    • (1988) Int. J. Supercomput. Appl. , vol.2 , pp. 12-48
    • Gallivan, K.1    Jalby, W.2    Meier, U.3    Sameh, A.4
  • 14
    • 0348125138 scopus 로고
    • Working Note (April), Department of Mathematics, University of Manchester, Manchester, UK
    • GREEN, M. 1994. High performance level 3 BLAS. A KSR implementation. Working Note (April), Department of Mathematics, University of Manchester, Manchester, UK.
    • (1994) High Performance Level 3 BLAS. A KSR Implementation
    • Green, M.1
  • 15
    • 0025637437 scopus 로고
    • Exploiting fast matrix multiplication within the level 3 BLAS
    • HIGHAM, N. 1990. Exploiting fast matrix multiplication within the level 3 BLAS. ACM Trans. Math. Softw. 16, 4, 352-368.
    • (1990) ACM Trans. Math. Softw. , vol.16 , Issue.4 , pp. 352-368
    • Higham, N.1
  • 17
    • 0040000454 scopus 로고
    • Technical Report. 312936-001 (Oct.), Intel Supercomputer Division. Beaverton, Ore.
    • INTEL. 1993. Paragon Basic Math Library performance report. Technical Report. 312936-001 (Oct.), Intel Supercomputer Division. Beaverton, Ore.
    • (1993) Paragon Basic Math Library Performance Report
  • 18
    • 10844292223 scopus 로고
    • Technical Report CTC91TR47 (Dec.), Department of Computer Science, Cornell University
    • KÅGSTRÖM, B. AND VAN LOAN, C. 1989. GEMM-based level 3 BLAS. Technical Report CTC91TR47 (Dec.), Department of Computer Science, Cornell University.
    • (1989) GEMM-based Level 3 BLAS
    • Kågström, B.1    Van Loan, C.2
  • 19
    • 0346234145 scopus 로고
    • High performance GEMM-based level 3 BLAS: Sample routines for double precision real data
    • (Amsterdam, 1991). North-Holland
    • KÅGSTRÖM, B., LING, P., AND VAN LOAN, C. 1991. High performance GEMM-based level 3 BLAS: Sample routines for double precision real data. In High Performance Computing II (Amsterdam, 1991). North-Holland, 269-281.
    • (1991) High Performance Computing , vol.2 , pp. 269-281
    • Kågström, B.1    Ling, P.2    Van Loan, C.3
  • 20
    • 10844275231 scopus 로고
    • Portable high performance GEMM-based level 3 BLAS
    • (Philadelphia, 1993). SIAM Publications
    • KÅGSTRÖM, B., LING, P., AND VAN LOAN, C. 1993. Portable high performance GEMM-based level 3 BLAS. In Parallel Processing for Scientific Computing (Philadelphia, 1993). SIAM Publications, 339-346.
    • (1993) Parallel Processing for Scientific Computing , pp. 339-346
    • Kågström, B.1    Ling, P.2    Van Loan, C.3
  • 22
    • 0032155342 scopus 로고    scopus 로고
    • Algorithm 784: GEMM-based level 3 BLAS: Portability and optimization issues
    • This issue
    • KÅGSTRÖM, B., LING, P., AND VAN LOAN, C. 1998. Algorithm 784: GEMM-based level 3 BLAS: Portability and optimization issues. ACM Trans. Math. Software. This issue.
    • (1998) ACM Trans. Math. Software
    • Kågström, B.1    Ling, P.2    Van Loan, C.3
  • 24
    • 0027656965 scopus 로고
    • A set of high performance level-3 BLAS structured and tuned for the IBM 3090 VF and implemented in Fortran 77
    • Sept.
    • LING, P. 1993. A set of high performance level-3 BLAS structured and tuned for the IBM 3090 VF and implemented in Fortran 77. J. Supercomput. 7, 3 (Sept.), 323-355.
    • (1993) J. Supercomput. , vol.7 , Issue.3 , pp. 323-355
    • Ling, P.1
  • 25
    • 0347495130 scopus 로고
    • Implementation of the level 2 and 3 BLAS on the CRAY Y-MP and the CRAY-2
    • Feb.
    • SHEIK, Q., PHUONG, V., CHAO, Y., AND MERCHANT, M. 1992. Implementation of the level 2 and 3 BLAS on the CRAY Y-MP and the CRAY-2. J. Supercomput. 5, 4 (Feb.), 291-305.
    • (1992) J. Supercomput. , vol.5 , Issue.4 , pp. 291-305
    • Sheik, Q.1    Phuong, V.2    Chao, Y.3    Merchant, M.4
  • 26
    • 34250487811 scopus 로고
    • Gaussian elimination is not optimal
    • STRASSEN, V. 1969. Gaussian elimination is not optimal. Numer. Math. 13, 354-356.
    • (1969) Numer. Math. , vol.13 , pp. 354-356
    • Strassen, V.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.