메뉴 건너뛰기




Volumn 9, Issue 9, 1997, Pages 837-857

Parallel implementation of BLAS: General techniques for level 3 BLAS

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER SOFTWARE; CONCURRENCY CONTROL; INTERFACES (COMPUTER); MATRIX ALGEBRA; PARALLEL ALGORITHMS; PERFORMANCE; SUBROUTINES; TECHNOLOGY;

EID: 0031221523     PISSN: 10403108     EISSN: None     Source Type: Journal    
DOI: 10.1002/(SICI)1096-9128(199709)9:9<837::AID-CPE267>3.0.CO;2-2     Document Type: Article
Times cited : (13)

References (46)
  • 5
    • 0000778168 scopus 로고    scopus 로고
    • Scalability issues affecting the design of a dense linear algebra library
    • Scalability of Parallel Algorithms
    • J. Dongarra, R. van de Geijn and D.Walker, 'Scalability issues affecting the design of a dense linear algebra library', Special Issue on Scalability of Parallel Algorithms, J. Parallel Distrib. Comput., 22, (3), (1994).
    • J. Parallel Distrib. Comput. , vol.22 , Issue.3 SPEC. ISSUE , pp. 1994
    • Dongarra, J.1    Van De Geijn, R.2    Walker, D.3
  • 7
    • 12444284722 scopus 로고
    • Harvard University, Center for Research in Computing Technology, TR-04-92, Jan.
    • W. Lichtenstein and S. L. Johnsson, 'Block-cyclic dense linear algebra', Harvard University, Center for Research in Computing Technology, TR-04-92, Jan. 1992.
    • (1992) Block-cyclic Dense Linear Algebra
    • Lichtenstein, W.1    Johnsson, S.L.2
  • 11
    • 0018515759 scopus 로고
    • Basic linear algebra subprograms for Fortran usage
    • C. L. Lawson, R. J. Hanson, D. R. Kincaid and F. T. Krogh, 'Basic linear algebra subprograms for Fortran usage', TOMS, 5, (3), 308-323 (1979).
    • (1979) Toms , vol.5 , Issue.3 , pp. 308-323
    • Lawson, C.L.1    Hanson, R.J.2    Kincaid, D.R.3    Krogh, F.T.4
  • 12
    • 0023983122 scopus 로고
    • An extended set of FORTRAN basic linear algebra subprograms
    • J. J. Dongarra, J. Du Croz, S. Hammarling and R. J. Hanson, 'An extended set of FORTRAN basic linear algebra subprograms', TOMS, 14, (1), 1-17 (1988).
    • (1988) Toms , vol.14 , Issue.1 , pp. 1-17
    • Dongarra, J.J.1    Du Croz, J.2    Hammarling, S.3    Hanson, R.J.4
  • 13
    • 0025402476 scopus 로고
    • A set of Level 3 basic linear algebra subprograms
    • J. J. Dongarra, J. Du Croz, S. Hammarling and I. Duff, 'A set of Level 3 basic linear algebra subprograms', TOMS, 16, (1), 1-16 (1990).
    • (1990) Toms , vol.16 , Issue.1 , pp. 1-16
    • Dongarra, J.J.1    Du Croz, J.2    Hammarling, S.3    Duff, I.4
  • 15
    • 0031123769 scopus 로고    scopus 로고
    • TR-95-13, Department of Computer Sciences, University of Texas, April
    • R. van de Geijn and J. Watts, 'SUMMA: Scalable universal matrix multiplication algorithm', TR-95-13, Department of Computer Sciences, University of Texas, April 1995. Also: LAPACK Working Note 96, May 1, Concurrency: Pract. Exp., 9, (4), 255-274 (1997).
    • (1995) SUMMA: Scalable Universal Matrix Multiplication Algorithm
    • Van De Geijn, R.1    Watts, J.2
  • 16
    • 0031123769 scopus 로고    scopus 로고
    • Also: LAPACK Working Note 96, May 1
    • R. van de Geijn and J. Watts, 'SUMMA: Scalable universal matrix multiplication algorithm', TR-95-13, Department of Computer Sciences, University of Texas, April 1995. Also: LAPACK Working Note 96, May 1, Concurrency: Pract. Exp., 9, (4), 255-274 (1997).
    • (1997) Concurrency: Pract. Exp. , vol.9 , Issue.4 , pp. 255-274
  • 17
    • 0028530654 scopus 로고
    • PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers
    • J. Choi, J. J. Dongarra and D. W. Walker, 'PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers', Concurrency: Pract. Exp., 6, (7), 543-570 (1994).
    • (1994) Concurrency: Pract. Exp. , vol.6 , Issue.7 , pp. 543-570
    • Choi, J.1    Dongarra, J.J.2    Walker, D.W.3
  • 26
    • 0029312007 scopus 로고
    • A pipelined broadcast for multidimensional meshes
    • J. Watts and R. van de Geijn, 'A pipelined broadcast for multidimensional meshes', Parallel Process. Lett., 5, (2), 281-292 (1995).
    • (1995) Parallel Process. Lett. , vol.5 , Issue.2 , pp. 281-292
    • Watts, J.1    Van De Geijn, R.2
  • 29
    • 0000262001 scopus 로고
    • On parallelizable eigensolvers
    • L. Auslander and A.Tsao, On parallelizable eigensolvers', Adv. Appl. Math., 13, 253-261, (1992).
    • (1992) Adv. Appl. Math. , vol.13 , pp. 253-261
    • Auslander, L.1    Tsao, A.2
  • 30
    • 0001175581 scopus 로고
    • Design of a parallel nonsymmetric eigenroutine toolbox, Part I
    • R. Sincovec, D. Keyes, M. Leuze, L. Petzold and D. Reed (Eds.), SIAM Publications, Philadelphia, PA
    • Z. Bai and J. Demmel, 'Design of a parallel nonsymmetric eigenroutine toolbox, Part I', Parallel Processing for Scientific Computing, R. Sincovec, D. Keyes, M. Leuze, L. Petzold and D. Reed (Eds.), SIAM Publications, Philadelphia, PA, 1993, pp. 391-398.
    • (1993) Parallel Processing for Scientific Computing , pp. 391-398
    • Bai, Z.1    Demmel, J.2
  • 32
    • 0039122444 scopus 로고
    • A parallelizable eigensolver for real diagonalizable matrices with real eigenvalues
    • Supercomputing Research Center
    • S. Lederman, A. Tsao and T. Turnbull, 'A parallelizable eigensolver for real diagonalizable matrices with real eigenvalues', Technical Report TR-91-042, Supercomputing Research Center, 1991.
    • (1991) Technical Report TR-91-042
    • Lederman, S.1    Tsao, A.2    Turnbull, T.3
  • 34
    • 85033305549 scopus 로고    scopus 로고
    • A high performance parallel strassen implementation
    • to be published
    • B. Grayson and R. van de Geijn, 'A high performance parallel strassen implementation'. Parallel Process. Lett., to be published.
    • Parallel Process. Lett.
    • Grayson, B.1    Van De Geijn, R.2
  • 35
    • 0028545949 scopus 로고
    • A high performance matrix multiplication algorithm on a distributed-memory parallel computer, using overlapped communication
    • R. C. Agarwal, F. G. Gustavson and M. Zubair, 'A high performance matrix multiplication algorithm on a distributed-memory parallel computer, using overlapped communication', IBM J. Res. Dev., 673-681 (1994).
    • (1994) IBM J. Res. Dev. , pp. 673-681
    • Agarwal, R.C.1    Gustavson, F.G.2    Zubair, M.3
  • 40
    • 0023288009 scopus 로고
    • Matrix algorithms on a hypercube I: Matrix multiplication
    • G. Fox, S. Otto and A. Hey, 'Matrix algorithms on a hypercube I: Matrix multiplication', Parallel Comput., 3, 17-31 (1987).
    • (1987) Parallel Comput. , vol.3 , pp. 17-31
    • Fox S Otto, G.1    Hey, A.2
  • 43
    • 12444336579 scopus 로고
    • Level 2 and 3 BLAS routines for the IBM 3090 VF/400: Implementation and experiences
    • Information Processing, University of Umeå, S-901 87 Umeå, Sweden
    • B. Kågström and P. Ling, 'Level 2 and 3 BLAS routines for the IBM 3090 VF/400: Implementation and experiences', Technical Report UMINF-154.88, Information Processing, University of Umeå, S-901 87 Umeå, Sweden, 1988.
    • (1988) Technical Report UMINF-154.88
    • Kågström, B.1    Ling, P.2
  • 44
    • 85033287874 scopus 로고    scopus 로고
    • Implementing matrix-vector multiplication and conjugate gradient algorithms on distributed memory multicomputers
    • J. G. Lewis and R. A. van de Geijn, 'Implementing matrix-vector multiplication and conjugate gradient algorithms on distributed memory multicomputers', Supercomputing '93.
    • Supercomputing '93
    • Lewis, J.G.1    Van De Geijn, R.A.2
  • 46
    • 0026973156 scopus 로고
    • A matrix product algorithm and its comparative performance on hypercubes
    • Q. Stout and M. Wolfe (eds.), IEEE Press, Los Alamitos, CA
    • C. Lin and L. Snyder, 'A matrix product algorithm and its comparative performance on hypercubes', in Proceedings of Scalable High Performance Computing Conference, Q. Stout and M. Wolfe (eds.), IEEE Press, Los Alamitos, CA, 1992, pp. 190-193.
    • (1992) Proceedings of Scalable High Performance Computing Conference , pp. 190-193
    • Lin, C.1    Snyder, L.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.