SCOPUS 정보 검색 플랫폼

Concurrency Practice and Experience

Volumn 9, Issue 9, 1997, Pages 837-857

Parallel implementation of BLAS: General techniques for level 3 BLAS

(5) Chtchelkanova, Almadena a Gunnels, John a Morrow, Greg a Overfelt, James a Van De Geijn, Robert A a

a Department of Computer Sciences ^* (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER SOFTWARE; CONCURRENCY CONTROL; INTERFACES (COMPUTER); MATRIX ALGEBRA; PARALLEL ALGORITHMS; PERFORMANCE; SUBROUTINES; TECHNOLOGY;

BASIC LINEAR ALGEBRA SUBPROGRAM; INTEL PARAGON SYSTEM; MATRIX MATRIX OPERATION; PARALLEL MATRIX MATRIX MULTIPLICATION;

PARALLEL PROCESSING SYSTEMS;

EID: 0031221523 PISSN: 10403108 EISSN: None Source Type: Journal
DOI: 10.1002/(SICI)1096-9128(199709)9:9<837::AID-CPE267>3.0.CO;2-2 Document Type: Article

Times cited : (13)

References (46)

1
- 0242343480
- LAPACK for distributed memory architectures: Progress report
- SIAM, Philadelphia
- E. Anderson, A. Benzoni, J. Dongarra, S. Moulton, S. Ostrouchov, B. Tourancheau and R. van de Geijn, 'LAPACK for distributed memory architectures: progress report', in Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, SIAM, Philadelphia, 1992, pp. 625-630.
- (1992) Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing , pp. 625-630
- Anderson, E.¹ Benzoni, A.² Dongarra, J.³ Moulton, S.⁴ Ostrouchov, S.⁵ Tourancheau, B.⁶ Van De Geijn, R.⁷

2
- 0002924772
- Scalapack: A scalable linear algebra library for distributed memory concurrent computers
- IEEE Comput. Soc. Press
- J. Choi, J. J. Dongarra, R. Pozo and D. W. Walker, 'Scalapack: A scalable linear algebra library for distributed memory concurrent computers', Proceedings of the Fourth Symposium on the Frontiers of Massively Parallel Computation, IEEE Comput. Soc. Press, 1992, pp. 120-127.
- (1992) Proceedings of the Fourth Symposium on the Frontiers of Massively Parallel Computation , pp. 120-127
- Choi, J.¹ Dongarra, J.J.² Pozo, R.³ Walker, D.W.⁴

3
- 10444267815
- LAPACK for distributed memory architectures: The next generation
- Norfolk, March
- J. Demmel, J. Dongarra, R. van de Geijn and D. Walker, 'LAPACK for distributed memory architectures: The next generation', in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, Norfolk, March 1993.
- (1993) Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing
- Demmel, J.¹ Dongarra, J.² Van De Geijn, R.³ Walker, D.⁴

4
- 12444274390
- A parallel dense linear solve library routine
- Dallas, Oct.
- J. Dongarra and R. van de Geijn, 'A parallel dense linear solve library routine', in Proceedings of the 1992 Intel Supercomputer Users' Group Meeting, Dallas, Oct. 1992.
- (1992) Proceedings of the 1992 Intel Supercomputer Users' Group Meeting
- Dongarra, J.¹ Van De Geijn, R.²

5
- 0000778168
- Scalability issues affecting the design of a dense linear algebra library
- Scalability of Parallel Algorithms
- J. Dongarra, R. van de Geijn and D.Walker, 'Scalability issues affecting the design of a dense linear algebra library', Special Issue on Scalability of Parallel Algorithms, J. Parallel Distrib. Comput., 22, (3), (1994).
- J. Parallel Distrib. Comput. , vol.22 , Issue.3 SPEC. ISSUE , pp. 1994
- Dongarra, J.¹ Van De Geijn, R.² Walker, D.³

6
- 0003506603
- Prentice Hall, Englewood Cliffs, N.J.
- G. C. Fox, M. A. Johnson, G. A. Lyzenga, S. W. Otto, J. K. Salmon and D. W. Walker, Solving Problems on Concurrent Processors, Vol. 1, Prentice Hall, Englewood Cliffs, N.J., 1988.
- (1988) Solving Problems on Concurrent Processors , vol.1
- Fox, G.C.¹ Johnson, M.A.² Lyzenga, G.A.³ Otto, S.W.⁴ Salmon, J.K.⁵ Walker, D.W.⁶

7
- 12444284722
- Harvard University, Center for Research in Computing Technology, TR-04-92, Jan.
- W. Lichtenstein and S. L. Johnsson, 'Block-cyclic dense linear algebra', Harvard University, Center for Research in Computing Technology, TR-04-92, Jan. 1992.
- (1992) Block-cyclic Dense Linear Algebra
- Lichtenstein, W.¹ Johnsson, S.L.²

8
- 0042839461
- TR-91-28, Department of Computer Sciences, University of Texas, Aug.
- R. van de Geijn, 'Massively parallel LINPACK benchmark on the Intel Touchstone DELTA and iPSC/860 systems: Preliminary report', TR-91-28, Department of Computer Sciences, University of Texas, Aug. 1991.
- (1991) Massively Parallel LINPACK Benchmark on the Intel Touchstone DELTA and IPSC/860 Systems: Preliminary Report
- Van De Geijn, R.¹

9
- 0025536635
- Lapack: A portable linear algebra library for high performance computers
- IEEE Press
- E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. DuCroz, A. Greenbaum, S. Hammarling, A. McKenney and D. Sorensen, 'Lapack: A portable linear algebra library for high performance computers', Proceedings of Supercomputing '90, IEEE Press, 1990, pp. 1-10.
- (1990) Proceedings of Supercomputing '90 , pp. 1-10
- Anderson, E.¹ Bai, Z.² Bischof, C.³ Demmel, J.⁴ Dongarra, J.⁵ Ducroz, J.⁶ Greenbaum, A.⁷ Hammarling, S.⁸ McKenney, A.⁹ Sorensen, D.¹⁰

10
- 0003706460
- SIAM, Philadelphia
- E. Anderson, Z. Bai, J. Demmel, J. Dongarra, J. DuCroz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov and D. Sorensen, LAPACK Users' Guide, SIAM, Philadelphia, 1992.
- (1992) LAPACK Users' Guide
- Anderson, E.¹ Bai, Z.² Demmel, J.³ Dongarra, J.⁴ DuCroz, J.⁵ Greenbaum, A.⁶ Hammarling, S.⁷ McKenney, A.⁸ Ostrouchov, S.⁹ Sorensen, D.¹⁰

11
- 0018515759
- Basic linear algebra subprograms for Fortran usage
- C. L. Lawson, R. J. Hanson, D. R. Kincaid and F. T. Krogh, 'Basic linear algebra subprograms for Fortran usage', TOMS, 5, (3), 308-323 (1979).
- (1979) Toms , vol.5 , Issue.3 , pp. 308-323
- Lawson, C.L.¹ Hanson, R.J.² Kincaid, D.R.³ Krogh, F.T.⁴

12
- 0023983122
- An extended set of FORTRAN basic linear algebra subprograms
- J. J. Dongarra, J. Du Croz, S. Hammarling and R. J. Hanson, 'An extended set of FORTRAN basic linear algebra subprograms', TOMS, 14, (1), 1-17 (1988).
- (1988) Toms , vol.14 , Issue.1 , pp. 1-17
- Dongarra, J.J.¹ Du Croz, J.² Hammarling, S.³ Hanson, R.J.⁴

13
- 0025402476
- A set of Level 3 basic linear algebra subprograms
- J. J. Dongarra, J. Du Croz, S. Hammarling and I. Duff, 'A set of Level 3 basic linear algebra subprograms', TOMS, 16, (1), 1-16 (1990).
- (1990) Toms , vol.16 , Issue.1 , pp. 1-16
- Dongarra, J.J.¹ Du Croz, J.² Hammarling, S.³ Duff, I.⁴

14
- 0003978709
- LAPACK Working Note 100, University of Tennessee, CS-95-292, May
- J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. Walker, and R. C. Whaley 'A proposal for a set of parallel basic linear algebra subprograms', LAPACK Working Note 100, University of Tennessee, CS-95-292, May 1995.
- (1995) A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
- Choi, J.¹ Dongarra, J.² Ostrouchov, S.³ Petitet, A.⁴ Walker, D.⁵ Whaley, R.C.⁶

15
- 0031123769
- TR-95-13, Department of Computer Sciences, University of Texas, April
- R. van de Geijn and J. Watts, 'SUMMA: Scalable universal matrix multiplication algorithm', TR-95-13, Department of Computer Sciences, University of Texas, April 1995. Also: LAPACK Working Note 96, May 1, Concurrency: Pract. Exp., 9, (4), 255-274 (1997).
- (1995) SUMMA: Scalable Universal Matrix Multiplication Algorithm
- Van De Geijn, R.¹ Watts, J.²

16
- 0031123769
- Also: LAPACK Working Note 96, May 1
- R. van de Geijn and J. Watts, 'SUMMA: Scalable universal matrix multiplication algorithm', TR-95-13, Department of Computer Sciences, University of Texas, April 1995. Also: LAPACK Working Note 96, May 1, Concurrency: Pract. Exp., 9, (4), 255-274 (1997).
- (1997) Concurrency: Pract. Exp. , vol.9 , Issue.4 , pp. 255-274

17
- 0028530654
- PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers
- J. Choi, J. J. Dongarra and D. W. Walker, 'PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers', Concurrency: Pract. Exp., 6, (7), 543-570 (1994).
- (1994) Concurrency: Pract. Exp. , vol.6 , Issue.7 , pp. 543-570
- Choi, J.¹ Dongarra, J.J.² Walker, D.W.³

18
- 0037970044
- Comparison of scalable parallel matrix multiplication libraries
- Starksville, MS, Oct.
- S. Huss-Lederman, E. Jacobson and A. Tsao, 'Comparison of scalable parallel matrix multiplication libraries', in Proceedings of the Scalable Parallel Libraries Conference, Starksville, MS, Oct. 1993.
- (1993) Proceedings of the Scalable Parallel Libraries Conference
- Huss-Lederman, S.¹ Jacobson, E.² Tsao, A.³

19
- 0028529387
- Matrix multiplication on the Intel Touchstone DELTA
- S. Huss-Lederman, E. Jacobson, A. Tsao and G. Zhang, 'Matrix multiplication on the Intel Touchstone DELTA', Concurrency: Pract. Exp., 6, (7), 571-594 (1994).
- (1994) Concurrency: Pract. Exp. , vol.6 , Issue.7 , pp. 571-594
- Huss-Lederman, S.¹ Jacobson, E.² Tsao, A.³ Zhang, G.⁴

20
- 12444266489
- Gemm-based level-3 bias, 1991
- January
- B. Kågström and C. F. Van Loan, Gemm-based level-3 bias, 1991. Theory Center Technical Report, January 1991.
- (1991) Theory Center Technical Report
- Kågström, B.¹ Van Loan, C.F.²

21
- 14744301600
- Fast collective communication libraries, please
- P. Mitra, D. Payne, L. Shuler, R. van de Geijn and J. Watts, 'Fast collective communication libraries, please', in the Proceedings of the Intel Supercomputing Users' Group Meeting 1995.
- Proceedings of the Intel Supercomputing Users' Group Meeting 1995
- Mitra, P.¹ Payne, D.² Shuler, L.³ Van De Geijn, R.⁴ Watts, J.⁵

22
- 12444294511
- IBM T.J. Watson Research Center
- R. C. Agarwal, F. G. Gustavson, S. M. Balle, M. Joshi and P. Palkar, 'A high performance matrix multiplication algorithm for MPPs', IBM T.J. Watson Research Center, 1995.
- (1995) A High Performance Matrix Multiplication Algorithm for MPPs
- Agarwal, R.C.¹ Gustavson, F.G.² Balle, S.M.³ Joshi, M.⁴ Palkar, P.⁵

23
- 85033307657
- TR-96-09, Department of Computer Sciences, University of Texas, May
- A. Chtchelkanova, C. Edwards, J. Gunnels, G. Morrow, J. Overfelt and R A. van de Geijn, 'Towards usable and lean parallel linear algebra libraries', TR-96-09, Department of Computer Sciences, University of Texas, May 1996.
- (1996) Towards Usable and Lean Parallel Linear Algebra Libraries
- Chtchelkanova, A.¹ Edwards, C.² Gunnels, J.³ Morrow, G.⁴ Overfelt, J.⁵ Van De Geijn, R.A.⁶

24
- 0022909361
- Distributed routing algorithms for broadcasting and personalized communication in hypercubes
- IEEE
- C.-T. Ho and S. L. Johnsson, 'Distributed routing algorithms for broadcasting and personalized communication in hypercubes', in Proceedings of the 1986 International Conference on Parallel Processing, IEEE, 1986, pp. 640-648.
- (1986) Proceedings of the 1986 International Conference on Parallel Processing , pp. 640-648
- Ho, C.-T.¹ Johnsson, S.L.²

25
- 0004435844
- On global combine operations'
- R. van de Geijn, On global combine operations', J Parallel Distrib. Comput., 22, 324-328 (1994).
- (1994) J Parallel Distrib. Comput. , vol.22 , pp. 324-328
- Van De Geijn, R.¹

26
- 0029312007
- A pipelined broadcast for multidimensional meshes
- J. Watts and R. van de Geijn, 'A pipelined broadcast for multidimensional meshes', Parallel Process. Lett., 5, (2), 281-292 (1995).
- (1995) Parallel Process. Lett. , vol.5 , Issue.2 , pp. 281-292
- Watts, J.¹ Van De Geijn, R.²

27
- 4744342117
- TR-95-40, Department of Computer Sciences, University of Texas, Oct.
- A. Chtchelkanova, J. Gunnels, G. Morrow, J. Overfelt and R. A. van de Geijn, 'Parallel implementation of BLAS: General techniques for Level 3 BLAS', TR-95-40, Department of Computer Sciences, University of Texas, Oct. 1995.
- (1995) Parallel Implementation of BLAS: General Techniques for Level 3 BLAS
- Chtchelkanova, A.¹ Gunnels, J.² Morrow, G.³ Overfelt, J.⁴ Van De Geijn, R.A.⁵

28
- 12444256113
- Department of Computer Sciences, UT-Austin, Report TR95-39, Oct.
- C. Edwards, P. Geng, A. Patra, and R. van de Geijn, 'Parallel matrix distributions: have we been doing it all wrong?', Department of Computer Sciences, UT-Austin, Report TR95-39, Oct. 1995.
- (1995) Parallel Matrix Distributions: Have We Been Doing It All Wrong?
- Edwards, C.¹ Geng, P.² Patra, A.³ Van De Geijn, R.⁴

29
- 0000262001
- On parallelizable eigensolvers
- L. Auslander and A.Tsao, On parallelizable eigensolvers', Adv. Appl. Math., 13, 253-261, (1992).
- (1992) Adv. Appl. Math. , vol.13 , pp. 253-261
- Auslander, L.¹ Tsao, A.²

30
- 0001175581
- Design of a parallel nonsymmetric eigenroutine toolbox, Part I
- R. Sincovec, D. Keyes, M. Leuze, L. Petzold and D. Reed (Eds.), SIAM Publications, Philadelphia, PA
- Z. Bai and J. Demmel, 'Design of a parallel nonsymmetric eigenroutine toolbox, Part I', Parallel Processing for Scientific Computing, R. Sincovec, D. Keyes, M. Leuze, L. Petzold and D. Reed (Eds.), SIAM Publications, Philadelphia, PA, 1993, pp. 391-398.
- (1993) Parallel Processing for Scientific Computing , pp. 391-398
- Bai, Z.¹ Demmel, J.²

31
- 0040176467
- LAPACK working note 91, University of Tennessee, Jan.
- Z. Bai, J. Demmel, J. Dongarra, A. Petitet, H. Robinson and K. Stanley, The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers, LAPACK working note 91, University of Tennessee, Jan. 1995.
- (1995) The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers
- Bai, Z.¹ Demmel, J.² Dongarra, J.³ Petitet, A.⁴ Robinson, H.⁵ Stanley, K.⁶

32
- 0039122444
- A parallelizable eigensolver for real diagonalizable matrices with real eigenvalues
- Supercomputing Research Center
- S. Lederman, A. Tsao and T. Turnbull, 'A parallelizable eigensolver for real diagonalizable matrices with real eigenvalues', Technical Report TR-91-042, Supercomputing Research Center, 1991.
- (1991) Technical Report TR-91-042
- Lederman, S.¹ Tsao, A.² Turnbull, T.³

33
- 0039740892
- Anatomy of an out-of-core dense linear solver
- K. Klimkowski and R. van de Geijn, 'Anatomy of an out-of-core dense linear solver', Vol III, Algorithms and Applications, Proceedings of the 1995 International Conference on Parallel Processing, pp. 29-33.
- Algorithms and Applications, Proceedings of the 1995 International Conference on Parallel Processing , vol.3 , pp. 29-33
- Klimkowski, K.¹ Van De Geijn, R.²

34
- 85033305549
- A high performance parallel strassen implementation
- to be published
- B. Grayson and R. van de Geijn, 'A high performance parallel strassen implementation'. Parallel Process. Lett., to be published.
- Parallel Process. Lett.
- Grayson, B.¹ Van De Geijn, R.²

35
- 0028545949
- A high performance matrix multiplication algorithm on a distributed-memory parallel computer, using overlapped communication
- R. C. Agarwal, F. G. Gustavson and M. Zubair, 'A high performance matrix multiplication algorithm on a distributed-memory parallel computer, using overlapped communication', IBM J. Res. Dev., 673-681 (1994).
- (1994) IBM J. Res. Dev. , pp. 673-681
- Agarwal, R.C.¹ Gustavson, F.G.² Zubair, M.³

36
- 0003712293
- Ph.D. thesis, Montana State University
- L. E. Cannon, A Cellular Computer to Implement the Kalman Filter Algorithm, Ph.D. thesis, 1969, Montana State University.
- (1969) A Cellular Computer to Implement the Kalman Filter Algorithm
- Cannon, L.E.¹

37
- 0005269376
- Level 3 BLAS for distributed memory concurrent computers
- Saint Hilaire du Touvet, France, 7-8 Sept. 1992, Elsevier Science Publishers
- J. Choi, J. J. Dongarra and D. W. Walker, 'Level 3 BLAS for distributed memory concurrent computers', CNRS-NSF Workshop on Environments and Tools for Parallel Scientific Computing, Saint Hilaire du Touvet, France, 7-8 Sept. 1992, Elsevier Science Publishers, 1992.
- (1992) CNRS-NSF Workshop on Environments and Tools for Parallel Scientific Computing
- Choi, J.¹ Dongarra, J.J.² Walker, D.W.³

38
- 0003793981
- SIAM, Philadelphia
- J. J. Dongarra, I. S. Duff, D. C. Sorensen and H. A. van der Vorst, Solving Linear Systems on Vector and Shared Memory Computers, SIAM, Philadelphia, 1991.
- (1991) Solving Linear Systems on Vector and Shared Memory Computers
- Dongarra, J.J.¹ Duff, I.S.² Sorensen, D.C.³ Van Der Vorst, H.A.⁴

39
- 4243168540
- Two dimensional basic linear algebra communication subprograms
- Norfolk, March
- J. J. Dongarra, R. A. van de Geijn and R. Clint Whaley, 'Two dimensional basic linear algebra communication subprograms', in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, Norfolk, March 1993.
- (1993) Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing
- Dongarra, J.J.¹ Van De Geijn, R.A.² Clint Whaley, R.³

40
- 0023288009
- Matrix algorithms on a hypercube I: Matrix multiplication
- G. Fox, S. Otto and A. Hey, 'Matrix algorithms on a hypercube I: Matrix multiplication', Parallel Comput., 3, 17-31 (1987).
- (1987) Parallel Comput. , vol.3 , pp. 17-31
- Fox S Otto, G.¹ Hey, A.²

41
- 0004236492
- Johns Hopkins University Press, 2nd edn.
- G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, 2nd edn., 1989.
- (1989) Matrix Computations
- Golub, G.H.¹ Van Loan, C.F.²

42
- 0003417929
- The MIT Press
- W. Gropp, E. Lusk and A. Skjellum, Using MPI: Portable Programming with the Message-Passing Interface, The MIT Press, 1994.
- (1994) Using MPI: Portable Programming with the Message-Passing Interface
- Gropp, W.¹ Lusk, E.² Skjellum, A.³

43
- 12444336579
- Level 2 and 3 BLAS routines for the IBM 3090 VF/400: Implementation and experiences
- Information Processing, University of Umeå, S-901 87 Umeå, Sweden
- B. Kågström and P. Ling, 'Level 2 and 3 BLAS routines for the IBM 3090 VF/400: Implementation and experiences', Technical Report UMINF-154.88, Information Processing, University of Umeå, S-901 87 Umeå, Sweden, 1988.
- (1988) Technical Report UMINF-154.88
- Kågström, B.¹ Ling, P.²

44
- 85033287874
- Implementing matrix-vector multiplication and conjugate gradient algorithms on distributed memory multicomputers
- J. G. Lewis and R. A. van de Geijn, 'Implementing matrix-vector multiplication and conjugate gradient algorithms on distributed memory multicomputers', Supercomputing '93.
- Supercomputing '93
- Lewis, J.G.¹ Van De Geijn, R.A.²

45
- 0028553205
- Matrix-vector multiplication and conjugate gradient algorithms on distributed memory computers
- J. G. Lewis, D. G. Payne and R. A. van de Geijn, 'Matrix-vector multiplication and conjugate gradient algorithms on distributed memory computers', Scalable High Performance Computing Conference, 1994.
- (1994) Scalable High Performance Computing Conference
- Lewis, J.G.¹ Payne, D.G.² Van De Geijn, R.A.³

46
- 0026973156
- A matrix product algorithm and its comparative performance on hypercubes
- Q. Stout and M. Wolfe (eds.), IEEE Press, Los Alamitos, CA
- C. Lin and L. Snyder, 'A matrix product algorithm and its comparative performance on hypercubes', in Proceedings of Scalable High Performance Computing Conference, Q. Stout and M. Wolfe (eds.), IEEE Press, Los Alamitos, CA, 1992, pp. 190-193.
- (1992) Proceedings of Scalable High Performance Computing Conference , pp. 190-193
- Lin, C.¹ Snyder, L.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.