SCOPUS 정보 검색 플랫폼

Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)

Volumn 18, Issue , 2004, Pages 987-996

SRUMMA: A matrix multiplication algorithm suitable for clusters and scalable shared memory systems

(2) Krishnan, Manojkumar a Nieplocha, Jarek a

a PACIFIC NORTHWEST NATIONAL LABORATORY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BASIC LINEAR ALGEBRA SUBROUTINES (BLAS); MASSIVELY PARALLEL PROCESSOR (MPP); REMOTE MEMORY ACCESS (RMA) COMMUNICATION; SRUMMA;

ALGEBRA; COMPUTATIONAL METHODS; COMPUTER ARCHITECTURE; COMPUTER PROGRAMMING; COMPUTER SOFTWARE; DATA STORAGE EQUIPMENT; INSTALLATION; MATRIX ALGEBRA; OPTIMIZATION;

PARALLEL ALGORITHMS;

EID: 12444253004 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (44)

References (39)

1
- 0003278639
- Automatically tuned linear algebra software (ATLAS)
- R. Whaley and J. Dongarra, "Automatically Tuned Linear Algebra Software (ATLAS)", Supercomputing'89.
- Supercomputing'89
- Whaley, R.¹ Dongarra, J.²

2
- 0003712293
- Ph.D. dissertation, Montana State University
- L. E. Cannon, "A cellular computer to implement the Kalman Filter Algorithm", Ph.D. dissertation, Montana State University, 1969.
- (1969) A Cellular Computer to Implement the Kalman Filter Algorithm
- Cannon, L.E.¹

3
- 0023288009
- Matrix algorithms on a hypercube I: Matrix multiplication
- G. C. Fox, S. W. Otto, and A. J. G. Hey, "Matrix algorithms on a hypercube I: Matrix multiplication", Parallel Computing, vol. 4, pp. 17-31. 1987.
- (1987) Parallel Computing , vol.4 , pp. 17-31
- Fox, G.C.¹ Otto, S.W.² Hey, A.J.G.³

4
- 0003506603
- Prentice Hall
- G. C. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker, Solving Problems on Concurrent Processors. vol. 1, Prentice Hall, 1988.
- (1988) Solving Problems on Concurrent Processors , vol.1
- Fox, G.C.¹ Johnson, M.² Lyzenga, G.³ Otto, S.⁴ Salmon, J.⁵ Walker, D.⁶

5
- 0004236492
- Johns Hopkins University Press
- G.H. Golub and C.H Van Loan. Matrix Computations. Johns Hopkins University Press, 1989.
- (1989) Matrix Computations
- Golub, G.H.¹ Van Loan, C.H.²

6
- 0024883116
- Communication efficient matrix multiplication on hypercubes
- J. Berntsen, Communication efficient matrix multiplication on hypercubes, Parallel Computing, vol. 12, pp. 335-342, 1989.
- (1989) Parallel Computing , vol.12 , pp. 335-342
- Berntsen, J.¹

7
- 84904335157
- Scalability of parallel algorithms for matrix multiplication
- A. Gupta and V. Kumar, "Scalability of Parallel Algorithms for Matrix Multiplication", Proc. ICPP, 1993
- (1993) Proc. ICPP
- Gupta, A.¹ Kumar, V.²

8
- 0026973156
- A matrix product algorithm and its comparative performance on hypercubes
- C. Lin and L.Snyder, "A matrix product algorithm and its comparative performance on hypercubes", in Scalable High Performance Computing Conference, 1992,
- (1992) Scalable High Performance Computing Conference
- Lin, C.¹ Snyder, L.²

9
- 4544341989
- Q. Luo and J. B. Drake, "A Scalable Parallel Strassen's Matrix Multiply Algorithm for Distributed Memory Computers", http://citeseer.nj. nec.com/517382.html
- A Scalable Parallel Strassen's Matrix Multiply Algorithm for Distributed Memory Computers
- Luo, Q.¹ Drake, J.B.²

10
- 0037970044
- Comparison of scalable parallel matrix multiplication libraries
- IEEE Computer Society Press
- S. Huss-Lederman, E. M. Jacobson, and A. Tsao, "Comparison of Scalable Parallel Matrix Multiplication Libraries," in Scalable Parallel Libraries Conference, IEEE Computer Society Press, 1994, pp. 142-149.
- (1994) Scalable Parallel Libraries Conference , pp. 142-149
- Huss-Lederman, S.¹ Jacobson, E.M.² Tsao, A.³

11
- 84973758198
- Matrix multiplication on hypercubes using full bandwidth and constant storage
- C. T. Ho, S. L. Johnsson and A. Edelman, "Matrix multiplication on hypercubes using full bandwidth and constant storage", in Proceeding of the Sixth Distributed Memory Computing Conference. 1991, pp. 447-451.
- (1991) Proceeding of the Sixth Distributed Memory Computing Conference , pp. 447-451
- Ho, C.T.¹ Johnsson, S.L.² Edelman, A.³

12
- 0039066274
- Communication efficient matrix multiplication on hypercubes
- H. Gupta and P. Sadayappan, "Communication Efficient Matrix Multiplication on Hypercubes", in Proc Sixth ACM SPAA, 1994.
- (1994) Proc Sixth ACM SPAA
- Gupta, H.¹ Sadayappan, P.²

13
- 0031146653
- A poly-algorithm for parallel dense matrix multiplication on two-dimensional process grid topologies
- J. Li, A. Skjellum, and R. D. Falgout, "A Poly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies," Concurrency, Practice and Experience, vol. 9(5), 1997.
- (1997) Concurrency, Practice and Experience , vol.9 , Issue.5
- Li, J.¹ Skjellum, A.² Falgout, R.D.³

14
- 0000456144
- Parallel matrix and graph algorithms
- E. Dekel, D. Nassimi, and S. Sahni, "Parallel matrix and graph algorithms", SIAM Journal on Computing, vol. 10, pp. 657-673, 1981.
- (1981) SIAM Journal on Computing , vol.10 , pp. 657-673
- Dekel, E.¹ Nassimi, D.² Sahni, S.³

15
- 0003451323
- Springer-Verlag, New York, NY
- S. Ranka and S. Sahni. Hypercube Algorithms for Image Processing and Pattern Recognition. Springer-Verlag, New York, NY, 1990.
- (1990) Hypercube Algorithms for Image Processing and Pattern Recognition
- Ranka, S.¹ Sahni, S.²

16
- 0028530654
- PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers
- J. Choi, J. Dongarra, and D. W. Walker, "PUMMA: Parallel Universal Matrix Multiplication Algorithms on distributed memory concurrent computers," Concurrency: Practice and Experience, vol. 6(7), pp. 543-570, 1994.
- (1994) Concurrency: Practice and Experience , vol.6 , Issue.7 , pp. 543-570
- Choi, J.¹ Dongarra, J.² Walker, D.W.³

17
- 0028529387
- Matrix multiplication on the intel touchstone DELTA
- Oct
- S. Huss-Lederman, E. Jacobson, A. Tsao, and G. Zhang, "Matrix Multiplication on the Intel Touchstone DELTA", Concurrency: Practice and Experience, vol. 6 (7) pp. 571-594. Oct 1994.
- (1994) Concurrency: Practice and Experience , vol.6 , Issue.7 , pp. 571-594
- Huss-Lederman, S.¹ Jacobson, E.² Tsao, A.³ Zhang, G.⁴

18
- 0028545949
- A high performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication
- R. C. Agarwal, F. Gustavson, and M. Zubair, "A high performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication," IBM J. of Research and Development, vol. 38 (6), 1994.
- (1994) IBM J. of Research and Development , vol.38 , Issue.6
- Agarwal, R.C.¹ Gustavson, F.² Zubair, M.³

19
- 0031123769
- SUMMA: Scalable universal matrix multiplication algorithm
- R. van de Geijn, R. and J. Watts, "SUMMA: Scalable Universal Matrix Multiplication Algorithm," Concurrency: Practice and Experience, vol. 9(4), pp. 255-274, 1997.
- (1997) Concurrency: Practice and Experience , vol.9 , Issue.4 , pp. 255-274
- Van De Geijn, R.¹ Watts, J.²

20
- 0003978709
- A proposal for a set of parallel basic linear algebra subprograms
- May
- J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. Walker, and, R. C. Whaley, "A Proposal for a Set of Parallel Basic Linear Algebra Subprograms", University of Tennessee, Knoxville, Tech. Rep. CS-95-292, May 1995.
- (1995) University of Tennessee, Knoxville, Tech. Rep. , vol.CS-95-292
- Choi, J.¹ Dongarra, J.² Ostrouchov, S.³ Petitet, A.⁴ Walker, D.⁵ Whaley, R.C.⁶

21
- 0003615167
- SIAM, Philadelphia, PA
- L. S. Blackford et al., ScaLAPACK Users' Guide, SIAM, 1997, Philadelphia, PA.
- (1997) ScaLAPACK Users' Guide
- Blackford, L.S.¹

22
- 0030676131
- A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers
- J. Choi, "A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers", in Proc. IPPS '97, 1997.
- (1997) Proc. IPPS '97
- Choi, J.¹

23
- 12444271603
- OpenMP issues arising in the development of parallel BLAS and LAPACK libraries
- C. Addison and Y. Ren, "OpenMP Issues Arising in the Development of Parallel BLAS and LAPACK libraries", in Proceedings EWOMP'01. 2001.
- (2001) Proceedings EWOMP'01
- Addison, C.¹ Ren, Y.²

24
- 0035949178
- Scalability and performance of OpenMP and MPI on a 128-processor SGI origin 2000
- G. R. Luecke and W. Lin, "Scalability and Performance of OpenMP and MPI on a 128-Processor SGI Origin 2000", Concurrency and Computation: Practice and Experience, vol. 13, pp 905-928. 2001.
- (2001) Concurrency and Computation: Practice and Experience , vol.13 , pp. 905-928
- Luecke, G.R.¹ Lin, W.²

25
- 51549099930
- Mixed mode matrix multiplication
- M. Wu, S. Aluru, and R. A. Kendall, "Mixed Mode Matrix Multiplication", IEEE CLUSTER'02, 2002.
- (2002) IEEE CLUSTER'02
- Wu, M.¹ Aluru, S.² Kendall, R.A.³

26
- 84860083667
- Performance analysis of various parallelization methods for BLAS3 routines on cluster architectures
- Nov
- T. Betcke, "Performance analysis of various parallelization methods for BLAS3 routines on cluster architectures", John von Neumann-Instituts für Computing, Tech. Rep. FZJ-ZAM-IB-2000-15, Nov, 2000.
- (2000) John von Neumann-Instituts für Computing, Tech. Rep. , vol.FZJ-ZAM-IB-2000-15
- Betcke, T.¹

27
- 4544274105
- The implementation of MPI-2 one-sided communication for the NEC SX-5
- J. L. Träff, H. Ritzdorf, R. Hempel "The Implementation of MPI-2 One-Sided Communication for the NEC SX-5", in Proceedings of Supercomputing, 2000.
- (2000) Proceedings of Supercomputing
- Träff, J.L.¹ Ritzdorf, H.² Hempel, R.³

28
- 1142305191
- High performance RDMA-Based MPI implementation over infiniband
- J. Liu, J. Wu, S. P. Kinis, P. Wyckoff, and D. K. Panda, "High Performance RDMA-Based MPI Implementation over InfiniBand", in Proc of 17th ACM International Conference on Supercomputing, 2003.
- (2003) Proc of 17th ACM International Conference on Supercomputing
- Liu, J.¹ Wu, J.² Kinis, S.P.³ Wyckoff, P.⁴ Panda, D.K.⁵

29
- 78649896726
- Optimizing mechanisms for latency tolerance in remote memory access communication on clusters
- J. Nieplocha, V. Tipparaju, M. Krishnan, G. Santhanaraman, and D.K. Panda,"Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Communication on Clusters", IEEE CLUSTER, 2003.
- (2003) IEEE CLUSTER
- Nieplocha, J.¹ Tipparaju, V.² Krishnan, M.³ Santhanaraman, G.⁴ Panda, D.K.⁵

30
- 2342641297
- Addison Wesley
- A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing, Addison Wesley, 2003.
- (2003) Introduction to Parallel Computing
- Grama, A.¹ Gupta, A.² Karypis, G.³ Kumar, V.⁴

31
- 84862429834
- Optimizing Applications on the Cray X1TM System. http://www.cray.com/ craydoc/20/manuals/S-2315-50/html-S-2315-50/S-2315-50-toc.html
- Optimizing Applications on the Cray X1TM System

32
- 0006168939
- ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems
- J. Nieplocha and B. Carpenter, "ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems", in Proceedings of RTSPP IPPS/SDP, 1999.
- (1999) Proceedings of RTSPP IPPS/SDP
- Nieplocha, J.¹ Carpenter, B.²

33
- 34247349414
- ARMCI Web page. http://www.emsl.pnl.gov/docs/parsoft/armci/
- ARMCI Web Page

34
- 12244256651
- One-sided communication on myrinet
- J. Nieplocha, V. Tipparaju, J. Ju, and E. Apra, "One-sided communication on Myrinet", Cluster Computing, vol. 6, pp. 115-124, 2003.
- (2003) Cluster Computing , vol.6 , pp. 115-124
- Nieplocha, J.¹ Tipparaju, V.² Ju, J.³ Apra, E.⁴

35
- 77954488753
- Protocols and strategies for optimizing remote memory operations on clusters
- J. Nieplocha, V. Tipparaju, A. Saify, and D. Panda, "Protocols and Strategies for Optimizing Remote Memory Operations on Clusters", Proc CAC/IPDPS'02.2002.
- (2002) Proc CAC/IPDPS'02
- Nieplocha, J.¹ Tipparaju, V.² Saify, A.³ Panda, D.⁴

36
- 84860086729
- ORNL Tom Dunigan's Evaluation of Early Systems Webpage. http://www.csm.ornl.gov/~dunigan/
- Tom Dunigan's Evaluation of Early Systems Webpage

37
- 4544356070
- Exploiting non-blocking remote memory access communication in scientific benchmarks
- V. Tipparaju, M. Krishnan, J. Nieplocha, G. Santhanaraman, and D.K. Panda, "Exploiting Non-blocking Remote Memory Access Communication in Scientific Benchmarks", Proc. HiPC'2003, 2003.
- (2003) Proc. HiPC'2003
- Tipparaju, V.¹ Krishnan, M.² Nieplocha, J.³ Santhanaraman, G.⁴ Panda, D.K.⁵

38
- 84948981514
- COMB: A portable benchmark suite for assessing MPI overlap
- B. Lawry, R. Wilson, A. B. Maccabe, and R. Brightwell, "COMB: A Portable Benchmark Suite for Assessing MPI Overlap", IEEE Cluster, 2002.
- (2002) IEEE Cluster
- Lawry, B.¹ Wilson, R.² Maccabe, A.B.³ Brightwell, R.⁴

39
- 12444270456
- Where's the overlap? Overlapping communication and computation in several popular MPI implementations
- J. B. White and S. W. Bova, "Where's the overlap? Overlapping communication and computation in several popular MPI implementations", in Proceedings of the Third MPI Developers' and Users' Conference, 1999.
- (1999) Proceedings of the Third MPI Developers' and Users' Conference
- White, J.B.¹ Bova, S.W.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.