SCOPUS 정보 검색 플랫폼

Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS

Volumn 10, Issue , 2004, Pages 257-266

Optimizing parallel multiplication operation for rectangular and transposed matrices

(2) Krishnan, Manojkumar a Nieplocha, Jarek a

a PACIFIC NORTHWEST NATIONAL LABORATORY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BASIC LINEAR ALGEBRA SUBROUTINES (BLAS); PARALLEL MULTIPLICATION OPERATION; REMOTE MEMORY ACCESS (RMA); TRANSPOSED MATRICES;

ALGORITHMS; COMMUNICATION SYSTEMS; COMPUTER ARCHITECTURE; LINEAR SYSTEMS; MATHEMATICAL TECHNIQUES; MATRIX ALGEBRA; NETWORK PROTOCOLS;

PARALLEL PROCESSING SYSTEMS;

EID: 4544240248 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICPADS.2004.1316103 Document Type: Conference Paper

Times cited : (3)

References (36)

1
- 12444253004
- SRUMMA: A matrix multiplication algorithm suitable for clusters and scalable shared memory systems
- M. Krishnan, J. Nieplocha, "SRUMMA: A Matrix Multiplication Algorithm Suitable for Clusters and Scalable Shared Memory Systems", in IPDPS'2004.
- IPDPS'2004
- Krishnan, M.¹ Nieplocha, J.²

2
- 0003712293
- Montana State University
- L. E. Cannon, "A cellular computer to implement the Kalman Filter Algorithm", Montana State University, 1969.
- (1969) A Cellular Computer to Implement the Kalman Filter Algorithm
- Cannon, L.E.¹

3
- 0023288009
- Matrix algorithms on a hypercube I: Matrix multiplication
- G. C. Fox, S. W. Otto, and A. J. G. Hey, "Matrix algorithms on a hypercube I: Matrix multiplication", Parallel Computing, vol. 4, pp. 17-31. 1987.
- (1987) Parallel Computing , vol.4 , pp. 17-31
- Fox, G.C.¹ Otto, S.W.² Hey, A.J.G.³

4
- 4544319390
- G. C. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, D. Walker, Solving Problems on Concurrent Processors, v. 1, 1988.
- (1988) Solving Problems on Concurrent Processors , vol.1
- Fox, G.C.¹ Johnson, M.² Lyzenga, G.³ Otto, S.⁴ Salmon, J.⁵ Walker, D.⁶

5
- 0004236492
- G.H. Golub, C.H Van Loan. Matrix Computations. 1989.
- (1989) Matrix Computations
- Golub, G.H.¹ Van Loan, C.H.²

6
- 0024883116
- Communication efficient matrix multiplication on hypercubes
- J. Berntsen, "Communication efficient matrix multiplication on hypercubes", Parallel Computing, vol. 12, 1989.
- (1989) Parallel Computing , vol.12
- Berntsen, J.¹

7
- 4544306280
- Scalability of Parallel Algorithms for Matrix Multiplication
- A. Gupta and V. Kumar, "Scalability of Parallel Algorithms for Matrix Multiplication", Proc. Parallel Processing,'93.
- Proc. Parallel Processing,'93
- Gupta, A.¹ Kumar, V.²

8
- 0026973156
- A matrix product algorithm and its comparative performance on hypercubes
- C. Lin and L.Snyder, "A matrix product algorithm and its comparative performance on hypercubes", in SHPCC, 1992.
- (1992) SHPCC
- Lin, C.¹ Snyder, L.²

9
- 4544341989
- Q. Luo and J. Drake, "A Scalable Parallel Strassen's Matrix Multiply Algorithm for Distributed Memory Computers", http://citeseer.nj.nec.com/517382.html
- A Scalable Parallel Strassen's Matrix Multiply Algorithm for Distributed Memory Computers
- Luo, Q.¹ Drake, J.²

10
- 0346403604
- Comparison of scalable parallel matrix multiplication libraries
- S. Huss-Lederman, E. M. Jacobson, and A. Tsao, "Comparison of Scalable Parallel Matrix Multiplication Libraries",Proc. Scalable Parallel Libraries Conference'94.
- Proc. Scalable Parallel Libraries Conference'94
- Huss-Lederman, S.¹ Jacobson, E.M.² Tsao, A.³

11
- 84973758198
- Matrix multiplication on hypercubes using full bandwidth and constant storage
- C. T. Ho, S. L. Johnsson, A. Edelman, Matrix multiplication on hypercubes using full bandwidth and constant storage, Proc.of Distributed Memory Computing Conference. 1991.
- (1991) Proc.of Distributed Memory Computing Conference
- Ho, C.T.¹ Johnsson, S.L.² Edelman, A.³

12
- 0039066274
- Communication efficient matrix multiplication on hypercubes
- H. Gupta and P. Sadayappan, "Communication Efficient Matrix Multiplication on Hypercubes", in Proc. of ACM Symposium on Parallel Algorithms and Architectures, 1994.
- (1994) Proc. of ACM Symposium on Parallel Algorithms and Architectures
- Gupta, H.¹ Sadayappan, P.²

13
- 0031146653
- A poly-algorithm for parallel dense matrix multiplication on two-dimensional process grid topologies
- 97
- J. Li, A. Skjellum, and R. D. Falgout, "A Poly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies," Concurrency, Practice and Experience, vol. 9(5),'97.
- Concurrency, Practice and Experience , vol.9 , Issue.5
- Li, J.¹ Skjellum, A.² Falgout, R.D.³

14
- 0000456144
- Parallel matrix and graph algorithms
- E. Dekel, D. Nassimi, and S. Sahni, "Parallel matrix and graph algorithms", SIAM Journal on Computing'81, vol.10.
- SIAM Journal on Computing'81 , vol.10
- Dekel, E.¹ Nassimi, D.² Sahni, S.³

15
- 0003451323
- S. Ranka and S. Sahni. Hypercube Algorithms for Image Processing and Pattern Recognition, 1990.
- (1990) Hypercube Algorithms for Image Processing and Pattern Recognition
- Ranka, S.¹ Sahni, S.²

16
- 0028530654
- PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers
- 94
- J. Choi, J. Dongarra, and D. W. Walker, "PUMMA: Parallel Universal Matrix Multiplication Algorithms on distributed memory concurrent computers," Concurrenc:,Practice and Experience, vol. 6(7),'94.
- Concurrenc:,Practice and Experience , vol.6 , Issue.7
- Choi, J.¹ Dongarra, J.² Walker, D.W.³

17
- 0028529387
- Matrix multiplication on the intel touchstone DELTA
- S. Huss-Lederman, E. Jacobson, A. Tsao, and G. Zhang, "Matrix Multiplication on the Intel Touchstone DELTA", Concurrency: Practice and Experience, vol. 6 (7). 1994.
- (1994) Concurrency: Practice and Experience , vol.6 , Issue.7
- Huss-Lederman, S.¹ Jacobson, E.² Tsao, A.³ Zhang, G.⁴

18
- 0028545949
- A high performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication
- R. C. Agarwal, F. Gustavson, and M. Zubair, "A high performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication," IBM J. of Research and Development '94.
- IBM J. of Research and Development '94
- Agarwal, R.C.¹ Gustavson, F.² Zubair, M.³

19
- 0031123769
- SUMMA: Scalable universal matrix multiplication algorithm
- 97
- R. van de Geijn, R. J. Watts, "SUMMA: Scalable Universal Matrix Multiplication Algorithm" Concurrency: Practice and Experience, vol.9(4),'97.
- Concurrency: Practice and Experience , vol.9 , Issue.4
- Van De Geijn, R.¹ Watts, R.J.²

20
- 0003978709
- A proposal for a set of parallel basic linear algebra subprograms
- University of Tennessee, Knoxville
- J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. Walker, and, R. C. Whaley, "A Proposal for a Set of Parallel Basic Linear Algebra Subprograms", University of Tennessee, Knoxville, Tech. Rep. CS-95-292, 1995.
- (1995) Tech. Rep. , vol.CS-95-292
- Choi, J.¹ Dongarra, J.² Ostrouchov, S.³ Petitet, A.⁴ Walker, D.⁵ Whaley, R.C.⁶

21
- 0003615167
- L. S. Blackford et. al., ScaLAPACK Users' Guide, 1997.
- (1997) ScaLAPACK Users' Guide
- Blackford, L.S.¹

22
- 0030676131
- A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers
- J. Choi, "A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers", in Proc. of IPPS, 1997.
- (1997) Proc. of IPPS
- Choi, J.¹

23
- 0035949178
- Scalability and performance of OpenMP and MPI on 128-Processor SGI Origin-2000
- G.R. Luecke, W. Lin, Scalability and performance of OpenMP and MPI on 128-Processor SGI Origin-2000, Concurrency and Computation: Practice and Experience,13, 2001.
- (2001) Concurrency and Computation: Practice and Experience , vol.13
- Luecke, G.R.¹ Lin, W.²

24
- 51549099930
- Mixed mode matrix multiplication
- M. Wu, S. Aluru, and R. A. Kendall, "Mixed Mode Matrix Multiplication", in Proc. IEEE CLUSTER'02.
- Proc. IEEE CLUSTER'02
- Wu, M.¹ Aluru, S.² Kendall, R.A.³

25
- 84862431241
- The implementation of MPI-2 one-sided communication for the NEC SX-5
- J. L. Träff, H. Ritzdorf, R. Hempel "The Implementation of MPI-2 One-Sided Communication for the NEC SX-5", SC'2000.
- SC'2000
- Träff, J.L.¹ Ritzdorf, H.² Hempel, R.³

26
- 4544241477
- High Performance RDMA-Based MPI Implementation over InfmiBand
- J. Liu, J. Wu, S. P. Kini, P. Wyckoff, and D. K. Panda, "High Performance RDMA-Based MPI Implementation over InfmiBand" in ACM ICS, 2003.
- (2003) ACM ICS
- Liu, J.¹ Wu, J.² Kini, S.P.³ Wyckoff, P.⁴ Panda, D.K.⁵

27
- 78649896726
- Optimizing mechanisms for latency tolerance in remote memory access communication on clusters
- J. Nieplocha, V. Tipparaju, M. Krishnan, G. Santhanaraman, and D.K. Panda," Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Communication on Clusters", IEEE Cluster Computing'03.
- IEEE Cluster Computing'03
- Nieplocha, J.¹ Tipparaju, V.² Krishnan, M.³ Santhanaraman, G.⁴ Panda, D.K.⁵

28
- 2342641297
- A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing, 2003.
- (2003) Introduction to Parallel Computing
- Gupta, A.¹ Karypis, G.² Kumar, V.³

29
- 84862429834
- Optimizing Applications on the Cray X1TM System. http://www.cray.com/craydoc/20/manuals/S-2315-50/html-S-2315-50/S-2315-50-toc. html
- Optimizing Applications on the Cray X1TM System

30
- 0006168939
- ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems
- J. Nieplocha, B. Carpenter, ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems, Proc. RTSPP IPPS/SDP 1999.
- (1999) Proc. RTSPP IPPS/SDP
- Nieplocha, J.¹ Carpenter, B.²

31
- 12244256651
- One-sided communication on Myrinet
- J. Nieplocha, V. Tipparaju, J. Ju, and E. Apra, "One-sided communication on Myrinet", Cluster Computing'03, vol. 6.
- Cluster Computing'03 , vol.6
- Nieplocha, J.¹ Tipparaju, V.² Ju, J.³ Apra, E.⁴

32
- 77954488753
- Protocols and strategies for optimizing remote memory operations on clusters
- J. Nieplocha, V. Tipparaju, A. Saify, and D. Panda, "Protocols and Strategies for Optimizing Remote Memory Operations on Clusters", Proc. CAC Workshop IPDPS'02.
- Proc. CAC Workshop IPDPS'02
- Nieplocha, J.¹ Tipparaju, V.² Saify, A.³ Panda, D.⁴

33
- 84862424792
- http://www.csm.ornl.gov/evaluation

34
- 84862428917
- http://www.csm.ornl.gov/~dunigan/

35
- 4544356070
- Exploiting non-blocking remote memory access communication in scientific benchmarks
- V. Tipparaju, M. Krishnan, J. Nieplocha, G. Santhanaraman, and D.K. Panda, "Exploiting Non-blocking Remote Memory Access Communication in Scientific Benchmarks", Proceedings of HiPC, 2003.
- (2003) Proceedings of HiPC
- Tipparaju, V.¹ Krishnan, M.² Nieplocha, J.³ Santhanaraman, G.⁴ Panda, D.K.⁵

36
- 84948981514
- COMB: A portable benchmark suite for assessing MPI overlap
- B. Lawry, R. Wilson, A. B. Maccabe, and R. Brightwell, "COMB: A Portable Benchmark Suite for Assessing MPI Overlap", IEEE Cluster, 2002.
- (2002) IEEE Cluster
- Lawry, B.¹ Wilson, R.² Maccabe, A.B.³ Brightwell, R.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.