-
1
-
-
12444253004
-
SRUMMA: A matrix multiplication algorithm suitable for clusters and scalable shared memory systems
-
M. Krishnan, J. Nieplocha, "SRUMMA: A Matrix Multiplication Algorithm Suitable for Clusters and Scalable Shared Memory Systems", in IPDPS'2004.
-
IPDPS'2004
-
-
Krishnan, M.1
Nieplocha, J.2
-
3
-
-
0023288009
-
Matrix algorithms on a hypercube I: Matrix multiplication
-
G. C. Fox, S. W. Otto, and A. J. G. Hey, "Matrix algorithms on a hypercube I: Matrix multiplication", Parallel Computing, vol. 4, pp. 17-31. 1987.
-
(1987)
Parallel Computing
, vol.4
, pp. 17-31
-
-
Fox, G.C.1
Otto, S.W.2
Hey, A.J.G.3
-
4
-
-
4544319390
-
-
G. C. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, D. Walker, Solving Problems on Concurrent Processors, v. 1, 1988.
-
(1988)
Solving Problems on Concurrent Processors
, vol.1
-
-
Fox, G.C.1
Johnson, M.2
Lyzenga, G.3
Otto, S.4
Salmon, J.5
Walker, D.6
-
6
-
-
0024883116
-
Communication efficient matrix multiplication on hypercubes
-
J. Berntsen, "Communication efficient matrix multiplication on hypercubes", Parallel Computing, vol. 12, 1989.
-
(1989)
Parallel Computing
, vol.12
-
-
Berntsen, J.1
-
8
-
-
0026973156
-
A matrix product algorithm and its comparative performance on hypercubes
-
C. Lin and L.Snyder, "A matrix product algorithm and its comparative performance on hypercubes", in SHPCC, 1992.
-
(1992)
SHPCC
-
-
Lin, C.1
Snyder, L.2
-
13
-
-
0031146653
-
A poly-algorithm for parallel dense matrix multiplication on two-dimensional process grid topologies
-
97
-
J. Li, A. Skjellum, and R. D. Falgout, "A Poly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies," Concurrency, Practice and Experience, vol. 9(5),'97.
-
Concurrency, Practice and Experience
, vol.9
, Issue.5
-
-
Li, J.1
Skjellum, A.2
Falgout, R.D.3
-
16
-
-
0028530654
-
PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers
-
94
-
J. Choi, J. Dongarra, and D. W. Walker, "PUMMA: Parallel Universal Matrix Multiplication Algorithms on distributed memory concurrent computers," Concurrenc:,Practice and Experience, vol. 6(7),'94.
-
Concurrenc:,Practice and Experience
, vol.6
, Issue.7
-
-
Choi, J.1
Dongarra, J.2
Walker, D.W.3
-
17
-
-
0028529387
-
Matrix multiplication on the intel touchstone DELTA
-
S. Huss-Lederman, E. Jacobson, A. Tsao, and G. Zhang, "Matrix Multiplication on the Intel Touchstone DELTA", Concurrency: Practice and Experience, vol. 6 (7). 1994.
-
(1994)
Concurrency: Practice and Experience
, vol.6
, Issue.7
-
-
Huss-Lederman, S.1
Jacobson, E.2
Tsao, A.3
Zhang, G.4
-
18
-
-
0028545949
-
A high performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication
-
R. C. Agarwal, F. Gustavson, and M. Zubair, "A high performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication," IBM J. of Research and Development '94.
-
IBM J. of Research and Development '94
-
-
Agarwal, R.C.1
Gustavson, F.2
Zubair, M.3
-
20
-
-
0003978709
-
A proposal for a set of parallel basic linear algebra subprograms
-
University of Tennessee, Knoxville
-
J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. Walker, and, R. C. Whaley, "A Proposal for a Set of Parallel Basic Linear Algebra Subprograms", University of Tennessee, Knoxville, Tech. Rep. CS-95-292, 1995.
-
(1995)
Tech. Rep.
, vol.CS-95-292
-
-
Choi, J.1
Dongarra, J.2
Ostrouchov, S.3
Petitet, A.4
Walker, D.5
Whaley, R.C.6
-
22
-
-
0030676131
-
A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers
-
J. Choi, "A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers", in Proc. of IPPS, 1997.
-
(1997)
Proc. of IPPS
-
-
Choi, J.1
-
25
-
-
84862431241
-
The implementation of MPI-2 one-sided communication for the NEC SX-5
-
J. L. Träff, H. Ritzdorf, R. Hempel "The Implementation of MPI-2 One-Sided Communication for the NEC SX-5", SC'2000.
-
SC'2000
-
-
Träff, J.L.1
Ritzdorf, H.2
Hempel, R.3
-
26
-
-
4544241477
-
High Performance RDMA-Based MPI Implementation over InfmiBand
-
J. Liu, J. Wu, S. P. Kini, P. Wyckoff, and D. K. Panda, "High Performance RDMA-Based MPI Implementation over InfmiBand" in ACM ICS, 2003.
-
(2003)
ACM ICS
-
-
Liu, J.1
Wu, J.2
Kini, S.P.3
Wyckoff, P.4
Panda, D.K.5
-
27
-
-
78649896726
-
Optimizing mechanisms for latency tolerance in remote memory access communication on clusters
-
J. Nieplocha, V. Tipparaju, M. Krishnan, G. Santhanaraman, and D.K. Panda," Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Communication on Clusters", IEEE Cluster Computing'03.
-
IEEE Cluster Computing'03
-
-
Nieplocha, J.1
Tipparaju, V.2
Krishnan, M.3
Santhanaraman, G.4
Panda, D.K.5
-
30
-
-
0006168939
-
ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems
-
J. Nieplocha, B. Carpenter, ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems, Proc. RTSPP IPPS/SDP 1999.
-
(1999)
Proc. RTSPP IPPS/SDP
-
-
Nieplocha, J.1
Carpenter, B.2
-
31
-
-
12244256651
-
One-sided communication on Myrinet
-
J. Nieplocha, V. Tipparaju, J. Ju, and E. Apra, "One-sided communication on Myrinet", Cluster Computing'03, vol. 6.
-
Cluster Computing'03
, vol.6
-
-
Nieplocha, J.1
Tipparaju, V.2
Ju, J.3
Apra, E.4
-
33
-
-
84862424792
-
-
http://www.csm.ornl.gov/evaluation
-
-
-
-
34
-
-
84862428917
-
-
http://www.csm.ornl.gov/~dunigan/
-
-
-
-
35
-
-
4544356070
-
Exploiting non-blocking remote memory access communication in scientific benchmarks
-
V. Tipparaju, M. Krishnan, J. Nieplocha, G. Santhanaraman, and D.K. Panda, "Exploiting Non-blocking Remote Memory Access Communication in Scientific Benchmarks", Proceedings of HiPC, 2003.
-
(2003)
Proceedings of HiPC
-
-
Tipparaju, V.1
Krishnan, M.2
Nieplocha, J.3
Santhanaraman, G.4
Panda, D.K.5
-
36
-
-
84948981514
-
COMB: A portable benchmark suite for assessing MPI overlap
-
B. Lawry, R. Wilson, A. B. Maccabe, and R. Brightwell, "COMB: A Portable Benchmark Suite for Assessing MPI Overlap", IEEE Cluster, 2002.
-
(2002)
IEEE Cluster
-
-
Lawry, B.1
Wilson, R.2
Maccabe, A.B.3
Brightwell, R.4
|