-
1
-
-
0003278639
-
Automatically tuned linear algebra software (ATLAS)
-
R. Whaley and J. Dongarra, "Automatically Tuned Linear Algebra Software (ATLAS)", Supercomputing'89.
-
Supercomputing'89
-
-
Whaley, R.1
Dongarra, J.2
-
3
-
-
0023288009
-
Matrix algorithms on a hypercube I: Matrix multiplication
-
G. C. Fox, S. W. Otto, and A. J. G. Hey, "Matrix algorithms on a hypercube I: Matrix multiplication", Parallel Computing, vol. 4, pp. 17-31. 1987.
-
(1987)
Parallel Computing
, vol.4
, pp. 17-31
-
-
Fox, G.C.1
Otto, S.W.2
Hey, A.J.G.3
-
4
-
-
0003506603
-
-
Prentice Hall
-
G. C. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker, Solving Problems on Concurrent Processors. vol. 1, Prentice Hall, 1988.
-
(1988)
Solving Problems on Concurrent Processors
, vol.1
-
-
Fox, G.C.1
Johnson, M.2
Lyzenga, G.3
Otto, S.4
Salmon, J.5
Walker, D.6
-
6
-
-
0024883116
-
Communication efficient matrix multiplication on hypercubes
-
J. Berntsen, Communication efficient matrix multiplication on hypercubes, Parallel Computing, vol. 12, pp. 335-342, 1989.
-
(1989)
Parallel Computing
, vol.12
, pp. 335-342
-
-
Berntsen, J.1
-
7
-
-
84904335157
-
Scalability of parallel algorithms for matrix multiplication
-
A. Gupta and V. Kumar, "Scalability of Parallel Algorithms for Matrix Multiplication", Proc. ICPP, 1993
-
(1993)
Proc. ICPP
-
-
Gupta, A.1
Kumar, V.2
-
10
-
-
0037970044
-
Comparison of scalable parallel matrix multiplication libraries
-
IEEE Computer Society Press
-
S. Huss-Lederman, E. M. Jacobson, and A. Tsao, "Comparison of Scalable Parallel Matrix Multiplication Libraries," in Scalable Parallel Libraries Conference, IEEE Computer Society Press, 1994, pp. 142-149.
-
(1994)
Scalable Parallel Libraries Conference
, pp. 142-149
-
-
Huss-Lederman, S.1
Jacobson, E.M.2
Tsao, A.3
-
12
-
-
0039066274
-
Communication efficient matrix multiplication on hypercubes
-
H. Gupta and P. Sadayappan, "Communication Efficient Matrix Multiplication on Hypercubes", in Proc Sixth ACM SPAA, 1994.
-
(1994)
Proc Sixth ACM SPAA
-
-
Gupta, H.1
Sadayappan, P.2
-
13
-
-
0031146653
-
A poly-algorithm for parallel dense matrix multiplication on two-dimensional process grid topologies
-
J. Li, A. Skjellum, and R. D. Falgout, "A Poly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies," Concurrency, Practice and Experience, vol. 9(5), 1997.
-
(1997)
Concurrency, Practice and Experience
, vol.9
, Issue.5
-
-
Li, J.1
Skjellum, A.2
Falgout, R.D.3
-
14
-
-
0000456144
-
Parallel matrix and graph algorithms
-
E. Dekel, D. Nassimi, and S. Sahni, "Parallel matrix and graph algorithms", SIAM Journal on Computing, vol. 10, pp. 657-673, 1981.
-
(1981)
SIAM Journal on Computing
, vol.10
, pp. 657-673
-
-
Dekel, E.1
Nassimi, D.2
Sahni, S.3
-
16
-
-
0028530654
-
PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers
-
J. Choi, J. Dongarra, and D. W. Walker, "PUMMA: Parallel Universal Matrix Multiplication Algorithms on distributed memory concurrent computers," Concurrency: Practice and Experience, vol. 6(7), pp. 543-570, 1994.
-
(1994)
Concurrency: Practice and Experience
, vol.6
, Issue.7
, pp. 543-570
-
-
Choi, J.1
Dongarra, J.2
Walker, D.W.3
-
17
-
-
0028529387
-
Matrix multiplication on the intel touchstone DELTA
-
Oct
-
S. Huss-Lederman, E. Jacobson, A. Tsao, and G. Zhang, "Matrix Multiplication on the Intel Touchstone DELTA", Concurrency: Practice and Experience, vol. 6 (7) pp. 571-594. Oct 1994.
-
(1994)
Concurrency: Practice and Experience
, vol.6
, Issue.7
, pp. 571-594
-
-
Huss-Lederman, S.1
Jacobson, E.2
Tsao, A.3
Zhang, G.4
-
18
-
-
0028545949
-
A high performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication
-
R. C. Agarwal, F. Gustavson, and M. Zubair, "A high performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication," IBM J. of Research and Development, vol. 38 (6), 1994.
-
(1994)
IBM J. of Research and Development
, vol.38
, Issue.6
-
-
Agarwal, R.C.1
Gustavson, F.2
Zubair, M.3
-
19
-
-
0031123769
-
SUMMA: Scalable universal matrix multiplication algorithm
-
R. van de Geijn, R. and J. Watts, "SUMMA: Scalable Universal Matrix Multiplication Algorithm," Concurrency: Practice and Experience, vol. 9(4), pp. 255-274, 1997.
-
(1997)
Concurrency: Practice and Experience
, vol.9
, Issue.4
, pp. 255-274
-
-
Van De Geijn, R.1
Watts, J.2
-
20
-
-
0003978709
-
A proposal for a set of parallel basic linear algebra subprograms
-
May
-
J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. Walker, and, R. C. Whaley, "A Proposal for a Set of Parallel Basic Linear Algebra Subprograms", University of Tennessee, Knoxville, Tech. Rep. CS-95-292, May 1995.
-
(1995)
University of Tennessee, Knoxville, Tech. Rep.
, vol.CS-95-292
-
-
Choi, J.1
Dongarra, J.2
Ostrouchov, S.3
Petitet, A.4
Walker, D.5
Whaley, R.C.6
-
22
-
-
0030676131
-
A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers
-
J. Choi, "A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers", in Proc. IPPS '97, 1997.
-
(1997)
Proc. IPPS '97
-
-
Choi, J.1
-
23
-
-
12444271603
-
OpenMP issues arising in the development of parallel BLAS and LAPACK libraries
-
C. Addison and Y. Ren, "OpenMP Issues Arising in the Development of Parallel BLAS and LAPACK libraries", in Proceedings EWOMP'01. 2001.
-
(2001)
Proceedings EWOMP'01
-
-
Addison, C.1
Ren, Y.2
-
24
-
-
0035949178
-
Scalability and performance of OpenMP and MPI on a 128-processor SGI origin 2000
-
G. R. Luecke and W. Lin, "Scalability and Performance of OpenMP and MPI on a 128-Processor SGI Origin 2000", Concurrency and Computation: Practice and Experience, vol. 13, pp 905-928. 2001.
-
(2001)
Concurrency and Computation: Practice and Experience
, vol.13
, pp. 905-928
-
-
Luecke, G.R.1
Lin, W.2
-
26
-
-
84860083667
-
Performance analysis of various parallelization methods for BLAS3 routines on cluster architectures
-
Nov
-
T. Betcke, "Performance analysis of various parallelization methods for BLAS3 routines on cluster architectures", John von Neumann-Instituts für Computing, Tech. Rep. FZJ-ZAM-IB-2000-15, Nov, 2000.
-
(2000)
John von Neumann-Instituts für Computing, Tech. Rep.
, vol.FZJ-ZAM-IB-2000-15
-
-
Betcke, T.1
-
28
-
-
1142305191
-
High performance RDMA-Based MPI implementation over infiniband
-
J. Liu, J. Wu, S. P. Kinis, P. Wyckoff, and D. K. Panda, "High Performance RDMA-Based MPI Implementation over InfiniBand", in Proc of 17th ACM International Conference on Supercomputing, 2003.
-
(2003)
Proc of 17th ACM International Conference on Supercomputing
-
-
Liu, J.1
Wu, J.2
Kinis, S.P.3
Wyckoff, P.4
Panda, D.K.5
-
29
-
-
78649896726
-
Optimizing mechanisms for latency tolerance in remote memory access communication on clusters
-
J. Nieplocha, V. Tipparaju, M. Krishnan, G. Santhanaraman, and D.K. Panda,"Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Communication on Clusters", IEEE CLUSTER, 2003.
-
(2003)
IEEE CLUSTER
-
-
Nieplocha, J.1
Tipparaju, V.2
Krishnan, M.3
Santhanaraman, G.4
Panda, D.K.5
-
30
-
-
2342641297
-
-
Addison Wesley
-
A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing, Addison Wesley, 2003.
-
(2003)
Introduction to Parallel Computing
-
-
Grama, A.1
Gupta, A.2
Karypis, G.3
Kumar, V.4
-
32
-
-
0006168939
-
ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems
-
J. Nieplocha and B. Carpenter, "ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems", in Proceedings of RTSPP IPPS/SDP, 1999.
-
(1999)
Proceedings of RTSPP IPPS/SDP
-
-
Nieplocha, J.1
Carpenter, B.2
-
33
-
-
34247349414
-
-
ARMCI Web page. http://www.emsl.pnl.gov/docs/parsoft/armci/
-
ARMCI Web Page
-
-
-
34
-
-
12244256651
-
One-sided communication on myrinet
-
J. Nieplocha, V. Tipparaju, J. Ju, and E. Apra, "One-sided communication on Myrinet", Cluster Computing, vol. 6, pp. 115-124, 2003.
-
(2003)
Cluster Computing
, vol.6
, pp. 115-124
-
-
Nieplocha, J.1
Tipparaju, V.2
Ju, J.3
Apra, E.4
-
37
-
-
4544356070
-
Exploiting non-blocking remote memory access communication in scientific benchmarks
-
V. Tipparaju, M. Krishnan, J. Nieplocha, G. Santhanaraman, and D.K. Panda, "Exploiting Non-blocking Remote Memory Access Communication in Scientific Benchmarks", Proc. HiPC'2003, 2003.
-
(2003)
Proc. HiPC'2003
-
-
Tipparaju, V.1
Krishnan, M.2
Nieplocha, J.3
Santhanaraman, G.4
Panda, D.K.5
-
38
-
-
84948981514
-
COMB: A portable benchmark suite for assessing MPI overlap
-
B. Lawry, R. Wilson, A. B. Maccabe, and R. Brightwell, "COMB: A Portable Benchmark Suite for Assessing MPI Overlap", IEEE Cluster, 2002.
-
(2002)
IEEE Cluster
-
-
Lawry, B.1
Wilson, R.2
Maccabe, A.B.3
Brightwell, R.4
|