-
1
-
-
0028545949
-
A high performance matrix multiplication algorithm on a distributed-memory parallel computer, using overlapped communication
-
R. Agarwal, F. Gustavson, and M. Zubair. A high performance matrix multiplication algorithm on a distributed-memory parallel computer, using overlapped communication. IBMJ. of Res. and Develop., 38(6):673-681, 1994.
-
(1994)
IBMJ. of Res. and Develop
, vol.38
, Issue.6
, pp. 673-681
-
-
Agarwal, R.1
Gustavson, F.2
Zubair, M.3
-
3
-
-
0028530654
-
PUMMA: Parallel universal matrix multiplication algorithms
-
Oct
-
J. Choi, J. J. Dongarra, and D. W. Walker. PUMMA: parallel universal matrix multiplication algorithms. Concurrency: Practice, and Experience, 6(7):543-570, Oct. 1994.
-
(1994)
Concurrency: Practice, and Experience
, vol.6
, Issue.7
, pp. 543-570
-
-
Choi, J.1
Dongarra, J.J.2
Walker, D.W.3
-
5
-
-
0028485006
-
Constructive methods for scheduling uniform loop nests
-
Aug
-
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814-822, Aug. 1994.
-
(1994)
IEEE Trans. Parallel Distributed Systems
, vol.5
, Issue.8
, pp. 814-822
-
-
Darte, A.1
Robert, Y.2
-
6
-
-
0025402476
-
A set of level 3 basic linear algebra subprograms
-
J. J. Dongarra, J. D. Croz, I. Duff, and S. Hammarling. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Software, 16:1-17, 1990.
-
(1990)
ACM Trans. Math. Software
, vol.16
, pp. 1-17
-
-
Dongarra, J.J.1
Croz, J.D.2
Duff, I.3
Hammarling, S.4
-
7
-
-
0023983122
-
An extended set of FORTRAN basic linear algebra subprograms
-
J. J. Dongarra, J. D. Croz, S. Hammarling, and R. J. Hanson. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Software, 14:1-17, 1988.
-
(1988)
ACM Trans. Math. Software
, vol.14
, pp. 1-17
-
-
Dongarra, J.J.1
Croz, J.D.2
Hammarling, S.3
Hanson, R.J.4
-
8
-
-
0023288009
-
Matrix algorithms on a hypercube I: Matrix multiplication
-
G. Fox, S. Otto, and A. Hey. Matrix algorithms on a hypercube I: Matrix multiplication. Parallel Computing, 4:17-31, 1987.
-
(1987)
Parallel Computing
, vol.4
, pp. 17-31
-
-
Fox, G.1
Otto, S.2
Hey, A.3
-
10
-
-
0028529387
-
Matrix multiplication on the Intel Touchstone Delta
-
Oct
-
S. Huss-Lederman, E. M. Jacobson, A. Tsao, and G. Zhan. Matrix multiplication on the Intel Touchstone Delta. Concurrency: Practice and Experience, 6(7):571-594, Oct. 1994.
-
(1994)
Concurrency: Practice and Experience
, vol.6
, Issue.7
, pp. 571-594
-
-
Huss-Lederman, S.1
Jacobson, E.M.2
Tsao, A.3
Zhan, G.4
-
11
-
-
0346234145
-
High performance GEMM-based level 3 BLAS: Sample routines for double precision real data
-
North-Holland
-
B. Kågström, P. Ling, and C. V. Loan. High performance GEMM-based level 3 BLAS: Sample routines for double precision real data. In High Performance Computing II, pages 269-281. North-Holland, 1991.
-
(1991)
High Performance Computing II
, pp. 269-281
-
-
Kågström, B.1
Ling, P.2
Loan, C.V.3
-
12
-
-
0032155271
-
GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark
-
B. Kågström, P. Ling, and C. V. Loan. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark. ACM Trans. Math. Software, 24(3):268-302, 1998.
-
(1998)
ACM Trans. Math. Software
, vol.24
, Issue.3
, pp. 268-302
-
-
Kågström, B.1
Ling, P.2
Loan, C.V.3
-
14
-
-
33847158318
-
Advanced systolic design
-
Digital Signal Processing for Multimedia systems, chapter 23, Marcel Dekker
-
D. Lavenier, P. Quinton, and S. Rajopadhye. Advanced systolic design. In Digital Signal Processing for Multimedia systems, Signal Processing Series, chapter 23, pages 657-692. Marcel Dekker, 1999.
-
(1999)
Signal Processing Series
, pp. 657-692
-
-
Lavenier, D.1
Quinton, P.2
Rajopadhye, S.3
-
15
-
-
0018515759
-
Basic linear algebra subprograms for FORTRAN usage
-
C. L. Lawson, R. J. Hanson, R. J. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for FORTRAN usage. ACM Trans. Math. Software, 5:308-323, 1979.
-
(1979)
ACM Trans. Math. Software
, vol.5
, pp. 308-323
-
-
Lawson, C.L.1
Hanson, R.J.2
Kincaid, R.J.3
Krogh, F.T.4
-
16
-
-
0038553717
-
Modular mappings and data distribution independent computations
-
H. J. Lee and J. A. Fortes. Modular mappings and data distribution independent computations. Parallel Processing Letters, 7(2): 169-180, 1997.
-
(1997)
Parallel Processing Letters
, vol.7
, Issue.2
, pp. 169-180
-
-
Lee, H.J.1
Fortes, J.A.2
-
18
-
-
0003588633
-
SUMMA: Scalable universal matrix multiplication algorithm. Technical Report TR-95-13, The University of Texas
-
Apr
-
R. van de Geijn and J. Watts. SUMMA: Scalable universal matrix multiplication algorithm. Technical Report TR-95-13, The University of Texas, Apr. 1995.
-
(1995)
-
-
van de Geijn, R.1
Watts, J.2
-
20
-
-
33847160319
-
Matrix transpose on 2D torus. Technical Report 2005-1-001, The University of Aizu, Aizu Wakamatsu, Japan
-
Dec
-
A. S. Zekri and S. G. Sedukhin. Matrix transpose on 2D torus. Technical Report 2005-1-001, The University of Aizu, Aizu Wakamatsu, Japan, Dec. 2005.
-
(2005)
-
-
Zekri, A.S.1
Sedukhin, S.G.2
|