-
1
-
-
0029370767
-
A three-dimensional approach to parallel matrix multiplication
-
available online
-
R.C. Agarwal, S.M. Balle, F.G. Gustavson, M. Joshi, P. Palkar, A three-dimensional approach to parallel matrix multiplication, IBM J. Res. Devel. 39 (5) (1995) 575-582, available online at http://www.research.ibm.com/journal/rd39-5.html.
-
(1995)
IBM J. Res. Devel.
, vol.39
, Issue.5
, pp. 575-582
-
-
Agarwal, R.C.1
Balle, S.M.2
Gustavson, F.G.3
Joshi, M.4
Palkar, P.5
-
2
-
-
0028513316
-
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms
-
R.C. Agarwal, F.G. Gustavson, M. Zubair, Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms, IBM J. Res. Dev. 38 (5) (1994) 563-576.
-
(1994)
IBM J. Res. Dev.
, vol.38
, Issue.5
, pp. 563-576
-
-
Agarwal, R.C.1
Gustavson, F.G.2
Zubair, M.3
-
3
-
-
0028545949
-
A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer using overlapped communication
-
available online
-
R.C. Agarwal, F.G. Gustavson, M. Zubair, A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer using overlapped communication, IBM J. Res. Devel. 38 (6) (1994) 673-681, available online at http://www.research.ibm.com/ journal/rd38-6.html.
-
(1994)
IBM J. Res. Devel.
, vol.38
, Issue.6
, pp. 673-681
-
-
Agarwal, R.C.1
Gustavson, F.G.2
Zubair, M.3
-
4
-
-
0028427170
-
Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetch
-
R.C. Agarwal, F.G. Gustavson, M. Zubair, Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetch, IBM J. Res. Devel. 38 (3) (1994) 265-275.
-
(1994)
IBM J. Res. Devel.
, vol.38
, Issue.3
, pp. 265-275
-
-
Agarwal, R.C.1
Gustavson, F.G.2
Zubair, M.3
-
6
-
-
0024883116
-
Communication efficient matrix multiplication on hypercubes
-
J. Bernsten, Communication efficient matrix multiplication on hypercubes, Parallel Comput. 12 (1989) 335-342.
-
(1989)
Parallel Comput.
, vol.12
, pp. 335-342
-
-
Bernsten, J.1
-
7
-
-
0030661485
-
Optimizing matrix multiply using PHIPAC: A portable, high-performance, ANSI C coding methodology
-
Vienna, Austria
-
J. Bilmes, K. Asanovic, C.W. Chin, J. Demmel, Optimizing matrix multiply using PHIPAC: a portable, high-performance, ANSI C coding methodology, in: Proceedings of the International Conference on Supercomputing, Vienna, Austria, 1997.
-
(1997)
Proceedings of the International Conference on Supercomputing
-
-
Bilmes, J.1
Asanovic, K.2
Chin, C.W.3
Demmel, J.4
-
8
-
-
0003350839
-
Geometric Inequalities
-
Springer, Berlin
-
Yu.D. Burago, V.A. Zalgaller, Geometric Inequalities, Grundlehren der mathematischen Wissenschaften, vol. 285, Springer, Berlin, 1988.
-
(1988)
Grundlehren der Mathematischen Wissenschaften
, vol.285
-
-
Burago, Yu.D.1
Zalgaller, V.A.2
-
10
-
-
0004116989
-
-
MIT Press, McGraw-Hill, Cambridge, MA, New York
-
T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, second ed., MIT Press, McGraw-Hill, Cambridge, MA, New York, 2001.
-
(2001)
Introduction to Algorithms, Second Ed.
-
-
Cormen, T.H.1
Leiserson, C.E.2
Rivest, R.L.3
Stein, C.4
-
11
-
-
0040919367
-
A blocked implementation of level 3 BLAS for RISC processors
-
ENSEEIHT-IRIT, France
-
M.J. Dayde, I.S. Duff, A blocked implementation of level 3 BLAS for RISC processors, Technical Report RT/APO/96/1, ENSEEIHT-IRIT, France, 1996.
-
(1996)
Technical Report
, vol.RT-APO-96-1
-
-
Dayde, M.J.1
Duff, I.S.2
-
13
-
-
0023288009
-
Matrix algorithms on a hypercube i: Matrix Multiplication
-
G.C. Fox, S.W. Otto, A.J.G. Hey, Matrix algorithms on a hypercube i: Matrix Multiplication, Parallel Comput. 4 (1987) 17-31.
-
(1987)
Parallel Comput.
, vol.4
, pp. 17-31
-
-
Fox, G.C.1
Otto, S.W.2
Hey, A.J.G.3
-
14
-
-
85054913228
-
The scalability of matrix multiplication algorithms on parallel computers
-
Department of Computer Science, University of Minnesota, available online from ftp://ftp.cs.umn.edu/users/kumar/matrix.ps
-
A. Gupta, V. Kumar, The scalability of matrix multiplication algorithms on parallel computers, Technical Report TR 91-54, Department of Computer Science, University of Minnesota, 1991, available online from ftp://ftp.cs.umn.edu/users/kumar/matrix.ps, A short version appeared in Proceedings of 1993 International Conference on Parallel Processing, 1993, pp. III-115-III-119.
-
(1991)
Technical Report
, vol.TR 91-54
-
-
Gupta, A.1
Kumar, V.2
-
15
-
-
85030827240
-
-
A. Gupta, V. Kumar, The scalability of matrix multiplication algorithms on parallel computers, Technical Report TR 91-54, Department of Computer Science, University of Minnesota, 1991, available online from ftp://ftp.cs.umn.edu/users/kumar/matrix.ps, A short version appeared in Proceedings of 1993 International Conference on Parallel Processing, 1993, pp. III-115-III-119.
-
(1993)
Proceedings of 1993 International Conference on Parallel Processing
-
-
-
16
-
-
10844253520
-
Vorlesungen über Inhalt, Oberfläche und Isoperimetrie
-
Springer, Berlin
-
H. Hadwiger, Vorlesungen über Inhalt, Oberfläche und Isoperimetrie, Grundlehren der mathematischen Wissenschaften, vol. 93, Springer, Berlin, 1957.
-
(1957)
Grundlehren der Mathematischen Wissenschaften
, vol.93
-
-
Hadwiger, H.1
-
17
-
-
3242777317
-
The performance of the Intel TFLOPS supercomputer
-
available online
-
G. Henry, P. Fay, B. Cole, T.G. Mattson, The performance of the Intel TFLOPS supercomputer, Intel Tech. J. 98 (1) (1998) available online at http://developer.intel.com/technology/it j/.
-
(1998)
Intel Tech. J.
, vol.98
, Issue.1
-
-
Henry, G.1
Fay, P.2
Cole, B.3
Mattson, T.G.4
-
19
-
-
0036493233
-
Trading replication for communication in parallel distributed-memory dense solvers
-
D. Irony, S. Toledo, Trading replication for communication in parallel distributed-memory dense solvers, Parallel Process. Lett. 12 (2002) 79-94.
-
(2002)
Parallel Process. Lett.
, vol.12
, pp. 79-94
-
-
Irony, D.1
Toledo, S.2
-
20
-
-
0027702512
-
Minimizing the communication time for matrix multiplication on multiprocessors
-
S.L. Johnsson, Minimizing the communication time for matrix multiplication on multiprocessors, Parallel Comput. 19 (1993) 1235-1257.
-
(1993)
Parallel Comput.
, vol.19
, pp. 1235-1257
-
-
Johnsson, S.L.1
-
21
-
-
10844253519
-
Local basic linear algebra subroutines LBLAS for the connection machine System CM-200
-
S.L. Johnsson, L.F. Ortiz, Local basic linear algebra subroutines LBLAS for the Connection Machine system CM-200, Internat. J. Supercomput. Appl. 7 (1993) 322-350.
-
(1993)
Internat. J. Supercomput. Appl.
, vol.7
, pp. 322-350
-
-
Johnsson, S.L.1
Ortiz, L.F.2
-
22
-
-
0346234145
-
High performance GEMM-based level-3 BLAS: Sample routines for double precision real data
-
M. Durand, F. El Dabaghi (Eds.), North-Holland, Amsterdam
-
B. Kågström, P. Ling, C. Van Loan, High performance GEMM-based level-3 BLAS: sample routines for double precision real data, in: M. Durand, F. El Dabaghi (Eds.), High Performance Computing H, North-Holland, Amsterdam, 1991, pp. 269-281.
-
(1991)
High Performance Computing H
, pp. 269-281
-
-
Kågström, B.1
Ling, P.2
Van Loan, C.3
-
23
-
-
10844275231
-
Portable high performance GEMM-based level 3 BLAS
-
Philadelphia
-
B. Kågström, P. Ling, C. Van Loan, Portable high performance GEMM-based level 3 BLAS, in: Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, Philadelphia, 1993, pp. 339-346.
-
(1993)
Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing
, pp. 339-346
-
-
Kågström, B.1
Ling, P.2
Van Loan, C.3
-
24
-
-
10844292223
-
GEMM-based level-3 BLAS
-
Department of Computer Science, Cornell University
-
B. Kågström, C. Van Loan, GEMM-based level-3 BLAS, Technical Report CTC-91-TR47, Department of Computer Science, Cornell University, 1989.
-
(1989)
Technical Report
, vol.CTC-91-TR47
-
-
Kågström, B.1
Van Loan, C.2
-
25
-
-
0028459839
-
DXML: A high-performance scientific subroutine library
-
C. Kamath, R. Ho, D.P. Manley, DXML: a high-performance scientific subroutine library, Digital Tech. J. 6 (3) (1994) 44-56.
-
(1994)
Digital Tech. J.
, vol.6
, Issue.3
, pp. 44-56
-
-
Kamath, C.1
Ho, R.2
Manley, D.P.3
-
27
-
-
0001289565
-
An inequality related to the isoperimetric inequality
-
L.H. Loomis, H. Whitney, An inequality related to the isoperimetric inequality, Bull. AMS 55 (1949) 961-962.
-
(1949)
Bull. AMS
, vol.55
, pp. 961-962
-
-
Loomis, L.H.1
Whitney, H.2
-
28
-
-
0000743020
-
Memory-efficient matrix multiplication in the BSP model
-
W.F. McColl, A. Tiskin, Memory-efficient matrix multiplication in the BSP model, Algorithmica 24 (3/4) (1999) 287-297.
-
(1999)
Algorithmica
, vol.24
, Issue.3-4
, pp. 287-297
-
-
McColl, W.F.1
Tiskin, A.2
-
29
-
-
84945709131
-
Organizing matrices and matrix operations for paged memory systems
-
A.C. McKeller, E.G. Coffman Jr., Organizing matrices and matrix operations for paged memory systems, Commun. ACM 12 (3) (1969) 153-165.
-
(1969)
Commun. ACM
, vol.12
, Issue.3
, pp. 153-165
-
-
McKeller, A.C.1
Coffman Jr., E.G.2
-
30
-
-
85030824962
-
-
Private communication
-
M.S. Paterson, Private communication, 1993
-
(1993)
-
-
Paterson, M.S.1
-
33
-
-
34250487811
-
Gaussian elimination is not optimal
-
V. Strassen, Gaussian elimination is not optimal, Numer. Math. 13 (1969) 354-355.
-
(1969)
Numer. Math.
, vol.13
, pp. 354-355
-
-
Strassen, V.1
-
34
-
-
22044455697
-
Bulk-synchronous parallel multiplication of Boolean matrices
-
K.G. Larsen, S. Skyum, W. Winskel (Eds.), Springer, Berlin
-
A. Tiskin, Bulk-synchronous parallel multiplication of Boolean matrices, in: K.G. Larsen, S. Skyum, W. Winskel (Eds.), Proceedings of ICALP, Lecture Notes in Computer Science, vol. 1443, Springer, Berlin, 1998, pp. 494-506.
-
(1998)
Proceedings of ICALP, Lecture Notes in Computer Science
, vol.1443
, pp. 494-506
-
-
Tiskin, A.1
-
35
-
-
0346098076
-
The bulk-synchronous parallel random access machine
-
April
-
A. Tiskin, The bulk-synchronous parallel random access machine, Theoret. Comput. Sci. 196 (1-2) (April 1998) 109-130.
-
(1998)
Theoret. Comput. Sci.
, vol.196
, Issue.1-2
, pp. 109-130
-
-
Tiskin, A.1
-
36
-
-
84887440108
-
Erratum: Bulk-synchronous parallel multiplication of Boolean matrices
-
J. Wiedermann, P. van Emde Boas, M. Nielsen (Eds.), Springer, Berlin
-
A. Tiskin, Erratum: Bulk-synchronous parallel multiplication of Boolean matrices, in: J. Wiedermann, P. van Emde Boas, M. Nielsen (Eds.), Proceedings of ICALP, Lecture Notes in Computer Science, vol. 1644, Springer, Berlin, 1999, p. 717.
-
(1999)
Proceedings of ICALP, Lecture Notes in Computer Science
, vol.1644
, pp. 717
-
-
Tiskin, A.1
-
37
-
-
0002831423
-
A survey of out-of-core algorithms in numerical linear algebra
-
James M. Abello, Jeffrey Scott Vitter (Eds.), American Mathematical Society, Providence, RI
-
S. Toledo, A survey of out-of-core algorithms in numerical linear algebra, in: James M. Abello, Jeffrey Scott Vitter (Eds.), External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, American Mathematical Society, Providence, RI, 1999, pp. 161-179.
-
(1999)
External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science
, pp. 161-179
-
-
Toledo, S.1
-
38
-
-
0031123769
-
SUMMA: Scalable universal matrix multiplication algorithm
-
R. van de Geijn, J. Watts, SUMMA: scalable universal matrix multiplication algorithm, Concurrency: Pract. Exp. 9 (1997) 255-274.
-
(1997)
Concurrency: Pract. Exp.
, vol.9
, pp. 255-274
-
-
Van De Geijn, R.1
Watts, J.2
-
39
-
-
0003418094
-
Automatically tuned linear algebra software
-
Computer Science Department, University of Tennessee, available online at www.netlib.org/atlas
-
R.C. Whaley, J.J. Dongarr, Automatically tuned linear algebra software, Technical Report, Computer Science Department, University of Tennessee, 1998, available online at www.netlib.org/atlas.
-
(1998)
Technical Report
-
-
Whaley, R.C.1
Dongarr, J.J.2
|