SCOPUS 정보 검색 플랫폼

Journal of Parallel and Distributed Computing

Volumn 64, Issue 9, 2004, Pages 1017-1026

Communication lower bounds for distributed-memory matrix multiplication

(3) Irony, Dror a Toledo, Sivan a Tiskin, Alexander b

a TEL AVIV UNIVERSITY (Israel)

b UNIVERSITY OF WARWICK (United Kingdom)

Author keywords

Communication; Distributed memory; Lower bounds; Matrix multiplication

Indexed keywords

ALGORITHMS; CACHE MEMORY; COMPUTER WORKSTATIONS; CONSTRAINT THEORY; DISTRIBUTED COMPUTER SYSTEMS; THEOREM PROVING;

DISTRIBUTED MEMORY; LOWER BOUNDS; MATRIX MULTIPLICATION;

PARALLEL PROCESSING SYSTEMS;

EID: 10844258198 PISSN: 07437315 EISSN: None Source Type: Journal
DOI: 10.1016/j.jpdc.2004.03.021 Document Type: Article

Times cited : (198)

References (39)

1
- 0029370767
- A three-dimensional approach to parallel matrix multiplication
- available online
- R.C. Agarwal, S.M. Balle, F.G. Gustavson, M. Joshi, P. Palkar, A three-dimensional approach to parallel matrix multiplication, IBM J. Res. Devel. 39 (5) (1995) 575-582, available online at http://www.research.ibm.com/journal/rd39-5.html.
- (1995) IBM J. Res. Devel. , vol.39 , Issue.5 , pp. 575-582
- Agarwal, R.C.¹ Balle, S.M.² Gustavson, F.G.³ Joshi, M.⁴ Palkar, P.⁵

2
- 0028513316
- Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms
- R.C. Agarwal, F.G. Gustavson, M. Zubair, Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms, IBM J. Res. Dev. 38 (5) (1994) 563-576.
- (1994) IBM J. Res. Dev. , vol.38 , Issue.5 , pp. 563-576
- Agarwal, R.C.¹ Gustavson, F.G.² Zubair, M.³

3
- 0028545949
- A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer using overlapped communication
- available online
- R.C. Agarwal, F.G. Gustavson, M. Zubair, A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer using overlapped communication, IBM J. Res. Devel. 38 (6) (1994) 673-681, available online at http://www.research.ibm.com/ journal/rd38-6.html.
- (1994) IBM J. Res. Devel. , vol.38 , Issue.6 , pp. 673-681
- Agarwal, R.C.¹ Gustavson, F.G.² Zubair, M.³

4
- 0028427170
- Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetch
- R.C. Agarwal, F.G. Gustavson, M. Zubair, Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetch, IBM J. Res. Devel. 38 (3) (1994) 265-275.
- (1994) IBM J. Res. Devel. , vol.38 , Issue.3 , pp. 265-275
- Agarwal, R.C.¹ Gustavson, F.G.² Zubair, M.³

5
- 0025231126
- Communication complexity of PRAMs
- A. Aggarwal, A. Chandra, M. Snir, Communication complexity of PRAMs, Theoret. Comput. Sci. 71 (1990) 3-28.
- (1990) Theoret. Comput. Sci. , vol.71 , pp. 3-28
- Aggarwal, A.¹ Chandra, A.² Snir, M.³

6
- 0024883116
- Communication efficient matrix multiplication on hypercubes
- J. Bernsten, Communication efficient matrix multiplication on hypercubes, Parallel Comput. 12 (1989) 335-342.
- (1989) Parallel Comput. , vol.12 , pp. 335-342
- Bernsten, J.¹

7
- 0030661485
- Optimizing matrix multiply using PHIPAC: A portable, high-performance, ANSI C coding methodology
- Vienna, Austria
- J. Bilmes, K. Asanovic, C.W. Chin, J. Demmel, Optimizing matrix multiply using PHIPAC: a portable, high-performance, ANSI C coding methodology, in: Proceedings of the International Conference on Supercomputing, Vienna, Austria, 1997.
- (1997) Proceedings of the International Conference on Supercomputing
- Bilmes, J.¹ Asanovic, K.² Chin, C.W.³ Demmel, J.⁴

8
- 0003350839
- Geometric Inequalities
- Springer, Berlin
- Yu.D. Burago, V.A. Zalgaller, Geometric Inequalities, Grundlehren der mathematischen Wissenschaften, vol. 285, Springer, Berlin, 1988.
- (1988) Grundlehren der Mathematischen Wissenschaften , vol.285
- Burago, Yu.D.¹ Zalgaller, V.A.²

9
- 0003712293
- Ph.D. Thesis, Montana State University
- L.E. Cannon, A cellular computer to implement the Kalman filter algorithm, Ph.D. Thesis, Montana State University, 1969.
- (1969) A Cellular Computer to Implement the Kalman Filter Algorithm
- Cannon, L.E.¹

10
- 0004116989
- MIT Press, McGraw-Hill, Cambridge, MA, New York
- T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, second ed., MIT Press, McGraw-Hill, Cambridge, MA, New York, 2001.
- (2001) Introduction to Algorithms, Second Ed.
- Cormen, T.H.¹ Leiserson, C.E.² Rivest, R.L.³ Stein, C.⁴

11
- 0040919367
- A blocked implementation of level 3 BLAS for RISC processors
- ENSEEIHT-IRIT, France
- M.J. Dayde, I.S. Duff, A blocked implementation of level 3 BLAS for RISC processors, Technical Report RT/APO/96/1, ENSEEIHT-IRIT, France, 1996.
- (1996) Technical Report , vol.RT-APO-96-1
- Dayde, M.J.¹ Duff, I.S.²

12
- 0000456144
- Parallel matrix and graph algorithms
- E. Dekel, D. Nassimi, S. Sahni, Parallel matrix and graph algorithms, SIAM J. Comput. 10 (1981) 657-675.
- (1981) SIAM J. Comput. , vol.10 , pp. 657-675
- Dekel, E.¹ Nassimi, D.² Sahni, S.³

13
- 0023288009
- Matrix algorithms on a hypercube i: Matrix Multiplication
- G.C. Fox, S.W. Otto, A.J.G. Hey, Matrix algorithms on a hypercube i: Matrix Multiplication, Parallel Comput. 4 (1987) 17-31.
- (1987) Parallel Comput. , vol.4 , pp. 17-31
- Fox, G.C.¹ Otto, S.W.² Hey, A.J.G.³

14
- 85054913228
- The scalability of matrix multiplication algorithms on parallel computers
- Department of Computer Science, University of Minnesota, available online from ftp://ftp.cs.umn.edu/users/kumar/matrix.ps
- A. Gupta, V. Kumar, The scalability of matrix multiplication algorithms on parallel computers, Technical Report TR 91-54, Department of Computer Science, University of Minnesota, 1991, available online from ftp://ftp.cs.umn.edu/users/kumar/matrix.ps, A short version appeared in Proceedings of 1993 International Conference on Parallel Processing, 1993, pp. III-115-III-119.
- (1991) Technical Report , vol.TR 91-54
- Gupta, A.¹ Kumar, V.²

15
- 85030827240
- A. Gupta, V. Kumar, The scalability of matrix multiplication algorithms on parallel computers, Technical Report TR 91-54, Department of Computer Science, University of Minnesota, 1991, available online from ftp://ftp.cs.umn.edu/users/kumar/matrix.ps, A short version appeared in Proceedings of 1993 International Conference on Parallel Processing, 1993, pp. III-115-III-119.
- (1993) Proceedings of 1993 International Conference on Parallel Processing

16
- 10844253520
- Vorlesungen über Inhalt, Oberfläche und Isoperimetrie
- Springer, Berlin
- H. Hadwiger, Vorlesungen über Inhalt, Oberfläche und Isoperimetrie, Grundlehren der mathematischen Wissenschaften, vol. 93, Springer, Berlin, 1957.
- (1957) Grundlehren der Mathematischen Wissenschaften , vol.93
- Hadwiger, H.¹

17
- 3242777317
- The performance of the Intel TFLOPS supercomputer
- available online
- G. Henry, P. Fay, B. Cole, T.G. Mattson, The performance of the Intel TFLOPS supercomputer, Intel Tech. J. 98 (1) (1998) available online at http://developer.intel.com/technology/it j/.
- (1998) Intel Tech. J. , vol.98 , Issue.1
- Henry, G.¹ Fay, P.² Cole, B.³ Mattson, T.G.⁴

18
- 84971853043
- I/O complexity: The red-blue pebble game
- J.-W. Hong, H.T. Kung, I/O complexity: the red-blue pebble game, in: Proceedings of the 13th Annual ACM Symposium on Theory of Computing, 1981, pp. 326-333.
- (1981) Proceedings of the 13th Annual ACM Symposium on Theory of Computing , pp. 326-333
- Hong, J.-W.¹ Kung, H.T.²

19
- 0036493233
- Trading replication for communication in parallel distributed-memory dense solvers
- D. Irony, S. Toledo, Trading replication for communication in parallel distributed-memory dense solvers, Parallel Process. Lett. 12 (2002) 79-94.
- (2002) Parallel Process. Lett. , vol.12 , pp. 79-94
- Irony, D.¹ Toledo, S.²

20
- 0027702512
- Minimizing the communication time for matrix multiplication on multiprocessors
- S.L. Johnsson, Minimizing the communication time for matrix multiplication on multiprocessors, Parallel Comput. 19 (1993) 1235-1257.
- (1993) Parallel Comput. , vol.19 , pp. 1235-1257
- Johnsson, S.L.¹

21
- 10844253519
- Local basic linear algebra subroutines LBLAS for the connection machine System CM-200
- S.L. Johnsson, L.F. Ortiz, Local basic linear algebra subroutines LBLAS for the Connection Machine system CM-200, Internat. J. Supercomput. Appl. 7 (1993) 322-350.
- (1993) Internat. J. Supercomput. Appl. , vol.7 , pp. 322-350
- Johnsson, S.L.¹ Ortiz, L.F.²

22
- 0346234145
- High performance GEMM-based level-3 BLAS: Sample routines for double precision real data
- M. Durand, F. El Dabaghi (Eds.), North-Holland, Amsterdam
- B. Kågström, P. Ling, C. Van Loan, High performance GEMM-based level-3 BLAS: sample routines for double precision real data, in: M. Durand, F. El Dabaghi (Eds.), High Performance Computing H, North-Holland, Amsterdam, 1991, pp. 269-281.
- (1991) High Performance Computing H , pp. 269-281
- Kågström, B.¹ Ling, P.² Van Loan, C.³

23
- 10844275231
- Portable high performance GEMM-based level 3 BLAS
- Philadelphia
- B. Kågström, P. Ling, C. Van Loan, Portable high performance GEMM-based level 3 BLAS, in: Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, Philadelphia, 1993, pp. 339-346.
- (1993) Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing , pp. 339-346
- Kågström, B.¹ Ling, P.² Van Loan, C.³

24
- 10844292223
- GEMM-based level-3 BLAS
- Department of Computer Science, Cornell University
- B. Kågström, C. Van Loan, GEMM-based level-3 BLAS, Technical Report CTC-91-TR47, Department of Computer Science, Cornell University, 1989.
- (1989) Technical Report , vol.CTC-91-TR47
- Kågström, B.¹ Van Loan, C.²

25
- 0028459839
- DXML: A high-performance scientific subroutine library
- C. Kamath, R. Ho, D.P. Manley, DXML: a high-performance scientific subroutine library, Digital Tech. J. 6 (3) (1994) 44-56.
- (1994) Digital Tech. J. , vol.6 , Issue.3 , pp. 44-56
- Kamath, C.¹ Ho, R.² Manley, D.P.³

26
- 8344273620
- Local basic linear algebra subroutines (LBLAS) for the CM-5/5E
- D. Kramer, S.L. Johnsson, Yu Hu, Local basic linear algebra subroutines (LBLAS) for the CM-5/5E, Internat. J. Supercomput. Appl. 10 (1996) 300-335.
- (1996) Internat. J. Supercomput. Appl. , vol.10 , pp. 300-335
- Kramer, D.¹ Johnsson, S.L.² Hu, Yu.³

27
- 0001289565
- An inequality related to the isoperimetric inequality
- L.H. Loomis, H. Whitney, An inequality related to the isoperimetric inequality, Bull. AMS 55 (1949) 961-962.
- (1949) Bull. AMS , vol.55 , pp. 961-962
- Loomis, L.H.¹ Whitney, H.²

28
- 0000743020
- Memory-efficient matrix multiplication in the BSP model
- W.F. McColl, A. Tiskin, Memory-efficient matrix multiplication in the BSP model, Algorithmica 24 (3/4) (1999) 287-297.
- (1999) Algorithmica , vol.24 , Issue.3-4 , pp. 287-297
- McColl, W.F.¹ Tiskin, A.²

29
- 84945709131
- Organizing matrices and matrix operations for paged memory systems
- A.C. McKeller, E.G. Coffman Jr., Organizing matrices and matrix operations for paged memory systems, Commun. ACM 12 (3) (1969) 153-165.
- (1969) Commun. ACM , vol.12 , Issue.3 , pp. 153-165
- McKeller, A.C.¹ Coffman Jr., E.G.²

30
- 85030824962
- Private communication
- M.S. Paterson, Private communication, 1993
- (1993)
- Paterson, M.S.¹

31
- 85030821457
- Matrix algebra programs for the UNIVAC
- Copies available from Sivan Toledo, March
- J. Rutledge, H. Rubinstein, Matrix algebra programs for the UNIVAC, Presented at the Wayne Conference on Automatic Computing Machinery and Applications, Copies available from Sivan Toledo, March 1951.
- (1951) Wayne Conference on Automatic Computing Machinery and Applications
- Rutledge, J.¹ Rubinstein, H.²

32
- 85030819093
- High order matrix computation on the UNIVAC
- Copies available from Sivan Toledo, May
- J. Rutledge, H. Rubinstein, High order matrix computation on the UNIVAC, Presented at the meeting of the Association for Computing Machinery, Copies available from Sivan Toledo, May 1952.
- (1952) Meeting of the Association for Computing Machinery
- Rutledge, J.¹ Rubinstein, H.²

33
- 34250487811
- Gaussian elimination is not optimal
- V. Strassen, Gaussian elimination is not optimal, Numer. Math. 13 (1969) 354-355.
- (1969) Numer. Math. , vol.13 , pp. 354-355
- Strassen, V.¹

34
- 22044455697
- Bulk-synchronous parallel multiplication of Boolean matrices
- K.G. Larsen, S. Skyum, W. Winskel (Eds.), Springer, Berlin
- A. Tiskin, Bulk-synchronous parallel multiplication of Boolean matrices, in: K.G. Larsen, S. Skyum, W. Winskel (Eds.), Proceedings of ICALP, Lecture Notes in Computer Science, vol. 1443, Springer, Berlin, 1998, pp. 494-506.
- (1998) Proceedings of ICALP, Lecture Notes in Computer Science , vol.1443 , pp. 494-506
- Tiskin, A.¹

35
- 0346098076
- The bulk-synchronous parallel random access machine
- April
- A. Tiskin, The bulk-synchronous parallel random access machine, Theoret. Comput. Sci. 196 (1-2) (April 1998) 109-130.
- (1998) Theoret. Comput. Sci. , vol.196 , Issue.1-2 , pp. 109-130
- Tiskin, A.¹

36
- 84887440108
- Erratum: Bulk-synchronous parallel multiplication of Boolean matrices
- J. Wiedermann, P. van Emde Boas, M. Nielsen (Eds.), Springer, Berlin
- A. Tiskin, Erratum: Bulk-synchronous parallel multiplication of Boolean matrices, in: J. Wiedermann, P. van Emde Boas, M. Nielsen (Eds.), Proceedings of ICALP, Lecture Notes in Computer Science, vol. 1644, Springer, Berlin, 1999, p. 717.
- (1999) Proceedings of ICALP, Lecture Notes in Computer Science , vol.1644 , pp. 717
- Tiskin, A.¹

37
- 0002831423
- A survey of out-of-core algorithms in numerical linear algebra
- James M. Abello, Jeffrey Scott Vitter (Eds.), American Mathematical Society, Providence, RI
- S. Toledo, A survey of out-of-core algorithms in numerical linear algebra, in: James M. Abello, Jeffrey Scott Vitter (Eds.), External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, American Mathematical Society, Providence, RI, 1999, pp. 161-179.
- (1999) External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science , pp. 161-179
- Toledo, S.¹

38
- 0031123769
- SUMMA: Scalable universal matrix multiplication algorithm
- R. van de Geijn, J. Watts, SUMMA: scalable universal matrix multiplication algorithm, Concurrency: Pract. Exp. 9 (1997) 255-274.
- (1997) Concurrency: Pract. Exp. , vol.9 , pp. 255-274
- Van De Geijn, R.¹ Watts, J.²

39
- 0003418094
- Automatically tuned linear algebra software
- Computer Science Department, University of Tennessee, available online at www.netlib.org/atlas
- R.C. Whaley, J.J. Dongarr, Automatically tuned linear algebra software, Technical Report, Computer Science Department, University of Tennessee, 1998, available online at www.netlib.org/atlas.
- (1998) Technical Report
- Whaley, R.C.¹ Dongarr, J.J.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.