메뉴 건너뛰기




Volumn 64, Issue 9, 2004, Pages 1017-1026

Communication lower bounds for distributed-memory matrix multiplication

Author keywords

Communication; Distributed memory; Lower bounds; Matrix multiplication

Indexed keywords

ALGORITHMS; CACHE MEMORY; COMPUTER WORKSTATIONS; CONSTRAINT THEORY; DISTRIBUTED COMPUTER SYSTEMS; THEOREM PROVING;

EID: 10844258198     PISSN: 07437315     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.jpdc.2004.03.021     Document Type: Article
Times cited : (198)

References (39)
  • 1
    • 0029370767 scopus 로고
    • A three-dimensional approach to parallel matrix multiplication
    • available online
    • R.C. Agarwal, S.M. Balle, F.G. Gustavson, M. Joshi, P. Palkar, A three-dimensional approach to parallel matrix multiplication, IBM J. Res. Devel. 39 (5) (1995) 575-582, available online at http://www.research.ibm.com/journal/rd39-5.html.
    • (1995) IBM J. Res. Devel. , vol.39 , Issue.5 , pp. 575-582
    • Agarwal, R.C.1    Balle, S.M.2    Gustavson, F.G.3    Joshi, M.4    Palkar, P.5
  • 2
    • 0028513316 scopus 로고
    • Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms
    • R.C. Agarwal, F.G. Gustavson, M. Zubair, Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms, IBM J. Res. Dev. 38 (5) (1994) 563-576.
    • (1994) IBM J. Res. Dev. , vol.38 , Issue.5 , pp. 563-576
    • Agarwal, R.C.1    Gustavson, F.G.2    Zubair, M.3
  • 3
    • 0028545949 scopus 로고
    • A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer using overlapped communication
    • available online
    • R.C. Agarwal, F.G. Gustavson, M. Zubair, A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer using overlapped communication, IBM J. Res. Devel. 38 (6) (1994) 673-681, available online at http://www.research.ibm.com/ journal/rd38-6.html.
    • (1994) IBM J. Res. Devel. , vol.38 , Issue.6 , pp. 673-681
    • Agarwal, R.C.1    Gustavson, F.G.2    Zubair, M.3
  • 4
    • 0028427170 scopus 로고
    • Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetch
    • R.C. Agarwal, F.G. Gustavson, M. Zubair, Improving performance of linear algebra algorithms for dense matrices using algorithmic prefetch, IBM J. Res. Devel. 38 (3) (1994) 265-275.
    • (1994) IBM J. Res. Devel. , vol.38 , Issue.3 , pp. 265-275
    • Agarwal, R.C.1    Gustavson, F.G.2    Zubair, M.3
  • 6
    • 0024883116 scopus 로고
    • Communication efficient matrix multiplication on hypercubes
    • J. Bernsten, Communication efficient matrix multiplication on hypercubes, Parallel Comput. 12 (1989) 335-342.
    • (1989) Parallel Comput. , vol.12 , pp. 335-342
    • Bernsten, J.1
  • 11
    • 0040919367 scopus 로고    scopus 로고
    • A blocked implementation of level 3 BLAS for RISC processors
    • ENSEEIHT-IRIT, France
    • M.J. Dayde, I.S. Duff, A blocked implementation of level 3 BLAS for RISC processors, Technical Report RT/APO/96/1, ENSEEIHT-IRIT, France, 1996.
    • (1996) Technical Report , vol.RT-APO-96-1
    • Dayde, M.J.1    Duff, I.S.2
  • 12
    • 0000456144 scopus 로고
    • Parallel matrix and graph algorithms
    • E. Dekel, D. Nassimi, S. Sahni, Parallel matrix and graph algorithms, SIAM J. Comput. 10 (1981) 657-675.
    • (1981) SIAM J. Comput. , vol.10 , pp. 657-675
    • Dekel, E.1    Nassimi, D.2    Sahni, S.3
  • 13
    • 0023288009 scopus 로고
    • Matrix algorithms on a hypercube i: Matrix Multiplication
    • G.C. Fox, S.W. Otto, A.J.G. Hey, Matrix algorithms on a hypercube i: Matrix Multiplication, Parallel Comput. 4 (1987) 17-31.
    • (1987) Parallel Comput. , vol.4 , pp. 17-31
    • Fox, G.C.1    Otto, S.W.2    Hey, A.J.G.3
  • 14
    • 85054913228 scopus 로고
    • The scalability of matrix multiplication algorithms on parallel computers
    • Department of Computer Science, University of Minnesota, available online from ftp://ftp.cs.umn.edu/users/kumar/matrix.ps
    • A. Gupta, V. Kumar, The scalability of matrix multiplication algorithms on parallel computers, Technical Report TR 91-54, Department of Computer Science, University of Minnesota, 1991, available online from ftp://ftp.cs.umn.edu/users/kumar/matrix.ps, A short version appeared in Proceedings of 1993 International Conference on Parallel Processing, 1993, pp. III-115-III-119.
    • (1991) Technical Report , vol.TR 91-54
    • Gupta, A.1    Kumar, V.2
  • 15
    • 85030827240 scopus 로고
    • A. Gupta, V. Kumar, The scalability of matrix multiplication algorithms on parallel computers, Technical Report TR 91-54, Department of Computer Science, University of Minnesota, 1991, available online from ftp://ftp.cs.umn.edu/users/kumar/matrix.ps, A short version appeared in Proceedings of 1993 International Conference on Parallel Processing, 1993, pp. III-115-III-119.
    • (1993) Proceedings of 1993 International Conference on Parallel Processing
  • 16
    • 10844253520 scopus 로고
    • Vorlesungen über Inhalt, Oberfläche und Isoperimetrie
    • Springer, Berlin
    • H. Hadwiger, Vorlesungen über Inhalt, Oberfläche und Isoperimetrie, Grundlehren der mathematischen Wissenschaften, vol. 93, Springer, Berlin, 1957.
    • (1957) Grundlehren der Mathematischen Wissenschaften , vol.93
    • Hadwiger, H.1
  • 17
    • 3242777317 scopus 로고    scopus 로고
    • The performance of the Intel TFLOPS supercomputer
    • available online
    • G. Henry, P. Fay, B. Cole, T.G. Mattson, The performance of the Intel TFLOPS supercomputer, Intel Tech. J. 98 (1) (1998) available online at http://developer.intel.com/technology/it j/.
    • (1998) Intel Tech. J. , vol.98 , Issue.1
    • Henry, G.1    Fay, P.2    Cole, B.3    Mattson, T.G.4
  • 19
    • 0036493233 scopus 로고    scopus 로고
    • Trading replication for communication in parallel distributed-memory dense solvers
    • D. Irony, S. Toledo, Trading replication for communication in parallel distributed-memory dense solvers, Parallel Process. Lett. 12 (2002) 79-94.
    • (2002) Parallel Process. Lett. , vol.12 , pp. 79-94
    • Irony, D.1    Toledo, S.2
  • 20
    • 0027702512 scopus 로고
    • Minimizing the communication time for matrix multiplication on multiprocessors
    • S.L. Johnsson, Minimizing the communication time for matrix multiplication on multiprocessors, Parallel Comput. 19 (1993) 1235-1257.
    • (1993) Parallel Comput. , vol.19 , pp. 1235-1257
    • Johnsson, S.L.1
  • 21
    • 10844253519 scopus 로고
    • Local basic linear algebra subroutines LBLAS for the connection machine System CM-200
    • S.L. Johnsson, L.F. Ortiz, Local basic linear algebra subroutines LBLAS for the Connection Machine system CM-200, Internat. J. Supercomput. Appl. 7 (1993) 322-350.
    • (1993) Internat. J. Supercomput. Appl. , vol.7 , pp. 322-350
    • Johnsson, S.L.1    Ortiz, L.F.2
  • 22
    • 0346234145 scopus 로고
    • High performance GEMM-based level-3 BLAS: Sample routines for double precision real data
    • M. Durand, F. El Dabaghi (Eds.), North-Holland, Amsterdam
    • B. Kågström, P. Ling, C. Van Loan, High performance GEMM-based level-3 BLAS: sample routines for double precision real data, in: M. Durand, F. El Dabaghi (Eds.), High Performance Computing H, North-Holland, Amsterdam, 1991, pp. 269-281.
    • (1991) High Performance Computing H , pp. 269-281
    • Kågström, B.1    Ling, P.2    Van Loan, C.3
  • 24
    • 10844292223 scopus 로고
    • GEMM-based level-3 BLAS
    • Department of Computer Science, Cornell University
    • B. Kågström, C. Van Loan, GEMM-based level-3 BLAS, Technical Report CTC-91-TR47, Department of Computer Science, Cornell University, 1989.
    • (1989) Technical Report , vol.CTC-91-TR47
    • Kågström, B.1    Van Loan, C.2
  • 25
    • 0028459839 scopus 로고
    • DXML: A high-performance scientific subroutine library
    • C. Kamath, R. Ho, D.P. Manley, DXML: a high-performance scientific subroutine library, Digital Tech. J. 6 (3) (1994) 44-56.
    • (1994) Digital Tech. J. , vol.6 , Issue.3 , pp. 44-56
    • Kamath, C.1    Ho, R.2    Manley, D.P.3
  • 26
    • 8344273620 scopus 로고    scopus 로고
    • Local basic linear algebra subroutines (LBLAS) for the CM-5/5E
    • D. Kramer, S.L. Johnsson, Yu Hu, Local basic linear algebra subroutines (LBLAS) for the CM-5/5E, Internat. J. Supercomput. Appl. 10 (1996) 300-335.
    • (1996) Internat. J. Supercomput. Appl. , vol.10 , pp. 300-335
    • Kramer, D.1    Johnsson, S.L.2    Hu, Yu.3
  • 27
    • 0001289565 scopus 로고
    • An inequality related to the isoperimetric inequality
    • L.H. Loomis, H. Whitney, An inequality related to the isoperimetric inequality, Bull. AMS 55 (1949) 961-962.
    • (1949) Bull. AMS , vol.55 , pp. 961-962
    • Loomis, L.H.1    Whitney, H.2
  • 28
    • 0000743020 scopus 로고    scopus 로고
    • Memory-efficient matrix multiplication in the BSP model
    • W.F. McColl, A. Tiskin, Memory-efficient matrix multiplication in the BSP model, Algorithmica 24 (3/4) (1999) 287-297.
    • (1999) Algorithmica , vol.24 , Issue.3-4 , pp. 287-297
    • McColl, W.F.1    Tiskin, A.2
  • 29
    • 84945709131 scopus 로고
    • Organizing matrices and matrix operations for paged memory systems
    • A.C. McKeller, E.G. Coffman Jr., Organizing matrices and matrix operations for paged memory systems, Commun. ACM 12 (3) (1969) 153-165.
    • (1969) Commun. ACM , vol.12 , Issue.3 , pp. 153-165
    • McKeller, A.C.1    Coffman Jr., E.G.2
  • 30
    • 85030824962 scopus 로고
    • Private communication
    • M.S. Paterson, Private communication, 1993
    • (1993)
    • Paterson, M.S.1
  • 33
    • 34250487811 scopus 로고
    • Gaussian elimination is not optimal
    • V. Strassen, Gaussian elimination is not optimal, Numer. Math. 13 (1969) 354-355.
    • (1969) Numer. Math. , vol.13 , pp. 354-355
    • Strassen, V.1
  • 34
    • 22044455697 scopus 로고    scopus 로고
    • Bulk-synchronous parallel multiplication of Boolean matrices
    • K.G. Larsen, S. Skyum, W. Winskel (Eds.), Springer, Berlin
    • A. Tiskin, Bulk-synchronous parallel multiplication of Boolean matrices, in: K.G. Larsen, S. Skyum, W. Winskel (Eds.), Proceedings of ICALP, Lecture Notes in Computer Science, vol. 1443, Springer, Berlin, 1998, pp. 494-506.
    • (1998) Proceedings of ICALP, Lecture Notes in Computer Science , vol.1443 , pp. 494-506
    • Tiskin, A.1
  • 35
    • 0346098076 scopus 로고    scopus 로고
    • The bulk-synchronous parallel random access machine
    • April
    • A. Tiskin, The bulk-synchronous parallel random access machine, Theoret. Comput. Sci. 196 (1-2) (April 1998) 109-130.
    • (1998) Theoret. Comput. Sci. , vol.196 , Issue.1-2 , pp. 109-130
    • Tiskin, A.1
  • 36
    • 84887440108 scopus 로고    scopus 로고
    • Erratum: Bulk-synchronous parallel multiplication of Boolean matrices
    • J. Wiedermann, P. van Emde Boas, M. Nielsen (Eds.), Springer, Berlin
    • A. Tiskin, Erratum: Bulk-synchronous parallel multiplication of Boolean matrices, in: J. Wiedermann, P. van Emde Boas, M. Nielsen (Eds.), Proceedings of ICALP, Lecture Notes in Computer Science, vol. 1644, Springer, Berlin, 1999, p. 717.
    • (1999) Proceedings of ICALP, Lecture Notes in Computer Science , vol.1644 , pp. 717
    • Tiskin, A.1
  • 37
    • 0002831423 scopus 로고    scopus 로고
    • A survey of out-of-core algorithms in numerical linear algebra
    • James M. Abello, Jeffrey Scott Vitter (Eds.), American Mathematical Society, Providence, RI
    • S. Toledo, A survey of out-of-core algorithms in numerical linear algebra, in: James M. Abello, Jeffrey Scott Vitter (Eds.), External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, American Mathematical Society, Providence, RI, 1999, pp. 161-179.
    • (1999) External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science , pp. 161-179
    • Toledo, S.1
  • 38
    • 0031123769 scopus 로고    scopus 로고
    • SUMMA: Scalable universal matrix multiplication algorithm
    • R. van de Geijn, J. Watts, SUMMA: scalable universal matrix multiplication algorithm, Concurrency: Pract. Exp. 9 (1997) 255-274.
    • (1997) Concurrency: Pract. Exp. , vol.9 , pp. 255-274
    • Van De Geijn, R.1    Watts, J.2
  • 39
    • 0003418094 scopus 로고    scopus 로고
    • Automatically tuned linear algebra software
    • Computer Science Department, University of Tennessee, available online at www.netlib.org/atlas
    • R.C. Whaley, J.J. Dongarr, Automatically tuned linear algebra software, Technical Report, Computer Science Department, University of Tennessee, 1998, available online at www.netlib.org/atlas.
    • (1998) Technical Report
    • Whaley, R.C.1    Dongarr, J.J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.