SCOPUS 정보 검색 플랫폼

Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium, IPDPS 2013

Volumn , Issue , 2013, Pages 261-272

Communication-optimal parallel recursive rectangular matrix multiplication

(7) Demmel, James a Eliahu, David a Fox, Armando a Kamil, Shoaib b Lipshitz, Benjamin a Schwartz, Oded a Spillinger, Omer a

a UNIVERSITY OF CALIFORNIA (United States)

b MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

linear algebra; matrix multiplication; ommunication avoiding algorithms

Indexed keywords

DISTRIBUTED MEMORY; LINES OF CODE; MATRIX MULTIPLICATION; PARALLEL LINEAR ALGEBRAS; RECTANGULAR MATRIX; SHARED MEMORY; SHARED MEMORY MACHINES; SQUARE MATRICES;

ALGORITHMS; COMMUNICATION; DISTRIBUTED PARAMETER NETWORKS; LINEAR ALGEBRA; MATRIX ALGEBRA; OPTIMIZATION; SUPERCOMPUTERS;

CACHE MEMORY;

EID: 84884883916 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IPDPS.2013.80 Document Type: Conference Paper

Times cited : (119)

References (35)

1
- 0029370767
- A three-dimensional approach to parallel matrix multiplication
- R. C. Agarwal, S. M. Balle, F. G. Gustavson, M. Joshi, and P. Palkar. A three-dimensional approach to parallel matrix multiplication. IBM Journal of Research and Development, 39:39-5, 1995.
- (1995) IBM Journal of Research and Development , vol.39 , pp. 39-45
- Agarwal, R.C.¹ Balle, S.M.² Gustavson, F.G.³ Joshi, M.⁴ Palkar, P.⁵

2
- 84864146488
- Brief announcement: Strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds
- New York, NY, USA, ACM
- G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Brief announcement: Strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, pages 77-79, New York, NY, USA, 2012. ACM.
- (2012) Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12 , pp. 77-79
- Ballard, G.¹ Demmel, J.² Holtz, O.³ Lipshitz, B.⁴ Schwartz, O.⁵

3
- 84864147291
- Communication-optimal parallel algorithm for Strassen's matrix multiplication
- New York, NY, USA, ACM
- G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Communication-optimal parallel algorithm for Strassen's matrix multiplication. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, pages 193-204, New York, NY, USA, 2012. ACM.
- (2012) Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12 , pp. 193-204
- Ballard, G.¹ Demmel, J.² Holtz, O.³ Lipshitz, B.⁴ Schwartz, O.⁵

4
- 84872475139
- Graph expansion analysis for communication costs of fast rectangular matrix multiplication
- Springer
- G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Graph expansion analysis for communication costs of fast rectangular matrix multiplication. In Proceedings of The 1st Mediterranean Conference on Algorithms, MedAlg '12. Springer, 2012.
- (2012) Proceedings of the 1st Mediterranean Conference on Algorithms, MedAlg '12
- Ballard, G.¹ Demmel, J.² Holtz, O.³ Lipshitz, B.⁴ Schwartz, O.⁵

5
- 80054034521
- Minimizing communication in numerical linear algebra
- G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM J. Matrix Analysis Applications, 32(3):866-901, 2011.
- (2011) SIAM J. Matrix Analysis Applications , vol.32 , Issue.3 , pp. 866-901
- Ballard, G.¹ Demmel, J.² Holtz, O.³ Schwartz, O.⁴

6
- 84877716093
- Communication-avoiding parallel Strassen: Implementation and performance
- ACM
- G. Ballard, J. Demmel, B. Lipshitz, and O. Schwartz. Communication- avoiding parallel Strassen: Implementation and performance. In Proceedings of 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '12, New York, NY, USA, 2012. ACM.
- Proceedings of 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '12, New York, NY, USA, 2012
- Ballard, G.¹ Demmel, J.² Lipshitz, B.³ Schwartz, O.⁴

7
- 0024883116
- Communication efficient matrix multiplication on hypercubes
- DOI 10.1016/0167-8191(89)90091-4
- J. Berntsen. Communication efficient matrix multiplication on hypercubes. Parallel Computing, 12(3):335-342, 1989. (Pubitemid 20644636)
- (1989) Parallel Computing , vol.12 , Issue.3 , pp. 335-342
- Berntsen, J.¹

8
- 34548779645
- Network-oblivious algorithms
- G. Bilardi, A. Pietracaprina, G. Pucci, and F. Silvestri. Network-oblivious algorithms. In Proceedings of 21st International Parallel and Distributed Processing Symposium, 2007.
- Proceedings of 21st International Parallel and Distributed Processing Symposium, 2007
- Bilardi, G.¹ Pietracaprina, A.² Pucci, G.³ Silvestri, F.⁴

9
- 0038954994
- 2.7799) complexity for n × n approximate matrix multiplication
- 2.7799) complexity for n × n approximate matrix multiplication. Information Processing Letters, 8(5):234-235, 1979.
- (1979) Information Processing Letters , vol.8 , Issue.5 , pp. 234-235
- Bini, D.¹ Capovani, M.² Romani, F.³ Lotti, G.⁴

10
- 0000817992
- Algebraic Complexity Theory
- Springer Verlag
- P. Bu{combining double acute accent}rgisser, M. Clausen, and M. A. Shokrollahi. Algebraic Complexity Theory. Number 315 in Grundlehren der mathematischen Wissenschaften. Springer Verlag, 1997.
- (1997) Grundlehren der Mathematischen Wissenschaften , Issue.315
- Burgisser, P.¹ Clausen, M.² Shokrollahi, M.A.³

11
- 79952809941
- Technical Report UCB/EECS-2010-23, EECS Department, University of California, Berkeley, Mar
- B. Catanzaro, S. A. Kamil, Y. Lee, K. Asanovi, J. Demmel, K. Keutzer, J. Shalf, K. A. Yelick, and A. Fox. Sejits: Getting productivity and performance with selective embedded jit specialization. Technical Report UCB/EECS-2010-23, EECS Department, University of California, Berkeley, Mar 2010.
- (2010) Sejits: Getting Productivity and Performance with Selective Embedded Jit Specialization
- Catanzaro, B.¹ Kamil, S.A.² Lee, Y.³ Asanovi, K.⁴ Demmel, J.⁵ Keutzer, K.⁶ Shalf, J.⁷ Yelick, K.A.⁸ Fox, A.⁹

12
- 77954024841
- Oblivious algorithms for multicores and network of processors
- R. A. Chowdhury, F. Silvestri, B. Blakeley, and V. Ramachandran. Oblivious algorithms for multicores and network of processors. In IPDPS, pages 1-12, 2010.
- (2010) IPDPS , pp. 1-12
- Chowdhury, R.A.¹ Silvestri, F.² Blakeley, B.³ Ramachandran, V.⁴

13
- 77955314108
- Resource oblivious sorting on multicores
- Berlin, Heidelberg, Springer-Verlag
- R. Cole and V. Ramachandran. Resource oblivious sorting on multicores. In Proceedings of the 37th international colloquium conference on Automata, languages and programming, ICALP'10, pages 226-237, Berlin, Heidelberg, 2010. Springer-Verlag.
- (2010) Proceedings of the 37th International Colloquium Conference on Automata, Languages and Programming, ICALP'10 , pp. 226-237
- Cole, R.¹ Ramachandran, V.²

14
- 0001555328
- Rapid multiplication of rectangular matrices
- D. Coppersmith. Rapid multiplication of rectangular matrices. SIAM Journal on Computing, 11(3):467-471, 1982.
- (1982) SIAM Journal on Computing , vol.11 , Issue.3 , pp. 467-471
- Coppersmith, D.¹

15
- 0031097276
- Rectangular matrix multiplication revisited
- DOI 10.1006/jcom.1997.0438, PII S0885064X97904386
- D. Coppersmith. Rectangular matrix multiplication revisited. J. Complex., 13:42-49, March 1997. (Pubitemid 127172722)
- (1997) Journal of Complexity , vol.13 , Issue.1 , pp. 42-49
- Coppersmith, D.¹

16
- 1842832833
- Recursive blocked algorithms and hybrid data structures for dense matrix library software
- E. Elmroth, F. Gustavson, I. Jonsson, and B. Kågström. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM review, 46(1):3-45, 2004.
- (2004) SIAM Review , vol.46 , Issue.1 , pp. 3-45
- Elmroth, E.¹ Gustavson, F.² Jonsson, I.³ Kågström, B.⁴

17
- 0033350255
- Cache-oblivious algorithms
- Washington, DC, USA, IEEE Computer Society
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, page 285, Washington, DC, USA, 1999. IEEE Computer Society.
- (1999) FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science , pp. 285
- Frigo, M.¹ Leiserson, C.E.² Prokop, H.³ Ramachandran, S.⁴

18
- 0034268943
- Portable programming interface for performance evaluation on modern processors
- DOI 10.1177/109434200001400303
- B. D. Garner, S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A portable programming interface for performance evaluation on modern processors. The International Journal of High Performance Computing Applications, 14:189-204, 2000. (Pubitemid 32025040)
- (2000) International Journal of High Performance Computing Applications , vol.14 , Issue.3 , pp. 189-204
- Browne, S.¹ Dongarra, J.² Garner, N.³ Ho, G.⁴ Mucci, P.⁵

19
- 0004168818
- The Johns Hopkins University Press, 3rd edition, Oct.
- G. H. Golub and C. F. Van Loan. Matrix Computations (Johns Hopkins Studies in Mathematical Sciences)(3rd Edition). The Johns Hopkins University Press, 3rd edition, Oct. 1996.
- (1996) Matrix Computations (Johns Hopkins Studies in Mathematical Sciences)(3rd Edition)
- Golub, G.H.¹ Van Loan, C.F.²

20
- 84971853043
- I/O complexity: The red-blue pebble game
- New York, NY, USA, ACM
- J. W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In STOC '81: Proceedings of the thirteenth annual ACM symposium on Theory of computing, pages 326-333, New York, NY, USA, 1981. ACM.
- (1981) STOC '81: Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing , pp. 326-333
- Hong, J.W.¹ Kung, H.T.²

21
- 0014972944
- On minimizing the number of multiplications necessary for matrix multiplication
- J. E. Hopcroft and L. R. Kerr. On minimizing the number of multiplications necessary for matrix multiplication. SIAM Journal on Applied Mathematics, 20(1):pp. 30-36, 1971.
- (1971) SIAM Journal on Applied Mathematics , vol.20 , Issue.1 , pp. 30-36
- Hopcroft, J.E.¹ Kerr, L.R.²

22
- 0030673980
- Fast rectangular matrix multiplications and improving parallel matrix computations
- New York, NY, USA, ACM
- X. Huang and V. Y. Pan. Fast rectangular matrix multiplications and improving parallel matrix computations. In Proceedings of the second international symposium on Parallel symbolic computation, PASCO '97, pages 11-23, New York, NY, USA, 1997. ACM.
- (1997) Proceedings of the Second International Symposium on Parallel Symbolic Computation, PASCO '97 , pp. 11-23
- Huang, X.¹ Pan, V.Y.²

23
- 0000559550
- Fast Rectangular Matrix Multiplication and Applications
- DOI 10.1006/jcom.1998.0476, PII S0885064X98904769
- X. Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complex., 14:257-299, June 1998. (Pubitemid 128347318)
- (1998) Journal of Complexity , vol.14 , Issue.2 , pp. 257-299
- Huang, X.¹ Pan, V.Y.²

24
- 10844258198
- Communication lower bounds for distributed-memory matrix multiplication
- DOI 10.1016/j.jpdc.2004.03.021
- D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017-1026, 2004. (Pubitemid 40000755)
- (2004) Journal of Parallel and Distributed Computing , vol.64 , Issue.9 , pp. 1017-1026
- Irony, D.¹ Toledo, S.² Tiskin, A.³

25
- 0001289565
- An inequality related to the isoperimetric inequality
- L. H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bulletin of the AMS, 55:961-962, 1949.
- (1949) Bulletin of the AMS , vol.55 , pp. 961-962
- Loomis, L.H.¹ Whitney, H.²

26
- 0040170057
- On the asymptotic complexity of rectangular matrix multiplication
- G. Lotti and F. Romani. On the asymptotic complexity of rectangular matrix multiplication. Theoretical Computer Science, 23(2):171-185, 1983.
- (1983) Theoretical Computer Science , vol.23 , Issue.2 , pp. 171-185
- Lotti, G.¹ Romani, F.²

27
- 0000743020
- Memory-efficient matrix multiplication in the BSP model
- W. F. McColl and A. Tiskin. Memory-efficient matrix multiplication in the BSP model. Algorithmica, 24:287-297, 1999. 10.1007/PL00008264. (Pubitemid 129715337)
- (1999) Algorithmica (New York) , vol.24 , Issue.3-4 , pp. 287-297
- McColl, W.F.¹ Tiskin, A.²

28
- 84886442861
- Scalable universal matrix multiplication algorithms: 2d and 3d variations on a theme
- submitted to
- M. D. Schatz, J. Poulson, and R. A. van de Geijn. Scalable universal matrix multiplication algorithms: 2d and 3d variations on a theme. submitted to ACM Transactions on Mathematical Software.
- ACM Transactions on Mathematical Software
- Schatz, M.D.¹ Poulson, J.² Van De Geijn, R.A.³

29
- 83155193222
- Improving communication performance in dense linear algebra via topology aware collectives
- New York, NY, USA, ACM
- E. Solomonik, A. Bhatele, and J. Demmel. Improving communication performance in dense linear algebra via topology aware collectives. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 77:1-77:11, New York, NY, USA, 2011. ACM.
- (2011) Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11
- Solomonik, E.¹ Bhatele, A.² Demmel, J.³

30
- 80052305141
- Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
- Springer
- E. Solomonik and J. Demmel. Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. In Euro-Par'11: Proceedings of the 17th International European Conference on Parallel and Distributed Computing. Springer, 2011.
- (2011) Euro-Par'11: Proceedings of the 17th International European Conference on Parallel and Distributed Computing
- Solomonik, E.¹ Demmel, J.²

31
- 84884822823
- Technical Report UCB/EECS-2012-28, EECS Department, University of California, Berkeley, Feb
- E. Solomonik and J. Demmel. Matrix multiplication on multidimensional torus networks. Technical Report UCB/EECS-2012-28, EECS Department, University of California, Berkeley, Feb 2012.
- (2012) Matrix Multiplication on Multidimensional Torus Networks
- Solomonik, E.¹ Demmel, J.²

32
- 34250487811
- Gaussian elimination is not optimal
- V. Strassen. Gaussian elimination is not optimal. Numer. Math., 13:354-356, 1969.
- (1969) Numer. Math. , vol.13 , pp. 354-356
- Strassen, V.¹

33
- 0031123769
- SUMMA: Scalable universal matrix multiplication algorithm
- R. A. van de Geijn and J. Watts. SUMMA: scalable universal matrix multiplication algorithm. Concurrency - Practice and Experience, 9(4):255-274, 1997. (Pubitemid 127679707)
- (1997) Concurrency Practice and Experience , vol.9 , Issue.4 , pp. 255-274
- Van De, G.R.A.¹ Watts, J.²

34
- 20744459570
- Is search really necessary to generate high-performance BLAS?
- Feb
- K. Yotov, X. Li, G. Ren, M. Garzaran, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? Proceedings of the IEEE, 93(2):358-386, Feb 2005.
- (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 358-386
- Yotov, K.¹ Li, X.² Ren, G.³ Garzaran, M.⁴ Padua, D.⁵ Pingali, K.⁶ Stodghill, P.⁷

35
- 35248846531
- An experimental comparison of cache-oblivious and cache-conscious programs
- New York, NY, USA, ACM
- K. Yotov, T. Roeder, K. Pingali, J. Gunnels, and F. Gustavson. An experimental comparison of cache-oblivious and cache-conscious programs. In Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '07, pages 93-104, New York, NY, USA, 2007. ACM.
- (2007) Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '07 , pp. 93-104
- Yotov, K.¹ Roeder, T.² Pingali, K.³ Gunnels, J.⁴ Gustavson, F.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.