SCOPUS 정보 검색 플랫폼

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Volumn , Issue , 2012, Pages

Communication-avoiding parallel Strassen: Implementation and performance

(4) Lipshitz, Benjamin a Ballard, Grey a Demmel, James a Schwartz, Oded a

a UNIVERSITY OF CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ARITHMETIC COMPLEXITY; EXASCALE; FAST MATRIX MULTIPLICATION; LOWER BOUNDS; MATRIX MULTIPLICATION; PARALLEL IMPLEMENTATIONS; SCIENTIFIC COMPUTING APPLICATIONS;

ALGORITHMS; COMMUNICATION;

BENCHMARKING;

EID: 84877716093 PISSN: 21674329 EISSN: 21674337 Source Type: Conference Proceeding
DOI: 10.1109/SC.2012.33 Document Type: Conference Paper

Times cited : (51)

References (20)

1
- 34547509407
- Seven at one stroke: Results from a cache-oblivious paradigm for scalable matrix algorithms
- New York, NY, USA, ACM
- M. D. Adams and D. S. Wise. Seven at one stroke: Results from a cache-oblivious paradigm for scalable matrix algorithms. In MSPC '06: Proceedings of the 2006 Workshop on Memory System Performance and Correctness, pages 41-50, New York, NY, USA, 2006. ACM.
- (2006) MSPC '06: Proceedings of the 2006 Workshop on Memory System Performance and Correctness , pp. 41-50
- Adams, M.D.¹ Wise, D.S.²

2
- 0029370767
- A three-dimensional approach to parallel matrix multiplication
- R. C. Agarwal, S. M. Balle, F. G. Gustavson, M. Joshi, and P. Palkar. A three-dimensional approach to parallel matrix multiplication. IBM Journal of Research and Development, 39:39-5, 1995.
- (1995) IBM Journal of Research and Development , vol.39 , pp. 39-45
- Agarwal, R.C.¹ Balle, S.M.² Gustavson, F.G.³ Joshi, M.⁴ Palkar, P.⁵

3
- 84864146488
- Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds
- New York, NY, USA, ACM
- G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, pages 77-79, New York, NY, USA, 2012. ACM.
- (2012) Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12 , pp. 77-79
- Ballard, G.¹ Demmel, J.² Holtz, O.³ Lipshitz, B.⁴ Schwartz, O.⁵

4
- 84864147291
- Communication-optimal parallel algorithm for Strassen's matrix multiplication
- New York, NY, USA, ACM
- G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Communication-optimal parallel algorithm for Strassen's matrix multiplication. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, pages 193-204, New York, NY, USA, 2012. ACM.
- (2012) Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12 , pp. 193-204
- Ballard, G.¹ Demmel, J.² Holtz, O.³ Lipshitz, B.⁴ Schwartz, O.⁵

5
- 79959674766
- Graph expansion and communication costs of fast matrix multiplication
- New York, NY, USA, ACM
- G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Graph expansion and communication costs of fast matrix multiplication. In SPAA '11: Proceedings of the 23rd Annual Symposium on Parallelism in Algorithms and Architectures, pages 1-12, New York, NY, USA, 2011. ACM.
- (2011) SPAA '11: Proceedings of the 23rd Annual Symposium on Parallelism in Algorithms and Architectures , pp. 1-12
- Ballard, G.¹ Demmel, J.² Holtz, O.³ Schwartz, O.⁴

6
- 80054034521
- Minimizing communication in numerical linear algebra
- G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM J. Matrix Analysis Applications, 32(3):866-901, 2011.
- (2011) SIAM J. Matrix Analysis Applications , vol.32 , Issue.3 , pp. 866-901
- Ballard, G.¹ Demmel, J.² Holtz, O.³ Schwartz, O.⁴

7
- 0024883116
- Communication efficient matrix multiplication on hypercubes
- DOI 10.1016/0167-8191(89)90091-4
- J. Berntsen. Communication efficient matrix multiplication on hypercubes. Parallel Computing, 12(3):335-342, 1989. (Pubitemid 20644636)
- (1989) Parallel Computing , vol.12 , Issue.3 , pp. 335-342
- Berntsen, J.¹

8
- 0003712293
- PhD thesis, Montana State University, Bozeman, MN
- L. Cannon. A cellular computer to implement the Kalman filter algorithm. PhD thesis, Montana State University, Bozeman, MN, 1969.
- (1969) A Cellular Computer to Implement the Kalman Filter Algorithm
- Cannon, L.¹

9
- 35548978022
- Fast linear algebra is stable
- J. Demmel, I. Dumitriu, and O. Holtz. Fast linear algebra is stable. Numerische Mathematik, 108(1):59-91, 2007.
- (2007) Numerische Mathematik , vol.108 , Issue.1 , pp. 59-91
- Demmel, J.¹ Dumitriu, I.² Holtz, O.³

10
- 0033350255
- Cache-oblivious algorithms
- Washington, DC, USA, IEEE Computer Society
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, page 285, Washington, DC, USA, 1999. IEEE Computer Society.
- (1999) FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science , pp. 285
- Frigo, M.¹ Leiserson, C.E.² Prokop, H.³ Ramachandran, S.⁴

11
- 0030092443
- A high performance parallel Strassen implementation
- B. Grayson, A. Shah, and R. van de Geijn. A high performance parallel Strassen implementation. In Parallel Processing Letters, volume 6, pages 3-12, 1995.
- (1995) Parallel Processing Letters , vol.6 , pp. 3-12
- Grayson, B.¹ Shah, A.² Van De Geijn, R.³

12
- 0036457301
- SIAM, Philadelphia, PA, 2nd edition
- N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, PA, 2nd edition, 2002.
- (2002) Accuracy and Stability of Numerical Algorithms
- Higham, N.J.¹

13
- 10844258198
- Communication lower bounds for distributed-memory matrix multiplication
- DOI 10.1016/j.jpdc.2004.03.021
- D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017-1026, 2004. (Pubitemid 40000755)
- (2004) Journal of Parallel and Distributed Computing , vol.64 , Issue.9 , pp. 1017-1026
- Irony, D.¹ Toledo, S.² Tiskin, A.³

14
- 0029192458
- A scalable parallel Strassen's matrix multiplication algorithm for distributed-memory computers
- New York, NY, USA, ACM
- Q. Luo and J. Drake. A scalable parallel Strassen's matrix multiplication algorithm for distributed-memory computers. In Proceedings of the 1995 ACM Symposium on Applied Computing, SAC '95, pages 221-226, New York, NY, USA, 1995. ACM.
- (1995) Proceedings of the 1995 ACM Symposium on Applied Computing, SAC '95 , pp. 221-226
- Luo, Q.¹ Drake, J.²

15
- 0000743020
- Memory-efficient matrix multiplication in the BSP model
- W. F. McColl and A. Tiskin. Memory-efficient matrix multiplication in the BSP model. Algorithmica, 24:287-297, 1999. 10.1007/PL00008264. (Pubitemid 129715337)
- (1999) Algorithmica (New York) , vol.24 , Issue.3-4 , pp. 287-297
- McColl, W.F.¹ Tiskin, A.²

16
- 32844469834
- H. Meuer, E. Strohmaier, J. Dongarra, and H. Simon. Top500 supercomputer sites, 2011. www.top500.org.
- (2011) Top500 Supercomputer Sites
- Meuer, H.¹ Strohmaier, E.² Dongarra, J.³ Simon, H.⁴

17
- 79952579787
- Exascale computing technology challenges
- J. M. L. M. Palma, M. J. Daydé, O. Marques, and J. C. Lopes, editors, High Performance Computing for Computational Science - VECPAR 2010 - 9th International conference, Berkeley, CA, USA, June 22-25, 2010, Revised Selected Papers, Springer
- J. Shalf, S. S. Dosanjh, and J. Morrison. Exascale computing technology challenges. In J. M. L. M. Palma, M. J. Daydé, O. Marques, and J. C. Lopes, editors, High Performance Computing for Computational Science - VECPAR 2010 - 9th International conference, Berkeley, CA, USA, June 22-25, 2010, Revised Selected Papers, volume 6449 of Lecture Notes in Computer Science, pages 1-25. Springer, 2010.
- (2010) Lecture Notes in Computer Science , vol.6449 , pp. 1-25
- Shalf, J.¹ Dosanjh, S.S.² Morrison, J.³

18
- 83155193222
- Improving communication performance in dense linear algebra via topology aware collectives
- New York, NY, USA, ACM
- E. Solomonik, A. Bhatele, and J. Demmel. Improving communication performance in dense linear algebra via topology aware collectives. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 77:1-77:11, New York, NY, USA, 2011. ACM.
- (2011) Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11
- Solomonik, E.¹ Bhatele, A.² Demmel, J.³

19
- 80052305141
- Technical Report UCB/EECS-2011-10, EECS Department, University of California, Berkeley, Feb
- E. Solomonik and J. Demmel. Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. Technical Report UCB/EECS-2011-10, EECS Department, University of California, Berkeley, Feb 2011.
- (2011) Communication-optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms
- Solomonik, E.¹ Demmel, J.²

20
- 0031123769
- SUMMA: Scalable universal matrix multiplication algorithm
- R. A. van de Geijn and J. Watts. SUMMA: scalable universal matrix multiplication algorithm. Concurrency - Practice and Experience, 9(4):255-274, 1997. (Pubitemid 127679707)
- (1997) Concurrency Practice and Experience , vol.9 , Issue.4 , pp. 255-274
- Van De, G.R.A.¹ Watts, J.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.