-
1
-
-
0029370767
-
A three-dimensional approach to parallel matrix multiplication
-
R. C. Agarwal, S. M. Balle, F. G. Gustavson, M. Joshi, and P. Palkar. A three-dimensional approach to parallel matrix multiplication. IBM Journal of Research and Development, 39:39-5, 1995.
-
(1995)
IBM Journal of Research and Development
, vol.39
, pp. 39-45
-
-
Agarwal, R.C.1
Balle, S.M.2
Gustavson, F.G.3
Joshi, M.4
Palkar, P.5
-
2
-
-
84864146488
-
Brief announcement: Strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds
-
New York, NY, USA, ACM
-
G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Brief announcement: Strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, pages 77-79, New York, NY, USA, 2012. ACM.
-
(2012)
Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12
, pp. 77-79
-
-
Ballard, G.1
Demmel, J.2
Holtz, O.3
Lipshitz, B.4
Schwartz, O.5
-
3
-
-
84864147291
-
Communication-optimal parallel algorithm for Strassen's matrix multiplication
-
New York, NY, USA, ACM
-
G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Communication-optimal parallel algorithm for Strassen's matrix multiplication. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, pages 193-204, New York, NY, USA, 2012. ACM.
-
(2012)
Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12
, pp. 193-204
-
-
Ballard, G.1
Demmel, J.2
Holtz, O.3
Lipshitz, B.4
Schwartz, O.5
-
4
-
-
84872475139
-
Graph expansion analysis for communication costs of fast rectangular matrix multiplication
-
Springer
-
G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Graph expansion analysis for communication costs of fast rectangular matrix multiplication. In Proceedings of The 1st Mediterranean Conference on Algorithms, MedAlg '12. Springer, 2012.
-
(2012)
Proceedings of the 1st Mediterranean Conference on Algorithms, MedAlg '12
-
-
Ballard, G.1
Demmel, J.2
Holtz, O.3
Lipshitz, B.4
Schwartz, O.5
-
5
-
-
80054034521
-
Minimizing communication in numerical linear algebra
-
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM J. Matrix Analysis Applications, 32(3):866-901, 2011.
-
(2011)
SIAM J. Matrix Analysis Applications
, vol.32
, Issue.3
, pp. 866-901
-
-
Ballard, G.1
Demmel, J.2
Holtz, O.3
Schwartz, O.4
-
6
-
-
84877716093
-
Communication-avoiding parallel Strassen: Implementation and performance
-
ACM
-
G. Ballard, J. Demmel, B. Lipshitz, and O. Schwartz. Communication- avoiding parallel Strassen: Implementation and performance. In Proceedings of 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '12, New York, NY, USA, 2012. ACM.
-
Proceedings of 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '12, New York, NY, USA, 2012
-
-
Ballard, G.1
Demmel, J.2
Lipshitz, B.3
Schwartz, O.4
-
7
-
-
0024883116
-
Communication efficient matrix multiplication on hypercubes
-
DOI 10.1016/0167-8191(89)90091-4
-
J. Berntsen. Communication efficient matrix multiplication on hypercubes. Parallel Computing, 12(3):335-342, 1989. (Pubitemid 20644636)
-
(1989)
Parallel Computing
, vol.12
, Issue.3
, pp. 335-342
-
-
Berntsen, J.1
-
11
-
-
79952809941
-
-
Technical Report UCB/EECS-2010-23, EECS Department, University of California, Berkeley, Mar
-
B. Catanzaro, S. A. Kamil, Y. Lee, K. Asanovi, J. Demmel, K. Keutzer, J. Shalf, K. A. Yelick, and A. Fox. Sejits: Getting productivity and performance with selective embedded jit specialization. Technical Report UCB/EECS-2010-23, EECS Department, University of California, Berkeley, Mar 2010.
-
(2010)
Sejits: Getting Productivity and Performance with Selective Embedded Jit Specialization
-
-
Catanzaro, B.1
Kamil, S.A.2
Lee, Y.3
Asanovi, K.4
Demmel, J.5
Keutzer, K.6
Shalf, J.7
Yelick, K.A.8
Fox, A.9
-
12
-
-
77954024841
-
Oblivious algorithms for multicores and network of processors
-
R. A. Chowdhury, F. Silvestri, B. Blakeley, and V. Ramachandran. Oblivious algorithms for multicores and network of processors. In IPDPS, pages 1-12, 2010.
-
(2010)
IPDPS
, pp. 1-12
-
-
Chowdhury, R.A.1
Silvestri, F.2
Blakeley, B.3
Ramachandran, V.4
-
13
-
-
77955314108
-
Resource oblivious sorting on multicores
-
Berlin, Heidelberg, Springer-Verlag
-
R. Cole and V. Ramachandran. Resource oblivious sorting on multicores. In Proceedings of the 37th international colloquium conference on Automata, languages and programming, ICALP'10, pages 226-237, Berlin, Heidelberg, 2010. Springer-Verlag.
-
(2010)
Proceedings of the 37th International Colloquium Conference on Automata, Languages and Programming, ICALP'10
, pp. 226-237
-
-
Cole, R.1
Ramachandran, V.2
-
14
-
-
0001555328
-
Rapid multiplication of rectangular matrices
-
D. Coppersmith. Rapid multiplication of rectangular matrices. SIAM Journal on Computing, 11(3):467-471, 1982.
-
(1982)
SIAM Journal on Computing
, vol.11
, Issue.3
, pp. 467-471
-
-
Coppersmith, D.1
-
15
-
-
0031097276
-
Rectangular matrix multiplication revisited
-
DOI 10.1006/jcom.1997.0438, PII S0885064X97904386
-
D. Coppersmith. Rectangular matrix multiplication revisited. J. Complex., 13:42-49, March 1997. (Pubitemid 127172722)
-
(1997)
Journal of Complexity
, vol.13
, Issue.1
, pp. 42-49
-
-
Coppersmith, D.1
-
16
-
-
1842832833
-
Recursive blocked algorithms and hybrid data structures for dense matrix library software
-
E. Elmroth, F. Gustavson, I. Jonsson, and B. Kågström. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM review, 46(1):3-45, 2004.
-
(2004)
SIAM Review
, vol.46
, Issue.1
, pp. 3-45
-
-
Elmroth, E.1
Gustavson, F.2
Jonsson, I.3
Kågström, B.4
-
17
-
-
0033350255
-
Cache-oblivious algorithms
-
Washington, DC, USA, IEEE Computer Society
-
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, page 285, Washington, DC, USA, 1999. IEEE Computer Society.
-
(1999)
FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science
, pp. 285
-
-
Frigo, M.1
Leiserson, C.E.2
Prokop, H.3
Ramachandran, S.4
-
18
-
-
0034268943
-
Portable programming interface for performance evaluation on modern processors
-
DOI 10.1177/109434200001400303
-
B. D. Garner, S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A portable programming interface for performance evaluation on modern processors. The International Journal of High Performance Computing Applications, 14:189-204, 2000. (Pubitemid 32025040)
-
(2000)
International Journal of High Performance Computing Applications
, vol.14
, Issue.3
, pp. 189-204
-
-
Browne, S.1
Dongarra, J.2
Garner, N.3
Ho, G.4
Mucci, P.5
-
20
-
-
84971853043
-
I/O complexity: The red-blue pebble game
-
New York, NY, USA, ACM
-
J. W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In STOC '81: Proceedings of the thirteenth annual ACM symposium on Theory of computing, pages 326-333, New York, NY, USA, 1981. ACM.
-
(1981)
STOC '81: Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing
, pp. 326-333
-
-
Hong, J.W.1
Kung, H.T.2
-
21
-
-
0014972944
-
On minimizing the number of multiplications necessary for matrix multiplication
-
J. E. Hopcroft and L. R. Kerr. On minimizing the number of multiplications necessary for matrix multiplication. SIAM Journal on Applied Mathematics, 20(1):pp. 30-36, 1971.
-
(1971)
SIAM Journal on Applied Mathematics
, vol.20
, Issue.1
, pp. 30-36
-
-
Hopcroft, J.E.1
Kerr, L.R.2
-
22
-
-
0030673980
-
Fast rectangular matrix multiplications and improving parallel matrix computations
-
New York, NY, USA, ACM
-
X. Huang and V. Y. Pan. Fast rectangular matrix multiplications and improving parallel matrix computations. In Proceedings of the second international symposium on Parallel symbolic computation, PASCO '97, pages 11-23, New York, NY, USA, 1997. ACM.
-
(1997)
Proceedings of the Second International Symposium on Parallel Symbolic Computation, PASCO '97
, pp. 11-23
-
-
Huang, X.1
Pan, V.Y.2
-
23
-
-
0000559550
-
Fast Rectangular Matrix Multiplication and Applications
-
DOI 10.1006/jcom.1998.0476, PII S0885064X98904769
-
X. Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complex., 14:257-299, June 1998. (Pubitemid 128347318)
-
(1998)
Journal of Complexity
, vol.14
, Issue.2
, pp. 257-299
-
-
Huang, X.1
Pan, V.Y.2
-
24
-
-
10844258198
-
Communication lower bounds for distributed-memory matrix multiplication
-
DOI 10.1016/j.jpdc.2004.03.021
-
D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017-1026, 2004. (Pubitemid 40000755)
-
(2004)
Journal of Parallel and Distributed Computing
, vol.64
, Issue.9
, pp. 1017-1026
-
-
Irony, D.1
Toledo, S.2
Tiskin, A.3
-
25
-
-
0001289565
-
An inequality related to the isoperimetric inequality
-
L. H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bulletin of the AMS, 55:961-962, 1949.
-
(1949)
Bulletin of the AMS
, vol.55
, pp. 961-962
-
-
Loomis, L.H.1
Whitney, H.2
-
26
-
-
0040170057
-
On the asymptotic complexity of rectangular matrix multiplication
-
G. Lotti and F. Romani. On the asymptotic complexity of rectangular matrix multiplication. Theoretical Computer Science, 23(2):171-185, 1983.
-
(1983)
Theoretical Computer Science
, vol.23
, Issue.2
, pp. 171-185
-
-
Lotti, G.1
Romani, F.2
-
27
-
-
0000743020
-
Memory-efficient matrix multiplication in the BSP model
-
W. F. McColl and A. Tiskin. Memory-efficient matrix multiplication in the BSP model. Algorithmica, 24:287-297, 1999. 10.1007/PL00008264. (Pubitemid 129715337)
-
(1999)
Algorithmica (New York)
, vol.24
, Issue.3-4
, pp. 287-297
-
-
McColl, W.F.1
Tiskin, A.2
-
29
-
-
83155193222
-
Improving communication performance in dense linear algebra via topology aware collectives
-
New York, NY, USA, ACM
-
E. Solomonik, A. Bhatele, and J. Demmel. Improving communication performance in dense linear algebra via topology aware collectives. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 77:1-77:11, New York, NY, USA, 2011. ACM.
-
(2011)
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11
-
-
Solomonik, E.1
Bhatele, A.2
Demmel, J.3
-
31
-
-
84884822823
-
-
Technical Report UCB/EECS-2012-28, EECS Department, University of California, Berkeley, Feb
-
E. Solomonik and J. Demmel. Matrix multiplication on multidimensional torus networks. Technical Report UCB/EECS-2012-28, EECS Department, University of California, Berkeley, Feb 2012.
-
(2012)
Matrix Multiplication on Multidimensional Torus Networks
-
-
Solomonik, E.1
Demmel, J.2
-
32
-
-
34250487811
-
Gaussian elimination is not optimal
-
V. Strassen. Gaussian elimination is not optimal. Numer. Math., 13:354-356, 1969.
-
(1969)
Numer. Math.
, vol.13
, pp. 354-356
-
-
Strassen, V.1
-
33
-
-
0031123769
-
SUMMA: Scalable universal matrix multiplication algorithm
-
R. A. van de Geijn and J. Watts. SUMMA: scalable universal matrix multiplication algorithm. Concurrency - Practice and Experience, 9(4):255-274, 1997. (Pubitemid 127679707)
-
(1997)
Concurrency Practice and Experience
, vol.9
, Issue.4
, pp. 255-274
-
-
Van De, G.R.A.1
Watts, J.2
-
34
-
-
20744459570
-
Is search really necessary to generate high-performance BLAS?
-
Feb
-
K. Yotov, X. Li, G. Ren, M. Garzaran, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? Proceedings of the IEEE, 93(2):358-386, Feb 2005.
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 358-386
-
-
Yotov, K.1
Li, X.2
Ren, G.3
Garzaran, M.4
Padua, D.5
Pingali, K.6
Stodghill, P.7
-
35
-
-
35248846531
-
An experimental comparison of cache-oblivious and cache-conscious programs
-
New York, NY, USA, ACM
-
K. Yotov, T. Roeder, K. Pingali, J. Gunnels, and F. Gustavson. An experimental comparison of cache-oblivious and cache-conscious programs. In Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '07, pages 93-104, New York, NY, USA, 2007. ACM.
-
(2007)
Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '07
, pp. 93-104
-
-
Yotov, K.1
Roeder, T.2
Pingali, K.3
Gunnels, J.4
Gustavson, F.5
|