메뉴 건너뛰기




Volumn 32, Issue 3, 2011, Pages 866-901

Minimizing communication in numerical linear algebra

Author keywords

Bandwidth; Communication avoiding; Latency; Linear algebra algorithms; Lower bound

Indexed keywords

ARITHMETIC OPERATIONS; CHOLESKY FACTORIZATIONS; COMMUNICATION-AVOIDING; COMPUTING POWER; DIRECT METHOD; EIGENVALUES; FAST MEMORY; GRAM-SCHMIDT ALGORITHMS; GRAPH-THEORETIC PROBLEM; INPUT MATRICES; LATENCY; LATENCY COSTS; LINEAR ALGEBRA ALGORITHMS; LINEAR ALGEBRA OPERATIONS; LOCAL MEMORIES; LOWER BOUND TECHNIQUES; LOWER BOUNDS; LU FACTORIZATION; MATRIX; MATRIX MULTIPLICATION; NUMERICAL LINEAR ALGEBRA; OPTIMAL ALGORITHM; QR FACTORIZATIONS; SINGULAR VALUES; SPARSE MATRICES;

EID: 80054034521     PISSN: 08954798     EISSN: 10957162     Source Type: Journal    
DOI: 10.1137/090769156     Document Type: Article
Times cited : (199)

References (64)
  • 3
    • 18044400448 scopus 로고    scopus 로고
    • A recursive formulation of Cholesky factorization of a matrix in packed storage
    • DOI 10.1145/383738.383741
    • B. S. ANDERSEN, F. GUSTAVSON, AND J. WASNIEWSKI, A recursive formulation of Cholesky factorization of a matrix in packed storage format, ACM Trans. Math. Software, 27 (2001), pp. 214-244. (Pubitemid 33602326)
    • (2001) ACM Transactions on Mathematical Software , vol.27 , Issue.2 , pp. 214-244
    • Andersen, B.S.1    Wasniewski, J.2    Gustavson, F.G.3
  • 6
    • 0001314661 scopus 로고
    • The fan-both family of column-based distributed Cholesky factorization algorithms
    • J. R. Gilbert, A. George, and J. W. H. Liu, eds., Springer-Verlag, Berlin
    • C. ASHCRAFT, The fan-both family of column-based distributed Cholesky factorization algorithms, in Graph Theory and Sparse Matrix Computation, IMA Volumes in Mathematics and Its Applications 56, J. R. Gilbert, A. George, and J. W. H. Liu, eds., Springer-Verlag, Berlin, 1993, pp. 159-190
    • (1993) Graph Theory and Sparse Matrix Computation, IMA Volumes in Mathematics and Its Applications , vol.56 , pp. 159-190
    • Ashcraft, C.1
  • 7
    • 0024082546 scopus 로고
    • The input/output complexity of sorting and related problems
    • DOI 10.1145/48529.48535
    • A. AGGARWAL AND J. S. VITTER, The input/output complexity of sorting and related problems, Comm. ACM, 31 (1988), pp. 1116-1127. (Pubitemid 18662481)
    • (1988) Communications of the ACM , vol.31 , Issue.9 , pp. 1116-1127
    • Aggarwal, A.1    Vitter, J.S.2
  • 8
    • 35248813384 scopus 로고    scopus 로고
    • Optimal sparse matrix dense vector multiplication in the I/O-model
    • DOI 10.1145/1248377.1248391, SPAA'07: Proceedings of the Nineteenth Annual Symposium on Parallelism in Algorithms and Architectures
    • M. A. BENDER, G. S. BRODAL, R. FAGERBERG, R. JACOB, AND E. VICARI, Optimal sparse matrix dense vector multiplication in the I/O-model, in SPAA '07: Proceedings of the 19th Annual ACM Symposium on Parallel Algorithms and Architectures, ACM, New York, 2007, pp. 61-70. (Pubitemid 47568555)
    • (2007) Annual ACM Symposium on Parallelism in Algorithms and Architectures , pp. 61-70
    • Bender, M.A.1    Brodal, G.S.2    Fagerberg, R.3    Jacob, R.4    Vicari, E.5
  • 9
    • 0036401631 scopus 로고    scopus 로고
    • The multishift QR algorithm. Part I: Maintaining well-focused shifts and level 3 performance
    • K. BRAMAN, R. BYERS, AND R. MATHIAS, The multishift QR algorithm. Part I: Maintaining well-focused shifts and level 3 performance, SIAM J. Matrix Anal. Appl., 23 (2002), pp. 929-947.
    • (2002) SIAM J. Matrix Anal. Appl. , vol.23 , pp. 929-947
    • Braman, K.1    Byers, R.2    Mathias, R.3
  • 10
    • 0036400807 scopus 로고    scopus 로고
    • The multishift QR algorithm. Part II: Aggressive early deflation
    • K. BRAMAN, R. BYERS, AND R. MATHIAS, The multishift QR algorithm. Part II: Aggressive early deflation, SIAM J. Matrix Anal. Appl., 23 (2002), pp. 948-973.
    • (2002) SIAM J. Matrix Anal. Appl. , vol.23 , pp. 948-973
    • Braman, K.1    Byers, R.2    Mathias, R.3
  • 14
    • 80054039665 scopus 로고    scopus 로고
    • Communication-optimal parallel and sequential eigenvalue and singular value algorithms
    • University of California-Berkeley
    • G. BALLARD, J. DEMMEL, AND I. DUMITRIU, Communication-Optimal Parallel and Sequential Eigenvalue and Singular Value Algorithms, EECS Technical Report EECS-2011-14, University of California-Berkeley, 2011.
    • (2011) EECS Technical Report EECS-2011-14
    • Ballard, G.1    Demmel, J.2    Dumitriu, I.3
  • 15
    • 79251563454 scopus 로고    scopus 로고
    • Communication-optimal parallel and sequential Cholesky decomposition
    • G. BALLARD, J. DEMMEL, O. HOLTZ, AND O. SCHWARTZ, Communication-optimal parallel and sequential Cholesky decomposition, SIAM J. Sci. Comput., 32 (2010), pp. 3495-3523.
    • (2010) SIAM J. Sci. Comput. , vol.32 , pp. 3495-3523
    • Ballard, G.1    Demmel, J.2    Holtz, O.3    Schwartz, O.4
  • 18
    • 84966228742 scopus 로고
    • Some stable methods for calculating inertia and solving symmetric linear systems
    • J. BUNCH AND L. KAUFMAN, Some stable methods for calculating inertia and solving symmetric linear systems, Math. Comp., 31 (1977), pp. 163-179.
    • (1977) Math. Comp. , vol.31 , pp. 163-179
    • Bunch, J.1    Kaufman, L.2
  • 20
    • 0012881041 scopus 로고    scopus 로고
    • Algorithm 807: The SBR toolbox-software for successive band reduction
    • C. H. BISCHOF, B. LANG, AND X. SUN, Algorithm 807: The SBR toolbox-software for successive band reduction, ACM Trans. Math. Software, 26 (2000), pp. 602-616.
    • (2000) ACM Trans. Math. Software , vol.26 , pp. 602-616
    • Bischof, C.H.1    Lang, B.2    Sun, X.3
  • 21
  • 22
    • 0001951009 scopus 로고
    • The WY representation for products of Householder matrices
    • C. BISCHOF AND C. VAN LOAN, The WY representation for products of Householder matrices, SIAM J. Sci. Statist. Comput., 8 (1987), pp. 2-13.
    • (1987) SIAM J. Sci. Statist. Comput. , vol.8 , pp. 2-13
    • Bischof, C.1    Van Loan, C.2
  • 26
    • 33244497406 scopus 로고    scopus 로고
    • Cache-oblivious dynamic programming
    • DOI 10.1145/1109557.1109622, Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms
    • R. A. CHOWDHURY AND V. RAMACHANDRAN, Cache-oblivious dynamic programming, in Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, Philadelphia, ACM, New York, 2006, pp. 591-600. (Pubitemid 43275280)
    • (2006) Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms , pp. 591-600
    • Chowdhury, R.A.1    Ramachandran, V.2
  • 28
    • 35548978022 scopus 로고    scopus 로고
    • Fast linear algebra is sable
    • J. DEMMEL, I. DUMITRIU, AND O. HOLTZ, Fast linear algebra is sable, Numer. Math., 108 (2007), pp. 59-91.
    • (2007) Numer. Math. , vol.108 , pp. 59-91
    • Demmel, J.1    Dumitriu, I.2    Holtz, O.3
  • 29
    • 80054022827 scopus 로고    scopus 로고
    • CS 267 course notes: Applications of parallel processing
    • University of California
    • J. DEMMEL, CS 267 Course Notes: Applications of Parallel Processing, Computer Science Division, University of California, 1996. http://www.cs. berkeley.edu/∼demmel/cs267.
    • (1996) Computer Science Division
    • Demmel, J.1
  • 30
    • 0003252789 scopus 로고    scopus 로고
    • Applied numerical linear algebra
    • J. DEMMEL, Applied Numerical Linear Algebra, SIAM, Philadelphia, 1997.
    • (1997) SIAM Philadelphia
    • Demmel, J.1
  • 31
    • 77953980008 scopus 로고    scopus 로고
    • Communication-optimal parallel and sequential QR and LU factorizations
    • University of California-Berkeley to appear in SIAM. J. Sci. Comput
    • J. DEMMEL, L. GRIGORI, M. HOEMMEN, AND J. LANGOU, Communication-Optimal Parallel and Sequential QR and LU Factorizations, EECS Technical Report EECS-2008-89, University of California-Berkeley, 2008, to appear in SIAM. J. Sci. Comput.
    • (2008) EECS Technical Report EECS-2008-89
    • Demmel, J.1    Grigori, L.2    Hoemmen, M.3    Langou, J.4
  • 34
    • 80051667036 scopus 로고    scopus 로고
    • CALU: A communication optimal LU factorization algorithm
    • University of California-Berkeley submitted to SIAM J. Matrix Anal. Appl
    • J. DEMMEL, L. GRIGORI, AND H. XIANG, CALU: A Communication Optimal LU Factorization Algorithm, EECS Technical Report EECS-2010-29, University of California-Berkeley, 2010, submitted to SIAM J. Matrix Anal. Appl.
    • (2010) EECS Technical Report EECS-2010-29
    • Demmel, J.1    Grigori, L.2    Xiang, H.3
  • 35
    • 0000456144 scopus 로고
    • Parallel matrix and graph algorithms
    • E. DEKEL, D. NASSIMI, AND S. SAHNI, Parallel matrix and graph algorithms, SIAM J. Comput., 10 (1981), pp. 657-675.
    • (1981) SIAM J. Comput. , vol.10 , pp. 657-675
    • Dekel, E.1    Nassimi, D.2    Sahni, S.3
  • 37
    • 0034224207 scopus 로고    scopus 로고
    • Applying recursion to serial and parallel QR factorization leads to better performance
    • E. ELMROTH AND F. GUSTAVSONApplying recursion to serial and parallel QR factorization leads to better performance, IBM J. Res. Dev., 44 (2000), pp. 605-624.
    • (2000) IBM J. Res. Dev. , vol.44 , pp. 605-624
    • Elmroth, E.1    Gustavson, F.2
  • 38
    • 1842832833 scopus 로고    scopus 로고
    • Recursive blocked algorithms and hybrid data structures for dense matrix library software
    • E. ELMROTH, F. GUSTAVSON, I. JONSSON, AND B. KÅGSTRÖM, Recursive blocked algorithms and hybrid data structures for dense matrix library software, SIAM Rev., 46 (2004), pp. 3-45.
    • (2004) SIAM Rev. , vol.46 , pp. 3-45
    • Elmroth, E.1    Gustavson, F.2    Jonsson, I.3    Kågström, B.4
  • 40
    • 1442337668 scopus 로고    scopus 로고
    • QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism
    • J. D. FRENS AND D. S. WISE, QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism, SIGPLAN Not., 38 (2003), pp. 144-154.
    • (2003) SIGPLAN Not. , vol.38 , pp. 144-154
    • Frens, J.D.1    Wise, D.S.2
  • 41
    • 0000264382 scopus 로고
    • Nested dissection of a regular finite element mesh
    • A. GEORGE, Nested dissection of a regular finite element mesh, SIAM J. Numer. Anal., 10 (1973), pp. 345-363.
    • (1973) SIAM J. Numer. Anal. , vol.10 , pp. 345-363
    • George, A.1
  • 42
    • 17644368925 scopus 로고    scopus 로고
    • Parallel out-of-core computation and updating of the QR factorization
    • DOI 10.1145/1055531.1055534
    • B. C. GUNTER AND R. A. VAN DE GEIJN, Parallel out-of-core computation and updating of the QR factorization, ACM Trans. Math. Software, 31 (2005), pp. 60-78. (Pubitemid 40557862)
    • (2005) ACM Transactions on Mathematical Software , vol.31 , Issue.1 , pp. 60-78
    • Gunter, B.C.1    Van De Geijn, R.A.2
  • 45
    • 0010865720 scopus 로고
    • The analysis of a nested dissection algorithm
    • J. R. GILBERT AND R. E. TARJAN, The analysis of a nested dissection algorithm, Numer. Math., 50 (1987), pp. 377-404.
    • (1987) Numer. Math. , vol.50 , pp. 377-404
    • Gilbert, J.R.1    Tarjan, R.E.2
  • 46
    • 0031273280 scopus 로고    scopus 로고
    • Recursion leads to automatic variable blocking for dense linear-algebra algorithms
    • F. G. GUSTAVSON, Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM J. Res. Dev., 41 (1997), pp. 737-756.
    • (1997) IBM J. Res. Dev. , vol.41 , pp. 737-756
    • Gustavson, F.G.1
  • 49
    • 0039236804 scopus 로고
    • Complexity bounds for regular finite difference and finite element grids
    • A. J. HOFFMAN, M. S. MARTIN, AND D. J. ROSE, Complexity bounds for regular finite difference and finite element grids, SIAM J. Numer. Anal., 10 (1973), pp. 364-369.
    • (1973) SIAM J. Numer Anal. , vol.10 , pp. 364-369
    • Hoffman, A.J.1    Martin, M.S.2    Rose, D.J.3
  • 50
    • 0036493233 scopus 로고    scopus 로고
    • Trading replication for communication in parallel distributed-memory dense solvers
    • D. IRONY AND S. TOLEDO, Trading replication for communication in parallel distributed-memory dense solvers, Parallel Process. Lett., 12 (2002), pp. 79-94. (Pubitemid 34668795)
    • (2002) Parallel Processing Letters , vol.12 , Issue.1 , pp. 79-94
    • Irony, D.1    Toledo, S.2
  • 51
    • 10844258198 scopus 로고    scopus 로고
    • Communication lower bounds for distributed-memory matrix multiplication
    • DOI 10.1016/j.jpdc.2004.03.021
    • D. IRONY, S. TOLEDO, AND A. TISKIN, Communication lower bounds for distributed-memory matrix multiplication, J. Parallel Distrib. Comput., 64 (2004), pp. 1017-1026. (Pubitemid 40000755)
    • (2004) Journal of Parallel and Distributed Computing , vol.64 , Issue.9 , pp. 1017-1026
    • Irony, D.1    Toledo, S.2    Tiskin, A.3
  • 52
    • 0001289565 scopus 로고
    • An inequality related to the isoperimetric inequality
    • L. H. LOOMIS AND H. WHITNEY, An inequality related to the isoperimetric inequality, Bull. Am. Math. Soc., 55 (1949), pp. 961-962.
    • (1949) Bull. Am. Math. Soc. , vol.55 , pp. 961-962
    • Loomis, L.H.1    Whitney, H.2
  • 54
    • 0000743020 scopus 로고    scopus 로고
    • Memory-efficient matrix multiplication in the BSP model
    • W. F. MCCOLL AND A. TISKIN, Memory-efficient matrix multiplication in the BSP model, Algorithmica, 24 (1999), pp. 287-297. (Pubitemid 129715337)
    • (1999) Algorithmica (New York) , vol.24 , Issue.3-4 , pp. 287-297
    • McColl, W.F.1    Tiskin, A.2
  • 55
    • 0012032244 scopus 로고
    • Modification of the Householder method based on compact WY representation
    • C. PUGLISI, Modification of the Householder method based on compact WY representation, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 723-726.
    • (1992) SIAM J. Sci. Statist. Comput. , vol.13 , pp. 723-726
    • Puglisi, C.1
  • 56
    • 0042385409 scopus 로고
    • Communication complexity of the Gaussian elimination algorithm on multiprocessors
    • Y. SAAD, Communication complexity of the Gaussian elimination algorithm on multiprocessors, Linear Algebra Appl., 77 (1986), pp. 315-340.
    • (1986) Linear Algebra Appl. , vol.77 , pp. 315-340
    • Saad, Y.1
  • 58
  • 59
    • 80052305141 scopus 로고    scopus 로고
    • Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
    • University of California-Berkeley to appear in EURO-PAR 2011
    • E. SOLOMONIK AND J. DEMMEL, Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms, EECS Technical Report EECS-2011-10, University of California-Berkeley, 2011, to appear in EURO-PAR 2011.
    • (2011) EECS Technical Report EECS-2011-10
    • Solomonik, E.1    Demmel, J.2
  • 60
    • 0003078924 scopus 로고
    • A storage-efficient WY representation for products of Householder transformations
    • R. SCHREIBER AND C. VAN LOAN, A storage-efficient WY representation for products of Householder transformations, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 53-57.
    • (1989) SIAM J. Sci. Statist. Comput. , vol.10 , pp. 53-57
    • Schreiber, R.1    Van Loan, C.2
  • 61
    • 0031496750 scopus 로고    scopus 로고
    • Locality of reference in LU decomposition with partial pivoting
    • S. TOLEDO, Locality of reference in LU decomposition with partial pivoting, SIAM J. Matrix Anal. Appl., 18 (1997), pp. 1065-1081.
    • (1997) SIAM J. Matrix Anal. Appl. , vol.18 , pp. 1065-1081
    • Toledo, S.1
  • 63
    • 24344485098 scopus 로고    scopus 로고
    • OSKI: A library of automatically tuned sparse matrix kernels
    • J. of Physics: Conference Series Institute of Physics Publishing, London
    • R. VUDUC, J. DEMMEL, AND K. YELICK, OSKI: A library of automatically tuned sparse matrix kernels, in Proceedings of SciDAC 2005, J. of Physics: Conference Series, Institute of Physics Publishing, London, 2005.
    • (2005) Proceedings of SciDAC 2005
    • Vuduc, R.1    Demmel, J.2    Yelick, K.3
  • 64
    • 34250883179 scopus 로고    scopus 로고
    • Fast sparse matrix multiplication
    • R. YUSTER AND U. ZWICK, Fast sparse matrix multiplication, ACM Trans. Algorithms, 1 (2005), pp. 2-13.
    • (2005) ACM Trans. Algorithms , vol.1 , pp. 2-13
    • Yuster, R.1    Zwick, U.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.