SCOPUS 정보 검색 플랫폼

SIAM Journal on Scientific Computing

Volumn 32, Issue 6, 2010, Pages 3495-3523

Communication-optimal parallel and sequential Cholesky decomposition

(4) Ballard, Grey a Demmel, James a Holtz, Olga b Schwartz, Oded c

a UNIVERSITY OF CALIFORNIA (United States)

b TECHNISCHE UNIVERSITÄT BERLIN (Germany)

c WEIZMANN INSTITUTE OF SCIENCE (Israel)

Author keywords

Algorithm; Bandwidth; Cholesky decomposition; Communication avoiding; Latency; Lower bound

Indexed keywords

COSTS; FACTORIZATION; LINEAR SYSTEMS; LOWER-UPPER DECOMPOSITION; MEMORY ARCHITECTURE;

CHOLESKY DECOMPOSITION; COMMUNICATION AVOIDING; COMMUNICATION COST; LATENCY; LOW BOUND; LU FACTORIZATION; MATRIX MULTIPLICATION; MEMORY HIERARCHY; NUMERICAL ALGORITHMS; QR FACTORIZATIONS;

BANDWIDTH;

EID: 79251563454 PISSN: 10648275 EISSN: None Source Type: Journal
DOI: 10.1137/090760969 Document Type: Article

Times cited : (40)

References (30)

1
- 69149088136
- IEEE standard for floating-point arithmetic
- IEEE standard for floating-point arithmetic, IEEE Std. 754-2008, (2008), pp. 1-58.
- (2008) IEEE Std. , vol.754-2008 , pp. 1-58

2
- 0024082546
- The input/output complexity of sorting and related problems
- A. Aggarwal and J. S. Vitter, The input/output complexity of sorting and related problems, Commun. ACM, 31 (1988), pp. 1116-1127.
- (1988) Commun. ACM , vol.31 , pp. 1116-1127
- Aggarwal, A.¹ Vitter, J.S.²

3
- 84937408012
- Automatic generation of block-recursive codes
- London, UK, Springer-Verlag
- N. Ahmed and K. Pingali, Automatic generation of block-recursive codes, in Euro-Par '00: Proceedings from the 6th International Euro-Par Conference on Parallel Processing, London, UK, 2000, Springer-Verlag, pp. 368-378.
- (2000) Euro-Par '00: Proceedings from the 6th International Euro-par Conference on Parallel Processing , pp. 368-378
- Ahmed, N.¹ Pingali, K.²

4
- 18044400448
- A recursive formulation of Cholesky factorization of a matrix in packed storage format
- B. S. Andersen, F. G. Gustavson, and J. Wasniewski, A recursive formulation of Cholesky factorization of a matrix in packed storage format, ACM Trans. Math. Software, 27 (2001), pp. 214-244.
- (2001) ACM Trans. Math. Software , vol.27 , pp. 214-244
- Andersen, B.S.¹ Gustavson, F.G.² Wasniewski, J.³

5
- 0003706460
- 3rd ed. SIAM, Philadelphia
- E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, LAPACK Users Guide, 3rd ed., SIAM, Philadelphia, 1999; also available from . org/lapack/.
- (1999) LAPACK Users Guide
- Anderson, E.¹ Bai, Z.² Bischof, C.³ Demmel, J.⁴ Dongarra, J.⁵ Du Croz, J.⁶ Greenbaum, A.⁷ Hammarling, S.⁸ McKenney, A.⁹ Ostrouchov, S.¹⁰ Sorensen, D.¹¹

6
- 45449120592
- Hardware-oriented implementation of cache oblivious matrix operations based on space-filling curves
- Parallel Processing and Applied Mathematics, 7th International Conference, PPAM, Springer-Verlag, New York
- M. Bader, R. Franz, S. Guenther, and A. Heinecke, Hardware-oriented implementation of cache oblivious matrix operations based on space-filling curves, in Parallel Processing and Applied Mathematics, 7th International Conference, PPAM 2007, Lecture Notes in Comput. Sci. 4967, Springer-Verlag, New York, 2008, pp. 628-638.
- (2007) Lecture Notes in Comput. Sci. , vol.4967 , Issue.2008 , pp. 628-638
- Bader, M.¹ Franz, R.² Guenther, S.³ Heinecke, A.⁴

7
- 79251547215
- Minimizing communication in linear algebra
- Submitted
- G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, Minimizing communication in linear algebra, SIAM J. Matrix Anal. Appl., submitted; also available at .
- SIAM J. Matrix Anal. Appl.
- Ballard, G.¹ Demmel, J.² Holtz, O.³ Schwartz, O.⁴

8
- 70449623419
- Communication-optimal parallel and sequential cholesky decomposition
- G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, Communication-optimal parallel and sequential Cholesky decomposition, in SPAA '09: Proceedings of the 21st ACM Symposium on Parallelism in Algorithms and Architectures, 2009, pp. 245-252.
- (2009) SPAA '09: Proceedings of the 21st ACM Symposium on Parallelism in Algorithms and Architectures , pp. 245-252
- Ballard, G.¹ Demmel, J.² Holtz, O.³ Schwartz, O.⁴

9
- 77956611313
- Optimal sparse matrix dense vector multiplication in the I/O-model
- M. A. Bender, G. S. Brodal, R. Fagerberg, R. Jacob, and E. Vicari, Optimal sparse matrix dense vector multiplication in the I/O-model, Theoret. Comput. Sys., 47 (2010), pp. 934-962.
- (2010) Theoret. Comput. Sys. , vol.47 , pp. 934-962
- Bender, M.A.¹ Brodal, G.S.² Fagerberg, R.³ Jacob, R.⁴ Vicari, E.⁵

10
- 70449440599
- Out-of-core implementations of Cholesky factorization: Loop-based versus recursive algorithms
- N. Béreux, Out-of-core implementations of Cholesky factorization: Loop-based versus recursive algorithms, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 1302-1319.
- (2008) SIAM J. Matrix Anal. Appl. , vol.30 , pp. 1302-1319
- Béreux, N.¹

11
- 0003615167
- SIAM, Philadelphia
- L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley, ScaLAPACK Users' Guide, SIAM, Philadelphia, 1997; also available from .
- (1997) ScaLAPACK Users' Guide
- Blackford, L.S.¹ Choi, J.² Cleary, A.³ D'azevedo, E.⁴ Demmel, J.⁵ Dhillon, I.⁶ Dongarra, J.⁷ Hammarling, S.⁸ Henry, G.⁹ Petitet, A.¹⁰ Stanley, K.¹¹ Walker, D.¹² Whaley, R.C.¹³

12
- 33244497406
- New York, ACM
- R. A. Chowdhury and V. Ramachandran, Cache-oblivious dynamic programming, in SODA '06: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, New York, 2006, ACM, pp. 591-600.
- (2006) Cache-oblivious Dynamic Programming, in SODA '06: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms , pp. 591-600
- Chowdhury, R.A.¹ Ramachandran, V.²

13
- 77953980008
- Communication-optimal parallel and sequential QR and LU factorizations
- Technical report EECS- 2008-89 University of California Berkeley, Berkeley, CA. submitted
- J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Communication-optimal Parallel and Sequential QR and LU Factorizations, Technical report EECS-2008-89, University of California Berkeley, Berkeley, CA, 2008, SIAM. J. Sci. Comput., submitted.
- (2008) SIAM. J. Sci. Comput.
- Demmel, J.¹ Grigori, L.² Hoemmen, M.³ Langou, J.⁴

14
- 85140867620
- Implementing communication-optimal parallel and sequential QR and LU factorizations
- submitted
- J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Implementing communication-optimal parallel and sequential QR and LU factorizations, SIAM. J. Sci. Comput., submitted.
- SIAM. J. Sci. Comput.
- Demmel, J.¹ Grigori, L.² Hoemmen, M.³ Langou, J.⁴

15
- 70350784030
- Communication-avoiding Gaussian elimination
- J. Demmel, L. Grigori, and H. Xiang, Communication-avoiding Gaussian elimination, in Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, 2008.
- (2008) Proceedings of the 2008 ACM/IEEE Conference on Supercomputing
- Demmel, J.¹ Grigori, L.² Xiang, H.³

16
- 1842832833
- Recursive blocked algorithms and hybrid data structures for dense matrix library software
- E. Elmroth, F. Gustavson, I. Jonsson, and B. Ka°gström, Recursive blocked algorithms and hybrid data structures for dense matrix library software, SIAM Rev., 46 (2004), pp. 3-45.
- (2004) SIAM Rev. , vol.46 , pp. 3-45
- Elmroth, E.¹ Gustavson, F.² Jonsson, I.³ Kagström, B.⁴

17
- 0033350255
- Cache-oblivious algorithms
- Washington, DC, IEEE Computer Society
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran, Cache-oblivious algorithms, in FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, Washington, DC, 1999, IEEE Computer Society, pp. 285-297.
- (1999) FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science , pp. 285-297
- Frigo, M.¹ Leiserson, C.E.² Prokop, H.³ Ramachandran, S.⁴

18
- 79251582362
- Getting up to speed: The future of supercomputing
- The National Academies Press, Washington, D.C.
- S. L. Graham, M. Snir, and C. A. Patterson, eds., Getting up to Speed: The Future of Supercomputing, Report of the National Research Council of the National Academies of Sciences, The National Academies Press, Washington, D.C., 2004; also available online from .
- (2004) Report of the National Research Council of the National Academies of Sciences
- Graham, S.L.¹ Snir, M.² Patterson, C.A.³

19
- 79251581739
- Personal communication
- L. Grigori. Personal communication, 2009.
- (2009)
- Grigori., L.¹

20
- 0031273280
- Recursion leads to automatic variable blocking for dense linear-algebra algorithms
- F. G. Gustavson, Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM J. Res. Dev., 41 (1997), pp. 737-756.
- (1997) IBM J. Res. Dev. , vol.41 , pp. 737-756
- Gustavson, F.G.¹

21
- 84956987224
- High performance Cholesky factorization via blocking and recursion that uses minimal storage
- New Paradigms for HPC in Industry and Academia, London, UK Springer-Verlag
- F. G. Gustavson and I. Jonsson, High performance Cholesky factorization via blocking and recursion that uses minimal storage, in PARA '00: Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, London, UK, 2001, Springer-Verlag, pp. 82-91.
- (2001) PARA '00: Proceedings of the 5th International Workshop on Applied Parallel Computing , pp. 82-91
- Gustavson, F.G.¹ Jonsson, I.²

22
- 0036457301
- 2nd ed. SIAM, Philadelphia
- N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM, Philadelphia, 2002.
- (2002) Accuracy and Stability of Numerical Algorithms
- Higham, N.J.¹

23
- 84971853043
- I/O complexity: The red-blue pebble game
- New York, ACM
- J. W. Hong and H. T. Kung, I/O complexity: The red-blue pebble game, in STOC '81: Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, New York, 1981, ACM, pp. 326-333.
- (1981) STOC '81: Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing , pp. 326-333
- Hong, J.W.¹ Kung, H.T.²

24
- 0242674322
- Communication-efficient parallel dense LU using a 3-dimensional approach
- D. Irony and S. Toledo, Communication-efficient parallel dense LU using a 3-dimensional approach, in Proceedings of the 10th SIAM Conference on Parallel Processing for Scientific Computing, 2001.
- (2001) Proceedings of the 10th SIAM Conference on Parallel Processing for Scientific Computing
- Irony, D.¹ Toledo, S.²

25
- 10844258198
- Communication lower bounds for distributed-memory matrix multiplication
- D. Irony, S. Toledo, and A. Tiskin, Communication lower bounds for distributed-memory matrix multiplication, J. Parallel Distrib. Comput., 64 (2004), pp. 1017-1026.
- (2004) J. Parallel Distrib. Comput. , vol.64 , pp. 1017-1026
- Irony, D.¹ Toledo, S.² Tiskin, A.³

26
- 84957579840
- Extending the Hong-Kung model to memory hierarchies
- J. E. Savage, Extending the Hong-Kung model to memory hierarchies, in COCOON, 1995, pp. 270-281.
- (1995) COCOON , pp. 270-281
- Savage, J.E.¹

27
- 10044286066
- Analytical model for analysis of cache behavior during cholesky factorization and its variants
- Washington, DC, IEEE Computer Society
- I. Simecek and P. Tvrdik, Analytical model for analysis of cache behavior during Cholesky factorization and its variants, in ICPPW '04: Proceedings of the 2004 International Conference on Parallel Processing Workshops, Washington, DC, 2004, IEEE Computer Society, pp. 190-197.
- (2004) ICPPW '04: Proceedings of the 2004 International Conference on Parallel Processing Workshops , pp. 190-197
- Simecek, I.¹ Tvrdik, P.²

28
- 0031496750
- Locality of reference in LU decomposition with partial pivoting
- S. Toledo, Locality of reference in LU decomposition with partial pivoting, SIAM J. Matrix Anal. Appl., 18 (1997), pp. 1065-1081.
- (1997) SIAM J. Matrix Anal. Appl. , vol.18 , pp. 1065-1081
- Toledo, S.¹

29
- 0010020992
- Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free
- London, UK, Springer-Verlag
- D. Wise, Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free, in Euro- Par '00: Proceedings from the 6th International Euro-Par Conference on Parallel Processing, London, UK, 2000, Springer-Verlag, pp. 774-783.
- (2000) Euro- Par '00: Proceedings from the 6th International Euro-Par Conference on Parallel Processing , pp. 774-783
- Wise, D.¹

30
- 0024142145
- Critical path analysis for the execution of parallel and distributed programs
- IEEE Computer Society
- C.-Q. Yang and B.P. Miller, Critical path analysis for the execution of parallel and distributed programs, in Proceedings from the 8th International Conference on Distributed Computing Systems, IEEE Computer Society, 1988, pp. 366-373.
- (1988) Proceedings from the 8th International Conference on Distributed Computing Systems , pp. 366-373
- Yang, C.-Q.¹ Miller, B.P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.