SCOPUS 정보 검색 플랫폼

ACM Transactions on Mathematical Software

Volumn 39, Issue 2, 2013, Pages

Elemental: A new framework for distributed memory dense matrix computations

(5) Poulson, Jack a Marker, Bryan a Van De Geijn, Robert A a Hammond, Jeff R b Romero, Nichols A b

a UNIVERSITY OF TEXAS AT AUSTIN (United States)

b ARGONNE NATIONAL LABORATORY (United States)

Author keywords

High performance; Libraries; Linear algebra; Parallel computing

Indexed keywords

DENSE MATRICES; DISTRIBUTED MEMORY; DISTRIBUTED MEMORY ARCHITECTURE; HIGH-PERFORMANCE; LESSONS LEARNED; MANY-CORE ARCHITECTURE; PRELIMINARY PERFORMANCE RESULTS; SINGLE PROCESSORS;

LIBRARIES; LINEAR ALGEBRA; MEMORY ARCHITECTURE; PARALLEL ARCHITECTURES; PARALLEL PROCESSING SYSTEMS;

MATRIX ALGEBRA;

EID: 84875133170 PISSN: 00983500 EISSN: 15577295 Source Type: Journal
DOI: 10.1145/2427023.2427030 Document Type: Article

Times cited : (172)

References (46)

1
- 84900317032
- PLAPACK: Parallel linear algebra package: Design overview
- Alpatov, P., Baker, G., Edwards, C., Gunnels, J., Morrow, G., Overfelt, J., van de Geijn, R., and Wu, Y.-J. J. 1997. PLAPACK: Parallel Linear Algebra Package: Design overview. In Proceedings of the Conference on Supercomputing.
- (1997) Proceedings of the Conference on Supercomputing
- Alpatov, P.¹ Baker, G.² Edwards, C.³ Gunnels, J.⁴ Morrow, G.⁵ Overfelt, J.⁶ Van De Geijn, R.⁷ Wu, Y.-J.J.⁸

2
- 0242343480
- LAPACK for distributed memory architectures: Progress report
- SIAM, Philadelphia, PA
- Anderson, E., Benzoni, A., Dongarra, J., Moulton, S., Ostrouchov, S., Tourancheau, B., and van de Geijn, R. 1992. LAPACK for distributed memory architectures: Progress report. In Proceedings of the 5th SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Philadelphia, PA, 625-630.
- (1992) Proceedings of the 5th SIAM Conference on Parallel Processing for Scientific Computing , pp. 625-630
- Anderson, E.¹ Benzoni, A.² Dongarra, J.³ Moulton, S.⁴ Ostrouchov, S.⁵ Tourancheau, B.⁶ Van De Geijn, R.⁷

3
- 0003706460
- SIAM, Philadelphia, PA
- Anderson, E., Bai, Z., et al. 1999. LAPACK Users' Guide 3rd Ed. SIAM, Philadelphia, PA.
- (1999) LAPACK Users' Guide 3rd Ed.
- Anderson, E.¹ Bai, Z.²

4
- 10244221212
- An automated multilevel substructuring method for eigenspace computation in linear elastodynamics
- Bennighof, J. K. and Lehoucq, R. 2003. An automated multilevel substructuring method for eigenspace computation in linear elastodynamics. SIAM J. Sci. Comput. 25, 2084-2106.
- (2003) SIAM J. Sci. Comput. , vol.25 , pp. 2084-2106
- Bennighof, J.K.¹ Lehoucq, R.²

5
- 33144474872
- A parallel eigensolver for dense symmetric matrices based on multiple relatively robust representations
- Bientinesi, P., Dhillon, I. S., and van de Geijn, R. A. 2005a. A parallel eigensolver for dense symmetric matrices based on multiple relatively robust representations. SIAM J. Sci. Comput. 27, 1, 43-66.
- (2005) SIAM J. Sci. Comput. , vol.27 , Issue.1 , pp. 43-66
- Bientinesi, P.¹ Dhillon, I.S.² Van De Geijn, R.A.³

6
- 17644370328
- Representing linear algebra algorithms in code: The FLAME application programming interfaces
- Bientinesi, P., Quintana-Ortí, E. S., and van de Geijn, R. A. 2005b. Representing linear algebra algorithms in code: The FLAME application programming interfaces. ACM Trans. Math. Softw. 31, 1, 27-59.
- (2005) ACM Trans. Math. Softw. , vol.31 , Issue.1 , pp. 27-59
- Bientinesi, P.¹ Quintana-Ortí, E.S.² Van De Geijn, R.A.³

7
- 0003615167
- SIAM
- Blackford, L. S., Choi, J., et al. 1997. ScaLAPACK Users' Guide. SIAM.
- (1997) ScaLAPACK Users' Guide
- Blackford, L.S.¹ Choi, J.²

8
- 34548217713
- Collective communication: Theory, practice, and experience
- Chan, E., Heimlich, M., Purkayastha, A., and van de Geijn, R. 2007a. Collective communication: theory, practice, and experience. Concurrency Comput. Pract. Exper. 19, 13, 1749-1783.
- (2007) Concurrency Comput. Pract. Exper. , vol.19 , Issue.13 , pp. 1749-1783
- Chan, E.¹ Heimlich, M.² Purkayastha, A.³ Van De Geijn, R.⁴

9
- 35248843628
- SuperMatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
- Chan, E., Quintana-Ortí, E., Quintana-Ortí, G., and van de Geijn, R. 2007b. SuperMatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures. In Proceedings of the 19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'07). 116-126.
- (2007) Proceedings of the 19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'07) , pp. 116-126
- Chan, E.¹ Quintana-Ortí, E.² Quintana-Ortí, G.³ Van De Geijn, R.⁴

10
- 84875124205
- The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines
- University of Tennessee
- Choi, J., Dongarra, J. J., Ostrouchov, L. S., Petitet, A. P., Walker, D. W., and Whaley, R. C. 1994. The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines. LAPACK Working Note 80 UT-CS-94-246, University of Tennessee.
- (1994) LAPACK Working Note 80 UT-CS-94-246
- Choi, J.¹ Dongarra, J.J.² Ostrouchov, L.S.³ Petitet, A.P.⁴ Walker, D.W.⁵ Whaley, R.C.⁶

11
- 0031221523
- Parallel implementation of BLAS: General techniques for level 3 BLAS
- Chtchelkanova, A., Gunnels, J., Morrow, G., Overfelt, J., and van de Geijn, R. A. 1997. Parallel implementation of BLAS: General techniques for level 3 BLAS. Concurrency: Pract. Exper. 9, 9, 837-857.
- (1997) Concurrency: Pract. Exper. , vol.9 , Issue.9 , pp. 837-857
- Chtchelkanova, A.¹ Gunnels, J.² Morrow, G.³ Overfelt, J.⁴ Van De Geijn, R.A.⁵

12
- 0000659575
- A divide and conquer method for the symmetric tridiagonal eigenvalue problem
- Cuppen, J. J. M. 1981. A divide and conquer method for the symmetric tridiagonal eigenvalue problem. Numer. Math. 36, 177-195.
- (1981) Numer. Math. , vol.36 , pp. 177-195
- Cuppen, J.J.M.¹

13
- 0008813715
- Ph.D. thesis, EECS Department, University of California, Berkeley
- Dhillon, I. S. 1997. A new O(n2) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. Ph.D. thesis, EECS Department, University of California, Berkeley.
- (1997) A New O(n2) Algorithm for the Symmetric Tridiagonal Eigenvalue/ eigenvector Problem
- Dhillon, I.S.¹

14
- 0002944021
- LAPACK block factorization algorithms on the Intel iPSC/860
- Dongarra, J. and Ostrouchov, S. 1990. LAPACK block factorization algorithms on the Intel iPSC/860. LAPACK Working Note 24, Tech. rep. CS-90-115, University of Tennessee.
- (1990) LAPACK Working Note 24, Tech. Rep. CS-90-115, University of Tennessee
- Dongarra, J.¹ Ostrouchov, S.²

15
- 0026912004
- Reduction to condensed form on distributed memory architectures
- Dongarra, J. and van de Geijn, R. 1992. Reduction to condensed form on distributed memory architectures. Parallel Comput. 18, 973-982.
- (1992) Parallel Comput , vol.18 , pp. 973-982
- Dongarra, J.¹ Van De Geijn, R.²

16
- 0000778168
- Scalability issues affecting the design of a dense linear algebra library
- Dongarra, J., van de Geijn, R., and Walker, D. 1994. Scalability issues affecting the design of a dense linear algebra library. J. Parallel Distrib. Comput. 22, 3.
- (1994) J. Parallel Distrib. Comput. , vol.22 , pp. 3
- Dongarra, J.¹ Van De Geijn, R.² Walker, D.³

17
- 0025402476
- A set of level 3 basic linear algebra subprograms
- Dongarra, J. J., Du Croz, J., Hammarling, S., and Duff, I. 1990. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1, 1-17.
- (1990) ACM Trans. Math. Softw. , vol.16 , Issue.1 , pp. 1-17
- Dongarra, J.J.¹ Du Croz, J.² Hammarling, S.³ Duff, I.⁴

18
- 12444256113
- Parallelmatrix distributions:Have we been doing it all wrong?
- University of Texas at Austin
- Edwards, C., Geng, P., Patra, A., and van de Geijn, R. 1995. Parallelmatrix distributions:Have we been doing it all wrong? Tech. rep. TR-95-40, Department of Computer Sciences, University of Texas at Austin.
- (1995) Tech. Rep. TR-95-40, Department of Computer Sciences
- Edwards, C.¹ Geng, P.² Patra, A.³ Van De Geijn, R.⁴

19
- 0242351712
- The generalized eigenvalue problem in quantum chemistry
- Ford, B. and Hall, G. 1974. The generalized eigenvalue problem in quantum chemistry. Comput. Phys. Commun. 8, 5, 337-348.
- (1974) Comput. Phys. Commun. , vol.8 , Issue.5 , pp. 337-348
- Ford, B.¹ Hall, G.²

20
- 0004236492
- Johns Hopkins University Press, Baltimore, MD
- Golub, G. H. and Van Loan, C. F. 1989. Matrix Computations 2nd Ed. Johns Hopkins University Press, Baltimore, MD.
- (1989) Matrix Computations 2nd Ed.
- Golub, G.H.¹ Van Loan, C.F.²

21
- 44249094647
- Anatomy of high-performance matrix multiplication
- Article 12
- Goto, K. and van de Geijn, R. A. 2008. Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34, 3: Article 12.
- (2008) ACM Trans. Math. Softw. , vol.34 , Issue.3
- Goto, K.¹ Van De Geijn, R.A.²

22
- 0039435412
- FLAME: Formal linear algebra methods environment
- Gunnels, J. A., Gustavson, F. G., Henry, G.M., and van de Geijn, R. A. 2001. FLAME: Formal Linear Algebra Methods Environment. ACM Trans. Math. Softw. 27, 4, 422-455.
- (2001) ACM Trans. Math. Softw. , vol.27 , Issue.4 , pp. 422-455
- Gunnels, J.A.¹ Gustavson, F.G.² Henry, G.M.³ Van De Geijn, R.A.⁴

23
- 0032226427
- Toward an efficient parallel eigensolver for dense symmetric matrices
- Hendrickson, B., Jessup, E., and Smith, C. 1999. Toward an efficient parallel eigensolver for dense symmetric matrices. SIAM J. Sci. Comput. 20, 3, 1132-1154.
- (1999) SIAM J. Sci. Comput. , vol.20 , Issue.3 , pp. 1132-1154
- Hendrickson, B.¹ Jessup, E.² Smith, C.³

24
- 0000667923
- The torus-wrap mapping for dense matrix calculations on massively parallel computers
- Hendrickson, B. A. and Womble, D. E. 1994. The torus-wrap mapping for dense matrix calculations on massively parallel computers. SIAM J. Sci. Stat. Comput. 15, 5, 1201-1226.
- (1994) SIAM J. Sci. Stat. Comput. , vol.15 , Issue.5 , pp. 1201-1226
- Hendrickson, B.A.¹ Womble, D.E.²

25
- 77952123736
- A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS
- Howard, J., Dighe, S., et al. 2010. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the International Solid-State Circuits Conference.
- (2010) Proceedings of the International Solid-State Circuits Conference
- Howard, J.¹ Dighe, S.²

26
- 33746075581
- Accumulating Householder transformations, revisited
- Joffrain, T., Low, T. M., Quintana-Ortí, E. S., van de Geijn, R., and Van Zee, F. G. 2006. Accumulating Householder transformations, revisited. ACM Trans. Math. Softw. 32, 2, 169-179.
- (2006) ACM Trans. Math. Softw. , vol.32 , Issue.2 , pp. 169-179
- Joffrain, T.¹ Low, T.M.² Quintana-Ortí, E.S.³ Van De Geijn, R.⁴ Van Zee, F.G.⁵

27
- 0023328834
- Communication efficient basic linear algebra computations on hypercube architectures
- Johnsson, S. L. 1987. Communication efficient basic linear algebra computations on hypercube architectures. J. Parallel Distrib. Comput. 4, 133-172.
- (1987) J. Parallel Distrib. Comput. , vol.4 , pp. 133-172
- Johnsson, S.L.¹

28
- 84875167467
- Mechanizing the expert dense linear algebra developer
- University of Texas at Austin
- Marker, B., Terrel, A., Poulson, J., Batory, D., and van de Geijn, R. 2011. Mechanizing the expert dense linear algebra developer. FLAME working note #58 TR-11-18, Department of Computer Sciences, University of Texas at Austin.
- (2011) FLAME Working Note #58 TR-11-18, Department of Computer Sciences
- Marker, B.¹ Terrel, A.² Poulson, J.³ Batory, D.⁴ Van De Geijn, R.⁵

29
- 84864646753
- Programming many-core architectures - A case study: Dense matrix computations on the Intel SCC processor
- Marker, B., Chan, E., Poulson, J., van de Geijn, R., Van der Wijngaart, R. F., Mattson, T. G., and Kubaska, T. E. 2012. Programming many-core architectures - a case study: Dense matrix computations on the Intel SCC processor. Concurrency Comput. Pract. Exper. 24, 12, 1317-1333.
- (2012) Concurrency Comput. Pract. Exper. , vol.24 , Issue.12 , pp. 1317-1333
- Marker, B.¹ Chan, E.² Poulson, J.³ Van De Geijn, R.⁴ Van Der Wijngaart, R.F.⁵ Mattson, T.G.⁶ Kubaska, T.E.⁷

30
- 70350754500
- Programming the Intel 80-core networkon - A -chip terascale processor
- IEEE Press
- Mattson, T. G., Van der Wijngaart, R., and FRUMKIN, M. 2008. Programming the Intel 80-core networkon- a-chip terascale processor. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'08). IEEE Press, 1-11.
- (2008) Proceedings of the ACM/IEEE Conference on Supercomputing (SC'08) , pp. 1-11
- Mattson, T.G.¹ Van Der Wijngaart, R.² Frumkin, M.³

31
- 84875161681
- Petitet, A., Whaley, R. C., Dongarra, J., and Cleary, A. HPL Algorithm. http://netlib.org/benchmark/hpl/algorithm.html.
- HPL Algorithm
- Petitet, A.¹ Whaley, R.C.² Dongarra, J.³ Cleary, A.⁴

32
- 80052786022
- Parallel algorithms for reducing the generalized Hermitian-definite eigenvalue problem
- University of Texas at Austin
- Poulson, J., van de Geijn, R., and Bennighof, J. 2011. Parallel algorithms for reducing the generalized Hermitian-definite eigenvalue problem. FLAME working note #56. Tech. rep. TR-11-05, Department of Computer Sciences, University of Texas at Austin.
- (2011) FLAME Working Note #56. Tech. Rep. TR-11-05, Department of Computer Sciences
- Poulson, J.¹ Van De Geijn, R.² Bennighof, J.³

33
- 70349755577
- Programming matrix algorithms-by-blocks for thread-level parallelism
- Quintana-Ortí, G., Quintana-Ortí, E. S., van de Geijn, R. A., Van Zee, F. G., and Chan, E. 2009. Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. 36, 3, 14:1-14:26.
- (2009) ACM Trans. Math. Softw. , vol.36 , Issue.3 , pp. 141-1426
- Quintana-Ortí, G.¹ Quintana-Ortí, E.S.² Van De Geijn, R.A.³ Van Zee, F.G.⁴ Chan, E.⁵

34
- 84875165249
- Home Page
- ScaLAPACK 2010. Home Page. http://www.netlib.org/scalapack/scalapack- home.html.
- (2010)

35
- 0000127707
- Scalability of sparse direct solvers
- Schreiber, R. 1992. Scalability of sparse direct solvers. Graph Theory and Sparse Matrix Computations 56.
- (1992) Graph Theory and Sparse Matrix Computations , vol.56
- Schreiber, R.¹

36
- 84875200537
- Application of a high performance parallel eigensolver to electronic structure calculations
- IEEE Computer Society
- Sears, M. P., Stanley, K., and Henry, G. 1998. Application of a high performance parallel eigensolver to electronic structure calculations. In Proceedings of the ACM/IEEE Conference on Supercomputing. IEEE Computer Society, 1-1.
- (1998) Proceedings of the ACM/IEEE Conference on Supercomputing , pp. 1-1
- Sears, M.P.¹ Stanley, K.² Henry, G.³

37
- 0025521855
- Communication and matrix computations on large message passing systems
- Stewart, G. 1990. Communication and matrix computations on large message passing systems. Parallel Comput. 16, 27-40.
- (1990) Parallel Comput. , vol.16 , pp. 27-40
- Stewart, G.¹

38
- 0014797920
- Incorporating origin shifts into the qr algorithm for symmetric tridiagonal matrices
- Stewart, G. W. 1970. Incorporating origin shifts into the qr algorithm for symmetric tridiagonal matrices. Comm. ACM 13, 365-367.
- (1970) Comm. ACM , vol.13 , pp. 365-367
- Stewart, G.W.¹

39
- 84875188840
- Optimal load balancing techniques for block-cyclic decompositions for matrix factorization
- Strazdins, P. E. 1998. Optimal load balancing techniques for block-cyclic decompositions for matrix factorization. In Proceedings of the 2nd International Conference on Parallel and Distributed Computing and Networks (PDCN'98).
- (1998) Proceedings of the 2nd International Conference on Parallel and Distributed Computing and Networks (PDCN'98)
- Strazdins, P.E.¹

40
- 0026819059
- Dense linear solve on the Intel touchstone delta system
- Digest of Papers
- van de Geijn, R. 1992. Dense linear solve on the Intel touchstone delta system. In Proceedings of the 37th IEEE Computer Society International Conference. (Digest of Papers.)
- (1992) Proceedings of the 37th IEEE Computer Society International Conference
- Van De Geijn, R.¹

41
- 0003710742
- MIT Press
- van de Geijn, R. A. 1997. Using PLAPACK: Parallel Linear Algebra Package. MIT Press.
- (1997) Using PLAPACK: Parallel Linear Algebra Package
- Van De Geijn, R.A.¹

42
- 70349742199
- van de Geijn, R. A. and Quintana-Ortí, E. S. 2008. The science of programming matrix computations. http://www.lulu.com/content/1911788.
- (2008) The Science of Programming Matrix Computations
- Van De Geijn, R.A.¹ Quintana-Ortí, E.S.²

43
- 77956971956
- Van Zee, F. G. 2009. libflame: The Complete Reference. www.lulu.com.
- (2009) Libflame: The Complete Reference
- Van Zee, F.G.¹

44
- 84943297310
- Automatically tuned linear algebra software
- Whaley, R. C. and Dongarra, J. J. 1998. Automatically tuned linear algebra software. In Proceedings of the Conference on Supercomputing (SC'98).
- (1998) Proceedings of the Conference on Supercomputing (SC'98)
- Whaley, R.C.¹ Dongarra, J.J.²

45
- 0003660674
- Oxford University Press, Oxford, UK
- Wilkinson, J. H. 1965. The Algebraic Eigenvalue Problem. Oxford University Press, Oxford, UK.
- (1965) The Algebraic Eigenvalue Problem
- Wilkinson, J.H.¹

46
- 33847130468
- A parallel implementation of symmetric band reduction using PLAPACK
- Mississippi State University
- Wu, Y.-J. J., Alpatov, P. A., Bischof, C., and van de Geijn, R. A. 1996. A parallel implementation of symmetric band reduction using PLAPACK. In Proceedings of the Scalable Parallel Library Conference, Mississippi State University.
- (1996) Proceedings of the Scalable Parallel Library Conference
- Wu, Y.-J.J.¹ Alpatov, P.A.² Bischof, C.³ Van De Geijn, R.A.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.