-
1
-
-
84900317032
-
PLAPACK: Parallel linear algebra package: Design overview
-
Alpatov, P., Baker, G., Edwards, C., Gunnels, J., Morrow, G., Overfelt, J., van de Geijn, R., and Wu, Y.-J. J. 1997. PLAPACK: Parallel Linear Algebra Package: Design overview. In Proceedings of the Conference on Supercomputing.
-
(1997)
Proceedings of the Conference on Supercomputing
-
-
Alpatov, P.1
Baker, G.2
Edwards, C.3
Gunnels, J.4
Morrow, G.5
Overfelt, J.6
Van De Geijn, R.7
Wu, Y.-J.J.8
-
2
-
-
0242343480
-
LAPACK for distributed memory architectures: Progress report
-
SIAM, Philadelphia, PA
-
Anderson, E., Benzoni, A., Dongarra, J., Moulton, S., Ostrouchov, S., Tourancheau, B., and van de Geijn, R. 1992. LAPACK for distributed memory architectures: Progress report. In Proceedings of the 5th SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Philadelphia, PA, 625-630.
-
(1992)
Proceedings of the 5th SIAM Conference on Parallel Processing for Scientific Computing
, pp. 625-630
-
-
Anderson, E.1
Benzoni, A.2
Dongarra, J.3
Moulton, S.4
Ostrouchov, S.5
Tourancheau, B.6
Van De Geijn, R.7
-
3
-
-
0003706460
-
-
SIAM, Philadelphia, PA
-
Anderson, E., Bai, Z., et al. 1999. LAPACK Users' Guide 3rd Ed. SIAM, Philadelphia, PA.
-
(1999)
LAPACK Users' Guide 3rd Ed.
-
-
Anderson, E.1
Bai, Z.2
-
4
-
-
10244221212
-
An automated multilevel substructuring method for eigenspace computation in linear elastodynamics
-
Bennighof, J. K. and Lehoucq, R. 2003. An automated multilevel substructuring method for eigenspace computation in linear elastodynamics. SIAM J. Sci. Comput. 25, 2084-2106.
-
(2003)
SIAM J. Sci. Comput.
, vol.25
, pp. 2084-2106
-
-
Bennighof, J.K.1
Lehoucq, R.2
-
5
-
-
33144474872
-
A parallel eigensolver for dense symmetric matrices based on multiple relatively robust representations
-
Bientinesi, P., Dhillon, I. S., and van de Geijn, R. A. 2005a. A parallel eigensolver for dense symmetric matrices based on multiple relatively robust representations. SIAM J. Sci. Comput. 27, 1, 43-66.
-
(2005)
SIAM J. Sci. Comput.
, vol.27
, Issue.1
, pp. 43-66
-
-
Bientinesi, P.1
Dhillon, I.S.2
Van De Geijn, R.A.3
-
6
-
-
17644370328
-
Representing linear algebra algorithms in code: The FLAME application programming interfaces
-
Bientinesi, P., Quintana-Ortí, E. S., and van de Geijn, R. A. 2005b. Representing linear algebra algorithms in code: The FLAME application programming interfaces. ACM Trans. Math. Softw. 31, 1, 27-59.
-
(2005)
ACM Trans. Math. Softw.
, vol.31
, Issue.1
, pp. 27-59
-
-
Bientinesi, P.1
Quintana-Ortí, E.S.2
Van De Geijn, R.A.3
-
8
-
-
34548217713
-
Collective communication: Theory, practice, and experience
-
Chan, E., Heimlich, M., Purkayastha, A., and van de Geijn, R. 2007a. Collective communication: theory, practice, and experience. Concurrency Comput. Pract. Exper. 19, 13, 1749-1783.
-
(2007)
Concurrency Comput. Pract. Exper.
, vol.19
, Issue.13
, pp. 1749-1783
-
-
Chan, E.1
Heimlich, M.2
Purkayastha, A.3
Van De Geijn, R.4
-
9
-
-
35248843628
-
SuperMatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
-
Chan, E., Quintana-Ortí, E., Quintana-Ortí, G., and van de Geijn, R. 2007b. SuperMatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures. In Proceedings of the 19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'07). 116-126.
-
(2007)
Proceedings of the 19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'07)
, pp. 116-126
-
-
Chan, E.1
Quintana-Ortí, E.2
Quintana-Ortí, G.3
Van De Geijn, R.4
-
10
-
-
84875124205
-
The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines
-
University of Tennessee
-
Choi, J., Dongarra, J. J., Ostrouchov, L. S., Petitet, A. P., Walker, D. W., and Whaley, R. C. 1994. The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines. LAPACK Working Note 80 UT-CS-94-246, University of Tennessee.
-
(1994)
LAPACK Working Note 80 UT-CS-94-246
-
-
Choi, J.1
Dongarra, J.J.2
Ostrouchov, L.S.3
Petitet, A.P.4
Walker, D.W.5
Whaley, R.C.6
-
11
-
-
0031221523
-
Parallel implementation of BLAS: General techniques for level 3 BLAS
-
Chtchelkanova, A., Gunnels, J., Morrow, G., Overfelt, J., and van de Geijn, R. A. 1997. Parallel implementation of BLAS: General techniques for level 3 BLAS. Concurrency: Pract. Exper. 9, 9, 837-857.
-
(1997)
Concurrency: Pract. Exper.
, vol.9
, Issue.9
, pp. 837-857
-
-
Chtchelkanova, A.1
Gunnels, J.2
Morrow, G.3
Overfelt, J.4
Van De Geijn, R.A.5
-
12
-
-
0000659575
-
A divide and conquer method for the symmetric tridiagonal eigenvalue problem
-
Cuppen, J. J. M. 1981. A divide and conquer method for the symmetric tridiagonal eigenvalue problem. Numer. Math. 36, 177-195.
-
(1981)
Numer. Math.
, vol.36
, pp. 177-195
-
-
Cuppen, J.J.M.1
-
15
-
-
0026912004
-
Reduction to condensed form on distributed memory architectures
-
Dongarra, J. and van de Geijn, R. 1992. Reduction to condensed form on distributed memory architectures. Parallel Comput. 18, 973-982.
-
(1992)
Parallel Comput
, vol.18
, pp. 973-982
-
-
Dongarra, J.1
Van De Geijn, R.2
-
16
-
-
0000778168
-
Scalability issues affecting the design of a dense linear algebra library
-
Dongarra, J., van de Geijn, R., and Walker, D. 1994. Scalability issues affecting the design of a dense linear algebra library. J. Parallel Distrib. Comput. 22, 3.
-
(1994)
J. Parallel Distrib. Comput.
, vol.22
, pp. 3
-
-
Dongarra, J.1
Van De Geijn, R.2
Walker, D.3
-
17
-
-
0025402476
-
A set of level 3 basic linear algebra subprograms
-
Dongarra, J. J., Du Croz, J., Hammarling, S., and Duff, I. 1990. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1, 1-17.
-
(1990)
ACM Trans. Math. Softw.
, vol.16
, Issue.1
, pp. 1-17
-
-
Dongarra, J.J.1
Du Croz, J.2
Hammarling, S.3
Duff, I.4
-
18
-
-
12444256113
-
Parallelmatrix distributions:Have we been doing it all wrong?
-
University of Texas at Austin
-
Edwards, C., Geng, P., Patra, A., and van de Geijn, R. 1995. Parallelmatrix distributions:Have we been doing it all wrong? Tech. rep. TR-95-40, Department of Computer Sciences, University of Texas at Austin.
-
(1995)
Tech. Rep. TR-95-40, Department of Computer Sciences
-
-
Edwards, C.1
Geng, P.2
Patra, A.3
Van De Geijn, R.4
-
19
-
-
0242351712
-
The generalized eigenvalue problem in quantum chemistry
-
Ford, B. and Hall, G. 1974. The generalized eigenvalue problem in quantum chemistry. Comput. Phys. Commun. 8, 5, 337-348.
-
(1974)
Comput. Phys. Commun.
, vol.8
, Issue.5
, pp. 337-348
-
-
Ford, B.1
Hall, G.2
-
21
-
-
44249094647
-
Anatomy of high-performance matrix multiplication
-
Article 12
-
Goto, K. and van de Geijn, R. A. 2008. Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34, 3: Article 12.
-
(2008)
ACM Trans. Math. Softw.
, vol.34
, Issue.3
-
-
Goto, K.1
Van De Geijn, R.A.2
-
22
-
-
0039435412
-
FLAME: Formal linear algebra methods environment
-
Gunnels, J. A., Gustavson, F. G., Henry, G.M., and van de Geijn, R. A. 2001. FLAME: Formal Linear Algebra Methods Environment. ACM Trans. Math. Softw. 27, 4, 422-455.
-
(2001)
ACM Trans. Math. Softw.
, vol.27
, Issue.4
, pp. 422-455
-
-
Gunnels, J.A.1
Gustavson, F.G.2
Henry, G.M.3
Van De Geijn, R.A.4
-
23
-
-
0032226427
-
Toward an efficient parallel eigensolver for dense symmetric matrices
-
Hendrickson, B., Jessup, E., and Smith, C. 1999. Toward an efficient parallel eigensolver for dense symmetric matrices. SIAM J. Sci. Comput. 20, 3, 1132-1154.
-
(1999)
SIAM J. Sci. Comput.
, vol.20
, Issue.3
, pp. 1132-1154
-
-
Hendrickson, B.1
Jessup, E.2
Smith, C.3
-
24
-
-
0000667923
-
The torus-wrap mapping for dense matrix calculations on massively parallel computers
-
Hendrickson, B. A. and Womble, D. E. 1994. The torus-wrap mapping for dense matrix calculations on massively parallel computers. SIAM J. Sci. Stat. Comput. 15, 5, 1201-1226.
-
(1994)
SIAM J. Sci. Stat. Comput.
, vol.15
, Issue.5
, pp. 1201-1226
-
-
Hendrickson, B.A.1
Womble, D.E.2
-
26
-
-
33746075581
-
Accumulating Householder transformations, revisited
-
Joffrain, T., Low, T. M., Quintana-Ortí, E. S., van de Geijn, R., and Van Zee, F. G. 2006. Accumulating Householder transformations, revisited. ACM Trans. Math. Softw. 32, 2, 169-179.
-
(2006)
ACM Trans. Math. Softw.
, vol.32
, Issue.2
, pp. 169-179
-
-
Joffrain, T.1
Low, T.M.2
Quintana-Ortí, E.S.3
Van De Geijn, R.4
Van Zee, F.G.5
-
27
-
-
0023328834
-
Communication efficient basic linear algebra computations on hypercube architectures
-
Johnsson, S. L. 1987. Communication efficient basic linear algebra computations on hypercube architectures. J. Parallel Distrib. Comput. 4, 133-172.
-
(1987)
J. Parallel Distrib. Comput.
, vol.4
, pp. 133-172
-
-
Johnsson, S.L.1
-
28
-
-
84875167467
-
Mechanizing the expert dense linear algebra developer
-
University of Texas at Austin
-
Marker, B., Terrel, A., Poulson, J., Batory, D., and van de Geijn, R. 2011. Mechanizing the expert dense linear algebra developer. FLAME working note #58 TR-11-18, Department of Computer Sciences, University of Texas at Austin.
-
(2011)
FLAME Working Note #58 TR-11-18, Department of Computer Sciences
-
-
Marker, B.1
Terrel, A.2
Poulson, J.3
Batory, D.4
Van De Geijn, R.5
-
29
-
-
84864646753
-
Programming many-core architectures - A case study: Dense matrix computations on the Intel SCC processor
-
Marker, B., Chan, E., Poulson, J., van de Geijn, R., Van der Wijngaart, R. F., Mattson, T. G., and Kubaska, T. E. 2012. Programming many-core architectures - a case study: Dense matrix computations on the Intel SCC processor. Concurrency Comput. Pract. Exper. 24, 12, 1317-1333.
-
(2012)
Concurrency Comput. Pract. Exper.
, vol.24
, Issue.12
, pp. 1317-1333
-
-
Marker, B.1
Chan, E.2
Poulson, J.3
Van De Geijn, R.4
Van Der Wijngaart, R.F.5
Mattson, T.G.6
Kubaska, T.E.7
-
30
-
-
70350754500
-
Programming the Intel 80-core networkon - A -chip terascale processor
-
IEEE Press
-
Mattson, T. G., Van der Wijngaart, R., and FRUMKIN, M. 2008. Programming the Intel 80-core networkon- a-chip terascale processor. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'08). IEEE Press, 1-11.
-
(2008)
Proceedings of the ACM/IEEE Conference on Supercomputing (SC'08)
, pp. 1-11
-
-
Mattson, T.G.1
Van Der Wijngaart, R.2
Frumkin, M.3
-
31
-
-
84875161681
-
-
Petitet, A., Whaley, R. C., Dongarra, J., and Cleary, A. HPL Algorithm. http://netlib.org/benchmark/hpl/algorithm.html.
-
HPL Algorithm
-
-
Petitet, A.1
Whaley, R.C.2
Dongarra, J.3
Cleary, A.4
-
32
-
-
80052786022
-
Parallel algorithms for reducing the generalized Hermitian-definite eigenvalue problem
-
University of Texas at Austin
-
Poulson, J., van de Geijn, R., and Bennighof, J. 2011. Parallel algorithms for reducing the generalized Hermitian-definite eigenvalue problem. FLAME working note #56. Tech. rep. TR-11-05, Department of Computer Sciences, University of Texas at Austin.
-
(2011)
FLAME Working Note #56. Tech. Rep. TR-11-05, Department of Computer Sciences
-
-
Poulson, J.1
Van De Geijn, R.2
Bennighof, J.3
-
33
-
-
70349755577
-
Programming matrix algorithms-by-blocks for thread-level parallelism
-
Quintana-Ortí, G., Quintana-Ortí, E. S., van de Geijn, R. A., Van Zee, F. G., and Chan, E. 2009. Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. 36, 3, 14:1-14:26.
-
(2009)
ACM Trans. Math. Softw.
, vol.36
, Issue.3
, pp. 141-1426
-
-
Quintana-Ortí, G.1
Quintana-Ortí, E.S.2
Van De Geijn, R.A.3
Van Zee, F.G.4
Chan, E.5
-
34
-
-
84875165249
-
-
Home Page
-
ScaLAPACK 2010. Home Page. http://www.netlib.org/scalapack/scalapack- home.html.
-
(2010)
-
-
-
36
-
-
84875200537
-
Application of a high performance parallel eigensolver to electronic structure calculations
-
IEEE Computer Society
-
Sears, M. P., Stanley, K., and Henry, G. 1998. Application of a high performance parallel eigensolver to electronic structure calculations. In Proceedings of the ACM/IEEE Conference on Supercomputing. IEEE Computer Society, 1-1.
-
(1998)
Proceedings of the ACM/IEEE Conference on Supercomputing
, pp. 1-1
-
-
Sears, M.P.1
Stanley, K.2
Henry, G.3
-
37
-
-
0025521855
-
Communication and matrix computations on large message passing systems
-
Stewart, G. 1990. Communication and matrix computations on large message passing systems. Parallel Comput. 16, 27-40.
-
(1990)
Parallel Comput.
, vol.16
, pp. 27-40
-
-
Stewart, G.1
-
38
-
-
0014797920
-
Incorporating origin shifts into the qr algorithm for symmetric tridiagonal matrices
-
Stewart, G. W. 1970. Incorporating origin shifts into the qr algorithm for symmetric tridiagonal matrices. Comm. ACM 13, 365-367.
-
(1970)
Comm. ACM
, vol.13
, pp. 365-367
-
-
Stewart, G.W.1
-
46
-
-
33847130468
-
A parallel implementation of symmetric band reduction using PLAPACK
-
Mississippi State University
-
Wu, Y.-J. J., Alpatov, P. A., Bischof, C., and van de Geijn, R. A. 1996. A parallel implementation of symmetric band reduction using PLAPACK. In Proceedings of the Scalable Parallel Library Conference, Mississippi State University.
-
(1996)
Proceedings of the Scalable Parallel Library Conference
-
-
Wu, Y.-J.J.1
Alpatov, P.A.2
Bischof, C.3
Van De Geijn, R.A.4
|