-
1
-
-
84877895929
-
Autotuned dense QR factorization for multicore architectures
-
arXiv:1102.5328
-
AGULLO, E., DONGARRA, J., NATH, R., AND TOMOV, S. 2010. Autotuned dense QR factorization for multicore architectures. Tech. rep. RR-7526, Institut National de Recherche en Informatique et en Automatique (INRIA). arXiv:1102.5328.
-
(2010)
Tech. Rep. RR-7526, Institut National de Recherche en Informatique et en Automatique (INRIA)
-
-
Agullo, E.1
Dongarra, J.2
Nath, R.3
Tomov, S.4
-
2
-
-
74049090446
-
Comparative study of one-sided factorizations with multiple software packages on multi-core hardware
-
ACM, New York
-
AGULLO, E., HADRI, B., LTAIEF, H., AND DONGARRA, J. 2009. Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09). ACM, New York, 1-12.
-
(2009)
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09)
, pp. 1-12
-
-
Agullo, E.1
Hadri, B.2
Ltaief, H.3
Dongarra, J.4
-
3
-
-
0003706460
-
-
3rd Ed, SIAM, Philadelphia, PA
-
ANDERSON, E., BAI, Z., BISCHOF, C., BLACKFORD, S. L., DEMMEL, J. W., DONGARRA, J. J., CROZ, J. D., GREENBAUM, A., HAMMARLING, S., MCKENNEY, A., AND SORENSEN, D. C. 1999. LAPACK User's Guide 3rd Ed, SIAM, Philadelphia, PA.
-
(1999)
LAPACK User's Guide
-
-
Anderson, E.1
Bai, Z.2
Bischof, C.3
Blackford, S.L.4
Demmel, J.W.5
Dongarra, J.J.6
Croz, J.D.7
Greenbaum, A.8
Hammarling, S.9
Mckenney, A.10
Sorensen, D.C.11
-
4
-
-
0039762126
-
Evaluating block algorithm variants in LAPACK
-
J. Dongarra et al. Eds., SIAM, Philadelphia, PA
-
ANDERSON, E. AND DONGARRA, J. J. 1990. Evaluating block algorithm variants in LAPACK. In Parallel Processing for Scientific Computing, J. Dongarra et al. Eds., SIAM, Philadelphia, PA., 3-8.
-
(1990)
Parallel Processing for Scientific Computing
, pp. 3-8
-
-
Anderson, E.1
Dongarra, J.J.2
-
5
-
-
12444316073
-
A new stable bidiagonal reduction algorithm
-
BARLOW, J. L., BOSNER, N., AND DRMAČ, Z. 2005. A new stable bidiagonal reduction algorithm. Linear Algebra Appl. 397, 1, 35-84.
-
(2005)
Linear Algebra Appl.
, vol.397
, Issue.1
, pp. 35-84
-
-
Barlow, J.L.1
Bosner, N.2
Drmač, Z.3
-
6
-
-
77955109739
-
Reduction to condensed forms for symmetric eigenvalue problems on multi-core architectures
-
BIENTINESI, P., IGUAL, F., KRESSNER, D., AND QUINTANA-ORT'I, E. 2010. Reduction to condensed forms for symmetric eigenvalue problems on multi-core architectures. Parallel Process. Appl. Math. 6067, 387-395.
-
(2010)
Parallel Process. Appl. Math.
, vol.6067
, pp. 387-395
-
-
Bientinesi, P.1
Igual, F.2
Kressner, D.3
Quintana-Ort'I, E.4
-
7
-
-
0012881041
-
Algorithm 807: The SBR toolbox - Software for successive band reduction
-
BISCHOF, C. H., LANG, B., AND SUN, X. 2000. Algorithm 807: The SBR Toolbox - Software for successive band reduction. ACM Trans. Math. Softw. 26, 4, 602-616.
-
(2000)
ACM Trans. Math. Softw.
, vol.26
, Issue.4
, pp. 602-616
-
-
Bischof, C.H.1
Lang, B.2
Sun, X.3
-
8
-
-
0003615167
-
-
SIAM, Philadelphia, PA
-
BLACKFORD, L. S., CHOI, J., CLEARY, A., D'AZEVEDO, E. F., DEMMEL, J. W., DHILLON, I. S., DONGARRA, J. J., HAMMARLING, S., HENRY, G., PETITET, A., STANLEY, K., WALKER, D. W., AND WHALEY, R. C. 1997. ScaLAPACK Users' Guide. SIAM, Philadelphia, PA.
-
(1997)
ScaLAPACK Users' Guide
-
-
Blackford, L.S.1
Choi, J.2
Cleary, A.3
D'Azevedo, E.F.4
Demmel, J.W.5
Dhillon, I.S.6
Dongarra, J.J.7
Hammarling, S.8
Henry, G.9
Petitet, A.10
Stanley, K.11
Walker, D.W.12
Whaley, R.C.13
-
9
-
-
83455220868
-
Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA
-
ACM, New York
-
BOSILCA, G., BOUTEILLER, A., DANALIS, A., FAVERGE, M., HAIDAR, A., THOMAS HERAULT, J. K., LANGOU, J., LEMARINIER, P., LTAIEF, H., LUSZCZEK, P., YARKHAN, AND DONGARRA, J. 2011. Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In Proceedings of the 12th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-11). ACM, New York.
-
(2011)
Proceedings of the 12th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-11)
-
-
Bosilca, G.1
Bouteiller, A.2
Danalis, A.3
Faverge, M.4
Haidar, A.5
Thomas Herault, J.K.6
Langou, J.7
Lemarinier, P.8
Ltaief, H.9
Luszczek, P.10
Yarkhan11
Dongarra, J.12
-
10
-
-
48249107440
-
Block and parallel versions of one-sided bidiagonalization
-
BOSNER, N. AND BARLOW, J. L. 2007. Block and parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl. 29, 3, 927-953.
-
(2007)
SIAM J. Matrix Anal. Appl.
, vol.29
, Issue.3
, pp. 927-953
-
-
Bosner, N.1
Barlow, J.L.2
-
11
-
-
38049058008
-
The impact of multicore on math software
-
B. Kågström, et al. Eds., Lecture Notes in Computer Science Springer, Berlin
-
BUTTARI, A., DONGARRA, J., KURZAK, J., LANGOU, J., LUSZCZEK, P., AND TOMOV, S. 2006. The impact of multicore on math software. In Proceedings of the 8th International Workshop on Applied Parallel Computing. State of the Art in Scientific Computing (PARA). B. Kågström, et al. Eds., Lecture Notes in Computer Science, vol. 4699 Springer, Berlin, 1-10.
-
(2006)
Proceedings of the 8th International Workshop on Applied Parallel Computing. State of the Art in Scientific Computing (PARA)
, vol.4699
, pp. 1-10
-
-
Buttari, A.1
Dongarra, J.2
Kurzak, J.3
Langou, J.4
Luszczek, P.5
Tomov, S.6
-
12
-
-
50249105132
-
Parallel tiled QR factorization for multicore architectures
-
http://dx.doi.org/10.1002/cpe.1301.
-
BUTTARI, A., LANGOU, J., KURZAK, J., AND DONGARRA, J. J. 2008. Parallel tiled QR factorization for multicore architectures. Concurrency Comput. Pract. Exper. 20, 13, 1573-1590. http://dx.doi.org/10.1002/cpe.1301.
-
(2008)
Concurrency Comput. Pract. Exper.
, vol.20
, Issue.13
, pp. 1573-1590
-
-
Buttari, A.1
Langou, J.2
Kurzak, J.3
Dongarra, J.J.4
-
13
-
-
58149269099
-
A class of parallel tiled linear algebra algorithms for multicore architectures
-
http://dx.doi.org/10.1016/j.parco.2008.10.002.
-
BUTTARI, A., LANGOU, J., KURZAK, J., AND DONGARRA, J. J. 2009. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. Syst. Appl. 35, 38-53. http://dx.doi.org/10.1016/j.parco.2008.10.002.
-
(2009)
Parallel Comput. Syst. Appl.
, vol.35
, pp. 38-53
-
-
Buttari, A.1
Langou, J.2
Kurzak, J.3
Dongarra, J.J.4
-
14
-
-
0030244536
-
The design and implementation of the ScaLAPACK LU, QR, and cholesky factorization routines
-
CHOI, J., DONGARRA, J. J., OSTROUCHOV, S., PETITET, A., WALKER, D. W., AND WHALEY, R. C. 1996. The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Sci. Program. 5, 173-184.
-
(1996)
Sci. Program.
, vol.5
, pp. 173-184
-
-
Choi, J.1
Dongarra, J.J.2
Ostrouchov, S.3
Petitet, A.4
Walker, D.W.5
Whaley, R.C.6
-
15
-
-
0004504649
-
Design and evaluation of parallel block algorithms: Lu factorization on an IBM 3090 VF/600J
-
SIAM, Philadelphia, PA
-
DACKLAND, K., ELMROTH, E., KØAGSTR OM, B., AND LOAN, C. V. 1992. Design and evaluation of parallel block algorithms: Lu factorization on an IBM 3090 VF/600J. In Proceedings of the 5th SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Philadelphia, PA, 3-10.
-
(1992)
Proceedings of the 5th SIAM Conference on Parallel Processing for Scientific Computing
, pp. 3-10
-
-
Dackland, K.1
Elmroth, E.2
Køagstr Om, B.3
Loan, C.V.4
-
17
-
-
0026238244
-
The bidiagonal singular value decomposition and hamiltonian mechanics
-
(LAPACK Working Note #11)
-
DEIFT, P., DEMMEL, J. W., LI, L.-C., AND TOMEI, C. 1991. The bidiagonal singular value decomposition and Hamiltonian mechanics. SIAM J. Numer. Anal. 28, 5, 1463-1516. (LAPACK Working Note #11).
-
(1991)
SIAM J. Numer. Anal.
, vol.28
, Issue.5
, pp. 1463-1516
-
-
Deift, P.1
Demmel, J.W.2
Li, L.-C.3
Tomei, C.4
-
18
-
-
0001192187
-
Accurate singular values of bidiagonal matrices
-
(Also LAPACK Working Note #3)
-
DEMMEL, J. W. AND KAHAN, W. 1990. Accurate singular values of bidiagonal matrices. SIAM J. Sci. Stat. Comput. 11, 5, 873-912. (Also LAPACK Working Note #3).
-
(1990)
SIAM J. Sci. Stat. Comput.
, vol.11
, Issue.5
, pp. 873-912
-
-
Demmel, J.W.1
Kahan, W.2
-
20
-
-
21344496407
-
Accurate singular values and differential qd algorithms
-
FERNANDO, V. AND PARLETT, B. 1994. Accurate singular values and differential qd algorithms. Numer. Math. 67, 191-229.
-
(1994)
Numer. Math.
, vol.67
, pp. 191-229
-
-
Fernando, V.1
Parlett, B.2
-
21
-
-
33747738463
-
Singular value decomposition and least squares solutions
-
GOLUB, G. H. AND REINSCH, C. 1970. Singular value decomposition and least squares solutions. Numer. Math. 14, 403-420.
-
(1970)
Numer. Math.
, vol.14
, pp. 403-420
-
-
Golub, G.H.1
Reinsch, C.2
-
22
-
-
0004236492
-
-
3rd Ed. Johns Hopkins University Press, Baltimore, MD
-
GOLUB, G. H. AND VAN LOAN, C. F. 1996. Matrix Computation 3rd Ed. Johns Hopkins University Press, Baltimore, MD.
-
(1996)
Matrix Computation
-
-
Golub, G.H.1
Van Loan, C.F.2
-
23
-
-
0343090855
-
Efficient parallel reduction to bidiagonal form
-
GROSSER, B. AND LANG, B. 1999. Efficient parallel reduction to bidiagonal form. Parallel Comput. 25, 8, 969-986.
-
(1999)
Parallel Comput.
, vol.25
, Issue.8
, pp. 969-986
-
-
Grosser, B.1
Lang, B.2
-
24
-
-
1542533583
-
A divide-and-conquer algorithm for the bidiagonal SVD
-
GU, M. AND EISENSTAT, S. 1995. A divide-and-conquer algorithm for the bidiagonal SVD. SIAM J. Math. Anal. Appl. 16, 79-92.
-
(1995)
SIAM J. Math. Anal. Appl.
, vol.16
, pp. 79-92
-
-
Gu, M.1
Eisenstat, S.2
-
26
-
-
84868568003
-
Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurrency and computations: Practice and experience
-
University of Tennessee
-
HAIDAR, A., LTAIEF, H., YARKHAN, A., AND DONGARRA, J. 2011. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. concurrency and computations: Practice and experience. Tech. rep. UT-CS-11-666, University of Tennessee.
-
(2011)
Tech. Rep. UT-CS-11-666
-
-
Haidar, A.1
Ltaief, H.2
Yarkhan, A.3
Dongarra, J.4
-
28
-
-
58149421595
-
Analysis of a complex of statistical variables into principal components
-
498-520
-
HOTELLING, H. 1933. Analysis of a complex of statistical variables into principal components. J. Edu. Psych. 24, 417-441, 498-520.
-
(1933)
J. Edu. Psych.
, vol.24
, pp. 417-441
-
-
Hotelling, H.1
-
29
-
-
0002467254
-
Simplified calculation of principal components
-
HOTELLING, H. 1935. Simplified calculation of principal components. Psychometrica 1, 27-35.
-
(1935)
Psychometrica
, vol.1
, pp. 27-35
-
-
Hotelling, H.1
-
30
-
-
0000652188
-
Unitary triangularization of a nonsymmetric matrix
-
DOI 10.1145/320941.320947
-
HOUSEHOLDER, A. S. 1958. Unitary triangularization of a nonsymmetric matrix. J. ACM 5, 4. DOI 10.1145/320941.320947.
-
(1958)
J. ACM
, vol.5
, pp. 4
-
-
Householder, A.S.1
-
31
-
-
21344498628
-
A parallel algorithm for computing the singular value decomposition of a matrix
-
JESSUP, E. R. AND SORENSEN, D. 1994. A parallel algorithm for computing the singular value decomposition of a matrix. SIAM J. Matrix Anal. Appl. 15, 530-548.
-
(1994)
SIAM J. Matrix Anal. Appl.
, vol.15
, pp. 530-548
-
-
Jessup, E.R.1
Sorensen, D.2
-
32
-
-
80054983967
-
Blocked algorithms for the reduction to hessenberg-triangular form revisited
-
KÅGSTRÖM, B., KRESSNER, D., QUINTANA-ORTÍ, E., AND QUINTANA-ORTÍ, G. 2008. Blocked algorithms for the reduction to Hessenberg-triangular form revisited. BIT Numer. Math. 48, 563-584.
-
(2008)
BIT Numer. Math.
, vol.48
, pp. 563-584
-
-
Kågström, B.1
Kressner, D.2
Quintana-Ortí, E.3
Quintana-Ortí, G.4
-
33
-
-
77649275879
-
Parallel two-sided matrix reduction to band bidiagonal form on multicore architectures
-
LTAIEF, H., KURZAK, J., AND DONGARRA, J. 2010. Parallel two-sided matrix reduction to band bidiagonal form on multicore architectures. IEEE Trans. Parallel Distrib. Syst. 417-423.
-
(2010)
IEEE Trans. Parallel Distrib. Syst
, pp. 417-423
-
-
Ltaief, H.1
Kurzak, J.2
Dongarra, J.3
-
34
-
-
80053252490
-
Two-stage tridiagonal reduction for dense symmetric matrices using tile algorithms on multicore architectures
-
ACM, New York
-
LUSZCZEK, P., LTAIEF, H., AND DONGARRA, J. 2011. Two-stage tridiagonal reduction for dense symmetric matrices using tile algorithms on multicore architectures. In Proceedings of IPDPS 2011. ACM, New York.
-
(2011)
Proceedings of IPDPS 2011
-
-
Luszczek, P.1
Ltaief, H.2
Dongarra, J.3
-
35
-
-
84870611591
-
-
MKL Version 10.2
-
MKL. 2011. Intel, Math Kernel Library (MKL). http://www.intel.com/ software/products/mkl/. Version 10.2.
-
(2011)
Intel, Math Kernel Library (MKL)
-
-
-
36
-
-
0019533482
-
Principal component analysis in linear systems: Controllability, observability, and model reduction
-
MOORE, B. C. 1981. Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE Trans. Autom. Control AC-26, 1.
-
(1981)
IEEE Trans. Autom. Control AC-26
, pp. 1
-
-
Moore, B.C.1
-
37
-
-
57949083229
-
A dependency-aware task-based programming environment for multi-core architectures
-
IEEE, Los Alamitos, CA.
-
PEREZ, J., BADIA, R., AND LABARTA, J. 2008. A dependency-aware task-based programming environment for multi-core architectures. In Proceedings of the IEEE International Conference on Cluster Computing. IEEE, Los Alamitos, CA. 142-151.
-
(2008)
Proceedings of the IEEE International Conference on Cluster Computing
, pp. 142-151
-
-
Perez, J.1
Badia, R.2
Labarta, J.3
-
38
-
-
84867961757
-
One-sided reduction to bidiagonal form
-
RUI RALHA
-
RUI RALHA. 2003. One-sided reduction to bidiagonal form. Linear Algebra Appl. 358, 219-238.
-
(2003)
Linear Algebra Appl.
, vol.358
, pp. 219-238
-
-
-
40
-
-
0347737736
-
The decompositional approach to matrix computation
-
STEWART, G. W. 2000. The decompositional approach to matrix computation. Comput. Sci. Eng. 2, 1, 50-59.
-
(2000)
Comput. Sci. Eng.
, vol.2
, Issue.1
, pp. 50-59
-
-
Stewart, G.W.1
-
42
-
-
33646107115
-
Automatic blocking of qr and lu factorizations for locality
-
ACM, New York
-
YI, Q., KENNEDY, K., YOU, H., SEYMOUR, K., AND DONGARRA, J. 2004. Automatic blocking of qr and lu factorizations for locality. In Proceedings of the 2nd ACM SIGPLAN Workshop on Memory System Performance (MSP'04). ACM, New York.
-
(2004)
Proceedings of the 2nd ACM SIGPLAN Workshop on Memory System Performance (MSP'04)
-
-
Yi, Q.1
Kennedy, K.2
You, H.3
Seymour, K.4
Dongarra, J.5
|