SCOPUS 정보 검색 플랫폼

ACM Transactions on Mathematical Software

Volumn 39, Issue 3, 2013, Pages

High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures

(3) Ltaief, Hatem a Luszczek, Piotr b Dongarra, Jack b,c,d

a KING ABDULLAH UNIVERSITY OF SCIENCE AND TECHNOLOGY (Saudi Arabia)

b University of Tennessee (United States)

c OAK RIDGE NATIONAL LABORATORY (United States)

d UNIVERSITY OF MANCHESTER (United Kingdom)

Author keywords

Bidiagional reduction; Bulge chasing; Data translation layer; Dynamic scheduling; High performance kernels; Tile algorithms; Two stage approach

Indexed keywords

BANDWIDTH; CACHE MEMORY; COMPUTATIONAL EFFICIENCY; MEMORY ARCHITECTURE; OPEN SOURCE SOFTWARE; OPEN SYSTEMS; SINGULAR VALUE DECOMPOSITION; SOFTWARE ARCHITECTURE;

BULGE CHASING; DATA TRANSLATIONS; DYNAMIC SCHEDULING; HIGH PERFORMANCE KERNELS; TWO STAGE APPROACH;

DATA REDUCTION;

EID: 84877905452 PISSN: 00983500 EISSN: 15577295 Source Type: Journal
DOI: 10.1145/2450153.2450154 Document Type: Article

Times cited : (20)

References (42)

1
- 84877895929
- Autotuned dense QR factorization for multicore architectures
- arXiv:1102.5328
- AGULLO, E., DONGARRA, J., NATH, R., AND TOMOV, S. 2010. Autotuned dense QR factorization for multicore architectures. Tech. rep. RR-7526, Institut National de Recherche en Informatique et en Automatique (INRIA). arXiv:1102.5328.
- (2010) Tech. Rep. RR-7526, Institut National de Recherche en Informatique et en Automatique (INRIA)
- Agullo, E.¹ Dongarra, J.² Nath, R.³ Tomov, S.⁴

2
- 74049090446
- Comparative study of one-sided factorizations with multiple software packages on multi-core hardware
- ACM, New York
- AGULLO, E., HADRI, B., LTAIEF, H., AND DONGARRA, J. 2009. Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09). ACM, New York, 1-12.
- (2009) Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09) , pp. 1-12
- Agullo, E.¹ Hadri, B.² Ltaief, H.³ Dongarra, J.⁴

3
- 0003706460
- 3rd Ed, SIAM, Philadelphia, PA
- ANDERSON, E., BAI, Z., BISCHOF, C., BLACKFORD, S. L., DEMMEL, J. W., DONGARRA, J. J., CROZ, J. D., GREENBAUM, A., HAMMARLING, S., MCKENNEY, A., AND SORENSEN, D. C. 1999. LAPACK User's Guide 3rd Ed, SIAM, Philadelphia, PA.
- (1999) LAPACK User's Guide
- Anderson, E.¹ Bai, Z.² Bischof, C.³ Blackford, S.L.⁴ Demmel, J.W.⁵ Dongarra, J.J.⁶ Croz, J.D.⁷ Greenbaum, A.⁸ Hammarling, S.⁹ Mckenney, A.¹⁰ Sorensen, D.C.¹¹

4
- 0039762126
- Evaluating block algorithm variants in LAPACK
- J. Dongarra et al. Eds., SIAM, Philadelphia, PA
- ANDERSON, E. AND DONGARRA, J. J. 1990. Evaluating block algorithm variants in LAPACK. In Parallel Processing for Scientific Computing, J. Dongarra et al. Eds., SIAM, Philadelphia, PA., 3-8.
- (1990) Parallel Processing for Scientific Computing , pp. 3-8
- Anderson, E.¹ Dongarra, J.J.²

5
- 12444316073
- A new stable bidiagonal reduction algorithm
- BARLOW, J. L., BOSNER, N., AND DRMAČ, Z. 2005. A new stable bidiagonal reduction algorithm. Linear Algebra Appl. 397, 1, 35-84.
- (2005) Linear Algebra Appl. , vol.397 , Issue.1 , pp. 35-84
- Barlow, J.L.¹ Bosner, N.² Drmač, Z.³

6
- 77955109739
- Reduction to condensed forms for symmetric eigenvalue problems on multi-core architectures
- BIENTINESI, P., IGUAL, F., KRESSNER, D., AND QUINTANA-ORT'I, E. 2010. Reduction to condensed forms for symmetric eigenvalue problems on multi-core architectures. Parallel Process. Appl. Math. 6067, 387-395.
- (2010) Parallel Process. Appl. Math. , vol.6067 , pp. 387-395
- Bientinesi, P.¹ Igual, F.² Kressner, D.³ Quintana-Ort'I, E.⁴

7
- 0012881041
- Algorithm 807: The SBR toolbox - Software for successive band reduction
- BISCHOF, C. H., LANG, B., AND SUN, X. 2000. Algorithm 807: The SBR Toolbox - Software for successive band reduction. ACM Trans. Math. Softw. 26, 4, 602-616.
- (2000) ACM Trans. Math. Softw. , vol.26 , Issue.4 , pp. 602-616
- Bischof, C.H.¹ Lang, B.² Sun, X.³

8
- 0003615167
- SIAM, Philadelphia, PA
- BLACKFORD, L. S., CHOI, J., CLEARY, A., D'AZEVEDO, E. F., DEMMEL, J. W., DHILLON, I. S., DONGARRA, J. J., HAMMARLING, S., HENRY, G., PETITET, A., STANLEY, K., WALKER, D. W., AND WHALEY, R. C. 1997. ScaLAPACK Users' Guide. SIAM, Philadelphia, PA.
- (1997) ScaLAPACK Users' Guide
- Blackford, L.S.¹ Choi, J.² Cleary, A.³ D'Azevedo, E.F.⁴ Demmel, J.W.⁵ Dhillon, I.S.⁶ Dongarra, J.J.⁷ Hammarling, S.⁸ Henry, G.⁹ Petitet, A.¹⁰ Stanley, K.¹¹ Walker, D.W.¹² Whaley, R.C.¹³

9
- 83455220868
- Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA
- ACM, New York
- BOSILCA, G., BOUTEILLER, A., DANALIS, A., FAVERGE, M., HAIDAR, A., THOMAS HERAULT, J. K., LANGOU, J., LEMARINIER, P., LTAIEF, H., LUSZCZEK, P., YARKHAN, AND DONGARRA, J. 2011. Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In Proceedings of the 12th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-11). ACM, New York.
- (2011) Proceedings of the 12th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-11)
- Bosilca, G.¹ Bouteiller, A.² Danalis, A.³ Faverge, M.⁴ Haidar, A.⁵ Thomas Herault, J.K.⁶ Langou, J.⁷ Lemarinier, P.⁸ Ltaief, H.⁹ Luszczek, P.¹⁰ Yarkhan¹¹ Dongarra, J.¹²

10
- 48249107440
- Block and parallel versions of one-sided bidiagonalization
- BOSNER, N. AND BARLOW, J. L. 2007. Block and parallel versions of one-sided bidiagonalization. SIAM J. Matrix Anal. Appl. 29, 3, 927-953.
- (2007) SIAM J. Matrix Anal. Appl. , vol.29 , Issue.3 , pp. 927-953
- Bosner, N.¹ Barlow, J.L.²

11
- 38049058008
- The impact of multicore on math software
- B. Kågström, et al. Eds., Lecture Notes in Computer Science Springer, Berlin
- BUTTARI, A., DONGARRA, J., KURZAK, J., LANGOU, J., LUSZCZEK, P., AND TOMOV, S. 2006. The impact of multicore on math software. In Proceedings of the 8th International Workshop on Applied Parallel Computing. State of the Art in Scientific Computing (PARA). B. Kågström, et al. Eds., Lecture Notes in Computer Science, vol. 4699 Springer, Berlin, 1-10.
- (2006) Proceedings of the 8th International Workshop on Applied Parallel Computing. State of the Art in Scientific Computing (PARA) , vol.4699 , pp. 1-10
- Buttari, A.¹ Dongarra, J.² Kurzak, J.³ Langou, J.⁴ Luszczek, P.⁵ Tomov, S.⁶

12
- 50249105132
- Parallel tiled QR factorization for multicore architectures
- http://dx.doi.org/10.1002/cpe.1301.
- BUTTARI, A., LANGOU, J., KURZAK, J., AND DONGARRA, J. J. 2008. Parallel tiled QR factorization for multicore architectures. Concurrency Comput. Pract. Exper. 20, 13, 1573-1590. http://dx.doi.org/10.1002/cpe.1301.
- (2008) Concurrency Comput. Pract. Exper. , vol.20 , Issue.13 , pp. 1573-1590
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.J.⁴

13
- 58149269099
- A class of parallel tiled linear algebra algorithms for multicore architectures
- http://dx.doi.org/10.1016/j.parco.2008.10.002.
- BUTTARI, A., LANGOU, J., KURZAK, J., AND DONGARRA, J. J. 2009. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. Syst. Appl. 35, 38-53. http://dx.doi.org/10.1016/j.parco.2008.10.002.
- (2009) Parallel Comput. Syst. Appl. , vol.35 , pp. 38-53
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.J.⁴

14
- 0030244536
- The design and implementation of the ScaLAPACK LU, QR, and cholesky factorization routines
- CHOI, J., DONGARRA, J. J., OSTROUCHOV, S., PETITET, A., WALKER, D. W., AND WHALEY, R. C. 1996. The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Sci. Program. 5, 173-184.
- (1996) Sci. Program. , vol.5 , pp. 173-184
- Choi, J.¹ Dongarra, J.J.² Ostrouchov, S.³ Petitet, A.⁴ Walker, D.W.⁵ Whaley, R.C.⁶

15
- 0004504649
- Design and evaluation of parallel block algorithms: Lu factorization on an IBM 3090 VF/600J
- SIAM, Philadelphia, PA
- DACKLAND, K., ELMROTH, E., KØAGSTR OM, B., AND LOAN, C. V. 1992. Design and evaluation of parallel block algorithms: Lu factorization on an IBM 3090 VF/600J. In Proceedings of the 5th SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Philadelphia, PA, 3-10.
- (1992) Proceedings of the 5th SIAM Conference on Parallel Processing for Scientific Computing , pp. 3-10
- Dackland, K.¹ Elmroth, E.² Køagstr Om, B.³ Loan, C.V.⁴

16
- 84877883193
- A framework for check-pointed fault-tolerant out-of-core linear algebra
- SIAM, Philadelphia, PA
- D'AZEVEDO, E. AND LUSZCZEK, P. 2003. A framework for check-pointed fault-tolerant out-of-core linear algebra. In Proceedings of the SIAM Conference on Computational Science and Engineering (CSE03). SIAM, Philadelphia, PA.
- (2003) Proceedings of the SIAM Conference on Computational Science and Engineering (CSE03)
- D'Azevedo, E.¹ Luszczek, P.²

17
- 0026238244
- The bidiagonal singular value decomposition and hamiltonian mechanics
- (LAPACK Working Note #11)
- DEIFT, P., DEMMEL, J. W., LI, L.-C., AND TOMEI, C. 1991. The bidiagonal singular value decomposition and Hamiltonian mechanics. SIAM J. Numer. Anal. 28, 5, 1463-1516. (LAPACK Working Note #11).
- (1991) SIAM J. Numer. Anal. , vol.28 , Issue.5 , pp. 1463-1516
- Deift, P.¹ Demmel, J.W.² Li, L.-C.³ Tomei, C.⁴

18
- 0001192187
- Accurate singular values of bidiagonal matrices
- (Also LAPACK Working Note #3)
- DEMMEL, J. W. AND KAHAN, W. 1990. Accurate singular values of bidiagonal matrices. SIAM J. Sci. Stat. Comput. 11, 5, 873-912. (Also LAPACK Working Note #3).
- (1990) SIAM J. Sci. Stat. Comput. , vol.11 , Issue.5 , pp. 873-912
- Demmel, J.W.¹ Kahan, W.²

19
- 78650843047
- Version 2.3. University of Tennessee
- DONGARRA, J. 2010. PLASMA Users' Guide, Parallel Linear Algebra Software for Multicore Architectures, Version 2.3. University of Tennessee.
- (2010) PLASMA Users' Guide, Parallel Linear Algebra Software for Multicore Architectures
- Dongarra, J.¹

20
- 21344496407
- Accurate singular values and differential qd algorithms
- FERNANDO, V. AND PARLETT, B. 1994. Accurate singular values and differential qd algorithms. Numer. Math. 67, 191-229.
- (1994) Numer. Math. , vol.67 , pp. 191-229
- Fernando, V.¹ Parlett, B.²

21
- 33747738463
- Singular value decomposition and least squares solutions
- GOLUB, G. H. AND REINSCH, C. 1970. Singular value decomposition and least squares solutions. Numer. Math. 14, 403-420.
- (1970) Numer. Math. , vol.14 , pp. 403-420
- Golub, G.H.¹ Reinsch, C.²

22
- 0004236492
- 3rd Ed. Johns Hopkins University Press, Baltimore, MD
- GOLUB, G. H. AND VAN LOAN, C. F. 1996. Matrix Computation 3rd Ed. Johns Hopkins University Press, Baltimore, MD.
- (1996) Matrix Computation
- Golub, G.H.¹ Van Loan, C.F.²

23
- 0343090855
- Efficient parallel reduction to bidiagonal form
- GROSSER, B. AND LANG, B. 1999. Efficient parallel reduction to bidiagonal form. Parallel Comput. 25, 8, 969-986.
- (1999) Parallel Comput. , vol.25 , Issue.8 , pp. 969-986
- Grosser, B.¹ Lang, B.²

24
- 1542533583
- A divide-and-conquer algorithm for the bidiagonal SVD
- GU, M. AND EISENSTAT, S. 1995. A divide-and-conquer algorithm for the bidiagonal SVD. SIAM J. Math. Anal. Appl. 16, 79-92.
- (1995) SIAM J. Math. Anal. Appl. , vol.16 , pp. 79-92
- Gu, M.¹ Eisenstat, S.²

25
- 84901913528
- New generalized matrix data structures lead to a variety of high-performance algorithms
- Kluwer Academic, Amsterdam
- GUSTAVSON, F. G. 2000. New generalized matrix data structures lead to a variety of high-performance algorithms. In Proceedings of the IFIP WG 2.5 Working Conference on Software Architectures for Scientific Computing Applications. Kluwer Academic, Amsterdam, 211-234.
- (2000) Proceedings of the IFIP WG 2.5 Working Conference on Software Architectures for Scientific Computing Applications , pp. 211-234
- Gustavson, F.G.¹

26
- 84868568003
- Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurrency and computations: Practice and experience
- University of Tennessee
- HAIDAR, A., LTAIEF, H., YARKHAN, A., AND DONGARRA, J. 2011. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. concurrency and computations: Practice and experience. Tech. rep. UT-CS-11-666, University of Tennessee.
- (2011) Tech. Rep. UT-CS-11-666
- Haidar, A.¹ Ltaief, H.² Yarkhan, A.³ Dongarra, J.⁴

27
- 0004302191
- 5th Ed. Morgan Kaufmann
- HENNESSY, J. L. AND PATTERSON, D. A. 2012. Computer Architecture: A Quantitative Approach 5th Ed. Morgan Kaufmann.
- (2012) Computer Architecture: A Quantitative Approach
- Hennessy, J.L.¹ Patterson, D.A.²

28
- 58149421595
- Analysis of a complex of statistical variables into principal components
- 498-520
- HOTELLING, H. 1933. Analysis of a complex of statistical variables into principal components. J. Edu. Psych. 24, 417-441, 498-520.
- (1933) J. Edu. Psych. , vol.24 , pp. 417-441
- Hotelling, H.¹

29
- 0002467254
- Simplified calculation of principal components
- HOTELLING, H. 1935. Simplified calculation of principal components. Psychometrica 1, 27-35.
- (1935) Psychometrica , vol.1 , pp. 27-35
- Hotelling, H.¹

30
- 0000652188
- Unitary triangularization of a nonsymmetric matrix
- DOI 10.1145/320941.320947
- HOUSEHOLDER, A. S. 1958. Unitary triangularization of a nonsymmetric matrix. J. ACM 5, 4. DOI 10.1145/320941.320947.
- (1958) J. ACM , vol.5 , pp. 4
- Householder, A.S.¹

31
- 21344498628
- A parallel algorithm for computing the singular value decomposition of a matrix
- JESSUP, E. R. AND SORENSEN, D. 1994. A parallel algorithm for computing the singular value decomposition of a matrix. SIAM J. Matrix Anal. Appl. 15, 530-548.
- (1994) SIAM J. Matrix Anal. Appl. , vol.15 , pp. 530-548
- Jessup, E.R.¹ Sorensen, D.²

32
- 80054983967
- Blocked algorithms for the reduction to hessenberg-triangular form revisited
- KÅGSTRÖM, B., KRESSNER, D., QUINTANA-ORTÍ, E., AND QUINTANA-ORTÍ, G. 2008. Blocked algorithms for the reduction to Hessenberg-triangular form revisited. BIT Numer. Math. 48, 563-584.
- (2008) BIT Numer. Math. , vol.48 , pp. 563-584
- Kågström, B.¹ Kressner, D.² Quintana-Ortí, E.³ Quintana-Ortí, G.⁴

33
- 77649275879
- Parallel two-sided matrix reduction to band bidiagonal form on multicore architectures
- LTAIEF, H., KURZAK, J., AND DONGARRA, J. 2010. Parallel two-sided matrix reduction to band bidiagonal form on multicore architectures. IEEE Trans. Parallel Distrib. Syst. 417-423.
- (2010) IEEE Trans. Parallel Distrib. Syst , pp. 417-423
- Ltaief, H.¹ Kurzak, J.² Dongarra, J.³

34
- 80053252490
- Two-stage tridiagonal reduction for dense symmetric matrices using tile algorithms on multicore architectures
- ACM, New York
- LUSZCZEK, P., LTAIEF, H., AND DONGARRA, J. 2011. Two-stage tridiagonal reduction for dense symmetric matrices using tile algorithms on multicore architectures. In Proceedings of IPDPS 2011. ACM, New York.
- (2011) Proceedings of IPDPS 2011
- Luszczek, P.¹ Ltaief, H.² Dongarra, J.³

35
- 84870611591
- MKL Version 10.2
- MKL. 2011. Intel, Math Kernel Library (MKL). http://www.intel.com/ software/products/mkl/. Version 10.2.
- (2011) Intel, Math Kernel Library (MKL)

36
- 0019533482
- Principal component analysis in linear systems: Controllability, observability, and model reduction
- MOORE, B. C. 1981. Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE Trans. Autom. Control AC-26, 1.
- (1981) IEEE Trans. Autom. Control AC-26 , pp. 1
- Moore, B.C.¹

37
- 57949083229
- A dependency-aware task-based programming environment for multi-core architectures
- IEEE, Los Alamitos, CA.
- PEREZ, J., BADIA, R., AND LABARTA, J. 2008. A dependency-aware task-based programming environment for multi-core architectures. In Proceedings of the IEEE International Conference on Cluster Computing. IEEE, Los Alamitos, CA. 142-151.
- (2008) Proceedings of the IEEE International Conference on Cluster Computing , pp. 142-151
- Perez, J.¹ Badia, R.² Labarta, J.³

38
- 84867961757
- One-sided reduction to bidiagonal form
- RUI RALHA
- RUI RALHA. 2003. One-sided reduction to bidiagonal form. Linear Algebra Appl. 358, 219-238.
- (2003) Linear Algebra Appl. , vol.358 , pp. 219-238

39
- 74049123996
- SMPSs Team Version 2.3
- SMPSs Team. 2008. SMP Superscalar (SMPSs) User's Manual. Version 2.3.
- (2008) SMP Superscalar (SMPSs) User's Manual

40
- 0347737736
- The decompositional approach to matrix computation
- STEWART, G. W. 2000. The decompositional approach to matrix computation. Comput. Sci. Eng. 2, 1, 50-59.
- (2000) Comput. Sci. Eng. , vol.2 , Issue.1 , pp. 50-59
- Stewart, G.W.¹

41
- 0003424374
- SIAM, Philadelphia, PA
- TREFETHEN, L. N. AND BAU, D. 1997. Numerical Linear Algebra. SIAM, Philadelphia, PA.
- (1997) Numerical Linear Algebra
- Trefethen, L.N.¹ Bau, D.²

42
- 33646107115
- Automatic blocking of qr and lu factorizations for locality
- ACM, New York
- YI, Q., KENNEDY, K., YOU, H., SEYMOUR, K., AND DONGARRA, J. 2004. Automatic blocking of qr and lu factorizations for locality. In Proceedings of the 2nd ACM SIGPLAN Workshop on Memory System Performance (MSP'04). ACM, New York.
- (2004) Proceedings of the 2nd ACM SIGPLAN Workshop on Memory System Performance (MSP'04)
- Yi, Q.¹ Kennedy, K.² You, H.³ Seymour, K.⁴ Dongarra, J.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.