SCOPUS 정보 검색 플랫폼

Scientific Programming

Volumn 18, Issue 1, 2010, Pages 35-50

Scheduling two-sided transformations using tile algorithms on multicore architectures

(4) Ltaief, Hatem a Kurzak, Jakub a Dongarra, Jack a,b,c Badia, Rosa M d

a UNIVERSITY OF TENNESSEE (United States)

b OAK RIDGE NATIONAL LABORATORY (United States)

c UNIVERSITY OF MANCHESTER (United Kingdom)

d BARCELONA SUPERCOMPUTING CENTER (Spain)

Author keywords

Linear algebra; Matrix factorization; Multicore; Scheduling; Two sided transformations

Indexed keywords

ALGEBRA; COMPUTER ARCHITECTURE; DATA FLOW ANALYSIS; EIGENVALUES AND EIGENFUNCTIONS; FACTORIZATION; LINEAR ALGEBRA; LINEAR TRANSFORMATIONS; MATHEMATICAL TRANSFORMATIONS; MATRIX ALGEBRA; PARALLEL PROCESSING SYSTEMS; PROGRAM PROCESSORS; SCHEDULING; SINGULAR VALUE DECOMPOSITION; SOFTWARE ARCHITECTURE;

BASIC LINEAR ALGEBRA SUBPROGRAMS; HIGH PERFORMANCE COMPUTING; MATRIX FACTORIZATIONS; MULTI CORE; MULTI-CORE PROCESSOR; MULTICORE ARCHITECTURES; SCHEDULER IMPLEMENTATION; THREAD LEVEL PARALLELISM;

MULTICORE PROGRAMMING;

EID: 77951935506 PISSN: 10589244 EISSN: None Source Type: Journal
DOI: 10.3233/SPR-2010-0297 Document Type: Article

Times cited : (6)

References (35)

1
- 33745318358
- A parallel implementation of matrix multiplication and lu factorization on the ibm 3090
- Palo Alto, CA, August
- R. C. Agarwal and F. G. Gustavson, A parallel implementation of matrix multiplication and LU factorization on the IBM 3090, in: Proceedings of the IFIP WG 2.5 Working Conference on Aspects of Computation on Asynchronous Parallel Processors, Palo Alto, CA, August 1988, pp. 217-221.
- (1988) Proceedings of the IFIP WG 2.5 Working Conference on Aspects of Computation on Asynchronous Parallel Processors , pp. 217-221
- Agarwal, R.C.¹ Gustavson, F.G.²

2
- 0024891893
- Vector and parallel algorithms for cholesky factorization on ibm 3090
- Reno, NV, November
- R. C. Agarwal and F. G. Gustavson, Vector and parallel algorithms for Cholesky factorization on IBM 3090, in: Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, Reno, NV, November 1989, pp. 225-233.
- (1989) Proceedings of the 1989 ACM/IEEE Conference on Supercomputing , pp. 225-233
- Agarwal, R.C.¹ Gustavson, F.G.²

3
- 0003706460
- 3rd edn, SIAM, Philadelphia, PA, USA
- E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney and D. Sorensen, LAPACK Users' Guide, 3rd edn, SIAM, Philadelphia, PA, USA, 1999.
- (1999) LAPACK Users' Guide
- Anderson, E.¹ Bai, Z.² Bischof, C.³ Blackford, S.⁴ Demmel, J.⁵ Dongarra, J.⁶ Croz, J.D.⁷ Greenbaum, A.⁸ Hammarling, S.⁹ McKenney, A.¹⁰ Sorensen, D.¹¹

4
- 12444316073
- A new stable bidiagonal reduction algorithm
- DOI 10.1016/j.laa.2004.09.019, PII S0024379504004276
- J. L. Barlow, N. Bosner and Z. Drmač, A new stable bidiagonal reduction algorithm, Linear Algebra Appl. 397 (1) (2005), 35-84. (Pubitemid 40146312)
- (2005) Linear Algebra and Its Applications , vol.397 , Issue.1-3 , pp. 35-84
- Barlow, J.L.¹ Bosner, N.² Drmac, Z.³

5
- 34548265764
- Cellss: A programming model for the cell be architecture
- Tampa, FL, November 11-17
- P. Bellens, J. M. Perez, R. M. Badia and J. Labarta, CellSs: A programming model for the cell BE architecture, in: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, Tampa, FL, November 11-17, 2006, p. 86.
- (2006) Proceedings of the 2006 ACM/IEEE Conference on Supercomputing , pp. 86
- Bellens, P.¹ Perez, J.M.² Badia, R.M.³ Labarta, J.⁴

6
- 48249107440
- Block and parallel versions of one-sided bidiagonalization
- N. Bosner and J. L. Barlow, Block and parallel versions of one-sided bidiagonalization, SIAM J. Matrix Anal. Appl. 29 (3) (2007), 927-953.
- (2007) SIAM J. Matrix Anal. Appl , vol.29 , Issue.3 , pp. 927-953
- Bosner, N.¹ Barlow, J.L.²

7
- 77951890128
- Multithreading for synchronization tolerance in matrix factorization
- Boston, MA, IOP Publishing, June 24-28, 2007. (J. Phys.: Conference Series 78 012-028.)
- A. Buttari, J. J. Dongarra, P. Husbands, J. Kurzak and K. Yelick, Multithreading for synchronization tolerance in matrix factorization, in: Scientific Discovery Through Advanced Computing, SciDAC 2007, Boston, MA, IOP Publishing, June 24-28, 2007. (J. Phys.: Conference Series 78 012-028.)
- (2007) Scientific Discovery Through Advanced Computing, SciDAC
- Buttari, A.¹ Dongarra, J.J.² Husbands, P.³ Kurzak, J.⁴ Yelick, K.⁵

8
- 51049083291
- Parallel tiled qr factorization for multicore architectures
- July 2007
- A. Buttari, J. Langou, J. Kurzak and J. Dongarra, Parallel tiled QR factorization for multicore architectures, LAPACK Working Note 191, July 2007.
- LAPACK Working Note , vol.191
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.⁴

9
- 50249105132
- Parallel tiled qr factorization for multicore architectures
- A. Buttari, J. Langou, J. Kurzak and J. J. Dongarra, Parallel tiled QR factorization for multicore architectures, Concurrency Comput. Pract. Exp. 20 (13) (2008), 1573-1590.
- (2008) Concurrency Comput. Pract. Exp. , vol.20 , Issue.13 , pp. 1573-1590
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.J.⁴

10
- 58149269099
- A class of parallel tiled linear algebra algorithms for multicore architectures
- A. Buttari, J. Langou, J. Kurzak and J. J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parellel Comput. Syst. Appl. 35 (2009), 38-53.
- (2009) Parellel Comput. Syst. Appl , vol.35 , pp. 38-53
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.J.⁴

11
- 0030564728
- ScaLAPACK: A portable linear algebra library for distributed memory computers - Design issues and performance
- PII S0010465596000173
- J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker and R. C. Whaley, ScaLA-PACK, a portable linear algebra library for distributed memory computers-design issues and performance, Comput. Phys. Comm. 97 1, 2 1996, 1-15. (Pubitemid 126387751)
- (1996) Computer Physics Communications , vol.97 , Issue.1-2 , pp. 1-15
- Choi, J.¹ Demmel, J.² Dhillon, I.³ Dongarra, J.⁴ Ostrouchov, S.⁵ Petitet, A.⁶ Stanley, K.⁷ Walker, D.⁸ Whaley, R.C.⁹

12
- 33847379878
- Estimating and correcting global weather model error
- DOI 10.1175/MWR3289.1
- K. E. Danforth, M. Christopher and M. Takemasa, Estimating and correcting global weather model error, Mon. Weather Rev. 135 (2) (2007), 281-299. (Pubitemid 46344360)
- (2007) Monthly Weather Review , vol.135 , Issue.2 , pp. 281-299
- Danforth, C.M.¹ Kalnay, E.² Miyoshi, T.³

13
- 84947936389
- New serial and parallel recursive qr factorization algorithms for smp systems
- Springer-Verlag, Berlin
- E. Elmroth and F. G. Gustavson, New serial and parallel recursive QR factorization algorithms for SMP systems, in: Applied Parallel Computing, Large Scale Scientific and Industrial Problems, 4th International Workshop, PARA, Lecture Notes in Computer Science, Vol. 1541, Springer-Verlag, Berlin, 1998, pp. 120-128.
- (1998) Applied Parallel Computing, Large Scale Scientific and Industrial Problems, 4th International Workshop, PARA, Lecture Notes in Computer Science , vol.1541 , pp. 120-128
- Elmroth, E.¹ Gustavson, F.G.²

14
- 0034224207
- Applying recursion to serial and parallel qr factorization leads to better performance
- E. Elmroth and F. G. Gustavson, Applying recursion to serial and parallel QR factorization leads to better performance, IBM J. Res. Dev. 44 (4) (2000), 605-624.
- (2000) IBM J. Res. Dev. , vol.44 , Issue.4 , pp. 605-624
- Elmroth, E.¹ Gustavson, F.G.²

15
- 84957033906
- High-performance library software for qr factorization
- Springer-Verlag, Berlin/Heidelberg
- E. Elmroth and F. G. Gustavson, High-performance library software for QR factorization, in: Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, 5th International Workshop, PARA, Lecture Notes in Computer Science, Vol. 1947, Springer-Verlag, Berlin/Heidelberg, 2000, pp. 53-63.
- (2000) Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, 5th International Workshop, PARA, Lecture Notes in Computer Science , vol.1947 , pp. 53-63
- Elmroth, E.¹ Gustavson, F.G.²

16
- 1842832833
- Recursive blocked algorithms and hybrid data structures for dense matrix library software
- E. Elmroth, F. G. Gustavson, I. Jonsson and B. Kågström, Recursive blocked algorithms and hybrid data structures for dense matrix library software, SIAM Rev. 46 (1) (2004), 3-45.
- (2004) SIAM Rev , vol.46 , Issue.1 , pp. 3-45
- Elmroth, E.¹ Gustavson, F.G.² Jonsson, I.³ Kågström, B.⁴

17
- 0004236492
- 3rd edn, Johns Hopkins University Press, Baltimore, MD
- G. H. Golub and C. F. van Loan, Matrix Computation, 3rd edn, Johns Hopkins University Press, Baltimore, MD, 1996.
- (1996) Matrix computation
- Golub, G.H.¹ Van Loan, C.F.²

18
- 17644368925
- Parallel out-of-core computation and updating of the QR factorization
- B. C. Gunter and R. A. van de Geijn, Parallel out-of-core computation and updating of the QR factorization, ACM Trans. Math. Software 31 (1) (2005), 60-78.
- (2005) ACM Trans. Math. Software , vol.31 , Issue.1 , pp. 60-78
- Gunter, B.C.¹ Van De, R.A.G.²

19
- 84901913528
- New generalized matrix data structures lead to a variety of high-performance algorithms
- Kluwer Academic, Deventer, The Netherlands
- F. G. Gustavson, New generalized matrix data structures lead to a variety of high-performance algorithms, in: Proceedings of the IFIP WG 2.5 Working Conference on Software Architectures for Scientific Computing Applications, Kluwer Academic, Deventer, The Netherlands, 2000, pp. 211-234.
- (2000) Proceedings of the IFIP WG 2.5 Working Conference on Software Architectures for Scientific Computing Applications , pp. 211-234
- Gustavson, F.G.¹

20
- 38049054439
- Minimal data copy for dense linear algebra factorization
- Springer-Verlag, Berlin/Heidelberg
- F. G. Gustavson, J. A. Gunnels and J. C. Sexton, Minimal data copy for dense linear algebra factorization, in: Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA, Lecture Notes in Computer Science, Vol. 4699, Springer-Verlag, Berlin/Heidelberg, 2006, pp. 540-549.
- (2006) Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA, Lecture Notes in Computer Science , vol.4699 , pp. 540-549
- Gustavson, F.G.¹ Gunnels, J.A.² Sexton, J.C.³

21
- 0033297112
- A parallel algorithm for the reduction to tridiagonal form for eigendecomposition
- M. Hegland, M. Kahn and M. Osborne, A parallel algorithm for the reduction to tridiagonal form for eigendecomposition, SIAM J. Sci. Comput. 21 (3) (1999), 987-1005.
- (1999) SIAM J. Sci. Comput , vol.21 , Issue.3 , pp. 987-1005
- Hegland, M.¹ Kahn, M.² Osborne, M.³

22
- 49349111725
- Solving systems of linear equation on the cell processor using cholesky factorization
- J. Kurzak, A. Buttari and J. J. Dongarra, Solving systems of linear equation on the CELL processor using Cholesky factorization, Trans. Parallel Distrib. Syst. 19 (9) (2008), 1175-1186.
- (2008) Trans. Parallel Distrib. Syst , vol.19 , Issue.9 , pp. 1175-1186
- Kurzak, J.¹ Buttari, A.² Dongarra, J.J.³

23
- 38049005629
- Implementing linear algebra routines on multi-core processors with pipelining and a look ahead
- Springer-Verlag, Berlin, June
- J. Kurzak and J. J. Dongarra, Implementing linear algebra routines on multi-core processors with pipelining and a look ahead, in: Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA, Lecture Notes in Computer Science, Vol. 4699, Springer-Verlag, Berlin, June 2006, pp. 147-156.
- (2006) Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA, Lecture Notes in Computer Science , vol.4699 , pp. 147-156
- Kurzak, J.¹ Dongarra, J.J.²

24
- 74549205359
- Qr factorization for the cell processor
- May
- J. Kurzak and J. Dongarra, QR Factorization for the CELL processor, LAPACK Working Note 201, May 2008.
- (2008) LAPACK Working Note , vol.201
- Kurzak, J.¹ Dongarra, J.²

25
- 84906552600
- Qr factorization for the cell processor
- accepted
- J. Kurzak and J. J. Dongarra, QR factorization for the CELL processor, Scientific Programming, accepted.
- Scientific Programming
- Kurzak, J.¹ Dongarra, J.J.²

26
- 0020593101
- Solving linear algebraic equations on an MIMD computer
- DOI 10.1145/322358.322366
- R. E. Lord, J. S. Kowalik and S. P. Kumar, Solving linear algebraic equations on an MIMD computer, J. ACM 30 (1) (1983), 103-117. (Pubitemid 13504813)
- (1983) Journal of the ACM , vol.30 , Issue.1 , pp. 103-117
- Lord, R.E.¹ Kowalik, J.S.² Kumar, S.P.³

27
- 24644482622
- Analysis of memory hierarchy performance of block data layout
- IEEE Computer Society, Washington, DC
- N. Park, B. Hong and V. K. Prasanna, Analysis of memory hierarchy performance of block data layout, in: Proceedings of the 2002 International Conference on Parallel Processing, ICPP'02, IEEE Computer Society, Washington, DC, 2002, pp. 35-44.
- (2002) Proceedings of the 2002 International Conference on Parallel Processing, ICPP'02 , pp. 35-44
- Park, N.¹ Hong, B.² Prasanna, V.K.³

28
- 0042235298
- Tiling, block data layout, and memory hierarchy performance
- N. Park, B. Hong and V. K. Prasanna, Tiling, block data layout, and memory hierarchy performance, IEEE Trans. Parallel Distrib. Syst. 14 (7) (2003), 640-654.
- (2003) IEEE Trans. Parallel Distrib. Syst , vol.14 , Issue.7 , pp. 640-654
- Park, N.¹ Hong, B.² Prasanna, V.K.³

29
- 57949083229
- A dependency-aware task-based programming environment for multi-core architectures
- Piscataway, NJ
- J. M. PéArez, R. M. Badia and J. Labarta, A dependency-aware task-based programming environment for multi-core architectures, in: CLUSTER, IEEE, Piscataway, NJ, 2008, pp. 142-151.
- (2008) CLUSTER, IEEE , pp. 142-151
- PéArez, J.M.¹ Badia, R.M.² Labarta, J.³

30
- 35649006026
- CellSs: Making it easier to program the cell broadband engine processor
- DOI 10.1147/rd.515.0593
- J. M. Perez, P. Bellens, R. M. Badia and J. Labarta, CellSs: making it easier to program the Cell Broadband Engine processor, IBM J. Res. Dev. 51 (5) (2007), 593-604. (Pubitemid 350031358)
- (2007) IBM Journal of Research and Development , vol.51 , Issue.5 , pp. 593-604
- Perez, J.M.¹ Bellens, P.² Badia, R.M.³ Labarta, J.⁴

31
- 85021253844
- PIRO-BAND: PIpelined ROtations for BAnd Reduction, available at
- PIRO-BAND: PIpelined ROtations for BAnd Reduction, available at: http://www.cise.ufl.edu/˜srajaman/.

32
- 47349122478
- Scheduling of qr factorization algorithms on smp and multi-core architectures
- Los Alamitos, CA
- G. Quintana-OrtíA, E. S. Quintana-OrtíA, E. Chan, R. A. van de Geijn and F. G. van Zee, Scheduling of QR factorization algorithms on SMP and multi-core architectures, in: PDP, IEEE Computer Society, Los Alamitos, CA, 2008, pp. 301-310.
- (2008) PDP, IEEE Computer Society , pp. 301-310
- Quintana-OrtíA, G.¹ Quintana-OrtíA, E.S.² Chan, E.³ Van De, R.A.G.⁴ Van Zee, F.G.⁵

33
- 0003078924
- A storage efficient wy representation for products of householder transformations
- R. Schreiber and C. van Loan, A storage efficient WY representation for products of householder transformations, SIAM J. Sci. Statist. Comput. 10 (1989), 53-57.
- (1989) SIAM J. Sci. Statist. Comput , vol.10 , pp. 53-57
- Schreiber, R.¹ Van Loan, C.²

34
- 85021229732
- SMP Superscalar (SMPSs) User's Manual, Version 2.0, Barcelona Supercomputing Center
- SMP Superscalar (SMPSs) User's Manual, Version 2.0, Barcelona Supercomputing Center, 2008.
- (2008)

35
- 0004554167
- Numerical linear algebra
- Philadelphia, PA
- L. N. Trefethen and D. Bau, Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997.
- (1997) SIAM
- Trefethen, L.N.¹ Bau, D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.