SCOPUS 정보 검색 플랫폼

Concurrency and Computation: Practice and Experience

Volumn 21, Issue 18, 2009, Pages 2438-2456

Parallelizing dense and banded linear algebra libraries using SMPSs

(6) Badia, Rosa M a,b Herrero, José R c Labarta, Jesús a Pérez, Josep M a Quintana Ortí, Enrique S d Quintana Ortí, Gregorio d

a BARCELONA SUPERCOMPUTING CENTER (Spain)

b CSIC (Spain)

c UNIVERSITAT POLITÈCNICA DE CATALUNYA (Spain)

d UNIVERSITAT JAUME I (Spain)

Author keywords

Dynamic scheduling; High performance; Linear algebra libraries; Multi core processors; Programmability

Indexed keywords

APPLICATION PROGRAMMING INTERFACES (API); DIGITAL STORAGE; LINEAR ALGEBRA;

DYNAMIC SCHEDULING; HIGH PERFORMANCE; LINEAR ALGEBRA LIBRARIES; MULTI-CORE PROCESSOR; PROGRAMMABILITY;

LIBRARIES;

EID: 73349095700 PISSN: 15320626 EISSN: 15320634 Source Type: Journal
DOI: 10.1002/cpe.1463 Document Type: Article

Times cited : (64)

References (28)

1
- 0003706460
- SIAM: Philadelphia
- Anderson E, Bai Z, Demmel J, Dongarra JE, DuCroz J, Greenbaum A, Hammarling S, McKenney AE, Ostrouchov S, Sorensen D. LAPACK Users' Guide. SIAM: Philadelphia, 1992.
- (1992) LAPACK Users' Guide
- Anderson, E.¹ Bai, Z.² Demmel, J.³ Dongarra, J.E.⁴ DuCroz, J.⁵ Greenbaum, A.⁶ Hammarling, S.⁷ McKenney, A.E.⁸ Ostrouchov, S.⁹ Sorensen, D.¹⁰

2
- 50249105132
- Parallel tiled QR factorization for multicore architectures
- Buttari A, Langou J, Kurzak J, Dongarra J. Parallel tiled QR factorization for multicore architectures. Concurrency and Computation: Practice and Experience 2008; 20(13): 1573-1590.
- (2008) Concurrency and Computation: Practice and Experience , vol.20 , Issue.13 , pp. 1573-1590
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.⁴

3
- 58149269099
- A class of parallel tiled linear algebra algorithms for multicore architectures
- Buttari A, Langou J, Kurzak J, Dongarra J. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Computing 2009; 35(1): 38-53.
- (2009) Parallel Computing , vol.35 , Issue.1 , pp. 38-53
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.⁴

4
- 35248843628
- Super matrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
- San Diego, CA, U.S.A, 9-11 June
- Chan E, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn R. Super matrix out-of-order scheduling of matrix operations for SMP and multi-core architectures. Proceedings of the Nineteenth ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2007), San Diego, CA, U.S.A., 9-11 June 2007; 116-125.
- (2007) Proceedings of the Nineteenth ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2007) , pp. 116-125
- Chan, E.¹ Quintana-Ortí, E.S.² Quintana-Ortí, G.³ van de Geijn, R.⁴

5
- 51049099053
- Satisfying your dependencies with super matrix
- September
- Chan E, Van Zee FG, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn R. Satisfying your dependencies with super matrix. Proceedings of IEEE Cluster Computing 2007, September 2007; 91-99.
- (2007) Proceedings of IEEE Cluster Computing 2007 , pp. 91-99
- Chan, E.¹ Van Zee, F.G.² Quintana-Ortí, E.S.³ Quintana-Ortí, G.⁴ van de Geijn, R.⁵

6
- 67650063143
- Design of scalable dense linear algebra libraries for multithreaded architectures: The LU factorization
- CD-ROM
- Quintana-Ortí G, Quintana-Ortí ES, Chan E, van de Geijn R, Van Zee FG. Design of scalable dense linear algebra libraries for multithreaded architectures: the LU factorization. Workshop on Multithreaded Architectures and Applications - MTAAP 2008, 2008. CD-ROM.
- (2008) Workshop on Multithreaded Architectures and Applications - MTAAP 2008
- Quintana-Ortí, G.¹ Quintana-Ortí, E.S.² Chan, E.³ van de Geijn, R.⁴ Van Zee, F.G.⁵

7
- 47349122478
- Scheduling of QR factorization algorithms on SMP and multi-core architectures
- El Baz FSD, Bourgeois J eds
- Quintana-Ortí G, Quintana-Ortí ES, Chan E, Van Zee FG, van de Geijn RA. Scheduling of QR factorization algorithms on SMP and multi-core architectures. 16th Euromicro International Conference on Parallel, Distributed and Network-based Processing - PDP 2008, El Baz FSD, Bourgeois J (eds.). 2008; 301-310.
- (2008) 16th Euromicro International Conference on Parallel, Distributed and Network-based Processing - PDP 2008 , pp. 301-310
- Quintana-Ortí, G.¹ Quintana-Ortí, E.S.² Chan, E.³ Van Zee, F.G.⁴ van de Geijn, R.A.⁵

8
- 67650056933
- Super matrix: A multithreaded runtime scheduling system for algorithms-by-blocks
- Chan E, Van Zee FG, Bientinesi P, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn R. Super matrix: A multithreaded runtime scheduling system for algorithms-by-blocks. ACM SIGPLAN 2008 Symposium on Principles and Practices of Parallel Programming (PPoPP'08), 2008; 123-132.
- (2008) ACM SIGPLAN 2008 Symposium on Principles and Practices of Parallel Programming (PPoPP'08) , pp. 123-132
- Chan, E.¹ Van Zee, F.G.² Bientinesi, P.³ Quintana-Ortí, E.S.⁴ Quintana-Ortí, G.⁵ van de Geijn, R.⁶

9
- 73349130534
- Quintana-Ortí G, Quintana-Ortí ES,Remón A, van de Geijn R. Supermatrix for the factorization of band matrices. FLAME Working Note #27 TR-07-51, The University of Texas at Austin, Department of Computer Sciences, September 2007.
- Quintana-Ortí G, Quintana-Ortí ES,Remón A, van de Geijn R. Supermatrix for the factorization of band matrices. FLAME Working Note #27 TR-07-51, The University of Texas at Austin, Department of Computer Sciences, September 2007.

10
- 0030601279
- Cilk: An efficient multithreaded runtime system
- Blumofe RD, Joerg CF, Kuszmaul BC, Leiserson CE, Randall KH, Zhou Y. Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing 1996; 37(1): 55-69.
- (1996) Journal of Parallel and Distributed Computing , vol.37 , Issue.1 , pp. 55-69
- Blumofe, R.D.¹ Joerg, C.F.² Kuszmaul, B.C.³ Leiserson, C.E.⁴ Randall, K.H.⁵ Zhou, Y.⁶

11
- 34548265764
- CellSs: A programming model for the Cell BE architecture
- ACM Press: New York, NY, U.S.A
- Bellens P, Pérez JM, Badia RM, Labarta J. CellSs: A programming model for the Cell BE architecture. SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. ACM Press: New York, NY, U.S.A., 2006; 86.
- (2006) SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing , pp. 86
- Bellens, P.¹ Pérez, J.M.² Badia, R.M.³ Labarta, J.⁴

12
- 35649006026
- CellSs programming the Cell/B.E. made easier
- Pérez JM, Bellens P, Badia RM, Labarta J. CellSs programming the Cell/B.E. made easier. IBM Journal of Research and Development 2007; 51(5).
- (2007) IBM Journal of Research and Development , vol.51 , Issue.5
- Pérez, J.M.¹ Bellens, P.² Badia, R.M.³ Labarta, J.⁴

13
- 70350666900
- A flexible and portable programming model for SMP and multi-cores
- Technical Report 03/, Barcelona Supercomputing Center, Centro Nacional de Supercomputacion, Barcelona, Spain
- Pérez JM, Badia RM, Labarta J. A flexible and portable programming model for SMP and multi-cores. Technical Report 03/2007, Barcelona Supercomputing Center - Centro Nacional de Supercomputacion, Barcelona, Spain, 2007.
- (2007) , pp. 2007
- Pérez, J.M.¹ Badia, R.M.² Labarta, J.³

14
- 57949083229
- Pérez JM, Badia RM, Labarta J. A dependency-aware task-based programming environment for multi-core architectures. Proceedings of the 2008 IEEE International Conference on Cluster Computing, Causal Productions (ed.). September 2008; 142-151. IEEE Catalog Number CFP08235-CDR.
- Pérez JM, Badia RM, Labarta J. A dependency-aware task-based programming environment for multi-core architectures. Proceedings of the 2008 IEEE International Conference on Cluster Computing, Causal Productions (ed.). September 2008; 142-151. IEEE Catalog Number CFP08235-CDR.

15
- 0004236492
- 3rd edn, The Johns Hopkins University Press: Baltimore, MD
- Golub GH, Van Loan CF. Matrix Computations (3rd edn). The Johns Hopkins University Press: Baltimore, MD, 1996.
- (1996) Matrix Computations
- Golub, G.H.¹ Van Loan, C.F.²

16
- 0039435412
- FLAME: Formal linear algebra methods environment
- Gunnels JA, Gustavson FG, Henry GM, van de Geijn RA. FLAME: Formal linear algebra methods environment. ACM Transactions on Mathematical Software 2001; 27(4): 422-455.
- (2001) ACM Transactions on Mathematical Software , vol.27 , Issue.4 , pp. 422-455
- Gunnels, J.A.¹ Gustavson, F.G.² Henry, G.M.³ van de Geijn, R.A.⁴

17
- 17644412337
- The science of deriving dense linear algebra algorithms
- Bientinesi P, Gunnels JA, Myers ME, Quintana-Ortí ES, van de Geijn RA. The science of deriving dense linear algebra algorithms. ACM Transactions on Mathematical Software 2005; 31(1): 1-26.
- (2005) ACM Transactions on Mathematical Software , vol.31 , Issue.1 , pp. 1-26
- Bientinesi, P.¹ Gunnels, J.A.² Myers, M.E.³ Quintana-Ortí, E.S.⁴ van de Geijn, R.A.⁵

18
- 17644370328
- Representing linear algebra algorithms in code: The FLAME application programming interfaces
- Bientinesi P, Quintana-Ortí ES, van de Geijn RA. Representing linear algebra algorithms in code: The FLAME application programming interfaces. ACM Transactions on Mathematical Software 2005; 31(1): 27-59.
- (2005) ACM Transactions on Mathematical Software , vol.31 , Issue.1 , pp. 27-59
- Bientinesi, P.¹ Quintana-Ortí, E.S.² van de Geijn, R.A.³

19
- 65849272637
- A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization
- Technical Report TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia
- Strazdins P. A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Technical Report TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia, 1998.
- (1998)
- Strazdins, P.¹

20
- 0032155271
- GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark
- Kågström B, Ling P, Loan CV. GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark. ACM Transactions on Mathematical Software 1998; 24(3): 268-302.
- (1998) ACM Transactions on Mathematical Software , vol.24 , Issue.3 , pp. 268-302
- Kågström, B.¹ Ling, P.² Loan, C.V.³

21
- 0032155342
- Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues
- Kågström B, Ling P, Loan CV. Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues. ACM Transactions on Mathematical Software 1998; 24(3): 303-316.
- (1998) ACM Transactions on Mathematical Software , vol.24 , Issue.3 , pp. 303-316
- Kågström, B.¹ Ling, P.² Loan, C.V.³

22
- 17644368925
- Parallel out-of-core computation and updating the QR factorization
- Gunter BC, van de Geijn RA. Parallel out-of-core computation and updating the QR factorization. ACM Transactions on Mathematical Software 2005; 31(1): 60-78.
- (2005) ACM Transactions on Mathematical Software , vol.31 , Issue.1 , pp. 60-78
- Gunter, B.C.¹ van de Geijn, R.A.²

23
- 33745328323
- Rapid development of high-performance out-of-core solvers
- Proceedings of PARA 2004, Springer: Berlin, Heidelberg
- Joffrain T, Quintana-Ortí ES, van de Geijn RA. Rapid development of high-performance out-of-core solvers. Proceedings of PARA 2004, Lecture Notes in Computer Science, vol. 3732. Springer: Berlin, Heidelberg, 2005 ; 413-422.
- (2005) Lecture Notes in Computer Science , vol.3732 , pp. 413-422
- Joffrain, T.¹ Quintana-Ortí, E.S.² van de Geijn, R.A.³

24
- 85121159302
- Quintana-Ortí ES, van de Geijn R. Updating an LU factorization with pivoting. ACM Transactions on Mathematical Software 2008; 35(2): 11: 1-11: 16.
- Quintana-Ortí ES, van de Geijn R. Updating an LU factorization with pivoting. ACM Transactions on Mathematical Software 2008; 35(2): 11: 1-11: 16.

25
- 73349124198
- Gustavson FG. New generalized matrix data structures lead to a variety of high-performance algorithms. The Architecture of Scientific Software, Boisvert RF, Tang PTP (eds.), 188 of IFIP Conference Proceedings. Kluwer: Dordrecht, 2000; 211-234.
- Gustavson FG. New generalized matrix data structures lead to a variety of high-performance algorithms. The Architecture of Scientific Software, Boisvert RF, Tang PTP (eds.), vol. 188 of IFIP Conference Proceedings. Kluwer: Dordrecht, 2000; 211-234.

26
- 0042235298
- Tiling, block data layout, and memory hierarchy performance
- Park N, Hong B, Prasanna VK. Tiling, block data layout, and memory hierarchy performance. IEEE Transactions on Parallel and Distributed Systems 2003; 14(7): 640-654.
- (2003) IEEE Transactions on Parallel and Distributed Systems , vol.14 , Issue.7 , pp. 640-654
- Park, N.¹ Hong, B.² Prasanna, V.K.³

27
- 35248873682
- PhD Thesis, Polytechnic University of Catalonia, Spain
- Herrero JR. A framework for efficient execution of matrix computations. PhD Thesis, Polytechnic University of Catalonia, Spain, 2006.
- (2006) A framework for efficient execution of matrix computations
- Herrero, J.R.¹

28
- 47349106165
- An API for manipulating matrices stored by blocks
- Technical Report TR-2004-15, Department of Computer Sciences, The University of Texas at Austin, May
- Low TM, van de Gejin R. An API for manipulating matrices stored by blocks. Technical Report TR-2004-15, Department of Computer Sciences, The University of Texas at Austin, May 2004.
- (2004)
- Low, T.M.¹ van de Gejin, R.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.