-
1
-
-
0003706460
-
-
SIAM: Philadelphia
-
Anderson E, Bai Z, Demmel J, Dongarra JE, DuCroz J, Greenbaum A, Hammarling S, McKenney AE, Ostrouchov S, Sorensen D. LAPACK Users' Guide. SIAM: Philadelphia, 1992.
-
(1992)
LAPACK Users' Guide
-
-
Anderson, E.1
Bai, Z.2
Demmel, J.3
Dongarra, J.E.4
DuCroz, J.5
Greenbaum, A.6
Hammarling, S.7
McKenney, A.E.8
Ostrouchov, S.9
Sorensen, D.10
-
2
-
-
50249105132
-
Parallel tiled QR factorization for multicore architectures
-
Buttari A, Langou J, Kurzak J, Dongarra J. Parallel tiled QR factorization for multicore architectures. Concurrency and Computation: Practice and Experience 2008; 20(13): 1573-1590.
-
(2008)
Concurrency and Computation: Practice and Experience
, vol.20
, Issue.13
, pp. 1573-1590
-
-
Buttari, A.1
Langou, J.2
Kurzak, J.3
Dongarra, J.4
-
3
-
-
58149269099
-
A class of parallel tiled linear algebra algorithms for multicore architectures
-
Buttari A, Langou J, Kurzak J, Dongarra J. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Computing 2009; 35(1): 38-53.
-
(2009)
Parallel Computing
, vol.35
, Issue.1
, pp. 38-53
-
-
Buttari, A.1
Langou, J.2
Kurzak, J.3
Dongarra, J.4
-
4
-
-
35248843628
-
Super matrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
-
San Diego, CA, U.S.A, 9-11 June
-
Chan E, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn R. Super matrix out-of-order scheduling of matrix operations for SMP and multi-core architectures. Proceedings of the Nineteenth ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2007), San Diego, CA, U.S.A., 9-11 June 2007; 116-125.
-
(2007)
Proceedings of the Nineteenth ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2007)
, pp. 116-125
-
-
Chan, E.1
Quintana-Ortí, E.S.2
Quintana-Ortí, G.3
van de Geijn, R.4
-
5
-
-
51049099053
-
Satisfying your dependencies with super matrix
-
September
-
Chan E, Van Zee FG, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn R. Satisfying your dependencies with super matrix. Proceedings of IEEE Cluster Computing 2007, September 2007; 91-99.
-
(2007)
Proceedings of IEEE Cluster Computing 2007
, pp. 91-99
-
-
Chan, E.1
Van Zee, F.G.2
Quintana-Ortí, E.S.3
Quintana-Ortí, G.4
van de Geijn, R.5
-
6
-
-
67650063143
-
Design of scalable dense linear algebra libraries for multithreaded architectures: The LU factorization
-
CD-ROM
-
Quintana-Ortí G, Quintana-Ortí ES, Chan E, van de Geijn R, Van Zee FG. Design of scalable dense linear algebra libraries for multithreaded architectures: the LU factorization. Workshop on Multithreaded Architectures and Applications - MTAAP 2008, 2008. CD-ROM.
-
(2008)
Workshop on Multithreaded Architectures and Applications - MTAAP 2008
-
-
Quintana-Ortí, G.1
Quintana-Ortí, E.S.2
Chan, E.3
van de Geijn, R.4
Van Zee, F.G.5
-
7
-
-
47349122478
-
Scheduling of QR factorization algorithms on SMP and multi-core architectures
-
El Baz FSD, Bourgeois J eds
-
Quintana-Ortí G, Quintana-Ortí ES, Chan E, Van Zee FG, van de Geijn RA. Scheduling of QR factorization algorithms on SMP and multi-core architectures. 16th Euromicro International Conference on Parallel, Distributed and Network-based Processing - PDP 2008, El Baz FSD, Bourgeois J (eds.). 2008; 301-310.
-
(2008)
16th Euromicro International Conference on Parallel, Distributed and Network-based Processing - PDP 2008
, pp. 301-310
-
-
Quintana-Ortí, G.1
Quintana-Ortí, E.S.2
Chan, E.3
Van Zee, F.G.4
van de Geijn, R.A.5
-
8
-
-
67650056933
-
Super matrix: A multithreaded runtime scheduling system for algorithms-by-blocks
-
Chan E, Van Zee FG, Bientinesi P, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn R. Super matrix: A multithreaded runtime scheduling system for algorithms-by-blocks. ACM SIGPLAN 2008 Symposium on Principles and Practices of Parallel Programming (PPoPP'08), 2008; 123-132.
-
(2008)
ACM SIGPLAN 2008 Symposium on Principles and Practices of Parallel Programming (PPoPP'08)
, pp. 123-132
-
-
Chan, E.1
Van Zee, F.G.2
Bientinesi, P.3
Quintana-Ortí, E.S.4
Quintana-Ortí, G.5
van de Geijn, R.6
-
9
-
-
73349130534
-
-
Quintana-Ortí G, Quintana-Ortí ES,Remón A, van de Geijn R. Supermatrix for the factorization of band matrices. FLAME Working Note #27 TR-07-51, The University of Texas at Austin, Department of Computer Sciences, September 2007.
-
Quintana-Ortí G, Quintana-Ortí ES,Remón A, van de Geijn R. Supermatrix for the factorization of band matrices. FLAME Working Note #27 TR-07-51, The University of Texas at Austin, Department of Computer Sciences, September 2007.
-
-
-
-
10
-
-
0030601279
-
Cilk: An efficient multithreaded runtime system
-
Blumofe RD, Joerg CF, Kuszmaul BC, Leiserson CE, Randall KH, Zhou Y. Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing 1996; 37(1): 55-69.
-
(1996)
Journal of Parallel and Distributed Computing
, vol.37
, Issue.1
, pp. 55-69
-
-
Blumofe, R.D.1
Joerg, C.F.2
Kuszmaul, B.C.3
Leiserson, C.E.4
Randall, K.H.5
Zhou, Y.6
-
11
-
-
34548265764
-
CellSs: A programming model for the Cell BE architecture
-
ACM Press: New York, NY, U.S.A
-
Bellens P, Pérez JM, Badia RM, Labarta J. CellSs: A programming model for the Cell BE architecture. SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. ACM Press: New York, NY, U.S.A., 2006; 86.
-
(2006)
SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing
, pp. 86
-
-
Bellens, P.1
Pérez, J.M.2
Badia, R.M.3
Labarta, J.4
-
13
-
-
70350666900
-
A flexible and portable programming model for SMP and multi-cores
-
Technical Report 03/, Barcelona Supercomputing Center, Centro Nacional de Supercomputacion, Barcelona, Spain
-
Pérez JM, Badia RM, Labarta J. A flexible and portable programming model for SMP and multi-cores. Technical Report 03/2007, Barcelona Supercomputing Center - Centro Nacional de Supercomputacion, Barcelona, Spain, 2007.
-
(2007)
, pp. 2007
-
-
Pérez, J.M.1
Badia, R.M.2
Labarta, J.3
-
14
-
-
57949083229
-
-
Pérez JM, Badia RM, Labarta J. A dependency-aware task-based programming environment for multi-core architectures. Proceedings of the 2008 IEEE International Conference on Cluster Computing, Causal Productions (ed.). September 2008; 142-151. IEEE Catalog Number CFP08235-CDR.
-
Pérez JM, Badia RM, Labarta J. A dependency-aware task-based programming environment for multi-core architectures. Proceedings of the 2008 IEEE International Conference on Cluster Computing, Causal Productions (ed.). September 2008; 142-151. IEEE Catalog Number CFP08235-CDR.
-
-
-
-
15
-
-
0004236492
-
-
3rd edn, The Johns Hopkins University Press: Baltimore, MD
-
Golub GH, Van Loan CF. Matrix Computations (3rd edn). The Johns Hopkins University Press: Baltimore, MD, 1996.
-
(1996)
Matrix Computations
-
-
Golub, G.H.1
Van Loan, C.F.2
-
16
-
-
0039435412
-
FLAME: Formal linear algebra methods environment
-
Gunnels JA, Gustavson FG, Henry GM, van de Geijn RA. FLAME: Formal linear algebra methods environment. ACM Transactions on Mathematical Software 2001; 27(4): 422-455.
-
(2001)
ACM Transactions on Mathematical Software
, vol.27
, Issue.4
, pp. 422-455
-
-
Gunnels, J.A.1
Gustavson, F.G.2
Henry, G.M.3
van de Geijn, R.A.4
-
17
-
-
17644412337
-
The science of deriving dense linear algebra algorithms
-
Bientinesi P, Gunnels JA, Myers ME, Quintana-Ortí ES, van de Geijn RA. The science of deriving dense linear algebra algorithms. ACM Transactions on Mathematical Software 2005; 31(1): 1-26.
-
(2005)
ACM Transactions on Mathematical Software
, vol.31
, Issue.1
, pp. 1-26
-
-
Bientinesi, P.1
Gunnels, J.A.2
Myers, M.E.3
Quintana-Ortí, E.S.4
van de Geijn, R.A.5
-
19
-
-
65849272637
-
A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization
-
Technical Report TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia
-
Strazdins P. A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Technical Report TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia, 1998.
-
(1998)
-
-
Strazdins, P.1
-
20
-
-
0032155271
-
GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark
-
Kågström B, Ling P, Loan CV. GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark. ACM Transactions on Mathematical Software 1998; 24(3): 268-302.
-
(1998)
ACM Transactions on Mathematical Software
, vol.24
, Issue.3
, pp. 268-302
-
-
Kågström, B.1
Ling, P.2
Loan, C.V.3
-
21
-
-
0032155342
-
Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues
-
Kågström B, Ling P, Loan CV. Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues. ACM Transactions on Mathematical Software 1998; 24(3): 303-316.
-
(1998)
ACM Transactions on Mathematical Software
, vol.24
, Issue.3
, pp. 303-316
-
-
Kågström, B.1
Ling, P.2
Loan, C.V.3
-
23
-
-
33745328323
-
Rapid development of high-performance out-of-core solvers
-
Proceedings of PARA 2004, Springer: Berlin, Heidelberg
-
Joffrain T, Quintana-Ortí ES, van de Geijn RA. Rapid development of high-performance out-of-core solvers. Proceedings of PARA 2004, Lecture Notes in Computer Science, vol. 3732. Springer: Berlin, Heidelberg, 2005 ; 413-422.
-
(2005)
Lecture Notes in Computer Science
, vol.3732
, pp. 413-422
-
-
Joffrain, T.1
Quintana-Ortí, E.S.2
van de Geijn, R.A.3
-
24
-
-
85121159302
-
-
Quintana-Ortí ES, van de Geijn R. Updating an LU factorization with pivoting. ACM Transactions on Mathematical Software 2008; 35(2): 11: 1-11: 16.
-
Quintana-Ortí ES, van de Geijn R. Updating an LU factorization with pivoting. ACM Transactions on Mathematical Software 2008; 35(2): 11: 1-11: 16.
-
-
-
-
25
-
-
73349124198
-
-
Gustavson FG. New generalized matrix data structures lead to a variety of high-performance algorithms. The Architecture of Scientific Software, Boisvert RF, Tang PTP (eds.), 188 of IFIP Conference Proceedings. Kluwer: Dordrecht, 2000; 211-234.
-
Gustavson FG. New generalized matrix data structures lead to a variety of high-performance algorithms. The Architecture of Scientific Software, Boisvert RF, Tang PTP (eds.), vol. 188 of IFIP Conference Proceedings. Kluwer: Dordrecht, 2000; 211-234.
-
-
-
-
26
-
-
0042235298
-
Tiling, block data layout, and memory hierarchy performance
-
Park N, Hong B, Prasanna VK. Tiling, block data layout, and memory hierarchy performance. IEEE Transactions on Parallel and Distributed Systems 2003; 14(7): 640-654.
-
(2003)
IEEE Transactions on Parallel and Distributed Systems
, vol.14
, Issue.7
, pp. 640-654
-
-
Park, N.1
Hong, B.2
Prasanna, V.K.3
-
28
-
-
47349106165
-
An API for manipulating matrices stored by blocks
-
Technical Report TR-2004-15, Department of Computer Sciences, The University of Texas at Austin, May
-
Low TM, van de Gejin R. An API for manipulating matrices stored by blocks. Technical Report TR-2004-15, Department of Computer Sciences, The University of Texas at Austin, May 2004.
-
(2004)
-
-
Low, T.M.1
van de Gejin, R.2
|