-
1
-
-
33745318358
-
A parallel implementation of matrix multiplication and lu factorization on the ibm 3090
-
Palo Alto, CA, August
-
R. C. Agarwal and F. G. Gustavson, A parallel implementation of matrix multiplication and LU factorization on the IBM 3090, in: Proceedings of the IFIP WG 2.5 Working Conference on Aspects of Computation on Asynchronous Parallel Processors, Palo Alto, CA, August 1988, pp. 217-221.
-
(1988)
Proceedings of the IFIP WG 2.5 Working Conference on Aspects of Computation on Asynchronous Parallel Processors
, pp. 217-221
-
-
Agarwal, R.C.1
Gustavson, F.G.2
-
2
-
-
0024891893
-
Vector and parallel algorithms for cholesky factorization on ibm 3090
-
Reno, NV, November
-
R. C. Agarwal and F. G. Gustavson, Vector and parallel algorithms for Cholesky factorization on IBM 3090, in: Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, Reno, NV, November 1989, pp. 225-233.
-
(1989)
Proceedings of the 1989 ACM/IEEE Conference on Supercomputing
, pp. 225-233
-
-
Agarwal, R.C.1
Gustavson, F.G.2
-
3
-
-
0003706460
-
-
3rd edn, SIAM, Philadelphia, PA, USA
-
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney and D. Sorensen, LAPACK Users' Guide, 3rd edn, SIAM, Philadelphia, PA, USA, 1999.
-
(1999)
LAPACK Users' Guide
-
-
Anderson, E.1
Bai, Z.2
Bischof, C.3
Blackford, S.4
Demmel, J.5
Dongarra, J.6
Croz, J.D.7
Greenbaum, A.8
Hammarling, S.9
McKenney, A.10
Sorensen, D.11
-
4
-
-
12444316073
-
A new stable bidiagonal reduction algorithm
-
DOI 10.1016/j.laa.2004.09.019, PII S0024379504004276
-
J. L. Barlow, N. Bosner and Z. Drmač, A new stable bidiagonal reduction algorithm, Linear Algebra Appl. 397 (1) (2005), 35-84. (Pubitemid 40146312)
-
(2005)
Linear Algebra and Its Applications
, vol.397
, Issue.1-3
, pp. 35-84
-
-
Barlow, J.L.1
Bosner, N.2
Drmac, Z.3
-
5
-
-
34548265764
-
Cellss: A programming model for the cell be architecture
-
Tampa, FL, November 11-17
-
P. Bellens, J. M. Perez, R. M. Badia and J. Labarta, CellSs: A programming model for the cell BE architecture, in: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, Tampa, FL, November 11-17, 2006, p. 86.
-
(2006)
Proceedings of the 2006 ACM/IEEE Conference on Supercomputing
, pp. 86
-
-
Bellens, P.1
Perez, J.M.2
Badia, R.M.3
Labarta, J.4
-
6
-
-
48249107440
-
Block and parallel versions of one-sided bidiagonalization
-
N. Bosner and J. L. Barlow, Block and parallel versions of one-sided bidiagonalization, SIAM J. Matrix Anal. Appl. 29 (3) (2007), 927-953.
-
(2007)
SIAM J. Matrix Anal. Appl
, vol.29
, Issue.3
, pp. 927-953
-
-
Bosner, N.1
Barlow, J.L.2
-
7
-
-
77951890128
-
Multithreading for synchronization tolerance in matrix factorization
-
Boston, MA, IOP Publishing, June 24-28, 2007. (J. Phys.: Conference Series 78 012-028.)
-
A. Buttari, J. J. Dongarra, P. Husbands, J. Kurzak and K. Yelick, Multithreading for synchronization tolerance in matrix factorization, in: Scientific Discovery Through Advanced Computing, SciDAC 2007, Boston, MA, IOP Publishing, June 24-28, 2007. (J. Phys.: Conference Series 78 012-028.)
-
(2007)
Scientific Discovery Through Advanced Computing, SciDAC
-
-
Buttari, A.1
Dongarra, J.J.2
Husbands, P.3
Kurzak, J.4
Yelick, K.5
-
8
-
-
51049083291
-
Parallel tiled qr factorization for multicore architectures
-
July 2007
-
A. Buttari, J. Langou, J. Kurzak and J. Dongarra, Parallel tiled QR factorization for multicore architectures, LAPACK Working Note 191, July 2007.
-
LAPACK Working Note
, vol.191
-
-
Buttari, A.1
Langou, J.2
Kurzak, J.3
Dongarra, J.4
-
9
-
-
50249105132
-
Parallel tiled qr factorization for multicore architectures
-
A. Buttari, J. Langou, J. Kurzak and J. J. Dongarra, Parallel tiled QR factorization for multicore architectures, Concurrency Comput. Pract. Exp. 20 (13) (2008), 1573-1590.
-
(2008)
Concurrency Comput. Pract. Exp.
, vol.20
, Issue.13
, pp. 1573-1590
-
-
Buttari, A.1
Langou, J.2
Kurzak, J.3
Dongarra, J.J.4
-
10
-
-
58149269099
-
A class of parallel tiled linear algebra algorithms for multicore architectures
-
A. Buttari, J. Langou, J. Kurzak and J. J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parellel Comput. Syst. Appl. 35 (2009), 38-53.
-
(2009)
Parellel Comput. Syst. Appl
, vol.35
, pp. 38-53
-
-
Buttari, A.1
Langou, J.2
Kurzak, J.3
Dongarra, J.J.4
-
11
-
-
0030564728
-
ScaLAPACK: A portable linear algebra library for distributed memory computers - Design issues and performance
-
PII S0010465596000173
-
J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker and R. C. Whaley, ScaLA-PACK, a portable linear algebra library for distributed memory computers-design issues and performance, Comput. Phys. Comm. 97 1, 2 1996, 1-15. (Pubitemid 126387751)
-
(1996)
Computer Physics Communications
, vol.97
, Issue.1-2
, pp. 1-15
-
-
Choi, J.1
Demmel, J.2
Dhillon, I.3
Dongarra, J.4
Ostrouchov, S.5
Petitet, A.6
Stanley, K.7
Walker, D.8
Whaley, R.C.9
-
12
-
-
33847379878
-
Estimating and correcting global weather model error
-
DOI 10.1175/MWR3289.1
-
K. E. Danforth, M. Christopher and M. Takemasa, Estimating and correcting global weather model error, Mon. Weather Rev. 135 (2) (2007), 281-299. (Pubitemid 46344360)
-
(2007)
Monthly Weather Review
, vol.135
, Issue.2
, pp. 281-299
-
-
Danforth, C.M.1
Kalnay, E.2
Miyoshi, T.3
-
13
-
-
84947936389
-
New serial and parallel recursive qr factorization algorithms for smp systems
-
Springer-Verlag, Berlin
-
E. Elmroth and F. G. Gustavson, New serial and parallel recursive QR factorization algorithms for SMP systems, in: Applied Parallel Computing, Large Scale Scientific and Industrial Problems, 4th International Workshop, PARA, Lecture Notes in Computer Science, Vol. 1541, Springer-Verlag, Berlin, 1998, pp. 120-128.
-
(1998)
Applied Parallel Computing, Large Scale Scientific and Industrial Problems, 4th International Workshop, PARA, Lecture Notes in Computer Science
, vol.1541
, pp. 120-128
-
-
Elmroth, E.1
Gustavson, F.G.2
-
14
-
-
0034224207
-
Applying recursion to serial and parallel qr factorization leads to better performance
-
E. Elmroth and F. G. Gustavson, Applying recursion to serial and parallel QR factorization leads to better performance, IBM J. Res. Dev. 44 (4) (2000), 605-624.
-
(2000)
IBM J. Res. Dev.
, vol.44
, Issue.4
, pp. 605-624
-
-
Elmroth, E.1
Gustavson, F.G.2
-
15
-
-
84957033906
-
High-performance library software for qr factorization
-
Springer-Verlag, Berlin/Heidelberg
-
E. Elmroth and F. G. Gustavson, High-performance library software for QR factorization, in: Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, 5th International Workshop, PARA, Lecture Notes in Computer Science, Vol. 1947, Springer-Verlag, Berlin/Heidelberg, 2000, pp. 53-63.
-
(2000)
Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, 5th International Workshop, PARA, Lecture Notes in Computer Science
, vol.1947
, pp. 53-63
-
-
Elmroth, E.1
Gustavson, F.G.2
-
16
-
-
1842832833
-
Recursive blocked algorithms and hybrid data structures for dense matrix library software
-
E. Elmroth, F. G. Gustavson, I. Jonsson and B. Kågström, Recursive blocked algorithms and hybrid data structures for dense matrix library software, SIAM Rev. 46 (1) (2004), 3-45.
-
(2004)
SIAM Rev
, vol.46
, Issue.1
, pp. 3-45
-
-
Elmroth, E.1
Gustavson, F.G.2
Jonsson, I.3
Kågström, B.4
-
17
-
-
0004236492
-
-
3rd edn, Johns Hopkins University Press, Baltimore, MD
-
G. H. Golub and C. F. van Loan, Matrix Computation, 3rd edn, Johns Hopkins University Press, Baltimore, MD, 1996.
-
(1996)
Matrix computation
-
-
Golub, G.H.1
Van Loan, C.F.2
-
18
-
-
17644368925
-
Parallel out-of-core computation and updating of the QR factorization
-
B. C. Gunter and R. A. van de Geijn, Parallel out-of-core computation and updating of the QR factorization, ACM Trans. Math. Software 31 (1) (2005), 60-78.
-
(2005)
ACM Trans. Math. Software
, vol.31
, Issue.1
, pp. 60-78
-
-
Gunter, B.C.1
Van De, R.A.G.2
-
19
-
-
84901913528
-
New generalized matrix data structures lead to a variety of high-performance algorithms
-
Kluwer Academic, Deventer, The Netherlands
-
F. G. Gustavson, New generalized matrix data structures lead to a variety of high-performance algorithms, in: Proceedings of the IFIP WG 2.5 Working Conference on Software Architectures for Scientific Computing Applications, Kluwer Academic, Deventer, The Netherlands, 2000, pp. 211-234.
-
(2000)
Proceedings of the IFIP WG 2.5 Working Conference on Software Architectures for Scientific Computing Applications
, pp. 211-234
-
-
Gustavson, F.G.1
-
20
-
-
38049054439
-
Minimal data copy for dense linear algebra factorization
-
Springer-Verlag, Berlin/Heidelberg
-
F. G. Gustavson, J. A. Gunnels and J. C. Sexton, Minimal data copy for dense linear algebra factorization, in: Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA, Lecture Notes in Computer Science, Vol. 4699, Springer-Verlag, Berlin/Heidelberg, 2006, pp. 540-549.
-
(2006)
Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA, Lecture Notes in Computer Science
, vol.4699
, pp. 540-549
-
-
Gustavson, F.G.1
Gunnels, J.A.2
Sexton, J.C.3
-
21
-
-
0033297112
-
A parallel algorithm for the reduction to tridiagonal form for eigendecomposition
-
M. Hegland, M. Kahn and M. Osborne, A parallel algorithm for the reduction to tridiagonal form for eigendecomposition, SIAM J. Sci. Comput. 21 (3) (1999), 987-1005.
-
(1999)
SIAM J. Sci. Comput
, vol.21
, Issue.3
, pp. 987-1005
-
-
Hegland, M.1
Kahn, M.2
Osborne, M.3
-
22
-
-
49349111725
-
Solving systems of linear equation on the cell processor using cholesky factorization
-
J. Kurzak, A. Buttari and J. J. Dongarra, Solving systems of linear equation on the CELL processor using Cholesky factorization, Trans. Parallel Distrib. Syst. 19 (9) (2008), 1175-1186.
-
(2008)
Trans. Parallel Distrib. Syst
, vol.19
, Issue.9
, pp. 1175-1186
-
-
Kurzak, J.1
Buttari, A.2
Dongarra, J.J.3
-
23
-
-
38049005629
-
Implementing linear algebra routines on multi-core processors with pipelining and a look ahead
-
Springer-Verlag, Berlin, June
-
J. Kurzak and J. J. Dongarra, Implementing linear algebra routines on multi-core processors with pipelining and a look ahead, in: Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA, Lecture Notes in Computer Science, Vol. 4699, Springer-Verlag, Berlin, June 2006, pp. 147-156.
-
(2006)
Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA, Lecture Notes in Computer Science
, vol.4699
, pp. 147-156
-
-
Kurzak, J.1
Dongarra, J.J.2
-
24
-
-
74549205359
-
Qr factorization for the cell processor
-
May
-
J. Kurzak and J. Dongarra, QR Factorization for the CELL processor, LAPACK Working Note 201, May 2008.
-
(2008)
LAPACK Working Note
, vol.201
-
-
Kurzak, J.1
Dongarra, J.2
-
26
-
-
0020593101
-
Solving linear algebraic equations on an MIMD computer
-
DOI 10.1145/322358.322366
-
R. E. Lord, J. S. Kowalik and S. P. Kumar, Solving linear algebraic equations on an MIMD computer, J. ACM 30 (1) (1983), 103-117. (Pubitemid 13504813)
-
(1983)
Journal of the ACM
, vol.30
, Issue.1
, pp. 103-117
-
-
Lord, R.E.1
Kowalik, J.S.2
Kumar, S.P.3
-
27
-
-
24644482622
-
Analysis of memory hierarchy performance of block data layout
-
IEEE Computer Society, Washington, DC
-
N. Park, B. Hong and V. K. Prasanna, Analysis of memory hierarchy performance of block data layout, in: Proceedings of the 2002 International Conference on Parallel Processing, ICPP'02, IEEE Computer Society, Washington, DC, 2002, pp. 35-44.
-
(2002)
Proceedings of the 2002 International Conference on Parallel Processing, ICPP'02
, pp. 35-44
-
-
Park, N.1
Hong, B.2
Prasanna, V.K.3
-
28
-
-
0042235298
-
Tiling, block data layout, and memory hierarchy performance
-
N. Park, B. Hong and V. K. Prasanna, Tiling, block data layout, and memory hierarchy performance, IEEE Trans. Parallel Distrib. Syst. 14 (7) (2003), 640-654.
-
(2003)
IEEE Trans. Parallel Distrib. Syst
, vol.14
, Issue.7
, pp. 640-654
-
-
Park, N.1
Hong, B.2
Prasanna, V.K.3
-
29
-
-
57949083229
-
A dependency-aware task-based programming environment for multi-core architectures
-
Piscataway, NJ
-
J. M. PéArez, R. M. Badia and J. Labarta, A dependency-aware task-based programming environment for multi-core architectures, in: CLUSTER, IEEE, Piscataway, NJ, 2008, pp. 142-151.
-
(2008)
CLUSTER, IEEE
, pp. 142-151
-
-
PéArez, J.M.1
Badia, R.M.2
Labarta, J.3
-
30
-
-
35649006026
-
CellSs: Making it easier to program the cell broadband engine processor
-
DOI 10.1147/rd.515.0593
-
J. M. Perez, P. Bellens, R. M. Badia and J. Labarta, CellSs: making it easier to program the Cell Broadband Engine processor, IBM J. Res. Dev. 51 (5) (2007), 593-604. (Pubitemid 350031358)
-
(2007)
IBM Journal of Research and Development
, vol.51
, Issue.5
, pp. 593-604
-
-
Perez, J.M.1
Bellens, P.2
Badia, R.M.3
Labarta, J.4
-
31
-
-
85021253844
-
-
PIRO-BAND: PIpelined ROtations for BAnd Reduction, available at
-
PIRO-BAND: PIpelined ROtations for BAnd Reduction, available at: http://www.cise.ufl.edu/˜srajaman/.
-
-
-
-
32
-
-
47349122478
-
Scheduling of qr factorization algorithms on smp and multi-core architectures
-
Los Alamitos, CA
-
G. Quintana-OrtíA, E. S. Quintana-OrtíA, E. Chan, R. A. van de Geijn and F. G. van Zee, Scheduling of QR factorization algorithms on SMP and multi-core architectures, in: PDP, IEEE Computer Society, Los Alamitos, CA, 2008, pp. 301-310.
-
(2008)
PDP, IEEE Computer Society
, pp. 301-310
-
-
Quintana-OrtíA, G.1
Quintana-OrtíA, E.S.2
Chan, E.3
Van De, R.A.G.4
Van Zee, F.G.5
-
33
-
-
0003078924
-
A storage efficient wy representation for products of householder transformations
-
R. Schreiber and C. van Loan, A storage efficient WY representation for products of householder transformations, SIAM J. Sci. Statist. Comput. 10 (1989), 53-57.
-
(1989)
SIAM J. Sci. Statist. Comput
, vol.10
, pp. 53-57
-
-
Schreiber, R.1
Van Loan, C.2
-
34
-
-
85021229732
-
-
SMP Superscalar (SMPSs) User's Manual, Version 2.0, Barcelona Supercomputing Center
-
SMP Superscalar (SMPSs) User's Manual, Version 2.0, Barcelona Supercomputing Center, 2008.
-
(2008)
-
-
-
35
-
-
0004554167
-
Numerical linear algebra
-
Philadelphia, PA
-
L. N. Trefethen and D. Bau, Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997.
-
(1997)
SIAM
-
-
Trefethen, L.N.1
Bau, D.2
|