-
1
-
-
69149088136
-
IEEE standard for floating-point arithmetic
-
IEEE standard for floating-point arithmetic, IEEE Std. 754-2008, (2008), pp. 1-58.
-
(2008)
IEEE Std.
, vol.754-2008
, pp. 1-58
-
-
-
2
-
-
0024082546
-
The input/output complexity of sorting and related problems
-
A. Aggarwal and J. S. Vitter, The input/output complexity of sorting and related problems, Commun. ACM, 31 (1988), pp. 1116-1127.
-
(1988)
Commun. ACM
, vol.31
, pp. 1116-1127
-
-
Aggarwal, A.1
Vitter, J.S.2
-
3
-
-
84937408012
-
Automatic generation of block-recursive codes
-
London, UK, Springer-Verlag
-
N. Ahmed and K. Pingali, Automatic generation of block-recursive codes, in Euro-Par '00: Proceedings from the 6th International Euro-Par Conference on Parallel Processing, London, UK, 2000, Springer-Verlag, pp. 368-378.
-
(2000)
Euro-Par '00: Proceedings from the 6th International Euro-par Conference on Parallel Processing
, pp. 368-378
-
-
Ahmed, N.1
Pingali, K.2
-
4
-
-
18044400448
-
A recursive formulation of Cholesky factorization of a matrix in packed storage format
-
B. S. Andersen, F. G. Gustavson, and J. Wasniewski, A recursive formulation of Cholesky factorization of a matrix in packed storage format, ACM Trans. Math. Software, 27 (2001), pp. 214-244.
-
(2001)
ACM Trans. Math. Software
, vol.27
, pp. 214-244
-
-
Andersen, B.S.1
Gustavson, F.G.2
Wasniewski, J.3
-
5
-
-
0003706460
-
-
3rd ed. SIAM, Philadelphia
-
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, LAPACK Users Guide, 3rd ed., SIAM, Philadelphia, 1999; also available from . org/lapack/.
-
(1999)
LAPACK Users Guide
-
-
Anderson, E.1
Bai, Z.2
Bischof, C.3
Demmel, J.4
Dongarra, J.5
Du Croz, J.6
Greenbaum, A.7
Hammarling, S.8
McKenney, A.9
Ostrouchov, S.10
Sorensen, D.11
-
6
-
-
45449120592
-
Hardware-oriented implementation of cache oblivious matrix operations based on space-filling curves
-
Parallel Processing and Applied Mathematics, 7th International Conference, PPAM, Springer-Verlag, New York
-
M. Bader, R. Franz, S. Guenther, and A. Heinecke, Hardware-oriented implementation of cache oblivious matrix operations based on space-filling curves, in Parallel Processing and Applied Mathematics, 7th International Conference, PPAM 2007, Lecture Notes in Comput. Sci. 4967, Springer-Verlag, New York, 2008, pp. 628-638.
-
(2007)
Lecture Notes in Comput. Sci.
, vol.4967
, Issue.2008
, pp. 628-638
-
-
Bader, M.1
Franz, R.2
Guenther, S.3
Heinecke, A.4
-
7
-
-
79251547215
-
Minimizing communication in linear algebra
-
Submitted
-
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, Minimizing communication in linear algebra, SIAM J. Matrix Anal. Appl., submitted; also available at .
-
SIAM J. Matrix Anal. Appl.
-
-
Ballard, G.1
Demmel, J.2
Holtz, O.3
Schwartz, O.4
-
8
-
-
70449623419
-
Communication-optimal parallel and sequential cholesky decomposition
-
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, Communication-optimal parallel and sequential Cholesky decomposition, in SPAA '09: Proceedings of the 21st ACM Symposium on Parallelism in Algorithms and Architectures, 2009, pp. 245-252.
-
(2009)
SPAA '09: Proceedings of the 21st ACM Symposium on Parallelism in Algorithms and Architectures
, pp. 245-252
-
-
Ballard, G.1
Demmel, J.2
Holtz, O.3
Schwartz, O.4
-
9
-
-
77956611313
-
Optimal sparse matrix dense vector multiplication in the I/O-model
-
M. A. Bender, G. S. Brodal, R. Fagerberg, R. Jacob, and E. Vicari, Optimal sparse matrix dense vector multiplication in the I/O-model, Theoret. Comput. Sys., 47 (2010), pp. 934-962.
-
(2010)
Theoret. Comput. Sys.
, vol.47
, pp. 934-962
-
-
Bender, M.A.1
Brodal, G.S.2
Fagerberg, R.3
Jacob, R.4
Vicari, E.5
-
10
-
-
70449440599
-
Out-of-core implementations of Cholesky factorization: Loop-based versus recursive algorithms
-
N. Béreux, Out-of-core implementations of Cholesky factorization: Loop-based versus recursive algorithms, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 1302-1319.
-
(2008)
SIAM J. Matrix Anal. Appl.
, vol.30
, pp. 1302-1319
-
-
Béreux, N.1
-
11
-
-
0003615167
-
-
SIAM, Philadelphia
-
L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley, ScaLAPACK Users' Guide, SIAM, Philadelphia, 1997; also available from .
-
(1997)
ScaLAPACK Users' Guide
-
-
Blackford, L.S.1
Choi, J.2
Cleary, A.3
D'azevedo, E.4
Demmel, J.5
Dhillon, I.6
Dongarra, J.7
Hammarling, S.8
Henry, G.9
Petitet, A.10
Stanley, K.11
Walker, D.12
Whaley, R.C.13
-
12
-
-
33244497406
-
-
New York, ACM
-
R. A. Chowdhury and V. Ramachandran, Cache-oblivious dynamic programming, in SODA '06: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, New York, 2006, ACM, pp. 591-600.
-
(2006)
Cache-oblivious Dynamic Programming, in SODA '06: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms
, pp. 591-600
-
-
Chowdhury, R.A.1
Ramachandran, V.2
-
13
-
-
77953980008
-
Communication-optimal parallel and sequential QR and LU factorizations
-
Technical report EECS- 2008-89 University of California Berkeley, Berkeley, CA. submitted
-
J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Communication-optimal Parallel and Sequential QR and LU Factorizations, Technical report EECS-2008-89, University of California Berkeley, Berkeley, CA, 2008, SIAM. J. Sci. Comput., submitted.
-
(2008)
SIAM. J. Sci. Comput.
-
-
Demmel, J.1
Grigori, L.2
Hoemmen, M.3
Langou, J.4
-
14
-
-
85140867620
-
Implementing communication-optimal parallel and sequential QR and LU factorizations
-
submitted
-
J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Implementing communication-optimal parallel and sequential QR and LU factorizations, SIAM. J. Sci. Comput., submitted.
-
SIAM. J. Sci. Comput.
-
-
Demmel, J.1
Grigori, L.2
Hoemmen, M.3
Langou, J.4
-
16
-
-
1842832833
-
Recursive blocked algorithms and hybrid data structures for dense matrix library software
-
E. Elmroth, F. Gustavson, I. Jonsson, and B. Ka°gström, Recursive blocked algorithms and hybrid data structures for dense matrix library software, SIAM Rev., 46 (2004), pp. 3-45.
-
(2004)
SIAM Rev.
, vol.46
, pp. 3-45
-
-
Elmroth, E.1
Gustavson, F.2
Jonsson, I.3
Kagström, B.4
-
17
-
-
0033350255
-
Cache-oblivious algorithms
-
Washington, DC, IEEE Computer Society
-
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran, Cache-oblivious algorithms, in FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, Washington, DC, 1999, IEEE Computer Society, pp. 285-297.
-
(1999)
FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science
, pp. 285-297
-
-
Frigo, M.1
Leiserson, C.E.2
Prokop, H.3
Ramachandran, S.4
-
18
-
-
79251582362
-
Getting up to speed: The future of supercomputing
-
The National Academies Press, Washington, D.C.
-
S. L. Graham, M. Snir, and C. A. Patterson, eds., Getting up to Speed: The Future of Supercomputing, Report of the National Research Council of the National Academies of Sciences, The National Academies Press, Washington, D.C., 2004; also available online from .
-
(2004)
Report of the National Research Council of the National Academies of Sciences
-
-
Graham, S.L.1
Snir, M.2
Patterson, C.A.3
-
19
-
-
79251581739
-
-
Personal communication
-
L. Grigori. Personal communication, 2009.
-
(2009)
-
-
Grigori., L.1
-
20
-
-
0031273280
-
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
-
F. G. Gustavson, Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM J. Res. Dev., 41 (1997), pp. 737-756.
-
(1997)
IBM J. Res. Dev.
, vol.41
, pp. 737-756
-
-
Gustavson, F.G.1
-
21
-
-
84956987224
-
High performance Cholesky factorization via blocking and recursion that uses minimal storage
-
New Paradigms for HPC in Industry and Academia, London, UK Springer-Verlag
-
F. G. Gustavson and I. Jonsson, High performance Cholesky factorization via blocking and recursion that uses minimal storage, in PARA '00: Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, London, UK, 2001, Springer-Verlag, pp. 82-91.
-
(2001)
PARA '00: Proceedings of the 5th International Workshop on Applied Parallel Computing
, pp. 82-91
-
-
Gustavson, F.G.1
Jonsson, I.2
-
23
-
-
84971853043
-
I/O complexity: The red-blue pebble game
-
New York, ACM
-
J. W. Hong and H. T. Kung, I/O complexity: The red-blue pebble game, in STOC '81: Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, New York, 1981, ACM, pp. 326-333.
-
(1981)
STOC '81: Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing
, pp. 326-333
-
-
Hong, J.W.1
Kung, H.T.2
-
25
-
-
10844258198
-
Communication lower bounds for distributed-memory matrix multiplication
-
D. Irony, S. Toledo, and A. Tiskin, Communication lower bounds for distributed-memory matrix multiplication, J. Parallel Distrib. Comput., 64 (2004), pp. 1017-1026.
-
(2004)
J. Parallel Distrib. Comput.
, vol.64
, pp. 1017-1026
-
-
Irony, D.1
Toledo, S.2
Tiskin, A.3
-
26
-
-
84957579840
-
Extending the Hong-Kung model to memory hierarchies
-
J. E. Savage, Extending the Hong-Kung model to memory hierarchies, in COCOON, 1995, pp. 270-281.
-
(1995)
COCOON
, pp. 270-281
-
-
Savage, J.E.1
-
27
-
-
10044286066
-
Analytical model for analysis of cache behavior during cholesky factorization and its variants
-
Washington, DC, IEEE Computer Society
-
I. Simecek and P. Tvrdik, Analytical model for analysis of cache behavior during Cholesky factorization and its variants, in ICPPW '04: Proceedings of the 2004 International Conference on Parallel Processing Workshops, Washington, DC, 2004, IEEE Computer Society, pp. 190-197.
-
(2004)
ICPPW '04: Proceedings of the 2004 International Conference on Parallel Processing Workshops
, pp. 190-197
-
-
Simecek, I.1
Tvrdik, P.2
-
28
-
-
0031496750
-
Locality of reference in LU decomposition with partial pivoting
-
S. Toledo, Locality of reference in LU decomposition with partial pivoting, SIAM J. Matrix Anal. Appl., 18 (1997), pp. 1065-1081.
-
(1997)
SIAM J. Matrix Anal. Appl.
, vol.18
, pp. 1065-1081
-
-
Toledo, S.1
-
29
-
-
0010020992
-
Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free
-
London, UK, Springer-Verlag
-
D. Wise, Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free, in Euro- Par '00: Proceedings from the 6th International Euro-Par Conference on Parallel Processing, London, UK, 2000, Springer-Verlag, pp. 774-783.
-
(2000)
Euro- Par '00: Proceedings from the 6th International Euro-Par Conference on Parallel Processing
, pp. 774-783
-
-
Wise, D.1
|