-
1
-
-
27344435504
-
The design and implementation of a first-generation CELL processor
-
Pham, D., Asano, S., Bolliger, M., Day, M.N., Hofstee, H.P., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Riley, M., Shippy, D., Stasiak, D., Suzuoki, M., Wang, M., Warnock, J., Weitzel, S., Wendel, D., Yamazaki, T., Yazawa, K.: The design and implementation of a first-generation CELL processor. In: IEEE International Solid-State Circuits Conference, pp. 184-185 (2005)
-
(2005)
IEEE International Solid-State Circuits Conference
, pp. 184-185
-
-
Pham, D.1
Asano, S.2
Bolliger, M.3
Day, M.N.4
Hofstee, H.P.5
Johns, C.6
Kahle, J.7
Kameyama, A.8
Keaty, J.9
Masubuchi, Y.10
Riley, M.11
Shippy, D.12
Stasiak, D.13
Suzuoki, M.14
Wang, M.15
Warnock, J.16
Weitzel, S.17
Wendel, D.18
Yamazaki, T.19
Yazawa, K.20
more..
-
2
-
-
45449102400
-
-
Teraflops research chip
-
Teraflops research chip, http://www.intel.com/research/platform/ terascale/teraflops.htm
-
-
-
-
3
-
-
0003706460
-
-
3rd edn. SIAM, Philadelphia
-
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK User's Guide, 3rd edn. SIAM, Philadelphia (1999)
-
(1999)
LAPACK User's Guide
-
-
Anderson, E.1
Bai, Z.2
Bischof, C.3
Blackford, S.4
Demmel, J.5
Dongarra, J.6
Croz, J.D.7
Greenbaum, A.8
Hammarling, S.9
McKenney, A.10
Sorensen, D.11
-
4
-
-
0030564728
-
ScaLAPACK: A portable linear algebra library for distributed memory computers - design issues and performance
-
also as LAPACK Working Note #95
-
Choi, J., Demmel, J., Dhillon, I., Dongarra, J., Ostrouchov, S., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK: A portable linear algebra library for distributed memory computers - design issues and performance. Computer Physics Communications 97, 1-15 (1996), (also as LAPACK Working Note #95)
-
(1996)
Computer Physics Communications
, vol.97
, pp. 1-15
-
-
Choi, J.1
Demmel, J.2
Dhillon, I.3
Dongarra, J.4
Ostrouchov, S.5
Petitet, A.6
Stanley, K.7
Walker, D.8
Whaley, R.C.9
-
5
-
-
35248868578
-
Implementing linear algebra routines on multi-core processors with pipelining and a look ahead
-
Also available as UT-CS-06-581, September
-
Kurzak, J., Dongarra, J.: Implementing linear algebra routines on multi-core processors with pipelining and a look ahead. LAPACK Working Note 178 (September 2006), Also available as UT-CS-06-581
-
(2006)
LAPACK Working Note
, vol.178
-
-
Kurzak, J.1
Dongarra, J.2
-
6
-
-
38049058008
-
-
Buttari, A., Dongarra, J., Kurzak, J., Langou, J., Luszczek, P., Tomov, S.: The impact of multicore on math software. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, 4699, pp. 1-10. Springer, Heidelberg (2007)
-
Buttari, A., Dongarra, J., Kurzak, J., Langou, J., Luszczek, P., Tomov, S.: The impact of multicore on math software. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 1-10. Springer, Heidelberg (2007)
-
-
-
-
7
-
-
35248843628
-
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
-
ACM Press, New York
-
Chan, E., Quintana-Orti, E.S., Quintana-Orti, G., van de Geijn, R.: Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures. In: SPAA 2007: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pp. 116-125. ACM Press, New York (2007)
-
(2007)
SPAA 2007: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
, pp. 116-125
-
-
Chan, E.1
Quintana-Orti, E.S.2
Quintana-Orti, G.3
van de Geijn, R.4
-
8
-
-
1842832833
-
Recursive blocked algorithms and hybrid data structures for dense matrix library software
-
Elmroth, E., Gustavson, F., Jonsson, I., Kågström, B.: Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review 46(1), 3-45 (2004)
-
(2004)
SIAM Review
, vol.46
, Issue.1
, pp. 3-45
-
-
Elmroth, E.1
Gustavson, F.2
Jonsson, I.3
Kågström, B.4
-
9
-
-
38049087210
-
-
Gustavson, F., Karlsson, L., Kågström, B.: Three algorithms for cholesky factorization on distributed memory using packed storage. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, 4699, pp. 550-559. Springer, Heidelberg (2007)
-
Gustavson, F., Karlsson, L., Kågström, B.: Three algorithms for cholesky factorization on distributed memory using packed storage. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 550-559. Springer, Heidelberg (2007)
-
-
-
-
10
-
-
45449118422
-
-
Kurzak, J., Buttari, A., Dongarra, J.: Solving systems of linear equations on the CELL processor using Cholesky factorization. Technical Report UT-CS-07-596, Innovative Computing Laboratory, University of Tennessee Knoxville (April 2007)
-
Kurzak, J., Buttari, A., Dongarra, J.: Solving systems of linear equations on the CELL processor using Cholesky factorization. Technical Report UT-CS-07-596, Innovative Computing Laboratory, University of Tennessee Knoxville (April 2007)
-
-
-
-
11
-
-
0020593101
-
Solving linear algebraic equations on an mimd computer
-
Lord, R.E., Kowalik, J.S., Kumar, S.P.: Solving linear algebraic equations on an mimd computer. J. ACM 30(1), 103-117 (1983)
-
(1983)
J. ACM
, vol.30
, Issue.1
, pp. 103-117
-
-
Lord, R.E.1
Kowalik, J.S.2
Kumar, S.P.3
-
14
-
-
45449117612
-
-
Agarwal, R.C., Gustavson, F.G.: A parallel implementation of matrix multiplication and LU factorization on the IBM 3090. In: Proceedings of the IFIP WG 2.5 Working Group on Aspects of Computation on Asychronous Parallel Processors, Stanford CA, Augest 22-26,1988, North Holland, Amsterdam (1988)
-
Agarwal, R.C., Gustavson, F.G.: A parallel implementation of matrix multiplication and LU factorization on the IBM 3090. In: Proceedings of the IFIP WG 2.5 Working Group on Aspects of Computation on Asychronous Parallel Processors, Stanford CA, Augest 22-26,1988, North Holland, Amsterdam (1988)
-
-
-
-
15
-
-
0034224207
-
Applying recursion to serial and parallel QR factorization leads to better performance
-
Elmroth, E., Gustavson, F.G.: Applying recursion to serial and parallel QR factorization leads to better performance. IBM Journal of Research and Development 44(4), 605 (2000)
-
(2000)
IBM Journal of Research and Development
, vol.44
, Issue.4
, pp. 605
-
-
Elmroth, E.1
Gustavson, F.G.2
-
16
-
-
0004236492
-
-
3rd edn. Johns Hopkins University Press, Baltimore
-
Golub, G., Van Loan, C.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
-
(1996)
Matrix Computations
-
-
Golub, G.1
Van Loan, C.2
-
17
-
-
0004094905
-
-
1st edn, SIAM, Philadelphia
-
Stewart, G.W.: Matrix Algorithms, 1st edn., vol. 1. SIAM, Philadelphia (1998)
-
(1998)
Matrix Algorithms
, vol.1
-
-
Stewart, G.W.1
-
18
-
-
45449092245
-
FORTRAN Subroutines for Out-of-Core Solutions of Large Complex Linear Systems
-
Technical Report CR-159142, NASA November
-
Yip, E.L.: FORTRAN Subroutines for Out-of-Core Solutions of Large Complex Linear Systems. Technical Report CR-159142, NASA (November 1979)
-
(1979)
-
-
Yip, E.L.1
-
19
-
-
45449110534
-
Updating an LU factorization with pivoting
-
Technical Report TR-2006-42, The University of Texas at Austin, Department of Computer Sciences , FLAME Working Note 21
-
Quintana-Orti, E., van de Geijn, R.: Updating an LU factorization with pivoting, Technical Report TR-2006-42, The University of Texas at Austin, Department of Computer Sciences (2006), FLAME Working Note 21
-
(2006)
-
-
Quintana-Orti, E.1
van de Geijn, R.2
-
20
-
-
17644368925
-
Parallel out-of-core computation and updating of the QR factorization
-
Gunter, B.C., van de Geijn, R.A.: Parallel out-of-core computation and updating of the QR factorization. ACM Trans. Math. Softw. 31(1), 60-78 (2005)
-
(2005)
ACM Trans. Math. Softw
, vol.31
, Issue.1
, pp. 60-78
-
-
Gunter, B.C.1
van de Geijn, R.A.2
-
21
-
-
0029358998
-
A parallel algorithm for the reduction of a nonsymmetric matrix to block upper-hessenberg form
-
Berry, M.W., Dongarra, J.J., Kim, Y.: A parallel algorithm for the reduction of a nonsymmetric matrix to block upper-hessenberg form. Parallel Comput. 21(8), 1189-1211 (1995)
-
(1995)
Parallel Comput
, vol.21
, Issue.8
, pp. 1189-1211
-
-
Berry, M.W.1
Dongarra, J.J.2
Kim, Y.3
-
22
-
-
84947583789
-
-
Gustavson, F.G.: New generalized data structures for matrices lead to a variety of high performance algorithms. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2001. LNCS, 2328, pp. 418-436. Springer, Heidelberg (2002)
-
Gustavson, F.G.: New generalized data structures for matrices lead to a variety of high performance algorithms. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2001. LNCS, vol. 2328, pp. 418-436. Springer, Heidelberg (2002)
-
-
-
-
23
-
-
0001951009
-
The WY representation for products of householder matrices
-
Bischof, C., van Loan, C.: The WY representation for products of householder matrices. SIAM J. Sci. Stat. Comput. 8(1), 2-13 (1987)
-
(1987)
SIAM J. Sci. Stat. Comput
, vol.8
, Issue.1
, pp. 2-13
-
-
Bischof, C.1
van Loan, C.2
-
24
-
-
0003078924
-
A storage-efficient WY representation for products of Householder transformations
-
Schreiber, R., van Loan, C.: A storage-efficient WY representation for products of Householder transformations. SIAM J. Sci. Stat. Comput. 10(1), 53-57 (1989)
-
(1989)
SIAM J. Sci. Stat. Comput
, vol.10
, Issue.1
, pp. 53-57
-
-
Schreiber, R.1
van Loan, C.2
-
25
-
-
0001951009
-
The WY representation for products of householder matrices
-
Bischof, C., van Loan, C.: The WY representation for products of householder matrices. SIAM J. Sci. Stat. Comput. 8(1), 2-13 (1987)
-
(1987)
SIAM J. Sci. Stat. Comput
, vol.8
, Issue.1
, pp. 2-13
-
-
Bischof, C.1
van Loan, C.2
-
26
-
-
45449098829
-
-
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: Parallel Tiled QR Factorization for Multicore Architectures. Technical Report UT-CS-07-598, University of Tennessee (2007), LAPACK Working Note 190
-
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: Parallel Tiled QR Factorization for Multicore Architectures. Technical Report UT-CS-07-598, University of Tennessee (2007), LAPACK Working Note 190
-
-
-
|