-
1
-
-
84864149419
-
LU factorization for accelerator-based systems
-
University of Tennessee
-
E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou, H. Ltaief, and S. Tomov. LU factorization for accelerator-based systems. ICL Technical Report ICL-UT-10-05, Innovative Computing Laboratory, University of Tennessee, 2010.
-
(2010)
ICL Technical Report ICL-UT-10-05, Innovative Computing Laboratory
-
-
Agullo, E.1
Augonnet, C.2
Dongarra, J.3
Faverge, M.4
Langou, J.5
Ltaief, H.6
Tomov, S.7
-
2
-
-
80053251324
-
QR factorization on a multicore node enhanced with multiple GPU accelerators
-
Alaska, USA
-
E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief, S. Thibault, and S. Tomov. QR factorization on a multicore node enhanced with multiple GPU accelerators. In IPDPS 2011, Alaska, USA, 2011.
-
(2011)
IPDPS 2011
-
-
Agullo, E.1
Augonnet, C.2
Dongarra, J.3
Faverge, M.4
Ltaief, H.5
Thibault, S.6
Tomov, S.7
-
3
-
-
84655172176
-
Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators
-
Knoxville, USA
-
E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst, J. Roman, S. Thibault, and S. Tomov. Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators. In Symposium on Application Accelerators in High Performance Computing (SAAHPC), Knoxville, USA, 2010.
-
(2010)
Symposium on Application Accelerators in High Performance Computing (SAAHPC)
-
-
Agullo, E.1
Augonnet, C.2
Dongarra, J.3
Ltaief, H.4
Namyst, R.5
Roman, J.6
Thibault, S.7
Tomov, S.8
-
4
-
-
77953999902
-
PLASMA users' guide
-
E. Agullo, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, J. Langou, H. Ltaief, P. Luszczek, and A. YarKhan. PLASMA Users' Guide. Technical report, ICL, UTK, 2011.
-
(2011)
Technical Report, ICL, UTK
-
-
Agullo, E.1
Dongarra, J.2
Hadri, B.3
Kurzak, J.4
Langou, J.5
Langou, J.6
Ltaief, H.7
Luszczek, P.8
Yarkhan, A.9
-
5
-
-
0003706460
-
-
SIAM
-
E. Anderson, Z. Bai, C. Bischof, L. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. SIAM, 1992.
-
(1992)
LAPACK Users' Guide
-
-
Anderson, E.1
Bai, Z.2
Bischof, C.3
Blackford, L.4
Demmel, J.5
Dongarra, J.6
Croz, J.D.7
Greenbaum, A.8
Hammarling, S.9
McKenney, A.10
Sorensen, D.11
-
6
-
-
78651103346
-
StarPU: A unified platform for task scheduling on heterogeneous multicore architectures
-
Feb.
-
C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. : Pract. Exper., Special Issue: Euro-Par 2009, 23:187-198, Feb. 2011.
-
(2011)
Concurr. Comput. : Pract. Exper., Special Issue: Euro-Par 2009
, vol.23
, pp. 187-198
-
-
Augonnet, C.1
Thibault, S.2
Namyst, R.3
Wacrenier, P.-A.4
-
7
-
-
70350635626
-
An extension of the StarSs programming model for platforms with multiple GPUs
-
Springer-Verlag
-
E. Ayguadé, R. M. Badia, F. D. Igual, J. Labarta, R. Mayo, and E. S. Quintana-Ortí. An extension of the StarSs programming model for platforms with multiple GPUs. In Proceedings of the 15th International Euro-Par Conference on Parallel Processing, Euro-Par '09, pages 851-862. Springer-Verlag, 2009.
-
(2009)
Proceedings of the 15th International Euro-Par Conference on Parallel Processing, Euro-Par '09
, pp. 851-862
-
-
Ayguadé, E.1
Badia, R.M.2
Igual, F.D.3
Labarta, J.4
Mayo, R.5
Quintana-Ortí, E.S.6
-
8
-
-
70449623419
-
Communication-optimal parallel and sequential Cholesky decomposition
-
ACM
-
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Communication-optimal parallel and sequential Cholesky decomposition. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, SPAA '09, pages 245-252. ACM, 2009.
-
(2009)
Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures, SPAA '09
, pp. 245-252
-
-
Ballard, G.1
Demmel, J.2
Holtz, O.3
Schwartz, O.4
-
9
-
-
0035481895
-
A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers)
-
DOI 10.1109/12.956091
-
O. Beaumont, V. Boudet, A. Petitet, F. Rastello, and Y. Robert. A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers). IEEE Transactions on Computers, 50:1052-1070, 2001. (Pubitemid 33048369)
-
(2001)
IEEE Transactions on Computers
, vol.50
, Issue.10
, pp. 1052-1070
-
-
Beaumont, O.1
Boudet, V.2
Petitet, A.3
Rastello, F.4
Robert, Y.5
-
10
-
-
0003615167
-
-
SIAM
-
L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. Whaley. ScaLAPACK Users' Guide. SIAM, 1997.
-
(1997)
ScaLAPACK Users' Guide
-
-
Blackford, L.S.1
Choi, J.2
Cleary, A.3
D'azevedo, E.4
Demmel, J.5
Dhillon, I.6
Dongarra, J.7
Hammarling, S.8
Henry, G.9
Petitet, A.10
Stanley, K.11
Walker, D.12
Whaley, R.13
-
11
-
-
0032648736
-
Static tiling for heterogeneous computing platforms
-
P. Boulet, J. Dongarra, Y. Robert, and F. Vivien. Static tiling for heterogeneous computing platforms. Parallel Computing, 25(5):547-568, 1999.
-
(1999)
Parallel Computing
, vol.25
, Issue.5
, pp. 547-568
-
-
Boulet, P.1
Dongarra, J.2
Robert, Y.3
Vivien, F.4
-
13
-
-
0042674307
-
The LINPACK Benchmark: Past, present, and future
-
J. J. Dongarra, P. Luszczek, and A. Petitet. The LINPACK Benchmark: past, present, and future. Concurrency and Computation: Practice and Experience, 15:803-820, 2003.
-
(2003)
Concurrency and Computation: Practice and Experience
, vol.15
, pp. 803-820
-
-
Dongarra, J.J.1
Luszczek, P.2
Petitet, A.3
-
16
-
-
77953483096
-
CULA: Hybrid GPU accelerated linear algebra routines
-
April
-
J. R. Humphrey, D. K. Price, K. E. Spagnoli, A. L. Paolini, and E. J. Kelmelis. CULA: Hybrid GPU accelerated linear algebra routines. In SPIE Defense and Security Symposium (DSS), April 2010.
-
(2010)
SPIE Defense and Security Symposium (DSS)
-
-
Humphrey, J.R.1
Price, D.K.2
Spagnoli, K.E.3
Paolini, A.L.4
Kelmelis, E.J.5
-
17
-
-
36248980362
-
Data distribution for dense factorization on computers with memory heterogeneity
-
DOI 10.1016/j.parco.2007.06.001, PII S0167819107000762
-
A. Lastovetsky and R. Reddy. Data distribution for dense factorization on computers with memory heterogeneity. Parallel Comput., 33:757-779, December 2007. (Pubitemid 350122765)
-
(2007)
Parallel Computing
, vol.33
, Issue.12
, pp. 757-779
-
-
Lastovetsky, A.1
Reddy, R.2
-
18
-
-
77954725202
-
Overlapping communication and computation by using a hybrid MPI/SMPSs approach
-
ACM
-
V. Marjanović, J. Labarta, E. Ayguadé, and M. Valero. Overlapping communication and computation by using a hybrid MPI/SMPSs approach. In Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, pages 5-16. ACM, 2010.
-
(2010)
Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10
, pp. 5-16
-
-
Marjanović, V.1
Labarta, J.2
Ayguadé, E.3
Valero, M.4
-
19
-
-
84864153899
-
-
CUDA Toolkit 4.0 CUBLAS Library
-
NVIDIA. CUDA Toolkit 4.0 CUBLAS Library, 2011.
-
(2011)
NVIDIA
-
-
-
20
-
-
67650021816
-
Solving dense linear systems on platforms with multiple hardware accelerators
-
ACM
-
G. Quintana-Ortí, F. D. Igual, E. S. Quintana-Ortí, and R. A. van de Geijn. Solving dense linear systems on platforms with multiple hardware accelerators. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '09, pages 121-130. ACM, 2009.
-
(2009)
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '09
, pp. 121-130
-
-
Quintana-Ortí, G.1
Igual, F.D.2
Quintana-Ortí, E.S.3
Van De Geijn, R.A.4
-
21
-
-
82655162782
-
PTask: Operating system abstractions to manage GPUs as compute devices
-
ACM
-
C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 233-248. ACM, 2011.
-
(2011)
Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11
, pp. 233-248
-
-
Rossbach, C.J.1
Currey, J.2
Silberstein, M.3
Ray, B.4
Witchel, E.5
-
23
-
-
84863925917
-
Efficient support for matrix computations on heterogeneous multi-core and multi-GPU architectures
-
June
-
F. Song, S. Tomov, and J. Dongarra. Efficient support for matrix computations on heterogeneous multi-core and multi-GPU architectures. LAPACK Working Note 250, UTK, June 2011.
-
(2011)
LAPACK Working Note 250 UTK
-
-
Song, F.1
Tomov, S.2
Dongarra, J.3
-
24
-
-
84863667764
-
MAGMA users' guide
-
S. Tomov, R. Nath, P. Du, and J. Dongarra. MAGMA Users' Guide. Technical report, ICL, UTK, 2011.
-
(2011)
Technical Report, ICL, UTK
-
-
Tomov, S.1
Nath, R.2
Du, P.3
Dongarra, J.4
-
25
-
-
80052312080
-
Keeneland: Bringing heterogeneous GPU computing to the computational science community
-
sept.-oct.
-
J. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, and S. Yalamanchili. Keeneland: Bringing heterogeneous GPU computing to the computational science community. Computing in Science Engineering, 13(5):90-95, sept.-oct. 2011.
-
(2011)
Computing in Science Engineering
, vol.13
, Issue.5
, pp. 90-95
-
-
Vetter, J.1
Glassbrook, R.2
Dongarra, J.3
Schwan, K.4
Loftis, B.5
McNally, S.6
Meredith, J.7
Rogers, J.8
Roth, P.9
Spafford, K.10
Yalamanchili, S.11
|