-
1
-
-
77954995885
-
Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU
-
June 19-23, ACM, New York 2010
-
V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, P. Dubey: Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU. Proc. 37th Ann. Int. Symposium on Computer Architecture (ISCA'10) Saint-Malo (France), June 19-23, 2010, ACM, New York 2010, 451-460.
-
(2010)
Proc. 37th Ann. Int. Symposium on Computer Architecture (ISCA'10) Saint-Malo (France)
, pp. 451-460
-
-
Lee, V.W.1
Kim, C.2
Chhugani, J.3
Deisher, M.4
Kim, D.5
Nguyen, A.D.6
Satish, N.7
Smelyanskiy, M.8
Chennupaty, S.9
Hammarlund, P.10
Singhal, R.11
Dubey, P.12
-
2
-
-
84857837437
-
-
NVIDIA Corporation, May
-
NVIDIA Corporation, CUDA CUBLAS library, PG-00000-002 V3.1, May 2010, http://developer.download.nvidia.com/compute/cuda/3-1/toolkit/docs/ CUBLAS-Library-3.1.pdf.
-
(2010)
CUDA CUBLAS Library, PG-00000-002 V3.1
-
-
-
3
-
-
33845468997
-
LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware
-
DOI 10.1109/SC.2005.42, Proceedings - Thirteenth International Symposium on Temporal Representation and Reasoning, TIME 2006
-
N. Galoppo, N. K Govindaraju, M. Henson, D. Manocha: LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware. Proceedings ACM/ IEEE SC'05, Conference of Supercomputing, Nov. 12-18, 2005, Seattle (USA), doi: 10.1109/SC.2005.42. (Pubitemid 44902346)
-
(2005)
Proceedings of the ACM/IEEE 2005 Supercomputing Conference, SC'05
, vol.2005
, pp. 1559955
-
-
Galoppo, N.1
Govindaraju, N.K.2
Henson, M.3
Manocha, D.4
-
4
-
-
67650056991
-
LU, QL and Cholesky factorizations using vector capabilities of GPUs
-
Electrical Engineering and Computer Sciences, University of California, Berkeley
-
V. Volkov, J. Demel: LU, QL and Cholesky factorizations using vector capabilities of GPUs. Techn. Rep. UCB/EECS-2008-49, Electrical Engineering and Computer Sciences, University of California, Berkeley, 2008.
-
(2008)
Techn. Rep. UCB/EECS-2008-49
-
-
Volkov, V.1
Demel, J.2
-
5
-
-
70350368872
-
Efficient sparse matrix-vector multiplication on CUDA
-
NVIDIA Corporation
-
N. Bell, M. Garland: Efficient sparse matrix-vector multiplication on CUDA. Techn. Rep. NVR-2008-004, NVIDIA Corporation 2008.
-
(2008)
Techn. Rep. NVR-2008-004
-
-
Bell, N.1
Garland, M.2
-
7
-
-
55849145179
-
Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression
-
Sept. 8-12, Portland (USA)
-
K. Kourtis, G. Goumas, N. Koziris: Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression. Proc. 37th International Conference on Parallel Processing, Sept. 8-12, 2008, Portland (USA), 511-519.
-
(2008)
Proc. 37th International Conference on Parallel Processing
, pp. 511-519
-
-
Kourtis, K.1
Goumas, G.2
Koziris, N.3
-
9
-
-
77957679421
-
Model-driven autotuning of sparse matrix-vector multipy on GPUs
-
Bangalore (India), Jan. 9-14, (R. Govindarajan, D. A. Padua, M. W. Hall, eds.), ACM 2010
-
J. W. Choi, A. Singh, R. Vuduc: Model-driven autotuning of sparse matrix-vector multipy on GPUs. Proc. 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010), Bangalore (India), Jan. 9-14, 2010 (R. Govindarajan, D. A. Padua, M. W. Hall, eds.), ACM 2010, 37-48.
-
(2010)
Proc. 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)
, pp. 37-48
-
-
Choi, J.W.1
Singh, A.2
Vuduc, R.3
-
10
-
-
70350356359
-
Implementing blocked sparse matrix-vector multiplication on NVIDIA GPUs
-
July 20-23, (K. Bertels, N. J. Dimopoulos, C. Silvano, S. Wong, eds.), Springer, Berlin 2009
-
A. Monakov, A. Avetisyan: Implementing blocked sparse matrix-vector multiplication on NVIDIA GPUs. Proc. 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation, Samos (Greece), July 20-23, 2009 (K. Bertels, N. J. Dimopoulos, C. Silvano, S. Wong, eds.) Springer, Berlin 2009, 289-297.
-
(2009)
Proc. 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation, Samos (Greece)
, pp. 289-297
-
-
Monakov, A.1
Avetisyan, A.2
-
11
-
-
0013269731
-
The University of Florida sparse matrix collection
-
T. A. Davis, Y. Hu: The University of Florida sparse matrix collection. NA Digest 92 (42), http://www.cise.ufl.edu/research/sparse/matrices/.
-
NA Digest
, vol.92
, Issue.42
-
-
Davis, T.A.1
Hu, Y.2
-
12
-
-
0004071611
-
-
release 1, Techn Rep., University of Kentucky
-
Z. Bai, D. Day, J. Demmel, J. Dongarra: Test matrix collection (non-Hermitian eigenvalue problems). release 1, Techn. Rep., University of Kentucky, 1996, http://math.nist.gov/MatrixMarket/
-
(1996)
Test Matrix Collection (Non-Hermitian Eigenvalue Problems)
-
-
Bai, Z.1
Day, D.2
Demmel, J.3
Dongarra, J.4
-
13
-
-
79953817719
-
-
NVIDIA Corporation
-
NVIDIA CUDA Programming Guide 3.0. NVIDIA Corporation, 2010, http://developer.download.nvidia.com/compute/cuda/3-0/toolkit/docs/ NVIDIA-CUDA-ProgrammingGuide.pdf.
-
(2010)
NVIDIA CUDA Programming Guide 3.0
-
-
-
14
-
-
77952611196
-
Concurrent number cruncher: A gpu implementation of a general sparse linear solver
-
L. Buatois, G. Caumon, B. Levy: Concurrent number cruncher: a gpu implementation of a general sparse linear solver. Int. J. Parallel Emerg. Distrib. Syst. 24 (2009), 205-223.
-
(2009)
Int. J. Parallel Emerg. Distrib. Syst.
, vol.24
, pp. 205-223
-
-
Buatois, L.1
Caumon, G.2
Levy, B.3
-
16
-
-
84857874326
-
-
Master's thesis, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague
-
J. Vacata: GPGPU: General purpose computation on GPUs. Master's thesis, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, 2008.
-
(2008)
GPGPU: General Purpose Computation on GPUs
-
-
Vacata, J.1
-
17
-
-
77949577730
-
Automatically tuning sparse matrixvector multiplication for GPU architectures
-
Pisa (Italy), Jan. 25-27, (Y. N. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, X. Martorell, eds.), Springer, Berlin 2010
-
A. Monakov, A. Lokhmotov, A. Avetisyan: Automatically tuning sparse matrixvector multiplication for GPU architectures. Proc. 5th International Conferences on High Performance Embedded Architectures and Compilers (HiPEAC 2010), Pisa (Italy), Jan. 25-27, 2010 (Y. N. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, X. Martorell, eds.), Springer, Berlin 2010, 111-125.
-
(2010)
Proc. 5th International Conferences on High Performance Embedded Architectures and Compilers (HiPEAC 2010)
, pp. 111-125
-
-
Monakov, A.1
Lokhmotov, A.2
Avetisyan, A.3
-
18
-
-
84857885283
-
-
Nvidia, Cusp 0.1.1. http://code.google.com/p/cusp-library/, 2010.
-
(2010)
Nvidia, Cusp 0.1.1
-
-
-
19
-
-
34547744862
-
When cache blocking of sparse matrix vector multiply works and why
-
DOI 10.1007/s00200-007-0038-9
-
R. Nishtala, R. W. Vuduc, J. W. Demmel, K. A. Yelick: When cache blocking of sparse matrix vector multiply works and why. Appl. Algebra Eng. Commun. Comput. 18 (2007), 297-311. (Pubitemid 47224626)
-
(2007)
Applicable Algebra in Engineering, Communications and Computing
, vol.18
, Issue.3
, pp. 297-311
-
-
Nishtala, R.1
Vuduc, R.W.2
Demmel, J.W.3
Yelick, K.A.4
|