-
1
-
-
74049143158
-
Implementing sparse matrix-vector multiplication on throughput-oriented processors
-
N. Bell and M. Garland, "Implementing sparse matrix-vector multiplication on throughput-oriented processors," in SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, New York, NY, USA, 2009, pp. 1-11.
-
SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, New York, NY, USA, 2009
, pp. 1-11
-
-
Bell, N.1
Garland, M.2
-
3
-
-
0242533311
-
Sparse matrix solvers on the GPU: Conjugate gradients and multigrid
-
J. Bolz, I. Farmer, E. Grinspun, and P. Schroder, "Sparse matrix solvers on the GPU: conjugate gradients and multigrid," ACM Trans. Graph., vol. 22, no. 3, pp. 917-924, 2003.
-
(2003)
ACM Trans. Graph.
, vol.22
, Issue.3
, pp. 917-924
-
-
Bolz, J.1
Farmer, I.2
Grinspun, E.3
Schroder, P.4
-
5
-
-
60649099576
-
Optimizing matrix multiplication for a short-vector simd architecture-cell processor
-
J. Kurzak, W. Alvaro, and J. Dongarra, "Optimizing matrix multiplication for a short-vector simd architecture-cell processor," Parallel Comput., vol. 35, no. 3, pp. 138-150, 2009.
-
(2009)
Parallel Comput.
, vol.35
, Issue.3
, pp. 138-150
-
-
Kurzak, J.1
Alvaro, W.2
Dongarra, J.3
-
6
-
-
20744452904
-
Self-adapting linear algebra algorithms and software
-
J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. C. W. R. Vuduc, and K. Yelick, "Self-adapting linear algebra algorithms and software," Proceeding of IEEE, vol. 93, no. 2, pp. 293-312, 2005.
-
(2005)
Proceeding of IEEE
, vol.93
, Issue.2
, pp. 293-312
-
-
Demmel, J.1
Dongarra, J.2
Eijkhout, V.3
Fuentes, E.4
Petitet, A.5
Vuduc, R.C.W.R.6
Yelick, K.7
-
7
-
-
1542501019
-
Sparsity: Optimization framework for sparse matrix kernels
-
E.-J. Im, K. Yelick, and R. Vuduc, "Sparsity: Optimization framework for sparse matrix kernels," Int. J. High Perform. Comput. Appl., vol. 18, no. 1, pp. 135-158, 2004.
-
(2004)
Int. J. High Perform. Comput. Appl.
, vol.18
, Issue.1
, pp. 135-158
-
-
Im, E.-J.1
Yelick, K.2
Vuduc, R.3
-
8
-
-
72849129747
-
-
Research Report RC24704, IBM TJ Watson Research Center, Tech. Rep., december
-
M. M. Baskaran and R. Bordawekar, "Optimizing sparse matrix-vector multiplication on GPUs using compile-time and run-time strategies," Research Report RC24704, IBM TJ Watson Research Center, Tech. Rep., december 2008.
-
(2008)
Optimizing Sparse Matrix-vector Multiplication on GPUs Using Compile-time and Run-time Strategies
-
-
Baskaran, M.M.1
Bordawekar, R.2
-
9
-
-
79952428965
-
Auto-tuning CUDA parameters for sparse matrixvector multiplication on GPUs
-
Proceedings of the 2010 International Conference on Computational and Information Sciences, ser. IEEE Computer Society
-
P. Guo and L. Wang, "Auto-tuning CUDA parameters for sparse matrixvector multiplication on GPUs," in Proceedings of the 2010 International Conference on Computational and Information Sciences, ser. ICCIS '10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1154-1157.
-
(2010)
ICCIS '10Washington, DC, USA
, pp. 1154-1157
-
-
Guo, P.1
Wang, L.2
-
10
-
-
74049114159
-
Auto-tuning 3-D FFT library for CUDA GPUS
-
A. Nukada and S. Matsuoka, "Auto-tuning 3-D FFT library for CUDA GPUS," in SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, New York, NY, USA, 2009, pp. 1-10.
-
SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, New York, NY, USA, 2009
, pp. 1-10
-
-
Nukada, A.1
Matsuoka, S.2
-
11
-
-
78249244772
-
Improving the performance of the sparse matrix vector product with gpus
-
Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology, ser. Washington, DC, USA: IEEE Computer Society
-
F. Vazquez, G. Ortega, J. J. Fernandez, and E. M. Garzon, "Improving the performance of the sparse matrix vector product with gpus," in Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology, ser. CIT '10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1146-1151.
-
(2010)
CIT '10
, pp. 1146-1151
-
-
Vazquez, F.1
Ortega, G.2
Fernandez, J.J.3
Garzon, E.M.4
-
12
-
-
77957679421
-
Model-driven autotuning of sparse matrix-vector multiply on GPUs
-
J. W. Choi, A. Singh, and R. W. Vuduc, "Model-driven autotuning of sparse matrix-vector multiply on GPUs," in PPoPP '10: Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming, New York, NY, USA, 2010, pp. 115-126.
-
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, NY, USA, 2010
, pp. 115-126
-
-
Choi, J.W.1
Singh, A.2
Vuduc, R.W.3
-
13
-
-
84886723259
-
Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform
-
S. Xu, W. Xue, and H. Lin, "Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform," The Journal of Supercomputing, pp. 1-12, 2011.
-
(2011)
The Journal of Supercomputing
, pp. 1-12
-
-
Xu, S.1
Xue, W.2
Lin, H.3
-
14
-
-
79955921273
-
A quantitative performance analysis model for GPU architectures
-
Y. Zhang and J. Owens, "A quantitative performance analysis model for GPU architectures," in High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on, feb. 2011, pp. 382-393.
-
High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on, Feb. 2011
, pp. 382-393
-
-
Zhang, Y.1
Owens, J.2
-
15
-
-
77749337497
-
An adaptive performance modeling tool for GPU architectures
-
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. New York, NY, USA: ACM
-
S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, and W.-m. W. Hwu, "An adaptive performance modeling tool for GPU architectures," in Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '10. New York, NY, USA: ACM, 2010, pp. 105-114.
-
(2010)
PPoPP '10
, pp. 105-114
-
-
Baghsorkhi, S.S.1
Delahaye, M.2
Patel, S.J.3
Gropp, W.D.4
Hwu, W.W.5
-
16
-
-
70450231944
-
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
-
Proceedings of the 36th annual international symposium on Computer architecture, ser. New York, NY, USA: ACM
-
S. Hong and H. Kim, "An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness," in Proceedings of the 36th annual international symposium on Computer architecture, ser. ISCA '09. New York, NY, USA: ACM, 2009, pp. 152-163.
-
(2009)
ISCA '09
, pp. 152-163
-
-
Hong, S.1
Kim, H.2
-
17
-
-
77952204218
-
A performance prediction model for the CUDA GPGPU platform
-
K. Kothapalli, R. Mukherjee, M. Rehman, S. Patidar, P. Narayanan, and K. Srinathan, "A performance prediction model for the CUDA GPGPU platform," in High Performance Computing (HiPC), 2009 International Conference on, dec. 2009, pp. 463-472.
-
High Performance Computing (HiPC), 2009 International Conference on, Dec. 2009
, pp. 463-472
-
-
Kothapalli, K.1
Mukherjee, R.2
Rehman, M.3
Patidar, S.4
Narayanan, P.5
Srinathan, K.6
-
18
-
-
56749158843
-
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
-
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, "Optimization of sparse matrix-vector multiplication on emerging multicore platforms," in Proc. 2007 ACM/IEEE Conference on Supercomputing, 2007.
-
Proc. 2007 ACM/IEEE Conference on Supercomputing, 2007
-
-
Williams, S.1
Oliker, L.2
Vuduc, R.3
Shalf, J.4
Yelick, K.5
Demmel, J.6
|