-
3
-
-
84900536807
-
Optimization of quasi diagonal matrix-vector multiplication on GPU
-
first published on September
-
W. Yang, K. Li, Y. Liu, L. Shi, and C. Wang, Optimization of Quasi Diagonal Matrix-Vector Multiplication on GPU, Int'l J. High Performance Computing Applications, first published on September 2, 2013, doi:10.1177/1094342013501126, http://hpc.sagepub.com/content/early/2013/09/02/1094342013501126.full.pdf
-
(2013)
Int'l J. High Performance Computing Applications
, vol.2
-
-
Yang, W.1
Li, K.2
Liu, Y.3
Shi, L.4
Wang, C.5
-
4
-
-
0242533311
-
Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid
-
July
-
J. Bolz, I. Farmer, E. Grinspun, and P. Schroder, "Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid," ACM Trans. Graphics, vol. 22, no. 3, pp. 917-924, July 2003.
-
(2003)
ACM Trans. Graphics
, vol.22
, Issue.3
, pp. 917-924
-
-
Bolz, J.1
Farmer, I.2
Grinspun, E.3
Schroder, P.4
-
6
-
-
78249244772
-
Improving the performance of the sparse matrix vector product with gpus
-
F. Vazquez, G. Ortega, J.J. Fernandez, and E.M. Garzon, "Improving the Performance of the Sparse Matrix Vector Product with GPUs," Proc. IEEE 10th Int'l Conf. Computer and Information Technology (CIT '10), pp. 1146-1151, 2010.
-
(2010)
Proc. IEEE 10th Int'l Conf. Computer and Information Technology (CIT '10)
, pp. 1146-1151
-
-
Vazquez, F.1
Ortega, G.2
Fernandez, J.J.3
Garzon, E.M.4
-
8
-
-
84857332778
-
Optimization of Sparse Matrix-Vector Multiplication Using Reordering Techniques on GPUs
-
J.C. Pichel, F.F. Rivera, M. Fernandez, and A. Rodriguez, "Optimization of Sparse Matrix-Vector Multiplication Using Reordering Techniques on GPUs," Microprocessors and Microsystems, vol. 36, no. 2, pp. 65-77, 2012.
-
(2012)
Microprocessors and Microsystems
, vol.36
, Issue.2
, pp. 65-77
-
-
Pichel, J.C.1
Rivera, F.F.2
Fernandez, M.3
Rodriguez, A.4
-
10
-
-
84919494711
-
High-level strategies for parallel shared-memory sparse matrix-vector multiplication
-
Jan.
-
A.-J.N. Yzelman and D. Roose, "High-Level Strategies for Parallel Shared-Memory Sparse Matrix-Vector Multiplication," IEEE Trans. Parallel and Distributed Systems, vol. 25, no. 1, pp. 116-125, http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.31, Jan. 2014.
-
(2014)
IEEE Trans. Parallel and Distributed Systems
, vol.25
, Issue.1
, pp. 116-125
-
-
Yzelman, A.-J.N.1
Roose, D.2
-
11
-
-
77957679421
-
Model-driven autotuning of sparse matrix-vector multiply on gpus
-
J.W. Choi, A. Singh, and R.W. Vuduc, "Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs," Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10), pp. 115-126, 2010.
-
(2010)
Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10)
, pp. 115-126
-
-
Choi, J.W.1
Singh, A.2
Vuduc, R.W.3
-
12
-
-
84883314318
-
An extended compression format for the optimization of sparse matrix-vector multiplication
-
Sept.
-
V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris, "An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication," IEEE Trans. Parallel and Distributed Systems, vol. 24, no. 10, pp. 1930-1940, Sept. 2013.
-
(2013)
IEEE Trans. Parallel and Distributed Systems
, vol.24
, Issue.10
, pp. 1930-1940
-
-
Karakasis, V.1
Gkountouvas, T.2
Kourtis, K.3
Goumas, G.4
Koziris, N.5
-
13
-
-
60649099576
-
Optimizing matrix multiplication for a short-vector simd architecture-cell processor
-
J. Kurzak, W. Alvaro, and J. Dongarra, "Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture-Cell Processor," Parallel Computing, vol. 35, no. 3, pp. 138-150, 2009.
-
(2009)
Parallel Computing
, vol.35
, Issue.3
, pp. 138-150
-
-
Kurzak, J.1
Alvaro, W.2
Dongarra, J.3
-
14
-
-
1542501019
-
Sparsity: Optimization framework for sparse matrix kernels
-
E.-J. Im, K. Yelick, and R. Vuduc, "Sparsity: Optimization Framework for Sparse Matrix Kernels," Int'l J. High Performance Computing Applications, vol. 18, no. 1, pp. 135-158, 2004.
-
(2004)
Int'l J. High Performance Computing Applications
, vol.18
, Issue.1
, pp. 135-158
-
-
Im, E.-J.1
Yelick, K.2
Vuduc, R.3
-
17
-
-
20744452904
-
Self-adapting linear algebra algorithms and software
-
Feb.
-
J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. C.W.R. Vuduc, and K. Yelick, "Self-Adapting Linear Algebra Algorithms and Software," Proc. IEEE, vol. 93, no. 2, pp. 293-312, Feb. 2005.
-
(2005)
Proc. IEEE
, vol.93
, Issue.2
, pp. 293-312
-
-
Demmel, J.1
Dongarra, J.2
Eijkhout, V.3
Fuentes, E.4
Petitet, A.5
Vuduc, R.C.W.R.6
Yelick, K.7
-
18
-
-
77949577730
-
Automatically tuning sparse matrix-vector multiplication for gpu architectures
-
A. Monakov, A. Lokhmotov, and A. Avetisyan, "Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures," Proc. Fifth Int'l Conf. High Performance Embedded Architectures and Compilers, pp. 111-125, 2010.
-
(2010)
Proc. Fifth Int'l Conf. High Performance Embedded Architectures and Compilers
, pp. 111-125
-
-
Monakov, A.1
Lokhmotov, A.2
Avetisyan, A.3
-
19
-
-
77956072107
-
Optimizing sparse matrix-vector multiplication on cuda
-
June
-
Z. Wang, X. Xu, W. Zhao, Y. Zhang, and S. He, "Optimizing Sparse Matrix-Vector Multiplication on CUDA," Proc. Second Int'l Conf. Education Technology and Computer (ICETC), vol. 4, pp. 54109-54113, June 2010.
-
(2010)
Proc. Second Int'l Conf. Education Technology and Computer (ICETC)
, vol.4
, pp. 54109-54113
-
-
Wang, Z.1
Xu, X.2
Zhao, W.3
Zhang, Y.4
He, S.5
-
20
-
-
84862123284
-
Fast sparse matrix-vector multiplication on GPUs: Implications for graph mining
-
Jan.
-
X. Yang, S. Parthasarathy, and P. Sadayappan, "Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining," Proc. VLDB Endowment, vol. 4, no. 4, pp. 231-242, Jan. 2011.
-
(2011)
Proc. VLDB Endowment
, vol.4
, Issue.4
, pp. 231-242
-
-
Yang, X.1
Parthasarathy, S.2
Sadayappan, P.3
-
21
-
-
84855223315
-
Generating optimal CUDA sparse matrix-vector product implementations for evolving GPU hardware
-
A.H.El Zein and A.P. Rendell, "Generating Optimal CUDA Sparse Matrix-Vector Product Implementations for Evolving GPU Hardware," Concurrency and Computation: Practice and Experience, vol. 24, no. 1, pp. 3-13, 2012.
-
(2012)
Concurrency and Computation: Practice and Experience
, vol.24
, Issue.1
, pp. 3-13
-
-
Zein A.H.El1
Rendell, A.P.2
-
23
-
-
43449094719
-
Program optimization space pruning for a multithreaded GPU
-
S. Ryoo, C.I. Rodrigues, S.S. Stone, S.S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W.-M.W. Hwu, "Program Optimization Space Pruning for a Multithreaded GPU," Proc. Sixth Ann. IEEE/ACM Int'l Symp. Code Generation and Optimization (CGO '08), pp. 195-204, 2008.
-
(2008)
Proc. Sixth Ann IEEE/ACM Int'l Symp. Code Generation and Optimization (CGO '08)
, pp. 195-204
-
-
Ryoo, S.1
Rodrigues, C.I.2
Stone, S.S.3
Baghsorkhi, S.S.4
Ueng, S.-Z.5
Stratton, J.A.6
Hwu, W.-M.W.7
-
25
-
-
84870731723
-
Performance of a structure-detecting SpMV using the CSR matrix representation
-
H. Pabst, B. Bachmayer, and M. Klemm, "Performance of a Structure-Detecting SpMV Using the CSR Matrix Representation," Proc. IEEE 11th Int'l Symp. Parallel and Distributed Computing (ISPDC), pp. 3-10, 2012.
-
(2012)
Proc. IEEE 11th Int'l Symp. Parallel and Distributed Computing (ISPDC)
, pp. 3-10
-
-
Pabst, H.1
Bachmayer, B.2
Klemm, M.3
-
26
-
-
84881061313
-
Parallel sparse approximate inverse preconditioning on graphic processing units
-
Sept.
-
M.M. Dehnavi, D. Fernandez, and J.L. Gaudiot, "Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units," IEEE Trans. Parallel and Distributed Systems, vol. 24, no. 9, pp. 1852-1862, Sept. 2013.
-
(2013)
IEEE Trans. Parallel and Distributed Systems
, vol.24
, Issue.9
, pp. 1852-1862
-
-
Dehnavi, M.M.1
Fernandez, D.2
Gaudiot, J.L.3
-
27
-
-
77957561221
-
An adaptive performance modeling tool for GPU architectures
-
S.S. Baghsorkhi, M. Delahaye, S.J. Patel, W.D. Gropp, and W.-m. W. Hwu, "An Adaptive Performance Modeling Tool for GPU Architectures," Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10), pp. 105-114, 2010.
-
(2010)
Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10)
, pp. 105-114
-
-
Baghsorkhi, S.S.1
Delahaye, M.2
Patel, S.J.3
Gropp, W.D.4
Hwu, W.-M.W.5
-
28
-
-
70450231944
-
An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness
-
S. Hong and H. Kim, "An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness," Proc. 36th Ann. Int'l Symp. Computer Architecture (ISCA '09), pp. 152-163, 2009.
-
(2009)
Proc. 36th Ann. Int'l Symp. Computer Architecture (ISCA '09)
, pp. 152-163
-
-
Hong, S.1
Kim, H.2
-
29
-
-
77952204218
-
A performance prediction model for the CUDA GPGPU platform
-
Dec.
-
K. Kothapalli, R. Mukherjee, M. Rehman, S. Patidar, P. Narayanan, and K. Srinathan, "A Performance Prediction Model for the CUDA GPGPU Platform," Proc. Int'l Conf. High Performance Computing (HiPC), pp. 463-472, Dec. 2009.
-
(2009)
Proc. Int'l Conf. High Performance Computing (HiPC)
, pp. 463-472
-
-
Kothapalli, K.1
Mukherjee, R.2
Rehman, M.3
Patidar, S.4
Narayanan, P.5
Srinathan, K.6
-
30
-
-
84883120855
-
SMAT: An input adaptive auto-tuner for sparse matrix-vector multiplication
-
J. Li, G. Tan, M. Chen, and N. Sun, "SMAT: An Input Adaptive Auto-Tuner for Sparse Matrix-Vector Multiplication," Proc. 34th ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 117-126, 2013.
-
(2013)
Proc. 34th ACM SIGPLAN Conf. Programming Language Design and Implementation
, pp. 117-126
-
-
Li, J.1
Tan, G.2
Chen, M.3
Sun, N.4
-
32
-
-
84898682038
-
A performance modeling and optimization analysis tool for sparse matrix-vector multiplication on GPUs
-
P. Guo, L. Wang, and P. Chen, "A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs," IEEE Trans. Parallel and Distributed Systems, vol. 25, no. 5, pp. 1112-1123, 2014.
-
(2014)
IEEE Trans. Parallel and Distributed Systems
, vol.25
, Issue.5
, pp. 1112-1123
-
-
Guo, P.1
Wang, L.2
Chen, P.3
-
33
-
-
84885948161
-
Sparse matrix vector multiplication on the single-chip cloud computer many-core processor
-
J.C. Pichel and F.F. Rivera, "Sparse Matrix Vector Multiplication on the Single-Chip Cloud Computer Many-Core Processor," J. Parallel and Distributed Computing, vol. 73, no. 12, pp. 1539-1550, 2013.
-
(2013)
J. Parallel and Distributed Computing
, vol.73
, Issue.12
, pp. 1539-1550
-
-
Pichel, J.C.1
Rivera, F.F.2
-
35
-
-
0003763748
-
-
John Wiley & Sons
-
N.L. Johnson, S. Kotz, and A. Kemp, Univariate Discrete Distributions, second ed., p. 36, John Wiley & Sons, 1993.
-
(1993)
Univariate Discrete Distributions Second Ed.
, pp. 36
-
-
Johnson, N.L.1
Kotz, S.2
Kemp, A.3
|