SCOPUS 정보 검색 플랫폼

IEEE Transactions on Parallel and Distributed Systems

Volumn 26, Issue 1, 2015, Pages 196-205

Performance analysis and optimization for SpMV on GPU using probabilistic modeling

(3) Li, Kenli a,b Yang, Wangdong a,b Li, Keqin a,b,c

a HUNAN UNIVERSITY (China)

b National Supercomputing Center in Changsha (China)

c STATE UNIVERSITY OF NEW YORK (United States)

Author keywords

GPU; performance modeling; probability mass function; sparse matrix vector multiplication

Indexed keywords

COMPUTER HARDWARE; FUNCTIONS; GRAPHICS PROCESSING UNIT;

COMPRESSION EFFICIENCY; DISTRIBUTION PATTERNS; PERFORMANCE ANALYSIS AND OPTIMIZATIONS; PERFORMANCE MODEL; PERFORMANCE MODELING AND ANALYSIS; PROBABILISTIC MODELING; PROBABILITY MASS FUNCTION; SPARSE MATRIX-VECTOR MULTIPLICATION;

MATRIX ALGEBRA;

EID: 84919470072 PISSN: 10459219 EISSN: None Source Type: Journal
DOI: 10.1109/TPDS.2014.2308221 Document Type: Article

Times cited : (214)

References (36)

1
- 74049143158
- Implementing sparse matrix-vector multiplication on throughput-oriented processors
- N. Bell and M. Garland, "Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors," Proc. Conf. High Performance Computing Networking, Storage and Analysis, pp. 1-11, 2009.
- (2009) Proc. Conf. High Performance Computing Networking, Storage and Analysis , pp. 1-11
- Bell, N.¹ Garland, M.²

2
- 84900558239
- NVIDIA
- The NVIDIA CUDA Sparse Matrix Library (cuSPARSE), second ed., NVIDIA, http://docs.nvidia.com/cuda/cusparse/index.html, 2012.
- (2012) The NVIDIA CUDA Sparse Matrix Library (cuSPARSE) second ed

3
- 84900536807
- Optimization of quasi diagonal matrix-vector multiplication on GPU
- first published on September
- W. Yang, K. Li, Y. Liu, L. Shi, and C. Wang, Optimization of Quasi Diagonal Matrix-Vector Multiplication on GPU, Int'l J. High Performance Computing Applications, first published on September 2, 2013, doi:10.1177/1094342013501126, http://hpc.sagepub.com/content/early/2013/09/02/1094342013501126.full.pdf
- (2013) Int'l J. High Performance Computing Applications , vol.2
- Yang, W.¹ Li, K.² Liu, Y.³ Shi, L.⁴ Wang, C.⁵

4
- 0242533311
- Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid
- July
- J. Bolz, I. Farmer, E. Grinspun, and P. Schroder, "Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid," ACM Trans. Graphics, vol. 22, no. 3, pp. 917-924, July 2003.
- (2003) ACM Trans. Graphics , vol.22 , Issue.3 , pp. 917-924
- Bolz, J.¹ Farmer, I.² Grinspun, E.³ Schroder, P.⁴

5
- 84877717135
- May
- NVIDIA CUDA C Programming Guide, Version 5.0, May 2012.
- (2012) NVIDIA CUDA C Programming Guide, Version 5.0

6
- 78249244772
- Improving the performance of the sparse matrix vector product with gpus
- F. Vazquez, G. Ortega, J.J. Fernandez, and E.M. Garzon, "Improving the Performance of the Sparse Matrix Vector Product with GPUs," Proc. IEEE 10th Int'l Conf. Computer and Information Technology (CIT '10), pp. 1146-1151, 2010.
- (2010) Proc. IEEE 10th Int'l Conf. Computer and Information Technology (CIT '10) , pp. 1146-1151
- Vazquez, F.¹ Ortega, G.² Fernandez, J.J.³ Garzon, E.M.⁴

7
- 84864039129
- Automatically generating and tuning gpu code for sparse matrix-vector multiplication from a high-level representation
- D. Grewe and A. Lokhmotov, "Automatically Generating and Tuning GPU Code for Sparse Matrix-Vector Multiplication from a High-Level Representation," Proc. Fourth Workshop General Purpose Processing on Graphics Processing Units (GPGPU-4), article 12, 2011.
- (2011) Proc. Fourth Workshop General Purpose Processing on Graphics Processing Units (GPGPU-4)
- Grewe, D.¹ Lokhmotov, A.²

8
- 84857332778
- Optimization of Sparse Matrix-Vector Multiplication Using Reordering Techniques on GPUs
- J.C. Pichel, F.F. Rivera, M. Fernandez, and A. Rodriguez, "Optimization of Sparse Matrix-Vector Multiplication Using Reordering Techniques on GPUs," Microprocessors and Microsystems, vol. 36, no. 2, pp. 65-77, 2012.
- (2012) Microprocessors and Microsystems , vol.36 , Issue.2 , pp. 65-77
- Pichel, J.C.¹ Rivera, F.F.² Fernandez, M.³ Rodriguez, A.⁴

9
- 84886723333
- arXiv preprint arXiv: 1012.2270
- T. Oberhuber, A. Suzuki, and J. Vacata, "New Row-Grouped CSR Format for Storing the Sparse Matrices on GPU with Implementation in CUDA," arXiv preprint arXiv:1012.2270, 2010.
- (2010) New Row-Grouped CSR Format for Storing the Sparse Matrices on GPU with Implementation in CUDA
- Oberhuber, T.¹ Suzuki, A.² Vacata, J.³

10
- 84919494711
- High-level strategies for parallel shared-memory sparse matrix-vector multiplication
- Jan.
- A.-J.N. Yzelman and D. Roose, "High-Level Strategies for Parallel Shared-Memory Sparse Matrix-Vector Multiplication," IEEE Trans. Parallel and Distributed Systems, vol. 25, no. 1, pp. 116-125, http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.31, Jan. 2014.
- (2014) IEEE Trans. Parallel and Distributed Systems , vol.25 , Issue.1 , pp. 116-125
- Yzelman, A.-J.N.¹ Roose, D.²

11
- 77957679421
- Model-driven autotuning of sparse matrix-vector multiply on gpus
- J.W. Choi, A. Singh, and R.W. Vuduc, "Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs," Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10), pp. 115-126, 2010.
- (2010) Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10) , pp. 115-126
- Choi, J.W.¹ Singh, A.² Vuduc, R.W.³

12
- 84883314318
- An extended compression format for the optimization of sparse matrix-vector multiplication
- Sept.
- V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris, "An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication," IEEE Trans. Parallel and Distributed Systems, vol. 24, no. 10, pp. 1930-1940, Sept. 2013.
- (2013) IEEE Trans. Parallel and Distributed Systems , vol.24 , Issue.10 , pp. 1930-1940
- Karakasis, V.¹ Gkountouvas, T.² Kourtis, K.³ Goumas, G.⁴ Koziris, N.⁵

13
- 60649099576
- Optimizing matrix multiplication for a short-vector simd architecture-cell processor
- J. Kurzak, W. Alvaro, and J. Dongarra, "Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture-Cell Processor," Parallel Computing, vol. 35, no. 3, pp. 138-150, 2009.
- (2009) Parallel Computing , vol.35 , Issue.3 , pp. 138-150
- Kurzak, J.¹ Alvaro, W.² Dongarra, J.³

14
- 1542501019
- Sparsity: Optimization framework for sparse matrix kernels
- E.-J. Im, K. Yelick, and R. Vuduc, "Sparsity: Optimization Framework for Sparse Matrix Kernels," Int'l J. High Performance Computing Applications, vol. 18, no. 1, pp. 135-158, 2004.
- (2004) Int'l J. High Performance Computing Applications , vol.18 , Issue.1 , pp. 135-158
- Im, E.-J.¹ Yelick, K.² Vuduc, R.³

15
- 74049163483
- Optimizing sparse matrix-vector multiplication on gpus
- Dec.
- M.M. Baskaran and R. Bordawekar, "Optimizing Sparse Matrix-Vector Multiplication on GPUs,"Technical Report RC24704 IBM TJ Watson Research Center, Dec. 2008. 204.
- (2008) Technical Report RC24704 IBM TJ Watson Research Center , pp. 204
- Baskaran, M.M.¹ Bordawekar, R.²

16
- 84919495323
- JANUARY 2015
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 1, JANUARY 2015
- IEEE Transactions on Parallel and Distributed Systems , vol.26 , Issue.1

17
- 20744452904
- Self-adapting linear algebra algorithms and software
- Feb.
- J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. C.W.R. Vuduc, and K. Yelick, "Self-Adapting Linear Algebra Algorithms and Software," Proc. IEEE, vol. 93, no. 2, pp. 293-312, Feb. 2005.
- (2005) Proc. IEEE , vol.93 , Issue.2 , pp. 293-312
- Demmel, J.¹ Dongarra, J.² Eijkhout, V.³ Fuentes, E.⁴ Petitet, A.⁵ Vuduc, R.C.W.R.⁶ Yelick, K.⁷

18
- 77949577730
- Automatically tuning sparse matrix-vector multiplication for gpu architectures
- A. Monakov, A. Lokhmotov, and A. Avetisyan, "Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures," Proc. Fifth Int'l Conf. High Performance Embedded Architectures and Compilers, pp. 111-125, 2010.
- (2010) Proc. Fifth Int'l Conf. High Performance Embedded Architectures and Compilers , pp. 111-125
- Monakov, A.¹ Lokhmotov, A.² Avetisyan, A.³

19
- 77956072107
- Optimizing sparse matrix-vector multiplication on cuda
- June
- Z. Wang, X. Xu, W. Zhao, Y. Zhang, and S. He, "Optimizing Sparse Matrix-Vector Multiplication on CUDA," Proc. Second Int'l Conf. Education Technology and Computer (ICETC), vol. 4, pp. 54109-54113, June 2010.
- (2010) Proc. Second Int'l Conf. Education Technology and Computer (ICETC) , vol.4 , pp. 54109-54113
- Wang, Z.¹ Xu, X.² Zhao, W.³ Zhang, Y.⁴ He, S.⁵

20
- 84862123284
- Fast sparse matrix-vector multiplication on GPUs: Implications for graph mining
- Jan.
- X. Yang, S. Parthasarathy, and P. Sadayappan, "Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining," Proc. VLDB Endowment, vol. 4, no. 4, pp. 231-242, Jan. 2011.
- (2011) Proc. VLDB Endowment , vol.4 , Issue.4 , pp. 231-242
- Yang, X.¹ Parthasarathy, S.² Sadayappan, P.³

21
- 84855223315
- Generating optimal CUDA sparse matrix-vector product implementations for evolving GPU hardware
- A.H.El Zein and A.P. Rendell, "Generating Optimal CUDA Sparse Matrix-Vector Product Implementations for Evolving GPU Hardware," Concurrency and Computation: Practice and Experience, vol. 24, no. 1, pp. 3-13, 2012.
- (2012) Concurrency and Computation: Practice and Experience , vol.24 , Issue.1 , pp. 3-13
- Zein A.H.El¹ Rendell, A.P.²

22
- 79960328560
- Optimization of sparse matrix-vector multiplication by auto selecting storage schemes on GPU
- Y. Kubota and D. Takahashi, "Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU," Proc. Int'l Conf. Computational Science and Its Applications (ICCSA '11), pp. 547-561, 2011.
- (2011) Proc. Int'l Conf. Computational Science and Its Applications (ICCSA '11) , pp. 547-561
- Kubota, Y.¹ Takahashi, D.²

23
- 43449094719
- Program optimization space pruning for a multithreaded GPU
- S. Ryoo, C.I. Rodrigues, S.S. Stone, S.S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W.-M.W. Hwu, "Program Optimization Space Pruning for a Multithreaded GPU," Proc. Sixth Ann. IEEE/ACM Int'l Symp. Code Generation and Optimization (CGO '08), pp. 195-204, 2008.
- (2008) Proc. Sixth Ann IEEE/ACM Int'l Symp. Code Generation and Optimization (CGO '08) , pp. 195-204
- Ryoo, S.¹ Rodrigues, C.I.² Stone, S.S.³ Baghsorkhi, S.S.⁴ Ueng, S.-Z.⁵ Stratton, J.A.⁶ Hwu, W.-M.W.⁷

24
- 79955921273
- A quantitative performance analysis model for gpu architectures
- Feb.
- Y. Zhang and J. Owens, "A Quantitative Performance Analysis Model for GPU Architectures," Proc. IEEE 17th Int'l Symp. High Performance Computer Architecture (HPCA '11), pp. 382-393, Feb. 2011.
- (2011) Proc. IEEE 17th Int'l Symp. High Performance Computer Architecture (HPCA '11) , pp. 382-393
- Zhang, Y.¹ Owens, J.²

25
- 84870731723
- Performance of a structure-detecting SpMV using the CSR matrix representation
- H. Pabst, B. Bachmayer, and M. Klemm, "Performance of a Structure-Detecting SpMV Using the CSR Matrix Representation," Proc. IEEE 11th Int'l Symp. Parallel and Distributed Computing (ISPDC), pp. 3-10, 2012.
- (2012) Proc. IEEE 11th Int'l Symp. Parallel and Distributed Computing (ISPDC) , pp. 3-10
- Pabst, H.¹ Bachmayer, B.² Klemm, M.³

26
- 84881061313
- Parallel sparse approximate inverse preconditioning on graphic processing units
- Sept.
- M.M. Dehnavi, D. Fernandez, and J.L. Gaudiot, "Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units," IEEE Trans. Parallel and Distributed Systems, vol. 24, no. 9, pp. 1852-1862, Sept. 2013.
- (2013) IEEE Trans. Parallel and Distributed Systems , vol.24 , Issue.9 , pp. 1852-1862
- Dehnavi, M.M.¹ Fernandez, D.² Gaudiot, J.L.³

27
- 77957561221
- An adaptive performance modeling tool for GPU architectures
- S.S. Baghsorkhi, M. Delahaye, S.J. Patel, W.D. Gropp, and W.-m. W. Hwu, "An Adaptive Performance Modeling Tool for GPU Architectures," Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10), pp. 105-114, 2010.
- (2010) Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10) , pp. 105-114
- Baghsorkhi, S.S.¹ Delahaye, M.² Patel, S.J.³ Gropp, W.D.⁴ Hwu, W.-M.W.⁵

28
- 70450231944
- An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness
- S. Hong and H. Kim, "An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness," Proc. 36th Ann. Int'l Symp. Computer Architecture (ISCA '09), pp. 152-163, 2009.
- (2009) Proc. 36th Ann. Int'l Symp. Computer Architecture (ISCA '09) , pp. 152-163
- Hong, S.¹ Kim, H.²

29
- 77952204218
- A performance prediction model for the CUDA GPGPU platform
- Dec.
- K. Kothapalli, R. Mukherjee, M. Rehman, S. Patidar, P. Narayanan, and K. Srinathan, "A Performance Prediction Model for the CUDA GPGPU Platform," Proc. Int'l Conf. High Performance Computing (HiPC), pp. 463-472, Dec. 2009.
- (2009) Proc. Int'l Conf. High Performance Computing (HiPC) , pp. 463-472
- Kothapalli, K.¹ Mukherjee, R.² Rehman, M.³ Patidar, S.⁴ Narayanan, P.⁵ Srinathan, K.⁶

30
- 84883120855
- SMAT: An input adaptive auto-tuner for sparse matrix-vector multiplication
- J. Li, G. Tan, M. Chen, and N. Sun, "SMAT: An Input Adaptive Auto-Tuner for Sparse Matrix-Vector Multiplication," Proc. 34th ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 117-126, 2013.
- (2013) Proc. 34th ACM SIGPLAN Conf. Programming Language Design and Implementation , pp. 117-126
- Li, J.¹ Tan, G.² Chen, M.³ Sun, N.⁴

31
- 70449793037
- Exploring the multiple-GPU design space
- May
- D. Schaa and D. Kaeli, "Exploring the Multiple-GPU Design Space," Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS '09), pp. 1-12, May 2009.
- (2009) Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS '09) , pp. 1-12
- Schaa, D.¹ Kaeli, D.²

32
- 84898682038
- A performance modeling and optimization analysis tool for sparse matrix-vector multiplication on GPUs
- P. Guo, L. Wang, and P. Chen, "A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs," IEEE Trans. Parallel and Distributed Systems, vol. 25, no. 5, pp. 1112-1123, 2014.
- (2014) IEEE Trans. Parallel and Distributed Systems , vol.25 , Issue.5 , pp. 1112-1123
- Guo, P.¹ Wang, L.² Chen, P.³

33
- 84885948161
- Sparse matrix vector multiplication on the single-chip cloud computer many-core processor
- J.C. Pichel and F.F. Rivera, "Sparse Matrix Vector Multiplication on the Single-Chip Cloud Computer Many-Core Processor," J. Parallel and Distributed Computing, vol. 73, no. 12, pp. 1539-1550, 2013.
- (2013) J. Parallel and Distributed Computing , vol.73 , Issue.12 , pp. 1539-1550
- Pichel, J.C.¹ Rivera, F.F.²

34
- 85011418031
- Architectureand workload-aware heterogeneous algorithms for sparse matrix vector multiplication
- Dec.
- S.B. Indarapu, M. Maramreddy, and K. Kothapalli, "Architectureand Workload-Aware Heterogeneous Algorithms for Sparse Matrix Vector Multiplication," Proc. Int'l Conf. Parallel and Distributed Systems (ICPADS), Dec. 2013, http://cstar.iiit.ac.in/kkishore/spmv2.pdf.
- (2013) Proc. Int'l Conf. Parallel and Distributed Systems (ICPADS)
- Indarapu, S.B.¹ Maramreddy, M.² Kothapalli, K.³

35
- 0003763748
- John Wiley & Sons
- N.L. Johnson, S. Kotz, and A. Kemp, Univariate Discrete Distributions, second ed., p. 36, John Wiley & Sons, 1993.
- (1993) Univariate Discrete Distributions Second Ed. , pp. 36
- Johnson, N.L.¹ Kotz, S.² Kemp, A.³

36
- 0012453312
- T.A. Davis and Y. Hu, University of Florida Sparse Matrix Collection, 2009.
- (2009) University of Florida Sparse Matrix Collection
- Davis, T.A.¹ Hu, Y.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.