SCOPUS 정보 검색 플랫폼

IEEE Transactions on Parallel and Distributed Systems

Volumn 25, Issue 5, 2014, Pages 1112-1123

A performance modeling and optimizationanalysis tool for sparse matrix-vectormultiplication on GPUs

(3) Guo, Ping a Wang, Liqiang a Chen, Po a

a UNIVERSITY OF WYOMING (United States)

Author keywords

CUDA; GPU; Performance modeling; sparse matrix vector multiplication

Indexed keywords

EXPERIMENTS; OPTIMIZATION; PROGRAM PROCESSORS; TOOLS;

AVERAGE DIFFERENCE; CUDA; GPU; OPTIMAL SOLUTIONS; PERFORMANCE MODEL; SPARSE MATRICES; SPARSE MATRIX-VECTOR MULTIPLICATION; STORAGE FORMATS;

OPTIMAL SYSTEMS;

EID: 84898682038 PISSN: 10459219 EISSN: None Source Type: Journal
DOI: 10.1109/TPDS.2013.123 Document Type: Article

Times cited : (71)

References (27)

1
- 74049143158
- Implementing sparse matrix-vectormultiplication on throughput-oriented processors
- N. Bell and M. Garland, "Implementing Sparse Matrix- VectorMultiplication On Throughput-Oriented Processors," Proc. Conf.High Performance Computing Networking, Storage and Analysis (SC'09), pp. 1-11, 2009.
- (2009) Proc. Conf.High Performance Computing Networking, Storage and Analysis (SC'09) , pp. 1-11
- Bell, N.¹ Garland, M.²

2
- 84867015272
- Master's thesis, Utrecht Univ.,2011
- A. Resios and V. Holdermans, "GPU Performance PredictionUsing Parametrized Models,", Master's thesis, Utrecht Univ.,2011.
- GPU Performance PredictionUsing Parametrized Models
- Resios, A.¹ Holdermans, V.²

3
- 84866980089
- Accurate CUDA performance modeling forsparse matrix-vector multiplication
- July
- P. Guo and L. Wang, "Accurate CUDA Performance Modeling forSparse Matrix-Vector Multiplication," Proc. IEEE Int'l Conf. HighPerformance Computing and Simulation (HPCS '12), pp. 496-502,July 2012.
- (2012) Proc. IEEE Int'l Conf. HighPerformance Computing and Simulation (HPCS '12) , pp. 496-502
- Guo, P.¹ Wang, L.²

4
- 80052311496
- A model-driven partitioning and auto-tuning integratedframework for sparse matrix-vector multiplication on GPUs
- P. Guo, H. Huang, Q. Chen, L. Wang, E.-J. Lee, and P. Chen,"A Model-Driven Partitioning and Auto-Tuning IntegratedFramework for Sparse Matrix-Vector Multiplication on GPUs,"Proc. TeraGrid Conf. Extreme Digital Discovery (TG '11), pp. 2:1-2:8, 2011.
- (2011) Proc. TeraGrid Conf. Extreme Digital Discovery (TG '11) , pp. 21-28
- Guo, P.¹ Huang, H.² Chen, Q.³ Wang, L.⁴ Lee, E.-J.⁵ Chen, P.⁶

5
- 0242533311
- Sparse matrixsolvers on the gpu: Conjugate gradients and multigrid
- J. Bolz, I. Farmer, E. Grinspun, and P. Schroder, "Sparse MatrixSolvers on The GPU: Conjugate Gradients and Multigrid," ACMTrans. Graphics, vol. 22, no. 3, pp. 917-924, 2003.
- (2003) ACMTrans. Graphics , vol.22 , Issue.3 , pp. 917-924
- Bolz, J.¹ Farmer, I.² Grinspun, E.³ Schroder, P.⁴

6
- 84898683364
- NVIDIA CUDA C Programming Guide, Version 4.0, May 2011
- NVIDIA CUDA C Programming Guide, Version 4.0, May 2011.

7
- 60649099576
- Optimizing matrix multiplicationfor a short-vector simd architecture-cell processor
- J. Kurzak, W. Alvaro, and J. Dongarra, "Optimizing Matrix Multiplicationfor a Short-Vector Simd Architecture-Cell Processor,"J. Parallel Computing, vol. 35, no. 3, pp. 138-150, 2009.
- (2009) J. Parallel Computing , vol.35 , Issue.3 , pp. 138-150
- Kurzak, J.¹ Alvaro, W.² Dongarra, J.³

8
- 1542501019
- Sparsity: Optimization frameworkfor sparse matrix kernels
- E.-J. Im, K. Yelick, and R. Vuduc, "Sparsity: Optimization Frameworkfor Sparse Matrix Kernels," Int'l J. High Performance ComputingApplications, vol. 18, no. 1, pp. 135-158, 2004.
- (2004) Int'l J. High Performance ComputingApplications , vol.18 , Issue.1 , pp. 135-158
- Im, E.-J.¹ Yelick, K.² Vuduc, R.³

9
- 74049163483
- Optimizing sparse matrix-vector multiplication on GPUs
- Dec.
- M.M. Baskaran and R. Bordawekar, "Optimizing Sparse Matrix-Vector Multiplication on GPUs," Research Report RC24704, IBMTJ Watson Research Center, Dec. 2008.
- (2008) Research Report RC24704, IBMTJ Watson Research Center
- Baskaran, M.M.¹ Bordawekar, R.²

10
- 20744452904
- Self-adapting linear algebraalgorithms and software
- Feb.
- J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet,R.C.W.R. Vuduc, and K. Yelick, "Self-Adapting Linear AlgebraAlgorithms and Software," Proc. IEEE, vol. 93, no. 2, pp. 293-312,Feb. 2005.
- (2005) Proc. IEEE , vol.93 , Issue.2 , pp. 293-312
- Demmel, J.¹ Dongarra, J.² Eijkhout, V.³ Fuentes, E.⁴ Petitet, A.⁵ Vuduc, R.C.W.R.⁶ Yelick, K.⁷

11
- 79952428965
- Auto-tuning CUDA parameters forsparse matrix-vector multiplication on GPUs
- P. Guo and L. Wang, "Auto-Tuning CUDA Parameters forSparse Matrix-Vector Multiplication on GPUs," Proc. Int'l Conf.Computational and Information Sciences (ICCIS '10), pp. 1154-1157, 2010.
- (2010) Proc. Int'l Conf.Computational and Information Sciences (ICCIS '10) , pp. 1154-1157
- Guo, P.¹ Wang, L.²

12
- 78249244772
- Improving the performance of the sparse matrix vector productwith GPUs
- F. Vazquez, G. Ortega, J.J. Fernandez, and E.M. Garzon,"Improving the Performance of the Sparse Matrix Vector Productwith GPUs," Proc. 10th IEEE Int'l Conf. Computer and InformationTechnology (CIT '10), pp. 1146-1151, 2010.
- (2010) Proc. 10th IEEE Int'l Conf. Computer and InformationTechnology (CIT '10) , pp. 1146-1151
- Vazquez, F.¹ Ortega, G.² Fernandez, J.J.³ Garzon, E.M.⁴

13
- 77949577730
- Automaticallytuning sparse matrix-vector multiplication for GPU architectures
- A. Monakov, A. Lokhmotov, and A. Avetisyan, "AutomaticallyTuning Sparse Matrix-Vector Multiplication for GPUArchitectures," Proc. Fifth Int'l Conf. High Performance EmbeddedArchitectures and Compilers (HiPEAC '10), pp. 111-125, 2010.
- (2010) Proc. Fifth Int'l Conf. High Performance EmbeddedArchitectures and Compilers (HiPEAC '10) , pp. 111-125
- Monakov, A.¹ Lokhmotov, A.² Avetisyan, A.³

14
- 79955053359
- Automatically generating andtuning gpu code for sparse matrix-vector multiplication from ahigh-level representation
- D. Grewe and A. Lokhmotov, "Automatically Generating andTuning Gpu Code for Sparse Matrix-Vector Multiplication from aHigh-Level Representation," Proc. ACM Fourth Workshop GeneralPurpose Processing on Graphics Processing Units (GPGPU-4),pp. 12:1-12:8, 2011.
- (2011) Proc. ACM Fourth Workshop GeneralPurpose Processing on Graphics Processing Units (GPGPU-4) , pp. 121-128
- Grewe, D.¹ Lokhmotov, A.²

15
- 77956072107
- Optimizingsparse matrix-vector multiplication on CUDA
- June
- Z. Wang, X. Xu, W. Zhao, Y. Zhang, and S. He, "OptimizingSparse Matrix-Vector Multiplication on CUDA," Proc. Second Int'lConf. Education Technology and Computer (ICETC '10), vol. 4,pp. V4-109-V4-113, June 2010.
- (2010) Proc. Second int'Lconf. Education Technology and Computer (ICETC '10) , vol.4
- Wang, Z.¹ Xu, X.² Zhao, W.³ Zhang, Y.⁴ He, S.⁵

16
- 84857332778
- Optimization of sparse matrix-vector multiplication using reorderingtechniques on GPUs
- J.C. Pichel, F.F. Rivera, M. Fernandez, and A. Rodriguez, "Optimization of Sparse Matrix-Vector Multiplication Using ReorderingTechniques on GPUs," Microprocessors and Microsystems,vol. 36, no. 2, pp. 65-77, 2012.
- (2012) Microprocessors and Microsystems , vol.36 , Issue.2 , pp. 65-77
- Pichel, J.C.¹ Rivera, F.F.² Fernandez, M.³ Rodriguez, A.⁴

17
- 84862123284
- Fast sparsematrix-vector multiplication on GPUs: Implications for graphmining
- Jan.
- X. Yang, S. Parthasarathy, and P. Sadayappan, "Fast SparseMatrix-Vector Multiplication on GPUs: Implications for GraphMining," Proc. VLDB Endowment, vol. 4, no. 4, pp. 231-242, Jan.2011.
- (2011) Proc. VLDB Endowment , vol.4 , Issue.4 , pp. 231-242
- Yang, X.¹ Parthasarathy, S.² Sadayappan, P.³

18
- 43449094719
- Program optimization space pruning for a multithreaded GPU
- DOI 10.1145/1356058.1356084, Proceedings of the 2008 CGO - Sixth International Symposium on Code Generation and Optimization
- S. Ryoo, C.I. Rodrigues, S.S. Stone, S.S. Baghsorkhi, S.-Z. Ueng,J.A. Stratton, and W.-M.W. Hwu, "Program Optimization SpacePruning for a Multithreaded GPU," Proc. ACM Sixth Ann. IEEE/ACM Int'l Symp. Code Generation and Optimization (CGO '08),pp. 195-204, 2008. (Pubitemid 351667266)
- (2008) Proceedings of the 2008 CGO - Sixth International Symposium on Code Generation and Optimization , pp. 195-204
- Ryoo, S.¹ Rodrigues, C.I.² Stone, S.S.³ Baghsorkhi, S.S.⁴ Ueng, S.-Z.⁵ Stratton, J.A.⁶ Hwu, W.-M.W.⁷

19
- 77957679421
- Model-driven autotuningof sparse matrix-vector multiply on GPUs
- J.W. Choi, A. Singh, and R.W. Vuduc, "Model-Driven Autotuningof Sparse Matrix-Vector Multiply on GPUs," Proc. 15thACM SIGPLAN Symp. Principles and Practice of Parallel Programming(PPoPP '10), pp. 115-126, 2010.
- (2010) Proc. 15thACM SIGPLAN Symp. Principles and Practice of Parallel Programming(PPoPP '10) , pp. 115-126
- Choi, J.W.¹ Singh, A.² Vuduc, R.W.³

20
- 70449793037
- Exploring the multiple-GPU designspace
- May
- D. Schaa and D. Kaeli, "Exploring the multiple-GPU DesignSpace," Proc. IEEE Int'l Parallel & Distributed Processing Symp.(IPDPS '09), pp. 1-12, May 2009.
- (2009) Proc. IEEE Int'l Parallel & Distributed Processing Symp.(IPDPS '09) , pp. 1-12
- Schaa, D.¹ Kaeli, D.²

21
- 84886727304
- Performance modeling and optimizationof sparse matrix-vector multiplication on NVIDIA CUDAPlatform
- S. Xu, W. Xue, and H. Lin, "Performance Modeling and Optimizationof Sparse Matrix-Vector Multiplication on NVIDIA CUDAPlatform," J. Supercomputing, vol. 63, pp. 710-721, 2013.
- (2013) J. Supercomputing , vol.63 , pp. 710-721
- Xu, S.¹ Xue, W.² Lin, H.³

22
- 79955921273
- A quantitative performance analysismodel for GPU architectures
- Feb.
- Y. Zhang and J. Owens, "A Quantitative Performance AnalysisModel for GPU Architectures," Proc. IEEE 17th Int'l Symp. HighPerformance Computer Architecture (HPCA '11), pp. 382-393, Feb.2011.
- (2011) Proc. IEEE 17th Int'l Symp. HighPerformance Computer Architecture (HPCA '11) , pp. 382-393
- Zhang, Y.¹ Owens, J.²

23
- 77957561221
- An adaptive performance modeling tool for GPU architectures
- S.S. Baghsorkhi, M. Delahaye, S.J. Patel, W.D. Gropp, and W.-M.W. Hwu, "An Adaptive Performance Modeling Tool for GPUArchitectures," Proc. 15th ACM SIGPLAN Symp. Principles andPractice of Parallel Programming (PPoPP '10), pp. 105-114, 2010.
- (2010) Proc. 15th ACM SIGPLAN Symp. Principles AndPractice of Parallel Programming (PPoPP '10) , pp. 105-114
- Baghsorkhi, S.S.¹ Delahaye, M.² Patel, S.J.³ Gropp, W.D.⁴ Hwu, W.-M.W.⁵

24
- 70450231944
- An analytical model for a gpu architecturewith memory-level and thread-level parallelismawareness
- S. Hong and H. Kim, "An Analytical Model for a GPU Architecturewith Memory-Level and Thread-Level ParallelismAwareness," Proc. 36th ACM Ann. Int'l Symp. Computer Architecture(ISCA '09), pp. 152-163, 2009.
- (2009) Proc. 36th ACM Ann. Int'l Symp. Computer Architecture(ISCA '09) , pp. 152-163
- Hong, S.¹ Kim, H.²

25
- 77952204218
- A performance prediction model forthe CUDA GPGPU platform
- Dec.
- K. Kothapalli, R. Mukherjee, M. Rehman, S. Patidar, P. Narayanan,and K. Srinathan, "A Performance Prediction Model forthe CUDA GPGPU Platform," Proc. Int'l Conf. High PerformanceComputing (HiPC '09), pp. 463-472, Dec. 2009.
- (2009) Proc. Int'l Conf. High PerformanceComputing (HiPC '09) , pp. 463-472
- Kothapalli, K.¹ Mukherjee, R.² Rehman, M.³ Patidar, S.⁴ Narayananand, P.⁵ Srinathan, K.⁶

26
- 81355161778
- The university of florida sparse matrixcollection
- T.A. Davis and Y. Hu, "The University of Florida Sparse MatrixCollection," ACM Trans. Math. Software, vol. 38, no. 1, pp. 1:1-1:25,2011.
- (2011) ACM Trans. Math. Software , vol.38 , Issue.1 , pp. 11-125
- Davis, T.A.¹ Hu, Y.²

27
- 56749158843
- Optimization of sparse matrix-vector multiplication on emergingmulticore platforms
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel,"Optimization of Sparse Matrix-Vector Multiplication on EmergingMulticore Platforms," Proc. ACM/IEEE Conf. Supercomputing,2007.
- (2007) Proc. ACM/IEEE Conf. Supercomputing
- Williams, S.¹ Oliker, L.² Vuduc, R.³ Shalf, J.⁴ Yelick, K.⁵ Demmel, J.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.