SCOPUS 정보 검색 플랫폼

IEEE Transactions on Computers

Volumn 64, Issue 9, 2015, Pages 2623-2636

Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs

(4) Yang, Wangdong a,b Li, Kenli a,b Mo, Zeyao c Li, Keqin a,b,d

a HUNAN UNIVERSITY (China)

b National Supercomputing Center in Changsha (China)

c INSTITUTE OF APPLIED PHYSICS AND COMPUTATIONAL MATHEMATICS (China)

d STATE UNIVERSITY OF NEW YORK (United States)

Author keywords

GPU; matrix partition; multicore CPU; probability distribution; sparse matrix vector multiplication

Indexed keywords

PROBABILITY; PROBABILITY DISTRIBUTIONS; PROGRAM PROCESSORS;

GPU; HETEROGENEOUS PROCESSORS; MATRIX PARTITIONS; MULTI-CORE CPUS; PARALLEL COMPUTING MODELS; PERFORMANCE OPTIMIZATIONS; PROBABILITY MASS FUNCTION; SPARSE MATRIX-VECTOR MULTIPLICATION;

MATRIX ALGEBRA;

EID: 84939230567 PISSN: 00189340 EISSN: None Source Type: Journal
DOI: 10.1109/TC.2014.2366731 Document Type: Article

Times cited : (122)

References (36)

1
- 51649124194
- Efficient breadth-first search on the cell/BE processor
- Oct. 10
- D. P. Scarpazza, O. Villa, and F. Petrini, "Efficient breadth-first search on the cell/BE processor," IEEE Trans. Parallel Distrib. Syst., vol. 19, no. 10, pp. 1381-1395, Oct. 10, 2008.
- (2008) IEEE Trans. Parallel Distrib. Syst. , vol.19 , Issue.10 , pp. 1381-1395
- Scarpazza, D.P.¹ Villa, O.² Petrini, F.³

2
- 77955747336
- FPGA and GPU implementation of large scale SpMV
- Y. Shan, W. Tianji, Y. Wang, B. Wang, Z. Wang, N. Xu, and H. Yang, "FPGA and GPU implementation of large scale SpMV," in Proc. 8th Symp. Appl. Sp., 2010, pp. 64-70.
- Proc. 8th Symp. Appl. Sp. 2010 , pp. 64-70
- Shan, Y.¹ Tianji, W.² Wang, Y.³ Wang, B.⁴ Wang, Z.⁵ Xu, N.⁶ Yang, H.⁷

3
- 77954995885
- Debunking the 100X GPU versus CPU myth: An evaluation of throughput computing on CPU and GPU
- V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, "Debunking the 100X GPU versus CPU myth: An evaluation of throughput computing on CPU and GPU," in Proc. Int. Symp. Comput. Arch., vol. 38, no. 3, 2010, pp. 451-460.
- (2010) Proc. Int. Symp. Comput. Arch. , vol.38 , Issue.3 , pp. 451-460
- Lee, V.W.¹ Kim, C.² Chhugani, J.³ Deisher, M.⁴ Kim, D.⁵ Nguyen, A.D.⁶ Satish, N.⁷ Smelyanskiy, M.⁸ Chennupaty, S.⁹ Hammarlund, P.¹⁰ Singhal, R.¹¹ Dubey, P.¹²

4
- 84869388261
- Codesign tradeoffs for high-performance, low-power linear algebra architectures
- Oct.
- A. Pedram, R. A. van de Geijn, and A. Gerstlauer, "Codesign tradeoffs for high-performance, low-power linear algebra architectures," IEEE Trans. Comput., vol. 61, no. 12, pp. 1724-1736, Oct. 2012.
- (2012) IEEE Trans. Comput. , vol.61 , Issue.12 , pp. 1724-1736
- Pedram, A.¹ Van De Geijn, R.A.² Gerstlauer, A.³

5
- 84939240853
- Available
- NVIDIA. (2012). The NVIDIA CUDA Sparse Matrix library (cuSPARSE), 2nd ed. [Online]. Available: http://docs.nvidia.com/cuda/cusparse/index.html
- (2012) The NVIDIA CUDA Sparse Matrix Library (cuSPARSE), 2nd Ed. [Online]

6
- 77954073240
- Tech. Rep. [Online]. Available
- J. Gustafson and B. Greer, "Clearspeed whitepaper: Accelerating the intel math kernel library," Tech. Rep., 2007. [Online]. Available: http://www.clearspeed.com/docs/resources/ClearSpeedIntelWhitepaperFeb07.pdf
- (2007) Clearspeed Whitepaper: Accelerating the Intel Math Kernel Library
- Gustafson, J.¹ Greer, B.²

7
- 84900536807
- Optimization of Quasi diagonal matrix vector multiplication on GPU
- W. Yang, K. Li, Y. Liu, L. Shi, and L. Wan, "Optimization of Quasi diagonal matrix vector multiplication on GPU," Int. J. High Performance Comput. Appl., vol. 28, no. 2, pp. 183-195, 2014.
- (2014) Int. J. High Performance Comput. Appl. , vol.28 , Issue.2 , pp. 183-195
- Yang, W.¹ Li, K.² Liu, Y.³ Shi, L.⁴ Wan, L.⁵

8
- 0242533311
- Sparse matrix solvers on the GPU: Conjugate gradients and multigrid
- J. Bolz, I. Farmer, E. Grinspun, and P. Schroder, "Sparse matrix solvers on the GPU: Conjugate gradients and multigrid," ACM Trans. Graph., vol. 22, no. 3, pp. 917-924, 2003.
- (2003) ACM Trans. Graph. , vol.22 , Issue.3 , pp. 917-924
- Bolz, J.¹ Farmer, I.² Grinspun, E.³ Schroder, P.⁴

9
- 84877717135
- May
- NVIDIA CUDA C Programming Guide, Version 5.0, May 2012.
- (2012) NVIDIA CUDA C Programming Guide, Version 5.0

10
- 84884657209
- Architecting the finite element method pipeline for the GPU
- Feb.
- Z. Fu, T. J. Lewis, R. M. Kirby, and R. T. Whitaker, "Architecting the finite element method pipeline for the GPU," J. Comput. Appl. Math., vol. 257, pp. 195-211, Feb. 2014.
- (2014) J. Comput. Appl. Math. , vol.257 , pp. 195-211
- Fu, Z.¹ Lewis, T.J.² Kirby, R.M.³ Whitaker, R.T.⁴

11
- 74049143158
- Implementing sparse matrix-vector multiplication on throughput-oriented processors
- N. Bell and M. Garland, "Implementing sparse matrix-vector multiplication on throughput-oriented processors," in Proc. Conf. High Performance Comput. Netw., Storage Anal., 2009, p. 18.
- Proc. Conf. High Performance Comput. Netw., Storage Anal., 2009 , pp. 18
- Bell, N.¹ Garland, M.²

12
- 84899694907
- Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes
- W. T. Tang, W. J. Tan, R. Ray, Y. W. Wong, W. Chan, S. H. Kuo, R. S. M. Goh, S. J. Turner, and W. F. Wong, "Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes," in Proc. Int. Conf. High Performance Comput., Netw., Storage Anal., 2013.
- Proc. Int. Conf. High Performance Comput., Netw., Storage Anal., 2013
- Tang, W.T.¹ Tan, W.J.² Ray, R.³ Wong, Y.W.⁴ Chan, W.⁵ Kuo, S.H.⁶ Goh, R.S.M.⁷ Turner, S.J.⁸ Wong, W.F.⁹

13
- 78249244772
- Improving the performance of the sparse matrix vector product with GPUs
- Proc. 10th IEEE Int. Conf. Comput. Inform. Technol., ser.
- F. Vazquez, G. Ortega, J. J. Fernandez, and E. M. Garzon, "Improving the performance of the sparse matrix vector product with GPUs," in Proc. 10th IEEE Int. Conf. Comput. Inform. Technol., ser. CIT, 2010, pp. 1146-1151.
- (2010) CIT , pp. 1146-1151
- Vazquez, F.¹ Ortega, G.² Fernandez, J.J.³ Garzon, E.M.⁴

14
- 67650998701
- Optimization of a lattice boltzmann computation on state-of-the-art multicore platforms
- S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick, "Optimization of a lattice boltzmann computation on state-of-the-art multicore platforms," J. Parallel Distrib. Comput., vol. 69, no. 9, pp. 762-777, 2009.
- (2009) J. Parallel Distrib. Comput. , vol.69 , Issue.9 , pp. 762-777
- Williams, S.¹ Carter, J.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

15
- 77956238872
- Exact sparse matrix vector multiplication on GPU's and multicore architectures
- B. Boyer, J. G. Dumas, and P. Giorgi, "Exact sparse matrix vector multiplication on GPU's and multicore architectures," in Proc. 4th Int. Workshop Parallel Symbolic Comput., Jul. 2010, pp. 80-88.
- Proc. 4th Int. Workshop Parallel Symbolic Comput., Jul. 2010 , pp. 80-88
- Boyer, B.¹ Dumas, J.G.² Giorgi, P.³

16
- 80053263342
- Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication
- May
- A. Buluc, S. Williams, L. Oliker, and J. Demmel, "Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication," in Proc. IEEE Int. Parallel Distrib. Process. Symp., pp. 721-733, May 2009.
- (2009) Proc. IEEE Int. Parallel Distrib. Process. Symp. , pp. 721-733
- Buluc, A.¹ Williams, S.² Oliker, L.³ Demmel, J.⁴

17
- 84855652802
- An I/O bandwidth-sensitive sparse matrix vector multiplication engine on FPGAs
- Jan.
- S. Sun, M. Monga, P. H. Jones, and J. Zambreno, "An I/O bandwidth-sensitive sparse matrix vector multiplication engine on FPGAs," IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 59, no. 1, pp. 113-123, Jan. 2012.
- (2012) IEEE Trans. Circuits Syst. I: Reg. Papers , vol.59 , Issue.1 , pp. 113-123
- Sun, S.¹ Monga, M.² Jones, P.H.³ Zambreno, J.⁴

18
- 84885948161
- Sparse matrix-vector multiplication on the single-chip cloud computer many-core processor
- J. C. Pichel and F. F. Rivera, "Sparse matrix-vector multiplication on the single-chip cloud computer many-core processor," J. Parallel Distrib. Comput., vol. 73, no. 12, pp. 1539-1550, 2013.
- (2013) J. Parallel Distrib. Comput. , vol.73 , Issue.12 , pp. 1539-1550
- Pichel, J.C.¹ Rivera, F.F.²

19
- 77952611196
- Concurrent number cruncher: A GPU implementation of a general sparse linear solver
- L. Buatois, G. Caumon, and B. Levy, "Concurrent number cruncher: A GPU implementation of a general sparse linear solver," Int. J. Parallel Emerg. Distrib. Syst., vol. 24, no. 3, pp. 205-223, 2009.
- (2009) Int. J. Parallel Emerg. Distrib. Syst. , vol.24 , Issue.3 , pp. 205-223
- Buatois, L.¹ Caumon, G.² Levy, B.³

20
- 84886723333
- arXiv preprint arXiv:1012.2270
- T. Oberhuber, A. Suzuki, and J. Vacata, "New row-grouped CSR format for storing the sparse matrices on GPU with implementation in CUDA," arXiv preprint arXiv:1012.2270, 2010.
- (2010) New Row-Grouped CSR Format for Storing the Sparse Matrices on GPU with Implementation in CUDA
- Oberhuber, T.¹ Suzuki, A.² Vacata, J.³

21
- 78650279432
- Pattern-based sparse matrix representation for memory-efficient SMVM kernels
- M. Belgin, G. Back, and C. J. Ribbens, "Pattern-based sparse matrix representation for memory-efficient SMVM kernels," in Proc. 23rd Int. Conf. Supercomput., Jun. 2009, pp. 100-109.
- Proc. 23rd Int. Conf. Supercomput., Jun. 2009 , pp. 100-109
- Belgin, M.¹ Back, G.² Ribbens, C.J.³

22
- 60949098907
- Optimization of sparse matrix vector multiplication on emerging multicore platforms
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, "Optimization of sparse matrix vector multiplication on emerging multicore platforms," Parallel Comput., vol. 35, no. 3, pp. 178-194, 2009.
- (2009) Parallel Comput. , vol.35 , Issue.3 , pp. 178-194
- Williams, S.¹ Oliker, L.² Vuduc, R.³ Shalf, J.⁴ Yelick, K.⁵ Demmel, J.⁶

23
- 77949577730
- Automatically tuning sparse matrix vector multiplication for GPU architectures
- Berlin, Germany: Springer
- A. Monakov, A. Lokhmotov, and A. Avetisyan, "Automatically tuning sparse matrix vector multiplication for GPU architectures," High Performance Embedded Architectures and Compilers. Berlin, Germany: Springer, 2010, pp. 111-125.
- (2010) High Performance Embedded Architectures and Compilers , pp. 111-125
- Monakov, A.¹ Lokhmotov, A.² Avetisyan, A.³

24
- 77957679421
- Model-driven autotuning of sparse matrix vector multiply on GPUs
- J. W. Choi, A. Singh, and R. W. Vuduc, "Model-driven autotuning of sparse matrix vector multiply on GPUs," in Proc. 15th ACM SIGPLAN Symp. Principles Practice Parallel Programming, 2010, pp. 115-126.
- Proc. 15th ACM SIGPLAN Symp. Principles Practice Parallel Programming, 2010 , pp. 115-126
- Choi, J.W.¹ Singh, A.² Vuduc, R.W.³

25
- 84856613262
- Optimization of sparse matrix vector multiplication with variant CSR on GPUs
- X. Feng, H. Jin, R. Zheng, K. Hu, J. Zeng, and Z. Shao, "Optimization of sparse matrix vector multiplication with variant CSR on GPUs," in Proc. IEEE 17th Int. Conf. Parallel Distrib. Syst., 2011, pp. 165-172.
- Proc. IEEE 17th Int. Conf. Parallel Distrib. Syst., 2011 , pp. 165-172
- Feng, X.¹ Jin, H.² Zheng, R.³ Hu, K.⁴ Zeng, J.⁵ Shao, Z.⁶

26
- 84867417216
- Sparse matrix vector multiplication on GPGPU clusters: A new storage format and a scalable implementation
- M. Kreutzer, G. Hager, G. Wellein, H. Fehske, A. Basermann, and A. R. Bishop, "Sparse matrix vector multiplication on GPGPU clusters: A new storage format and a scalable implementation," in Proc. IEEE 26th Int. Parallel Distrib. Process. Symp. Workshops PhD Forum, May 2012, pp. 1696-1702.
- Proc. IEEE 26th Int. Parallel Distrib. Process. Symp. Workshops PhD Forum, May 2012 , pp. 1696-1702
- Kreutzer, M.¹ Hager, G.² Wellein, G.³ Fehske, H.⁴ Basermann, A.⁵ Bishop, A.R.⁶

27
- 81355148805
- Two-dimensional cacheoblivious sparse matrix vector multiplication
- A. N. Yzelman and R. H. Bisseling, "Two-dimensional cacheoblivious sparse matrix vector multiplication," Parallel Comput., vol. 37, no. 12, pp. 806-819, 2011.
- (2011) Parallel Comput. , vol.37 , Issue.12 , pp. 806-819
- Yzelman, A.N.¹ Bisseling, R.H.²

28
- 84883314318
- An extended compression format for the optimization of sparse matrix vector multiplication
- Sep. Oct.
- V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris, "An extended compression format for the optimization of sparse matrix vector multiplication," IEEE Trans. Parallel Distrib. Syst. Sep. vol. 24, no. 10, pp. 1930-1940, Oct. 2013.
- (2013) IEEE Trans. Parallel Distrib. Syst. , vol.24 , Issue.10 , pp. 1930-1940
- Karakasis, V.¹ Gkountouvas, T.² Kourtis, K.³ Goumas, G.⁴ Koziris, N.⁵

29
- 84898682038
- A performance modeling and optimization analysis tool for sparse matrix vector multiplication on GPUs
- May
- P. Chen, "A performance modeling and optimization analysis tool for sparse matrix vector multiplication on GPUs," IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 5, pp. 1112-1123, May 2014.
- (2014) IEEE Trans. Parallel Distrib. Syst. , vol.25 , Issue.5 , pp. 1112-1123
- Chen, P.¹

30
- 84874116376
- Iterative sparse matrix vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems
- B. Schmidt, H. Aribowo, and H.-V. Dang, "Iterative sparse matrix vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems," Concurrency Comput.: Practice Experience, vol. 25, no. 4, pp. 586-603, 2013.
- (2013) Concurrency Comput.: Practice Experience , vol.25 , Issue.4 , pp. 586-603
- Schmidt, B.¹ Aribowo, H.² Dang, H.-V.³

31
- 84878396949
- Improved three-way split formulas for binary polynomial and Toeplitz matrix vector products
- Jul.
- M. Cenk, C. Negre, and M. A. Hasan, "Improved three-way split formulas for binary polynomial and Toeplitz matrix vector products," IEEE Trans. Comput., vol. 62, no. 7, pp. 1345-1361, Jul. 2013.
- (2013) IEEE Trans. Comput. , vol.62 , Issue.7 , pp. 1345-1361
- Cenk, M.¹ Negre, C.² Hasan, M.A.³

32
- 84878402645
- Multiway splitting method for toeplitz matrix vector product
- May
- M. A. Hasan and C. Negre, "Multiway splitting method for toeplitz matrix vector product," IEEE Trans. Comput., vol. 62, no. 7, pp. 1467-1471, May 2013.
- (2013) IEEE Trans. Comput. , vol.62 , Issue.7 , pp. 1467-1471
- Hasan, M.A.¹ Negre, C.²

33
- 84919494711
- High-level strategies for parallel shared-memory sparse matrix vector multiplication
- Jan.
- A.-J. N. Yzelman and D. Roose, "High-level strategies for parallel shared-memory sparse matrix vector multiplication," IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 1, pp. 116-125, Jan. 2014.
- (2014) IEEE Trans. Parallel Distrib. Syst. , vol.25 , Issue.1 , pp. 116-125
- Yzelman, A.-J.N.¹ Roose, D.²

34
- 84919470072
- Performance analysis and optimization for SpMV on GPU using probabilistic modeling
- Jan.
- K. Li, W. Yang, and K. Li, "Performance analysis and optimization for SpMV on GPU using probabilistic modeling," IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 1, pp. 196-205, Jan. 2015.
- (2015) IEEE Trans. Parallel Distrib. Syst. , vol.26 , Issue.1 , pp. 196-205
- Li, K.¹ Yang, W.² Li, K.³

35
- 0003763748
- New York, NY, USA: Wiley
- N. L. Johnson, S. Kotz, and A. Kemp, Univariate Discrete Distributions. 2nd Ed., New York, NY, USA: Wiley, ISBN 0-471-54897-9, 1993 p. 36.
- (1993) Univariate Discrete Distributions. 2nd Ed. , pp. 36
- Johnson, N.L.¹ Kotz, S.² Kemp, A.³

36
- 0012453312
- T. A. Davis and Y. Hu, University of Florida sparse matrix collection[J], 2009.
- (2009) University of Florida Sparse Matrix Collection[J]
- Davis, T.A.¹ Hu, Y.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.