메뉴 건너뛰기




Volumn 64, Issue 9, 2015, Pages 2623-2636

Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs

Author keywords

GPU; matrix partition; multicore CPU; probability distribution; sparse matrix vector multiplication

Indexed keywords

PROBABILITY; PROBABILITY DISTRIBUTIONS; PROGRAM PROCESSORS;

EID: 84939230567     PISSN: 00189340     EISSN: None     Source Type: Journal    
DOI: 10.1109/TC.2014.2366731     Document Type: Article
Times cited : (122)

References (36)
  • 1
    • 51649124194 scopus 로고    scopus 로고
    • Efficient breadth-first search on the cell/BE processor
    • Oct. 10
    • D. P. Scarpazza, O. Villa, and F. Petrini, "Efficient breadth-first search on the cell/BE processor," IEEE Trans. Parallel Distrib. Syst., vol. 19, no. 10, pp. 1381-1395, Oct. 10, 2008.
    • (2008) IEEE Trans. Parallel Distrib. Syst. , vol.19 , Issue.10 , pp. 1381-1395
    • Scarpazza, D.P.1    Villa, O.2    Petrini, F.3
  • 4
    • 84869388261 scopus 로고    scopus 로고
    • Codesign tradeoffs for high-performance, low-power linear algebra architectures
    • Oct.
    • A. Pedram, R. A. van de Geijn, and A. Gerstlauer, "Codesign tradeoffs for high-performance, low-power linear algebra architectures," IEEE Trans. Comput., vol. 61, no. 12, pp. 1724-1736, Oct. 2012.
    • (2012) IEEE Trans. Comput. , vol.61 , Issue.12 , pp. 1724-1736
    • Pedram, A.1    Van De Geijn, R.A.2    Gerstlauer, A.3
  • 7
    • 84900536807 scopus 로고    scopus 로고
    • Optimization of Quasi diagonal matrix vector multiplication on GPU
    • W. Yang, K. Li, Y. Liu, L. Shi, and L. Wan, "Optimization of Quasi diagonal matrix vector multiplication on GPU," Int. J. High Performance Comput. Appl., vol. 28, no. 2, pp. 183-195, 2014.
    • (2014) Int. J. High Performance Comput. Appl. , vol.28 , Issue.2 , pp. 183-195
    • Yang, W.1    Li, K.2    Liu, Y.3    Shi, L.4    Wan, L.5
  • 8
    • 0242533311 scopus 로고    scopus 로고
    • Sparse matrix solvers on the GPU: Conjugate gradients and multigrid
    • J. Bolz, I. Farmer, E. Grinspun, and P. Schroder, "Sparse matrix solvers on the GPU: Conjugate gradients and multigrid," ACM Trans. Graph., vol. 22, no. 3, pp. 917-924, 2003.
    • (2003) ACM Trans. Graph. , vol.22 , Issue.3 , pp. 917-924
    • Bolz, J.1    Farmer, I.2    Grinspun, E.3    Schroder, P.4
  • 10
    • 84884657209 scopus 로고    scopus 로고
    • Architecting the finite element method pipeline for the GPU
    • Feb.
    • Z. Fu, T. J. Lewis, R. M. Kirby, and R. T. Whitaker, "Architecting the finite element method pipeline for the GPU," J. Comput. Appl. Math., vol. 257, pp. 195-211, Feb. 2014.
    • (2014) J. Comput. Appl. Math. , vol.257 , pp. 195-211
    • Fu, Z.1    Lewis, T.J.2    Kirby, R.M.3    Whitaker, R.T.4
  • 13
    • 78249244772 scopus 로고    scopus 로고
    • Improving the performance of the sparse matrix vector product with GPUs
    • Proc. 10th IEEE Int. Conf. Comput. Inform. Technol., ser.
    • F. Vazquez, G. Ortega, J. J. Fernandez, and E. M. Garzon, "Improving the performance of the sparse matrix vector product with GPUs," in Proc. 10th IEEE Int. Conf. Comput. Inform. Technol., ser. CIT, 2010, pp. 1146-1151.
    • (2010) CIT , pp. 1146-1151
    • Vazquez, F.1    Ortega, G.2    Fernandez, J.J.3    Garzon, E.M.4
  • 14
    • 67650998701 scopus 로고    scopus 로고
    • Optimization of a lattice boltzmann computation on state-of-the-art multicore platforms
    • S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick, "Optimization of a lattice boltzmann computation on state-of-the-art multicore platforms," J. Parallel Distrib. Comput., vol. 69, no. 9, pp. 762-777, 2009.
    • (2009) J. Parallel Distrib. Comput. , vol.69 , Issue.9 , pp. 762-777
    • Williams, S.1    Carter, J.2    Oliker, L.3    Shalf, J.4    Yelick, K.5
  • 17
    • 84855652802 scopus 로고    scopus 로고
    • An I/O bandwidth-sensitive sparse matrix vector multiplication engine on FPGAs
    • Jan.
    • S. Sun, M. Monga, P. H. Jones, and J. Zambreno, "An I/O bandwidth-sensitive sparse matrix vector multiplication engine on FPGAs," IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 59, no. 1, pp. 113-123, Jan. 2012.
    • (2012) IEEE Trans. Circuits Syst. I: Reg. Papers , vol.59 , Issue.1 , pp. 113-123
    • Sun, S.1    Monga, M.2    Jones, P.H.3    Zambreno, J.4
  • 18
    • 84885948161 scopus 로고    scopus 로고
    • Sparse matrix-vector multiplication on the single-chip cloud computer many-core processor
    • J. C. Pichel and F. F. Rivera, "Sparse matrix-vector multiplication on the single-chip cloud computer many-core processor," J. Parallel Distrib. Comput., vol. 73, no. 12, pp. 1539-1550, 2013.
    • (2013) J. Parallel Distrib. Comput. , vol.73 , Issue.12 , pp. 1539-1550
    • Pichel, J.C.1    Rivera, F.F.2
  • 19
    • 77952611196 scopus 로고    scopus 로고
    • Concurrent number cruncher: A GPU implementation of a general sparse linear solver
    • L. Buatois, G. Caumon, and B. Levy, "Concurrent number cruncher: A GPU implementation of a general sparse linear solver," Int. J. Parallel Emerg. Distrib. Syst., vol. 24, no. 3, pp. 205-223, 2009.
    • (2009) Int. J. Parallel Emerg. Distrib. Syst. , vol.24 , Issue.3 , pp. 205-223
    • Buatois, L.1    Caumon, G.2    Levy, B.3
  • 22
    • 60949098907 scopus 로고    scopus 로고
    • Optimization of sparse matrix vector multiplication on emerging multicore platforms
    • S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, "Optimization of sparse matrix vector multiplication on emerging multicore platforms," Parallel Comput., vol. 35, no. 3, pp. 178-194, 2009.
    • (2009) Parallel Comput. , vol.35 , Issue.3 , pp. 178-194
    • Williams, S.1    Oliker, L.2    Vuduc, R.3    Shalf, J.4    Yelick, K.5    Demmel, J.6
  • 23
    • 77949577730 scopus 로고    scopus 로고
    • Automatically tuning sparse matrix vector multiplication for GPU architectures
    • Berlin, Germany: Springer
    • A. Monakov, A. Lokhmotov, and A. Avetisyan, "Automatically tuning sparse matrix vector multiplication for GPU architectures," High Performance Embedded Architectures and Compilers. Berlin, Germany: Springer, 2010, pp. 111-125.
    • (2010) High Performance Embedded Architectures and Compilers , pp. 111-125
    • Monakov, A.1    Lokhmotov, A.2    Avetisyan, A.3
  • 27
    • 81355148805 scopus 로고    scopus 로고
    • Two-dimensional cacheoblivious sparse matrix vector multiplication
    • A. N. Yzelman and R. H. Bisseling, "Two-dimensional cacheoblivious sparse matrix vector multiplication," Parallel Comput., vol. 37, no. 12, pp. 806-819, 2011.
    • (2011) Parallel Comput. , vol.37 , Issue.12 , pp. 806-819
    • Yzelman, A.N.1    Bisseling, R.H.2
  • 28
    • 84883314318 scopus 로고    scopus 로고
    • An extended compression format for the optimization of sparse matrix vector multiplication
    • Sep. Oct.
    • V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris, "An extended compression format for the optimization of sparse matrix vector multiplication," IEEE Trans. Parallel Distrib. Syst. Sep. vol. 24, no. 10, pp. 1930-1940, Oct. 2013.
    • (2013) IEEE Trans. Parallel Distrib. Syst. , vol.24 , Issue.10 , pp. 1930-1940
    • Karakasis, V.1    Gkountouvas, T.2    Kourtis, K.3    Goumas, G.4    Koziris, N.5
  • 29
    • 84898682038 scopus 로고    scopus 로고
    • A performance modeling and optimization analysis tool for sparse matrix vector multiplication on GPUs
    • May
    • P. Chen, "A performance modeling and optimization analysis tool for sparse matrix vector multiplication on GPUs," IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 5, pp. 1112-1123, May 2014.
    • (2014) IEEE Trans. Parallel Distrib. Syst. , vol.25 , Issue.5 , pp. 1112-1123
    • Chen, P.1
  • 30
    • 84874116376 scopus 로고    scopus 로고
    • Iterative sparse matrix vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems
    • B. Schmidt, H. Aribowo, and H.-V. Dang, "Iterative sparse matrix vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems," Concurrency Comput.: Practice Experience, vol. 25, no. 4, pp. 586-603, 2013.
    • (2013) Concurrency Comput.: Practice Experience , vol.25 , Issue.4 , pp. 586-603
    • Schmidt, B.1    Aribowo, H.2    Dang, H.-V.3
  • 31
    • 84878396949 scopus 로고    scopus 로고
    • Improved three-way split formulas for binary polynomial and Toeplitz matrix vector products
    • Jul.
    • M. Cenk, C. Negre, and M. A. Hasan, "Improved three-way split formulas for binary polynomial and Toeplitz matrix vector products," IEEE Trans. Comput., vol. 62, no. 7, pp. 1345-1361, Jul. 2013.
    • (2013) IEEE Trans. Comput. , vol.62 , Issue.7 , pp. 1345-1361
    • Cenk, M.1    Negre, C.2    Hasan, M.A.3
  • 32
    • 84878402645 scopus 로고    scopus 로고
    • Multiway splitting method for toeplitz matrix vector product
    • May
    • M. A. Hasan and C. Negre, "Multiway splitting method for toeplitz matrix vector product," IEEE Trans. Comput., vol. 62, no. 7, pp. 1467-1471, May 2013.
    • (2013) IEEE Trans. Comput. , vol.62 , Issue.7 , pp. 1467-1471
    • Hasan, M.A.1    Negre, C.2
  • 33
    • 84919494711 scopus 로고    scopus 로고
    • High-level strategies for parallel shared-memory sparse matrix vector multiplication
    • Jan.
    • A.-J. N. Yzelman and D. Roose, "High-level strategies for parallel shared-memory sparse matrix vector multiplication," IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 1, pp. 116-125, Jan. 2014.
    • (2014) IEEE Trans. Parallel Distrib. Syst. , vol.25 , Issue.1 , pp. 116-125
    • Yzelman, A.-J.N.1    Roose, D.2
  • 34
    • 84919470072 scopus 로고    scopus 로고
    • Performance analysis and optimization for SpMV on GPU using probabilistic modeling
    • Jan.
    • K. Li, W. Yang, and K. Li, "Performance analysis and optimization for SpMV on GPU using probabilistic modeling," IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 1, pp. 196-205, Jan. 2015.
    • (2015) IEEE Trans. Parallel Distrib. Syst. , vol.26 , Issue.1 , pp. 196-205
    • Li, K.1    Yang, W.2    Li, K.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.