메뉴 건너뛰기




Volumn , Issue , 2012, Pages 1696-1702

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Author keywords

CUDA; GPGPU; Sparse matrices

Indexed keywords

CUDA; DISTRIBUTED MEMORY; GPGPU; MATRIX STRUCTURE; MEMORY FOOTPRINT; MEMORY OVERHEADS; PARALLELIZATIONS; PERFORMANCE BOTTLENECKS; PERFORMANCE MODEL; PERFORMANCE PROPERTIES; SCALABLE IMPLEMENTATION; SPARSE MATRICES; SPARSE MATRIX-VECTOR MULTIPLICATION; SPARSE SOLVERS; SPARSITY PATTERNS; STORAGE FORMATS; TEST SCENARIO;

EID: 84867417216     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/IPDPSW.2012.211     Document Type: Conference Paper
Times cited : (54)

References (13)
  • 1
    • 74049143158 scopus 로고    scopus 로고
    • Implementing sparse matrix-vector multiplication on throughput-oriented processors
    • DOI:10.1145/1654059.1654078
    • N. Bell and M. Garland: Implementing sparse matrix-vector multiplication on throughput-oriented processors. Proc. SC'09. DOI:10.1145/1654059.1654078
    • Proc. SC'09
    • Bell, N.1    Garland, M.2
  • 2
    • 77749340082 scopus 로고    scopus 로고
    • Model-driven autotuning of sparse matrix-vector multiply on GPUs
    • DOI:10.1145/1693453.1693471
    • J.W. Choi, A. Singh, and R.W. Vuduc: Model-driven autotuning of sparse matrix-vector multiply on GPUs. Proc. PPoPP'10. DOI:10.1145/1693453.1693471
    • Proc. PPoPP'10
    • Choi, J.W.1    Singh, A.2    Vuduc, R.W.3
  • 4
    • 80052903010 scopus 로고    scopus 로고
    • Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems
    • DOI:10.1142/S0129626411000254
    • G. Schubert, G. Hager, H. Fehske, and G. Wellein: Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems. Parallel Processing Letters 21(3), 339-358 (2011). DOI:10.1142/S0129626411000254
    • (2011) Parallel Processing Letters , vol.21 , Issue.3 , pp. 339-358
    • Schubert, G.1    Hager, G.2    Fehske, H.3    Wellein, G.4
  • 5
    • 84883091562 scopus 로고    scopus 로고
    • Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results
    • Accepted for publication in Preprint
    • J. Habich, C. Feichtinger, H. Köstler, G. Hager, and G. Wellein: Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results. Accepted for publication in Computers & Fluids. Preprint: http://arxiv.org/abs/1112.0850
    • Computers & Fluids
    • Habich, J.1    Feichtinger, C.2    Köstler, H.3    Hager, G.4    Wellein, G.5
  • 6
    • 84855246970 scopus 로고    scopus 로고
    • An Introduction to Algebraic Multigrid
    • U. Trottenberg et al. (Eds.): Academic Press
    • K. Stüben: An Introduction to Algebraic Multigrid. In: U. Trottenberg et al. (Eds.): Multigrid: Basics, Parallelism and Adaptivity, Academic Press (2000).
    • (2000) Multigrid: Basics, Parallelism and Adaptivity
    • Stüben, K.1
  • 7
    • 84867433749 scopus 로고    scopus 로고
    • http://www.scai.fraunhofer.de/en/business-research-areas/ numerical-software/products/samg.html
  • 8
    • 83455182306 scopus 로고    scopus 로고
    • Performance limitations for sparse matrix-vector multiplications on current multicore environments
    • S. Wagner et al., Springer, ISBN 978-3642138713 DOI:10.1007/978-3-642- 13872-0-2
    • G. Schubert, G. Hager and H. Fehske: Performance limitations for sparse matrix-vector multiplications on current multicore environments. In: S. Wagner et al., High Performance Computing in Science and Engineering, Garching/Munich 2009. Springer, ISBN 978-3642138713 (2010), 13-26. DOI:10.1007/978-3-642-13872- 0-2
    • (2010) High Performance Computing in Science and Engineering, Garching/Munich 2009 , pp. 13-26
    • Schubert, G.1    Hager, G.2    Fehske, H.3
  • 9
    • 80052898254 scopus 로고    scopus 로고
    • HICFD - Highly Efficient Implementation of CFD Codes for HPC Many-Core Architectures
    • Springer [in print]
    • A. Basermann et al.: HICFD - Highly Efficient Implementation of CFD Codes for HPC Many-Core Architectures. In: Proceedings of CiHPC, Springer 2011 [in print]
    • (2011) Proceedings of CiHPC
    • Basermann, A.1
  • 10
    • 73349098372 scopus 로고
    • Technical Report CNA-150, Center for Numerical Analysis, University of Texas, Aug.
    • R. Grimes, D. Kincaid, and D. Young. ITPACK User's Guide. Technical Report CNA-150, Center for Numerical Analysis, University of Texas, Aug. 1979. http://rene.ma.utexas.edu/CNA/ITPACK/
    • (1979) ITPACK User's Guide
    • Grimes, R.1    Kincaid, D.2    Young, D.3
  • 11
    • 21144451281 scopus 로고    scopus 로고
    • Fast sparse matrix-vector multiplication for TFlop/s computers
    • J. Palma, J. Dongarra (Ed.): High Performance Computing for Computational Science - VECPAR2002, Springer Berlin DOI:10.1007/3-540-36569-9-18
    • G. Wellein, G. Hager, A. Basermann, and H. Fehske: Fast sparse matrix-vector multiplication for TFlop/s computers. In: J. Palma, J. Dongarra (Ed.): High Performance Computing for Computational Science - VECPAR2002, LNCS 2565, Springer Berlin (2003). DOI:10.1007/3-540-36569-9-18
    • (2003) LNCS , vol.2565
    • Wellein, G.1    Hager, G.2    Basermann, A.3    Fehske, H.4
  • 12
    • 77949577730 scopus 로고    scopus 로고
    • Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
    • Y. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, X. Martorell (Eds.): Springer, ISBN 978-3-642-11514-1 DOI:10.1007/978-3-642-11515-8-10
    • A. Monakov, A. Lokhmotov, A. Avetisyan: Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures. In: Y. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, X. Martorell (Eds.): Lecture Notes in Computer Science, Springer, ISBN 978-3-642-11514-1 (2010), 111-125. DOI:10.1007/978-3- 642-11515-8-10
    • (2010) Lecture Notes in Computer Science , pp. 111-125
    • Monakov, A.1    Lokhmotov, A.2    Avetisyan, A.3
  • 13
    • 79958091044 scopus 로고    scopus 로고
    • A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU
    • DOI:10.2528/PIER11031607
    • A. Dziekonski, A. Lamecki, M. Mrozowski: A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU. Progress In Electromagnetics Research 116, 49-63 (2011). DOI:10.2528/PIER11031607
    • (2011) Progress in Electromagnetics Research , vol.116 , pp. 49-63
    • Dziekonski, A.1    Lamecki, A.2    Mrozowski, M.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.