메뉴 건너뛰기




Volumn 38, Issue 8, 2012, Pages 408-420

Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach

Author keywords

GPU computing; GPU performance modeling; Sparse matrix vector product

Indexed keywords

ACCELERATION FACTORS; AUTOMATIC TUNING; AUTOTUNING; COMPARATIVE ANALYSIS; EVALUATION RESULTS; GPU COMPUTING; GRAPHICS PROCESSING UNITS; MEMORY OPERATIONS; OPTIMUM SELECTION; PERFORMANCE MODELING; SPARSE MATRICES; TEST MATRIX; TWO PARAMETER;

EID: 84862637273     PISSN: 01678191     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.parco.2011.08.003     Document Type: Article
Times cited : (35)

References (21)
  • 1
    • 77950518538 scopus 로고    scopus 로고
    • A matrix approach to tomographic reconstruction and its implementation on GPUs
    • F. Vázquez, E.M. Garzón, and J.J. Fernández A matrix approach to tomographic reconstruction and its implementation on GPUs Journal of Structural Biology 170 2010 146 151
    • (2010) Journal of Structural Biology , vol.170 , pp. 146-151
    • Vázquez, F.1    Garzón, E.M.2    Fernández, J.J.3
  • 3
    • 0031269220 scopus 로고    scopus 로고
    • Improving the memory-system performance of sparse-matrix vector multiplication
    • S. Toledo Improving memory-system performance of sparse matrix-vector multiplication IBM Journal of Research and Development 41 6 1997 711 725 (Pubitemid 127557044)
    • (1997) IBM Journal of Research and Development , vol.41 , Issue.6 , pp. 711-725
    • Toledo, S.1
  • 4
    • 60949098907 scopus 로고    scopus 로고
    • Optimization of sparse matrix-vector multiplication on emerging multicore platforms
    • S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel Optimization of sparse matrix-vector multiplication on emerging multicore platforms Parallel Computing 35 3 2009 178 194
    • (2009) Parallel Computing , vol.35 , Issue.3 , pp. 178-194
    • Williams, S.1    Oliker, L.2    Vuduc, R.3    Shalf, J.4    Yelick, K.5    Demmel, J.6
  • 5
  • 11
    • 77949577730 scopus 로고    scopus 로고
    • Automatically tuning sparse matrix-vector multiplication for GPU architectures
    • LNCS 5952
    • A. Monakov, A. Lokhmotov, A. Avetisyan, Automatically tuning sparse matrix-vector multiplication for GPU architectures, in: Proceedings of HiPEAC 2010, LNCS 5952, 2010, pp. 111-125.
    • (2010) Proceedings of HiPEAC 2010 , pp. 111-125
    • Monakov, A.1    Lokhmotov, A.2    Avetisyan, A.3
  • 13
    • 78249244772 scopus 로고    scopus 로고
    • Improving the performance of the sparse matrix vector product with GPUs
    • IEEE Computer Society
    • F. Vázquez, G. Ortega, J.J. Fernández, E.M. Garzón, Improving the performance of the sparse matrix vector product with GPUs, in: 10th IEEE International Conference on Computer and Information Technology. CIT 2010, IEEE Computer Society, 2010, pp. 1146-1151. .
    • (2010) 10th IEEE International Conference on Computer and Information Technology. CIT 2010 , pp. 1146-1151
    • F. Vázquez1
  • 15
    • 70450231944 scopus 로고    scopus 로고
    • An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
    • News 37 ACM, New York, NY, USA
    • S. Hong, H. Kim, An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness, in: SIGARCH Comput. Archit. News 37, vol. 3, ACM, New York, NY, USA, 2009, pp. 152-163. .
    • (2009) SIGARCH Comput. Archit. , vol.3 , pp. 152-163
    • Hong, S.1    Kim, H.2
  • 17
    • 84862692992 scopus 로고    scopus 로고
    • Next generation CUDA architecture
    • NVIDIA
    • NVIDIA, Next generation CUDA architecture. Fermi Architecture. , 2010.
    • (2010) Fermi Architecture
  • 18
    • 84862677879 scopus 로고    scopus 로고
    • CUDA C programming
    • NVIDIA CUDA Toolkit 2.3, July
    • NVIDIA, CUDA C programming. Best practices guide. CUDA Toolkit 2.3, July 2009.
    • (2009) Best Practices Guide
  • 19
    • 60649099576 scopus 로고    scopus 로고
    • Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor
    • Revolutionary Technologies for Acceleration of Emerging Petascale Applications
    • J. Kurzak, W. Alvaro, and J. Dongarra Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor Parallel Computing 35 3 2009 138 150 Revolutionary Technologies for Acceleration of Emerging Petascale Applications
    • (2009) Parallel Computing , vol.35 , Issue.3 , pp. 138-150
    • Kurzak, J.1    Alvaro, W.2    Dongarra, J.3
  • 20
    • 70349247125 scopus 로고    scopus 로고
    • INTEL, Math kernel library
    • INTEL, Math kernel library. reference manual, 2009.
    • (2009) Reference Manual


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.