메뉴 건너뛰기




Volumn 45, Issue 5, 2010, Pages 115-125

Model-driven autotuning of sparse matrix-vector multiply on GPUs

Author keywords

GPU; Performance modeling; Sparse matrix vector multiplication

Indexed keywords

AUTOTUNING; COMPRESSED SPARSE ROW; DIRECTLY MODEL; EXHAUSTIVE SEARCH; GPU; GRAPHICS PROCESSING UNIT; INPUT MATRICES; MODEL-DRIVEN; MULTITHREADED; OFFLINE; PARAMETER-TUNING; PERFORMANCE LIMITATIONS; PERFORMANCE MODEL; PERFORMANCE MODELING; PERFORMANCE TUNING; RUNTIMES; SPARSE MATRICES; SPARSE MATRIX-VECTOR MULTIPLICATION; STORAGE FORMATS; VECTOR PROCESSORS;

EID: 77957679421     PISSN: 15232867     EISSN: None     Source Type: Journal    
DOI: 10.1145/1837853.1693471     Document Type: Conference Paper
Times cited : (195)

References (22)
  • 1
    • 77957660323 scopus 로고    scopus 로고
    • NVIDIA CUDA (Compute Unified Device Architecture): Programming Guide Version 2.1 December
    • NVIDIA CUDA (Compute Unified Device Architecture): Programming Guide, Version 2.1, December 2008.
    • (2008)
  • 3
    • 77956260008 scopus 로고    scopus 로고
    • Efficient sparse matrix-vector multiplication on CUDA
    • Portland, OR, USA, November, (to appear)
    • Nathan Bell and Michael Garland. Efficient sparse matrix-vector multiplication on CUDA. In Proc. ACM/IEEE Conf. Supercomputing (SC), Portland, OR, USA, November 2009. (to appear).
    • (2009) Proc. ACM/IEEE Conf. Supercomputing (SC)
    • Bell, N.1    Garland, M.2
  • 6
    • 25144499116 scopus 로고    scopus 로고
    • Vectorized sparse matrix multiply for compressed row storage
    • 2005 of LNCS, Springer Berlin / Heidelberg
    • Eduardo F. D'Azevedo, Mark R. Fahey, and Richard T. Mills. Vectorized sparse matrix multiply for compressed row storage. In Proc. Int'l. Conf. Computational Science (ICCS), volume 3514/2005 of LNCS, pages 99-106. Springer Berlin / Heidelberg, 2005. doi: http://dx.doi.org/10.1007/11428831 13.
    • (2005) Proc. Int'l. Conf. Computational Science (ICCS) , vol.3514 , pp. 99-106
    • Eduardo, F.1    D'Azevedo, M.R.F.2    Richard, T.M.3
  • 7
    • 20744452904 scopus 로고    scopus 로고
    • Self-adapting linear algebra algorithms and software
    • February
    • James Demmel, Jack Dongarra, Viktor Eijkhout, Erika Fuentes, Antoine Petitet, Richard Vuduc, R. Clint Whaley, and Katherine Yelick. Self-adapting linear algebra algorithms and software. Proc. IEEE, 93(2):293-312, February 2005. doi: http://dx.doi.org/10.1109/JPROC.2004.840848.
    • (2005) Proc. IEEE , vol.93 , Issue.2 , pp. 293-312
    • Demmel, J.1    Dongarra, J.2    Eijkhout, V.3    Fuentes, E.4    Petitet, A.5    Vuduc, R.6    Whaley, R.C.7    Yelick, K.8
  • 8
    • 51549093017 scopus 로고    scopus 로고
    • Sparse matrix computations on manycore GPUs
    • Anaheim, CA, USA
    • Michael Garland. Sparse matrix computations on manycore GPUs. In Proc. ACM/IEEE Design Automation Conf. (DAC), pages 2-6, Anaheim, CA, USA, 2008. doi: http://dx.doi.org/10.1145/1391469.1391473.
    • (2008) Proc. ACM/IEEE Design Automation Conf. (DAC) , pp. 2-6
    • Garland, M.1
  • 9
    • 0035370546 scopus 로고    scopus 로고
    • Towards a fast sparse symmetric matrix-vector multiplication
    • June
    • Roman Geus and Stefan Röllin. Towards a fast sparse symmetric matrix-vector multiplication. Parallel Computing, 27(7):883-896, June 2001. doi: http://dx.doi.org/10.1016/S0167-8191(01)00073-74
    • (2001) Parallel Computing , vol.27 , Issue.7 , pp. 883-896
    • Geus, R.1    Röllin, S.2
  • 10
    • 70450231944 scopus 로고    scopus 로고
    • An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
    • Austin, TX, USA, June
    • Sunpyo Hong and Hyesoon Kim. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In Proc. ACM Int'l. Symp. Comp. Arch. (ISCA), pages 152-163, Austin, TX, USA, June 2009. doi: http://dx.doi.org/10.1145/1555815.1555775.
    • (2009) Proc. ACM Int'l. Symp. Comp. Arch. (ISCA) , pp. 152-163
    • Hong, S.1    Kim, H.2
  • 11
    • 1542501019 scopus 로고    scopus 로고
    • SPARSITY: Optimization framework for sparse matrix kernels
    • February
    • Eun-Jin Im, Katherine Yelick, and Richard Vuduc. SPARSITY: Optimization framework for sparse matrix kernels. Int'l J. of High Performance Computing Applications (IJHPCA), 18(1):135-158, February 2004. doi: http://dx.doi.org/10. 1177/1094342004041296.
    • (2004) Int'l J. of High Performance Computing Applications (IJHPCA) , vol.18 , Issue.1 , pp. 135-158
    • Im, E.-J.1    Yelick, K.2    Vuduc, R.3
  • 12
    • 35248834555 scopus 로고    scopus 로고
    • Parallel finite element analysis platform for the Earth Simulator: GeoFEM
    • of LNCS, Springer
    • Hiroshi Okuda, Kengo Nakajima, Mikio Iizuka, Li Chen, and Hisashi Nakamura. Parallel finite element analysis platform for the Earth Simulator: GeoFEM. In Proc. Int'l. Conf. Computational Science (ICCS), volume 2659 of LNCS, pages 773-780. Springer, 2003. doi: http://dx.doi.org/10.1007/3-540-44863-2 75.
    • (2003) Proc. Int'l. Conf. Computational Science (ICCS) , vol.2659 , pp. 773-780
    • Okuda, H.1    Nakajima, K.2    Iizuka, M.3    Chen, L.4    Nakamura, H.5
  • 13
    • 85031264203 scopus 로고    scopus 로고
    • Improving performance of sparse matrix-vector multiplication
    • Portland, OR, USA
    • Ali Pinar and Michael T. Heath. Improving performance of sparse matrix-vector multiplication. In Proc. ACM/IEEE Conf. Supercomputing (SC), Portland, OR, USA, 1999. doi: http://dx.doi.org/10.1145/331532.331562.
    • (1999) Proc. ACM/IEEE Conf. Supercomputing (SC)
    • Pinar, A.1    Michael, T.H.2
  • 18
    • 24344485098 scopus 로고    scopus 로고
    • OSKI: A library of automatically tuned sparse matrix kernels
    • Richard Vuduc, JamesW. Demmel, and Katherine A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. SciDAC, J. Phys.: Conf. Series, volume 16, pages 521-530, 2005. doi: http://dx.doi.org/10.1088/1742- 6596/16/1/071.
    • (2005) Proc. SciDAC, J. Phys.: Conf. Series , vol.16 , pp. 521-530
    • Vuduc, R.1    Demmel, J.W.2    Yelick, K.A.3
  • 20
    • 33646389518 scopus 로고    scopus 로고
    • Fast sparse matrix-vector multiplication by exploiting variable block structure
    • LNCS, Sorrento, Italy, September, LNCSSpringer. doi
    • Richard W. Vuduc and Hyun-Jin Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In Proc. High- Performance Computing and Communications Conf., volume LNCS 3726/2005, pages 807-816, Sorrento, Italy, September 2005. Springer. doi: http://dx.doi.org/10. 1007/11557654 91.
    • (2005) Proc. High- Performance Computing and Communications Conf. , vol.2005-3726 , pp. 807-816
    • Vuduc, R.W.1    Moon, H.-J.2
  • 21
    • 60949098907 scopus 로고    scopus 로고
    • Optimizing sparse matrix-vector multiply on emerging multicore platforms
    • March
    • Sam Williams, Richard Vuduc, Leonid Oliker, John Shalf, Katherine Yelick, and James Demmel. Optimizing sparse matrix-vector multiply on emerging multicore platforms. Journal of Parallel Computing, 35(3):178-194, March 2009. doi: http://dx.doi.org/10.1016/j.parco.2008.12.006.
    • (2009) Journal of Parallel Computing , vol.35 , Issue.3 , pp. 178-194
    • Williams, S.1    Vuduc, R.2    Oliker, L.3    Shalf, J.4    Yelick, K.5    Demmel, J.6
  • 22
    • 20744459570 scopus 로고    scopus 로고
    • Is search really necessary to generate high-performance BLAS?
    • February
    • Kamen Yotov, Xiaoming Li, Gang Ren, María Jesús Garzarán, David Padua, Keshav Pingali, and Paul Stodghill. Is search really necessary to generate high-performance BLAS? Proc. IEEE, 93(2):358-386, February 2005. doi: .
    • (2005) Proc IEEE , vol.93 , Issue.2 , pp. 358-386
    • Yotov, K.1    Li, X.2    Ren, G.3    Garzarán, M.J.4    Padua, D.5    Pingali, K.6    Stodghill, P.7


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.