메뉴 건너뛰기




Volumn 24, Issue 1, 2012, Pages 3-13

Generating optimal CUDA sparse matrix-vector product implementations for evolving GPU hardware

Author keywords

CUDA; Fermi; GPU; matrix vector; NVIDIA; S2050; sparse

Indexed keywords

COMPUTER GRAPHICS; COMPUTER HARDWARE; MATRIX ALGEBRA; PROGRAM PROCESSORS;

EID: 84855223315     PISSN: 15320626     EISSN: 15320634     Source Type: Journal    
DOI: 10.1002/cpe.1732     Document Type: Article
Times cited : (25)

References (16)
  • 1
    • 35948991669 scopus 로고    scopus 로고
    • NVIDIA. (2nd edn). NVIDIA Corporation, July.
    • NVIDIA. NVIDIA CUDA Programming Guide (2nd edn). NVIDIA Corporation, July 2008.
    • (2008) NVIDIA CUDA Programming Guide
  • 2
    • 1542501019 scopus 로고    scopus 로고
    • Sparsity: Optimization framework for sparse matrix kernels
    • Im EJ, Yelick KA, Vuduc RW,. Sparsity: Optimization framework for sparse matrix kernels. IJHPCA 2004; 18 (1): 135-158.
    • (2004) IJHPCA , vol.18 , Issue.1 , pp. 135-158
    • Im, E.J.1    Yelick, K.A.2    Vuduc, R.W.3
  • 3
    • 24344485098 scopus 로고    scopus 로고
    • OSKI: A library of automatically tuned sparse matrix kernels
    • Journal of Physics: Conference Series, Institute of Physics Publishing: San Francisco, CA, U.S.A.
    • Vuduc R, Demmel JW, Yelick KA,. OSKI: A library of automatically tuned sparse matrix kernels. Proceedings of SciDAC'05. Journal of Physics: Conference Series, Institute of Physics Publishing: San Francisco, CA, U.S.A., 2005.
    • (2005) Proceedings of SciDAC'05
    • Vuduc, R.1    Demmel, J.W.2    Yelick, K.A.3
  • 5
    • 77954926909 scopus 로고    scopus 로고
    • From sparse matrix to optimal GPU CUDA sparse matrix vector product implementation
    • Washington, DC, U.S.A. IEEE
    • Zein AHE, Rendell AP,. From sparse matrix to optimal GPU CUDA sparse matrix vector product implementation. CCGRID, Washington, DC, U.S.A. IEEE, 2010; 808-813.
    • (2010) CCGRID , pp. 808-813
    • Zein, A.H.E.1    Rendell, A.P.2
  • 6
    • 47749154455 scopus 로고    scopus 로고
    • Performance evaluation of the nvidia geforce 8800 gtx gpu for machine learning
    • (Lecture Notes in Computer Science, 5101), Bubak M. van Albada G.D. Dongarra J. Sloot P.M.A. (eds.). Springer: Berlin.
    • Zein AE, McCreath E, Rendell AP, Smola AJ,. Performance evaluation of the nvidia geforce 8800 gtx gpu for machine learning. ICCS (1) (Lecture Notes in Computer Science, vol. 5101), Bubak M, van Albada GD, Dongarra J, Sloot PMA, (eds.). Springer: Berlin, 2008; 466-475.
    • (2008) ICCS (1) , pp. 466-475
    • Zein, A.E.1    McCreath, E.2    Rendell, A.P.3    Smola, A.J.4
  • 7
    • 0013269731 scopus 로고
    • University of florida sparse matrix collection
    • Davis TA,. University of florida sparse matrix collection. NA Digest 1994; 92.
    • (1994) NA Digest , vol.92
    • Davis, T.A.1
  • 8
    • 84855218113 scopus 로고    scopus 로고
    • [August]
    • Tesla S2050 GPU computing system. Available at: http://www.nvidia.com/ object/product-tesla-S2050-us.html [August 2010 ].
    • (2010) Tesla S2050 GPU Computing System
  • 9
    • 84859261309 scopus 로고    scopus 로고
    • NVIDIA's next generation CUDA compute architecture: Fermi
    • June 2010. [August]
    • NVIDIA. NVIDIA's next generation CUDA compute architecture: Fermi. White Paper, June 2010. Available at: http://www.nvidia.com/content/PDF/fermi-white- papers/NVIDIA-Fermi-Compute-Architecture-Whitepaper.pdf [August 2010].
    • (2010) White Paper
  • 11
    • 74049163483 scopus 로고    scopus 로고
    • Optimizing sparse matrix-vector multiplication on gpus
    • RC24704
    • Baskaran MM, Bordawekar R,. Optimizing sparse matrix-vector multiplication on gpus. IBM Research Report 2009; RC24704.
    • (2009) IBM Research Report
    • Baskaran, M.M.1    Bordawekar, R.2
  • 14
    • 78651284120 scopus 로고    scopus 로고
    • Scan primitives for gpu computing
    • Segal M. Aila T. (eds.). Eurographics Association: Aire-la-Ville, Switzerland.
    • Sengupta S, Harris M, Zhang Y, Owens JD,. Scan primitives for gpu computing. Graphics Hardware, Segal M, Aila T, (eds.). Eurographics Association: Aire-la-Ville, Switzerland, 2007; 97-106.
    • (2007) Graphics Hardware , pp. 97-106
    • Sengupta, S.1    Harris, M.2    Zhang, Y.3    Owens, J.D.4
  • 15
    • 57949097109 scopus 로고    scopus 로고
    • Reinforcement learning for automated performance tuning: Initial evaluation for sparse matrix format selection
    • IEEE: New York.
    • Armstrong W, Rendell AP,. Reinforcement learning for automated performance tuning: Initial evaluation for sparse matrix format selection. CLUSTER. IEEE: New York, 2008; 411-420.
    • (2008) CLUSTER , pp. 411-420
    • Armstrong, W.1    Rendell, A.P.2
  • 16
    • 77954995885 scopus 로고    scopus 로고
    • Debunking the 100x gpu vs. cpu myth: An evaluation of throughput computing on cpu and gpu
    • Seznec A. Weiser U.C. Ronen R. (eds.). ACM: New York.
    • Lee VW, Kim C, Chhugani J, Deisher M, Kim D, Nguyen AD, Satish N, Smelyanskiy M, Chennupaty S, Hammarlund P,. et al. Debunking the 100x gpu vs. cpu myth: An evaluation of throughput computing on cpu and gpu. ISCA, Seznec A, Weiser UC, Ronen R, (eds.). ACM: New York, 2010; 451-460.
    • (2010) ISCA , pp. 451-460
    • Lee, V.W.1    Kim, C.2    Chhugani, J.3    Deisher, M.4    Kim, D.5    Nguyen, A.D.6    Satish, N.7    Smelyanskiy, M.8    Chennupaty, S.9    Hammarlund, P.10


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.