SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

Concurrency and Computation: Practice and Experience

Volumn 24, Issue 1, 2012, Pages 3-13

Generating optimal CUDA sparse matrix-vector product implementations for evolving GPU hardware

(2) El Zein, Ahmed H a Rendell, Alistair P a

a AUSTRALIAN NATIONAL UNIVERSITY (Australia)

Author keywords

CUDA; Fermi; GPU; matrix vector; NVIDIA; S2050; sparse

Indexed keywords

COMPUTER GRAPHICS; COMPUTER HARDWARE; MATRIX ALGEBRA; PROGRAM PROCESSORS;

CUDA; FERMI; MATRIX-VECTOR; NVIDIA; S2050; SPARSE;

GRAPHICS PROCESSING UNIT;

EID: 84855223315 PISSN: 15320626 EISSN: 15320634 Source Type: Journal
DOI: 10.1002/cpe.1732 Document Type: Article

Times cited : (25)

References (16)

1
- 35948991669
- NVIDIA. (2nd edn). NVIDIA Corporation, July.
- NVIDIA. NVIDIA CUDA Programming Guide (2nd edn). NVIDIA Corporation, July 2008.
- (2008) NVIDIA CUDA Programming Guide

2
- 1542501019
- Sparsity: Optimization framework for sparse matrix kernels
- Im EJ, Yelick KA, Vuduc RW,. Sparsity: Optimization framework for sparse matrix kernels. IJHPCA 2004; 18 (1): 135-158.
- (2004) IJHPCA , vol.18 , Issue.1 , pp. 135-158
- Im, E.J.¹ Yelick, K.A.² Vuduc, R.W.³

3
- 24344485098
- OSKI: A library of automatically tuned sparse matrix kernels
- Journal of Physics: Conference Series, Institute of Physics Publishing: San Francisco, CA, U.S.A.
- Vuduc R, Demmel JW, Yelick KA,. OSKI: A library of automatically tuned sparse matrix kernels. Proceedings of SciDAC'05. Journal of Physics: Conference Series, Institute of Physics Publishing: San Francisco, CA, U.S.A., 2005.
- (2005) Proceedings of SciDAC'05
- Vuduc, R.¹ Demmel, J.W.² Yelick, K.A.³

4
- 35549013711
- Performance optimization and modeling of blocked sparse kernels
- DOI 10.1177/1094342007083801
- Buttari A, Eijkhout V, Langou J, Filippone S,. Performance optimization and modeling of blocked sparse kernels. IJHPCA 2007; 21 (4): 467-484. (Pubitemid 350011340)
- (2007) International Journal of High Performance Computing Applications , vol.21 , Issue.4 , pp. 467-484
- Buttari, A.¹ Eijkhout, V.² Langou, J.³ Filippone, S.⁴

5
- 77954926909
- From sparse matrix to optimal GPU CUDA sparse matrix vector product implementation
- Washington, DC, U.S.A. IEEE
- Zein AHE, Rendell AP,. From sparse matrix to optimal GPU CUDA sparse matrix vector product implementation. CCGRID, Washington, DC, U.S.A. IEEE, 2010; 808-813.
- (2010) CCGRID , pp. 808-813
- Zein, A.H.E.¹ Rendell, A.P.²

6
- 47749154455
- Performance evaluation of the nvidia geforce 8800 gtx gpu for machine learning
- (Lecture Notes in Computer Science, 5101), Bubak M. van Albada G.D. Dongarra J. Sloot P.M.A. (eds.). Springer: Berlin.
- Zein AE, McCreath E, Rendell AP, Smola AJ,. Performance evaluation of the nvidia geforce 8800 gtx gpu for machine learning. ICCS (1) (Lecture Notes in Computer Science, vol. 5101), Bubak M, van Albada GD, Dongarra J, Sloot PMA, (eds.). Springer: Berlin, 2008; 466-475.
- (2008) ICCS (1) , pp. 466-475
- Zein, A.E.¹ McCreath, E.² Rendell, A.P.³ Smola, A.J.⁴

7
- 0013269731
- University of florida sparse matrix collection
- Davis TA,. University of florida sparse matrix collection. NA Digest 1994; 92.
- (1994) NA Digest , vol.92
- Davis, T.A.¹

8
- 84855218113
- [August]
- Tesla S2050 GPU computing system. Available at: http://www.nvidia.com/ object/product-tesla-S2050-us.html [August 2010 ].
- (2010) Tesla S2050 GPU Computing System

9
- 84859261309
- NVIDIA's next generation CUDA compute architecture: Fermi
- June 2010. [August]
- NVIDIA. NVIDIA's next generation CUDA compute architecture: Fermi. White Paper, June 2010. Available at: http://www.nvidia.com/content/PDF/fermi-white- papers/NVIDIA-Fermi-Compute-Architecture-Whitepaper.pdf [August 2010].
- (2010) White Paper

10
- 77956260008
- SC ACM: New York.
- Bell N, Garland M,. Implementing sparse matrix-vector multiplication on throughput-oriented processors. SC. ACM: New York, 2009.
- (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors
- Bell, N.¹ Garland, M.²

11
- 74049163483
- Optimizing sparse matrix-vector multiplication on gpus
- RC24704
- Baskaran MM, Bordawekar R,. Optimizing sparse matrix-vector multiplication on gpus. IBM Research Report 2009; RC24704.
- (2009) IBM Research Report
- Baskaran, M.M.¹ Bordawekar, R.²

12
- 84855210918
- SC Verastegui B. (ed.). ACM Press: New York.
- Williams S, Oliker L, Vuduc RW, Shalf J, Yelick KA, Demmel J,. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. SC, Verastegui B, (ed.). ACM Press: New York, 2007; 38.
- (2007) Optimization of sparse matrix-vector multiplication on emerging multicore platforms , pp. 38
- Williams, S.¹ Oliker, L.² Vuduc, R.W.³ Shalf, J.⁴ Yelick, K.A.⁵ Demmel, J.⁶

13
- 84855233088
- [August]
- CUDPP: CUDA Data Parallel Primitives library. Available at: http://www.gpgpu.org/developer/cudpp/[August 2010 ].
- (2010) CUDPP: CUDA Data Parallel Primitives Library

14
- 78651284120
- Scan primitives for gpu computing
- Segal M. Aila T. (eds.). Eurographics Association: Aire-la-Ville, Switzerland.
- Sengupta S, Harris M, Zhang Y, Owens JD,. Scan primitives for gpu computing. Graphics Hardware, Segal M, Aila T, (eds.). Eurographics Association: Aire-la-Ville, Switzerland, 2007; 97-106.
- (2007) Graphics Hardware , pp. 97-106
- Sengupta, S.¹ Harris, M.² Zhang, Y.³ Owens, J.D.⁴

15
- 57949097109
- Reinforcement learning for automated performance tuning: Initial evaluation for sparse matrix format selection
- IEEE: New York.
- Armstrong W, Rendell AP,. Reinforcement learning for automated performance tuning: Initial evaluation for sparse matrix format selection. CLUSTER. IEEE: New York, 2008; 411-420.
- (2008) CLUSTER , pp. 411-420
- Armstrong, W.¹ Rendell, A.P.²

16
- 77954995885
- Debunking the 100x gpu vs. cpu myth: An evaluation of throughput computing on cpu and gpu
- Seznec A. Weiser U.C. Ronen R. (eds.). ACM: New York.
- Lee VW, Kim C, Chhugani J, Deisher M, Kim D, Nguyen AD, Satish N, Smelyanskiy M, Chennupaty S, Hammarlund P,. et al. Debunking the 100x gpu vs. cpu myth: An evaluation of throughput computing on cpu and gpu. ISCA, Seznec A, Weiser UC, Ronen R, (eds.). ACM: New York, 2010; 451-460.
- (2010) ISCA , pp. 451-460
- Lee, V.W.¹ Kim, C.² Chhugani, J.³ Deisher, M.⁴ Kim, D.⁵ Nguyen, A.D.⁶ Satish, N.⁷ Smelyanskiy, M.⁸ Chennupaty, S.⁹ Hammarlund, P.¹⁰

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.