SCOPUS 정보 검색 플랫폼

Acta Technica CSAV (Ceskoslovensk Akademie Ved)

Volumn 56, Issue 4, 2011, Pages 447-466

New Row-grouped CSR format for storing sparse matrices on GPU with implementation in CUDA

(3) Oberhuber, Tomáš a Suzuki, Atsushi b Vacata, Jan a

a CZECH TECHNICAL UNIVERSITY IN PRAGUE (Czech Republic)

b UNIVERSITÉ PIERRE ET MARIE CURIE (France)

Author keywords

CUDA; GPU; Parallel computing; Sparse matrices; SpMV; Thread computing

Indexed keywords

CUDA; GPU; SPARSE MATRICES; SPMV; THREAD COMPUTING;

ELECTRICAL ENGINEERING; MECHANICAL PROPERTIES; PARALLEL PROCESSING SYSTEMS;

PARALLEL ARCHITECTURES;

EID: 84857836454 PISSN: 00017043 EISSN: None Source Type: Journal
DOI: None Document Type: Article

Times cited : (17)

References (20)

1
- 77954995885
- Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU
- June 19-23, ACM, New York 2010
- V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, P. Dubey: Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU. Proc. 37th Ann. Int. Symposium on Computer Architecture (ISCA'10) Saint-Malo (France), June 19-23, 2010, ACM, New York 2010, 451-460.
- (2010) Proc. 37th Ann. Int. Symposium on Computer Architecture (ISCA'10) Saint-Malo (France) , pp. 451-460
- Lee, V.W.¹ Kim, C.² Chhugani, J.³ Deisher, M.⁴ Kim, D.⁵ Nguyen, A.D.⁶ Satish, N.⁷ Smelyanskiy, M.⁸ Chennupaty, S.⁹ Hammarlund, P.¹⁰ Singhal, R.¹¹ Dubey, P.¹²

2
- 84857837437
- NVIDIA Corporation, May
- NVIDIA Corporation, CUDA CUBLAS library, PG-00000-002 V3.1, May 2010, http://developer.download.nvidia.com/compute/cuda/3-1/toolkit/docs/ CUBLAS-Library-3.1.pdf.
- (2010) CUDA CUBLAS Library, PG-00000-002 V3.1

3
- 33845468997
- LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware
- DOI 10.1109/SC.2005.42, Proceedings - Thirteenth International Symposium on Temporal Representation and Reasoning, TIME 2006
- N. Galoppo, N. K Govindaraju, M. Henson, D. Manocha: LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware. Proceedings ACM/ IEEE SC'05, Conference of Supercomputing, Nov. 12-18, 2005, Seattle (USA), doi: 10.1109/SC.2005.42. (Pubitemid 44902346)
- (2005) Proceedings of the ACM/IEEE 2005 Supercomputing Conference, SC'05 , vol.2005 , pp. 1559955
- Galoppo, N.¹ Govindaraju, N.K.² Henson, M.³ Manocha, D.⁴

4
- 67650056991
- LU, QL and Cholesky factorizations using vector capabilities of GPUs
- Electrical Engineering and Computer Sciences, University of California, Berkeley
- V. Volkov, J. Demel: LU, QL and Cholesky factorizations using vector capabilities of GPUs. Techn. Rep. UCB/EECS-2008-49, Electrical Engineering and Computer Sciences, University of California, Berkeley, 2008.
- (2008) Techn. Rep. UCB/EECS-2008-49
- Volkov, V.¹ Demel, J.²

5
- 70350368872
- Efficient sparse matrix-vector multiplication on CUDA
- NVIDIA Corporation
- N. Bell, M. Garland: Efficient sparse matrix-vector multiplication on CUDA. Techn. Rep. NVR-2008-004, NVIDIA Corporation 2008.
- (2008) Techn. Rep. NVR-2008-004
- Bell, N.¹ Garland, M.²

6
- 0004972603
- PhD thesis, Rep. UCB/CSD-00-1104, University of California, Berkeley
- E.-J. Im: Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, Rep. UCB/CSD-00-1104, University of California, Berkeley, 2000.
- (2000) Optimizing the Performance of Sparse Matrix-vector Multiplication
- Im, E.-J.¹

7
- 55849145179
- Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression
- Sept. 8-12, Portland (USA)
- K. Kourtis, G. Goumas, N. Koziris: Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression. Proc. 37th International Conference on Parallel Processing, Sept. 8-12, 2008, Portland (USA), 511-519.
- (2008) Proc. 37th International Conference on Parallel Processing , pp. 511-519
- Kourtis, K.¹ Goumas, G.² Koziris, N.³

8
- 74049163483
- Techn. Rep. RC24704(W0812-047), IBM
- M. M. Baskaran, R. Bordwaker: Optimizing sparse matrix-vector multiplication on GPUs. Techn. Rep. RC24704(W0812-047), IBM 2008, http://domino.watson.ibm.com/library/CyberDig.nsf/papers/ 1D32F6D23B99F7898525752200618339/$File/rc24704.pdf.
- (2008) Optimizing Sparse Matrix-vector Multiplication on GPUs
- Baskaran, M.M.¹ Bordwaker, R.²

9
- 77957679421
- Model-driven autotuning of sparse matrix-vector multipy on GPUs
- Bangalore (India), Jan. 9-14, (R. Govindarajan, D. A. Padua, M. W. Hall, eds.), ACM 2010
- J. W. Choi, A. Singh, R. Vuduc: Model-driven autotuning of sparse matrix-vector multipy on GPUs. Proc. 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010), Bangalore (India), Jan. 9-14, 2010 (R. Govindarajan, D. A. Padua, M. W. Hall, eds.), ACM 2010, 37-48.
- (2010) Proc. 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010) , pp. 37-48
- Choi, J.W.¹ Singh, A.² Vuduc, R.³

10
- 70350356359
- Implementing blocked sparse matrix-vector multiplication on NVIDIA GPUs
- July 20-23, (K. Bertels, N. J. Dimopoulos, C. Silvano, S. Wong, eds.), Springer, Berlin 2009
- A. Monakov, A. Avetisyan: Implementing blocked sparse matrix-vector multiplication on NVIDIA GPUs. Proc. 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation, Samos (Greece), July 20-23, 2009 (K. Bertels, N. J. Dimopoulos, C. Silvano, S. Wong, eds.) Springer, Berlin 2009, 289-297.
- (2009) Proc. 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation, Samos (Greece) , pp. 289-297
- Monakov, A.¹ Avetisyan, A.²

11
- 0013269731
- The University of Florida sparse matrix collection
- T. A. Davis, Y. Hu: The University of Florida sparse matrix collection. NA Digest 92 (42), http://www.cise.ufl.edu/research/sparse/matrices/.
- NA Digest , vol.92 , Issue.42
- Davis, T.A.¹ Hu, Y.²

12
- 0004071611
- release 1, Techn Rep., University of Kentucky
- Z. Bai, D. Day, J. Demmel, J. Dongarra: Test matrix collection (non-Hermitian eigenvalue problems). release 1, Techn. Rep., University of Kentucky, 1996, http://math.nist.gov/MatrixMarket/
- (1996) Test Matrix Collection (Non-Hermitian Eigenvalue Problems)
- Bai, Z.¹ Day, D.² Demmel, J.³ Dongarra, J.⁴

13
- 79953817719
- NVIDIA Corporation
- NVIDIA CUDA Programming Guide 3.0. NVIDIA Corporation, 2010, http://developer.download.nvidia.com/compute/cuda/3-0/toolkit/docs/ NVIDIA-CUDA-ProgrammingGuide.pdf.
- (2010) NVIDIA CUDA Programming Guide 3.0

14
- 77952611196
- Concurrent number cruncher: A gpu implementation of a general sparse linear solver
- L. Buatois, G. Caumon, B. Levy: Concurrent number cruncher: a gpu implementation of a general sparse linear solver. Int. J. Parallel Emerg. Distrib. Syst. 24 (2009), 205-223.
- (2009) Int. J. Parallel Emerg. Distrib. Syst. , vol.24 , pp. 205-223
- Buatois, L.¹ Caumon, G.² Levy, B.³

15
- 80053929621
- Techn. Rep., AMD Press Release
- Amd "close to metal" technology unleashes the power of stream computing. Techn. Rep., AMD Press Release 2006.
- (2006) Amd "Close to Metal" Technology Unleashes the Power of Stream Computing

16
- 84857874326
- Master's thesis, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague
- J. Vacata: GPGPU: General purpose computation on GPUs. Master's thesis, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, 2008.
- (2008) GPGPU: General Purpose Computation on GPUs
- Vacata, J.¹

17
- 77949577730
- Automatically tuning sparse matrixvector multiplication for GPU architectures
- Pisa (Italy), Jan. 25-27, (Y. N. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, X. Martorell, eds.), Springer, Berlin 2010
- A. Monakov, A. Lokhmotov, A. Avetisyan: Automatically tuning sparse matrixvector multiplication for GPU architectures. Proc. 5th International Conferences on High Performance Embedded Architectures and Compilers (HiPEAC 2010), Pisa (Italy), Jan. 25-27, 2010 (Y. N. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, X. Martorell, eds.), Springer, Berlin 2010, 111-125.
- (2010) Proc. 5th International Conferences on High Performance Embedded Architectures and Compilers (HiPEAC 2010) , pp. 111-125
- Monakov, A.¹ Lokhmotov, A.² Avetisyan, A.³

18
- 84857885283
- Nvidia, Cusp 0.1.1. http://code.google.com/p/cusp-library/, 2010.
- (2010) Nvidia, Cusp 0.1.1

19
- 34547744862
- When cache blocking of sparse matrix vector multiply works and why
- DOI 10.1007/s00200-007-0038-9
- R. Nishtala, R. W. Vuduc, J. W. Demmel, K. A. Yelick: When cache blocking of sparse matrix vector multiply works and why. Appl. Algebra Eng. Commun. Comput. 18 (2007), 297-311. (Pubitemid 47224626)
- (2007) Applicable Algebra in Engineering, Communications and Computing , vol.18 , Issue.3 , pp. 297-311
- Nishtala, R.¹ Vuduc, R.W.² Demmel, J.W.³ Yelick, K.A.⁴

20
- 0030491606
- An approximate minimum degree ordering algorithm
- P. Amestoy, T. A. Davis, I. S. Duff: An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Appl. 17 (1996), 886-905.
- (1996) SIAM J. Matrix Anal. Appl. , vol.17 , pp. 886-905
- Amestoy, P.¹ Davis, T.A.² Duff, I.S.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.