SCOPUS 정보 검색 플랫폼

IEEE Transactions on Parallel and Distributed Systems

Volumn 25, Issue 1, 2014, Pages 116-125

High-level strategies for parallel shared-memory sparse matrix-vector multiplication

(2) Yzelman, Albert Jan Nicholas a,b Roose, Dirk b

Author keywords

cache oblivious; high performance computing; Hilbert space filling curve; matrix reordering; NUMA architectures; shared memory parallelism; sparse matrix partitioning; Sparse matrix vector multiplication

Indexed keywords

COMPUTER ARCHITECTURE; MEMORY ARCHITECTURE; PARALLEL ARCHITECTURES; VECTOR SPACES;

CACHE-OBLIVIOUS; HIGH PERFORMANCE COMPUTING; HILBERT SPACE FILLING CURVES; MATRIX REORDERING; NUMA ARCHITECTURES; SHARED MEMORY PARALLELISM; SPARSE MATRICES; SPARSE MATRIX-VECTOR MULTIPLICATION;

CACHE MEMORY;

EID: 84919494711 PISSN: 10459219 EISSN: None Source Type: Journal
DOI: 10.1109/TPDS.2013.31 Document Type: Article

Times cited : (42)

References (40)

1
- 0000135303
- Methods of conjugate gradients for solving linear systems
- M.R. Hestenes and E. Stiefel, "Methods of Conjugate Gradients for Solving Linear Systems, " J. Research Nat'l Bureau of Standards, vol. 49, pp. 409-436, 1952.
- (1952) J. Research Nat'l Bureau of Standards , vol.49 , pp. 409-436
- Hestenes, M.R.¹ Stiefel, E.²

2
- 0000048673
- GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems
- Y. Saad and M. Schultz, "GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems, " SIAM J. Scientific and Statistical Computation, vol. 7, pp. 856-869, 1986.
- (1986) SIAM J. Scientific and Statistical Computation , vol.7 , pp. 856-869
- Saad, Y.¹ Schultz, M.²

3
- 0000005482
- BiCGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems
- H. van der Vorst, "BiCGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems, " SIAM J. Scientific and Statistical Computation, vol. 13, pp. 631-644, 1992.
- (1992) SIAM J. Scientific and Statistical Computation , vol.13 , pp. 631-644
- Vorst Der H.Van¹

4
- 67649522218
- IDRηs A family of simple and fast algorithms for solving large nonsymmetric linear systems
- P. Sonneveld and M.B. van Gijzen, "IDRηs: A Family of Simple and Fast Algorithms for Solving Large Nonsymmetric Linear Systems, " SIAM J. Scientific Computing, vol. 31, no. 2, pp. 1035-1062, 2008.
- (2008) SIAM J. Scientific Computing , vol.31 , Issue.2 , pp. 1035-1062
- Sonneveld, P.¹ Van Gijzen, M.B.²

5
- 0034207349
- A Jacobi-Davidson Iteration Method for Linear Eigenvalue Problems
- G.L.G. Sleijpen and H.A. van der Vorst, "A Jacobi-Davidson Iteration Method for Linear Eigenvalue Problems, " SIAM Rev., vol. 42, no. 2, pp. 267-293, 2000.
- (2000) SIAM Rev , vol.42 , Issue.2 , pp. 267-293
- Sleijpen, G.L.G.¹ Vorst Der Van, H.A.²

6
- 84966231631
- A look-ahead lanczos algorithm for unsymmetric matrices
- B.N. Parlett, D. Taylor, and Z. Liu, "A Look-Ahead Lanczos Algorithm for Unsymmetric Matrices, " Math. of Computation, vol. 44, pp. 105-124, 1985.
- (1985) Math. of Computation , vol.44 , pp. 105-124
- Parlett, B.N.¹ Taylor, D.² Liu, Z.³

7
- 0039943513
- LSQR: An algorithm for sparse linear equations and sparse least squares
- C.C. Paige and M.A. Saunders, "LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares, " ACM Trans. Math. Software, vol. 8, pp. 43-71, 1982.
- (1982) ACM Trans. Math. Software , vol.8 , pp. 43-71
- Paige, C.C.¹ Saunders, M.A.²

8
- 0038589165
- The anatomy of a large-scale hypertextual web search engine
- S. Brin and L. Page, "The Anatomy of A Large-Scale Hypertextual Web Search Engine, " Comput. Netw. ISDN Systems, vol. 30, pp. 107-117, 1998.
- (1998) Comput. Netw. ISDN Systems , vol.30 , pp. 107-117
- Brin, S.¹ Page, L.²

9
- 0031269220
- Improving the memory-system performance of sparse-matrix vector multiplication
- S. Toledo, "Improving the Memory-System Performance of Sparse-Matrix Vector Multiplication, " IBM J. Research and Development, vol. 41, no. 6, pp. 711-725, 1997.
- (1997) IBM J. Research and Development , vol.41 , Issue.6 , pp. 711-725
- Toledo, S.¹

10
- 84949647432
- Optimizing sparse matrix-vector multiplication for register reuse in sparsity
- E.-J. Im and K.A. Yelick, "Optimizing Sparse Matrix-Vector Multiplication for Register Reuse in SPARSITY, " Proc. Int'l Conf. Computational Science, Part I, pp. 127-136. 2001.
- (2001) Proc. Int'l Conf. Computational Science , pp. 127-136
- Im, E.-J.¹ Yelick, K.A.²

11
- 24344485098
- OSKI: A library of automatically tuned sparse matrix kernels
- R. Vuduc, J.W. Demmel, and K.A. Yelick, "OSKI: A Library of Automatically Tuned Sparse Matrix Kernels, " J. Physics Conf. Series, vol. 16, pp. 521-530, 2005.
- (2005) J. Physics Conf. Series , vol.16 , pp. 521-530
- Vuduc, R.¹ Demmel, J.W.² Yelick, K.A.³

12
- 18744388753
- Templates for the solution of algebraic eigenvalue problems: A practical guide
- Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, 2000.
- (2000) SIAM
- Bai, Z.¹ Demmel, J.² Dongarra, J.³ Ruhe, A.⁴ Vorst Der H.Van⁵

13
- 17444432688
- Master'sthesis, Dept. of Math., Utrecht Univ., July
- J. Koster, "Parallel Templates for Numerical Linear Algebra, A High-Performance Computation Library, " master's thesis, Dept. of Math., Utrecht Univ., July 2002.
- (2002) Parallel Templates for Numerical Linear Algebra, A High-Performance Computation Library
- Koster, J.¹

14
- 81355148805
- Two-dimensional cache-oblivious sparse matrix-vector multiplication
- A.N. Yzelman and R.H. Bisseling, "Two-Dimensional Cache-Oblivious Sparse Matrix-Vector Multiplication, " Parallel Computing, vol. 37, no. 12, pp. 806-819, http://www.sciencedirect.com/science/article/pii/S0167819111001062, 2011.
- (2011) Parallel Computing , vol.37 , Issue.12 , pp. 806-819
- Yzelman, A.N.¹ Bisseling, R.H.²

15
- 60949098907
- Optimization of sparse matrix-vector multiplication on emerging multicore platforms
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, "Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms, " Parallel Computing, vol. 35, no. 3, pp. 178-194, http://www.sciencedirect.com/science/article/pii/S0167819108001403, 2009.
- (2009) Parallel Computing , vol.35 , Issue.3 , pp. 178-194
- Williams, S.¹ Oliker, L.² Vuduc, R.³ Shalf, J.⁴ Yelick, K.⁵ Demmel, J.⁶

16
- 17444414573
- A two-dimensional data distribution method for parallel sparse matrix-vector multiplication
- B. Vastenhouw and R.H. Bisseling, "A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication, " SIAM Rev., vol. 47, no. 1, pp. 67-95, 2005.
- (2005) SIAM Rev , vol.47 , Issue.1 , pp. 67-95
- Vastenhouw, B.¹ Bisseling, R.H.²

17
- 33847119013
- Parallel hypergraph partitioning for scientific computing
- K.D. Devine, E.G. Boman, R.T. Heaphy, R.H. Bisseling, and Ü.V. Ç atalyü rek, "Parallel Hypergraph Partitioning for Scientific Computing, " Proc. IEEE Int'l Parallel and Distributed Processing Symp., 2006.
- (2006) Proc. IEEE Int'l Parallel and Distributed Processing Symp.
- Devine, K.D.¹ Boman, E.G.² Heaphy, R.T.³ Bisseling, R.H.⁴ Ü, V.⁵ Atalyürek, C.⁶

18
- 84944061403
- Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs
- Springer
- F. Pellegrini and J. Roman, "Scotch: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs, " High-Performance Computing and Networking, pp. 493-498, Springer, 1996.
- (1996) High-Performance Computing and Networking , pp. 493-498
- Pellegrini, F.¹ Roman, J.²

19
- 19644382744
- A parallel algorithm for multilevel k-way hypergraph partitioning
- A. Trifunovic and W.J. Knottenbelt, "A Parallel Algorithm for Multilevel k-Way Hypergraph Partitioning, " Proc. IEEE Third Int'l Symp. Parallel and Distributed Computing, pp. 114-121, 2004.
- (2004) Proc. IEEE Third Int'l Symp. Parallel and Distributed Computing , pp. 114-121
- Trifunovic, A.¹ Knottenbelt, W.J.²

20
- 35048838799
- A fine-grain hypergraph model for 2d decomposition of sparse matrices
- Ü.V. Ç atalyü rek and C. Aykanat, "A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices, " Proc. IEEE Eigth Int'l Workshop Solving Irregularly Structured Problems in Parallel, p. 118, 2001.
- (2001) Proc. IEEE Eigth Int'l Workshop Solving Irregularly Structured Problems in Parallel , pp. 118
- Ü, V.¹ Cątalyürek, C.A.²

21
- 0003828819
- John Wiley and Sons
- T. Lengauer, Combinatorial Algorithms for Integrated Circuit Layout. John Wiley and Sons, 1990.
- (1990) Combinatorial Algorithms for Integrated Circuit Layout
- Lengauer, T.¹

22
- 0033360524
- Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication
- July
- Ü.V. Ç atalyü rek and C. Aykanat, "Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication, " IEEE Trans. Parallel Distributed Systems, vol. 10, no. 7, pp. 673-693, July 1999.
- (1999) IEEE Trans. Parallel Distributed Systems , vol.10 , Issue.7 , pp. 673-693
- Ü, V.¹ Atalyürek, C.² Aykanat, C.³

23
- 84906683087
- Two-dimensional approaches to sparse matrix partitioning
- U. Naumann and O. Schenk, eds. Chapman & Hall/CRC Press
- R.H. Bisseling, B.O. Fagginger Auer, A.N. Yzelman, T. van Leeuwen, and Ü.V. Ç atalyü rek, "Two-Dimensional Approaches to Sparse Matrix Partitioning, " Combinatorial Scientific Computing, U. Naumann and O. Schenk, eds., pp. 321-349, Chapman & Hall/CRC Press, 2012.
- (2012) Combinatorial Scientific Computing , pp. 321-349
- Bisseling, R.H.¹ Fagginger Auer, B.O.² Yzelman, A.N.³ Van Leeuwen, T.⁴ Ü, V.⁵ Atalyürek, C.⁶

24
- 0031120395
- Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines
- D.A. Burgess and M.B. Giles, "Renumbering Unstructured Grids to Improve the Performance of Codes on Hierarchical Memory Machines, " Advances in Eng. Software, vol. 28, no. 3, pp. 189-201, 1997.
- (1997) Advances in Eng. Software , vol.28 , Issue.3 , pp. 189-201
- Burgess, D.A.¹ Giles, M.B.²

25
- 0031364322
- On improving the performance of sparse matrix-vector multiplication
- J.B. White, III and P. Sadayappan, "On Improving the Performance of Sparse Matrix-Vector Multiplication, " Proc. IEEE Fourth Int'l Conf. High-Performance Computing, pp. 66-71. 1997.
- (1997) Proc. IEEE Fourth Int'l Conf. High-Performance Computing , pp. 66-71
- White, J.B.¹ Sadayappan, P.²

26
- 77954707501
- Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods
- A.N. Yzelman and R.H. Bisseling, "Cache-Oblivious Sparse Matrix-Vector Multiplication by Using Sparse Matrix Partitioning Methods, " SIAM J. Scientific Computing, vol. 31, no. 4, pp. 3128-3154, 2009.
- (2009) SIAM J. Scientific Computing , vol.31 , Issue.4 , pp. 3128-3154
- Yzelman, A.N.¹ Bisseling, R.H.²

27
- 34250347767
- A hilbert-order multiplication scheme for unstructured sparse matrices
- G. Haase, M. Liebmann, and G. Plank, "A Hilbert-Order Multiplication Scheme for Unstructured Sparse Matrices, " Int'l J. Parallel, Emergent and Distributed Systems, vol. 22, no. 4, pp. 213-220, 2007.
- (2007) Int'l J. Parallel, Emergent and Distributed Systems , vol.22 , Issue.4 , pp. 213-220
- Haase, G.¹ Liebmann, M.² Plank, G.³

28
- 84930675361
- A cache-oblivious sparse matrix-vector multiplication scheme based on the hilbert curve
- M. Gü nther, A. Bartel, M. Brunk, S. Schöps, and M. Striebel, eds. Springer
- A.N. Yzelman and R.H. Bisseling, "A Cache-Oblivious Sparse Matrix-Vector Multiplication Scheme Based on the Hilbert Curve, " Progress in Industrial Mathematics at ECMI 2010, M. Gü nther, A. Bartel, M. Brunk, S. Schöps, and M. Striebel, eds., pp. 627-634, http://www.springer.com/Math./applications/book/978-3-642-25099-6, Springer, 2012.
- (2012) Progress in Industrial Mathematics at ECMI 2010 , pp. 627-634
- Yzelman, A.N.¹ Bisseling, R.H.²

29
- 85031264203
- Improving performance of sparse matrix-vector multiplication
- A. Pinar and M.T. Heath, "Improving Performance of Sparse Matrix-Vector Multiplication, " Proc. IEEE ACM Supercomputing Conf., Article 30, 1999.
- (1999) Proc. IEEE ACM Supercomputing Conf.
- Pinar, A.¹ Heath, M.T.²

30
- 33646389518
- Fast sparse matrix-vector multiplication by exploiting variable block structure
- R. Vuduc and H.J. Moon, "Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure, " Proc. First Int'l Conf. High Performance Computing and Comm. (HPCC '05), pp. 807-816, 2005.
- (2005) Proc. First Int'l Conf. High Performance Computing and Comm. (HPCC '05) , pp. 807-816
- Vuduc, R.¹ Moon, H.J.²

31
- 1542501019
- Sparsity: Optimization framework for sparse matrix kernels
- E.-J. Im, K. Yelick, and R. Vuduc, "Sparsity: Optimization Framework for Sparse Matrix Kernels, " Int'l J. High Performance Computing Applications, vol. 18, no. 1, pp. 135-158, 2004.
- (2004) Int'l J. High Performance Computing Applications , vol.18 , Issue.1 , pp. 135-158
- Im, E.-J.¹ Yelick, K.² Vuduc, R.³

32
- 84858077252
- An object-oriented bulk synchronous parallel library for multicore programming
- A.N. Yzelman and R.H. Bisseling, "An Object-Oriented Bulk Synchronous Parallel Library for Multicore Programming, " Concurrency and Computation: Practice and Experience, vol. 24, no. 5, pp. 533-553, http://dx.doi.org/10.1002/cpe.1843, 2012.
- (2012) Concurrency and Computation: Practice and Experience , vol.24 , Issue.5 , pp. 533-553
- Yzelman, A.N.¹ Bisseling, R.H.²

33
- 84871135791
- PhD dissertation Utrecht Univ.
- A.N. Yzelman, "Fast Sparse Matrix-Vector Multiplication by Partitioning and Reordering, " PhD dissertation, Utrecht Univ., 2011.
- (2011) Fast Sparse Matrix-Vector Multiplication by Partitioning and Reordering
- Yzelman, A.N.¹

34
- 0003460690
- Technical report, IBM ,Mar.
- G. Morton, "A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing, " technical report, IBM, Mar. 1966.
- (1966) A Computer Oriented Geodetic Data Base and A New Technique in File Sequencing
- Morton, G.¹

35
- 70449690102
- Analyzing block locality in morton-order and morton-hybrid matrices
- K.P. Lorton and D.S. Wise, "Analyzing Block Locality in Morton-Order and Morton-Hybrid Matrices, " ACM SIGARCH Computer Architecture News, vol. 35, no. 4, pp. 6-12, 2007.
- (2007) ACM SIGARCH Computer Architecture News , vol.35 , Issue.4 , pp. 6-12
- Lorton, K.P.¹ Wise, D.S.²

36
- 79551511651
- Utilizing recursive storage in sparse matrix-vector multiplication-preliminary considerations
- M. Martone, S. Filippone, S. Tucci, M. Paprzycki, and M. Ganzha, "Utilizing Recursive Storage in Sparse Matrix-Vector Multiplication-Preliminary Considerations, " Proc. ISCA 25th Int'l Conf. Computers and Their Applications (CATA '10), 2010. pp. 300-305.
- Proc. ISCA 25th Int'l Conf. Computers and Their Applications (CATA '10) , pp. 300-305
- Martone, M.¹ Filippone, S.² Tucci, S.³ Paprzycki, M.⁴ Ganzha, M.⁵

37
- 70449629588
- Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks
- A. Buluç, J.T. Fineman, M. Frigo, J.R. Gilbert, and C.E. Leiserson, "Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks, " Proc. 21st Ann. Symp. Parallelism in Algorithms and Architectures (SPAA '09), pp. 233-244. 2009
- (2009) Proc. 21st Ann. Symp. Parallelism in Algorithms and Architectures (SPAA '09) , pp. 233-244
- Buluç, A.¹ Fineman, J.T.² Frigo, M.³ Gilbert, J.R.⁴ Leiserson, C.E.⁵

38
- 0029191296
- Cilk: An efficient multithreaded runtime system
- Aug.
- R.D. Blumofe, C.F. Joerg, B.C. Kuszmaul, C.E. Leiserson, K.H. Randall, and Y. Zhou, "Cilk: An Efficient Multithreaded Runtime System, " ACM SIGPLAN Notices, vol. 30, no. 8, pp. 207-216, http://doi.acm.org/10.1145/209937.209958, Aug. 1995.
- (1995) ACM SIGPLAN Notices , vol.30 , Issue.8 , pp. 207-216
- Blumofe, R.D.¹ Joerg, C.F.² Kuszmaul, B.C.³ Leiserson, C.E.⁴ Randall, K.H.⁵ Zhou, Y.⁶

39
- 84957597788
- Berkeley Benchmarking Optimization Group
- Berkeley Benchmarking and Optimization Group "pOSKI: Parallel Optimized Sparse Kernel Interface, " http://bebop.cs.berkeley. edu/poski/index.php, 2012.
- (2012) POSKI: Parallel Optimized Sparse Kernel Interface

40
- 80053263342
- Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication
- A. Buluç, S. Williams, L. Oliker, and J. Demmel, "Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication, " Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS '11), pp. 721-733, 2011.
- (2011) Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS '11) , pp. 721-733
- Buluç, A.¹ Williams, S.² Oliker, L.³ Demmel, J.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.