메뉴 건너뛰기




Volumn 25, Issue 1, 2014, Pages 116-125

High-level strategies for parallel shared-memory sparse matrix-vector multiplication

Author keywords

cache oblivious; high performance computing; Hilbert space filling curve; matrix reordering; NUMA architectures; shared memory parallelism; sparse matrix partitioning; Sparse matrix vector multiplication

Indexed keywords

COMPUTER ARCHITECTURE; MEMORY ARCHITECTURE; PARALLEL ARCHITECTURES; VECTOR SPACES;

EID: 84919494711     PISSN: 10459219     EISSN: None     Source Type: Journal    
DOI: 10.1109/TPDS.2013.31     Document Type: Article
Times cited : (42)

References (40)
  • 1
    • 0000135303 scopus 로고
    • Methods of conjugate gradients for solving linear systems
    • M.R. Hestenes and E. Stiefel, "Methods of Conjugate Gradients for Solving Linear Systems, " J. Research Nat'l Bureau of Standards, vol. 49, pp. 409-436, 1952.
    • (1952) J. Research Nat'l Bureau of Standards , vol.49 , pp. 409-436
    • Hestenes, M.R.1    Stiefel, E.2
  • 2
    • 0000048673 scopus 로고
    • GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems
    • Y. Saad and M. Schultz, "GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems, " SIAM J. Scientific and Statistical Computation, vol. 7, pp. 856-869, 1986.
    • (1986) SIAM J. Scientific and Statistical Computation , vol.7 , pp. 856-869
    • Saad, Y.1    Schultz, M.2
  • 3
    • 0000005482 scopus 로고
    • BiCGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems
    • H. van der Vorst, "BiCGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems, " SIAM J. Scientific and Statistical Computation, vol. 13, pp. 631-644, 1992.
    • (1992) SIAM J. Scientific and Statistical Computation , vol.13 , pp. 631-644
    • Vorst Der H.Van1
  • 4
    • 67649522218 scopus 로고    scopus 로고
    • IDRηs A family of simple and fast algorithms for solving large nonsymmetric linear systems
    • P. Sonneveld and M.B. van Gijzen, "IDRηs: A Family of Simple and Fast Algorithms for Solving Large Nonsymmetric Linear Systems, " SIAM J. Scientific Computing, vol. 31, no. 2, pp. 1035-1062, 2008.
    • (2008) SIAM J. Scientific Computing , vol.31 , Issue.2 , pp. 1035-1062
    • Sonneveld, P.1    Van Gijzen, M.B.2
  • 5
    • 0034207349 scopus 로고    scopus 로고
    • A Jacobi-Davidson Iteration Method for Linear Eigenvalue Problems
    • G.L.G. Sleijpen and H.A. van der Vorst, "A Jacobi-Davidson Iteration Method for Linear Eigenvalue Problems, " SIAM Rev., vol. 42, no. 2, pp. 267-293, 2000.
    • (2000) SIAM Rev , vol.42 , Issue.2 , pp. 267-293
    • Sleijpen, G.L.G.1    Vorst Der Van, H.A.2
  • 6
    • 84966231631 scopus 로고
    • A look-ahead lanczos algorithm for unsymmetric matrices
    • B.N. Parlett, D. Taylor, and Z. Liu, "A Look-Ahead Lanczos Algorithm for Unsymmetric Matrices, " Math. of Computation, vol. 44, pp. 105-124, 1985.
    • (1985) Math. of Computation , vol.44 , pp. 105-124
    • Parlett, B.N.1    Taylor, D.2    Liu, Z.3
  • 7
    • 0039943513 scopus 로고
    • LSQR: An algorithm for sparse linear equations and sparse least squares
    • C.C. Paige and M.A. Saunders, "LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares, " ACM Trans. Math. Software, vol. 8, pp. 43-71, 1982.
    • (1982) ACM Trans. Math. Software , vol.8 , pp. 43-71
    • Paige, C.C.1    Saunders, M.A.2
  • 8
    • 0038589165 scopus 로고    scopus 로고
    • The anatomy of a large-scale hypertextual web search engine
    • S. Brin and L. Page, "The Anatomy of A Large-Scale Hypertextual Web Search Engine, " Comput. Netw. ISDN Systems, vol. 30, pp. 107-117, 1998.
    • (1998) Comput. Netw. ISDN Systems , vol.30 , pp. 107-117
    • Brin, S.1    Page, L.2
  • 9
    • 0031269220 scopus 로고    scopus 로고
    • Improving the memory-system performance of sparse-matrix vector multiplication
    • S. Toledo, "Improving the Memory-System Performance of Sparse-Matrix Vector Multiplication, " IBM J. Research and Development, vol. 41, no. 6, pp. 711-725, 1997.
    • (1997) IBM J. Research and Development , vol.41 , Issue.6 , pp. 711-725
    • Toledo, S.1
  • 10
    • 84949647432 scopus 로고    scopus 로고
    • Optimizing sparse matrix-vector multiplication for register reuse in sparsity
    • E.-J. Im and K.A. Yelick, "Optimizing Sparse Matrix-Vector Multiplication for Register Reuse in SPARSITY, " Proc. Int'l Conf. Computational Science, Part I, pp. 127-136. 2001.
    • (2001) Proc. Int'l Conf. Computational Science , pp. 127-136
    • Im, E.-J.1    Yelick, K.A.2
  • 11
    • 24344485098 scopus 로고    scopus 로고
    • OSKI: A library of automatically tuned sparse matrix kernels
    • R. Vuduc, J.W. Demmel, and K.A. Yelick, "OSKI: A Library of Automatically Tuned Sparse Matrix Kernels, " J. Physics Conf. Series, vol. 16, pp. 521-530, 2005.
    • (2005) J. Physics Conf. Series , vol.16 , pp. 521-530
    • Vuduc, R.1    Demmel, J.W.2    Yelick, K.A.3
  • 12
    • 18744388753 scopus 로고    scopus 로고
    • Templates for the solution of algebraic eigenvalue problems: A practical guide
    • Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, 2000.
    • (2000) SIAM
    • Bai, Z.1    Demmel, J.2    Dongarra, J.3    Ruhe, A.4    Vorst Der H.Van5
  • 14
    • 81355148805 scopus 로고    scopus 로고
    • Two-dimensional cache-oblivious sparse matrix-vector multiplication
    • A.N. Yzelman and R.H. Bisseling, "Two-Dimensional Cache-Oblivious Sparse Matrix-Vector Multiplication, " Parallel Computing, vol. 37, no. 12, pp. 806-819, http://www.sciencedirect.com/science/article/pii/S0167819111001062, 2011.
    • (2011) Parallel Computing , vol.37 , Issue.12 , pp. 806-819
    • Yzelman, A.N.1    Bisseling, R.H.2
  • 15
    • 60949098907 scopus 로고    scopus 로고
    • Optimization of sparse matrix-vector multiplication on emerging multicore platforms
    • S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, "Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms, " Parallel Computing, vol. 35, no. 3, pp. 178-194, http://www.sciencedirect.com/science/article/pii/S0167819108001403, 2009.
    • (2009) Parallel Computing , vol.35 , Issue.3 , pp. 178-194
    • Williams, S.1    Oliker, L.2    Vuduc, R.3    Shalf, J.4    Yelick, K.5    Demmel, J.6
  • 16
    • 17444414573 scopus 로고    scopus 로고
    • A two-dimensional data distribution method for parallel sparse matrix-vector multiplication
    • B. Vastenhouw and R.H. Bisseling, "A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication, " SIAM Rev., vol. 47, no. 1, pp. 67-95, 2005.
    • (2005) SIAM Rev , vol.47 , Issue.1 , pp. 67-95
    • Vastenhouw, B.1    Bisseling, R.H.2
  • 18
    • 84944061403 scopus 로고    scopus 로고
    • Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs
    • Springer
    • F. Pellegrini and J. Roman, "Scotch: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs, " High-Performance Computing and Networking, pp. 493-498, Springer, 1996.
    • (1996) High-Performance Computing and Networking , pp. 493-498
    • Pellegrini, F.1    Roman, J.2
  • 22
    • 0033360524 scopus 로고    scopus 로고
    • Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication
    • July
    • Ü.V. Ç atalyü rek and C. Aykanat, "Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication, " IEEE Trans. Parallel Distributed Systems, vol. 10, no. 7, pp. 673-693, July 1999.
    • (1999) IEEE Trans. Parallel Distributed Systems , vol.10 , Issue.7 , pp. 673-693
    • Ü, V.1    Atalyürek, C.2    Aykanat, C.3
  • 24
    • 0031120395 scopus 로고    scopus 로고
    • Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines
    • D.A. Burgess and M.B. Giles, "Renumbering Unstructured Grids to Improve the Performance of Codes on Hierarchical Memory Machines, " Advances in Eng. Software, vol. 28, no. 3, pp. 189-201, 1997.
    • (1997) Advances in Eng. Software , vol.28 , Issue.3 , pp. 189-201
    • Burgess, D.A.1    Giles, M.B.2
  • 26
    • 77954707501 scopus 로고    scopus 로고
    • Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods
    • A.N. Yzelman and R.H. Bisseling, "Cache-Oblivious Sparse Matrix-Vector Multiplication by Using Sparse Matrix Partitioning Methods, " SIAM J. Scientific Computing, vol. 31, no. 4, pp. 3128-3154, 2009.
    • (2009) SIAM J. Scientific Computing , vol.31 , Issue.4 , pp. 3128-3154
    • Yzelman, A.N.1    Bisseling, R.H.2
  • 27
  • 28
    • 84930675361 scopus 로고    scopus 로고
    • A cache-oblivious sparse matrix-vector multiplication scheme based on the hilbert curve
    • M. Gü nther, A. Bartel, M. Brunk, S. Schöps, and M. Striebel, eds. Springer
    • A.N. Yzelman and R.H. Bisseling, "A Cache-Oblivious Sparse Matrix-Vector Multiplication Scheme Based on the Hilbert Curve, " Progress in Industrial Mathematics at ECMI 2010, M. Gü nther, A. Bartel, M. Brunk, S. Schöps, and M. Striebel, eds., pp. 627-634, http://www.springer.com/Math./applications/book/978-3-642-25099-6, Springer, 2012.
    • (2012) Progress in Industrial Mathematics at ECMI 2010 , pp. 627-634
    • Yzelman, A.N.1    Bisseling, R.H.2
  • 29
    • 85031264203 scopus 로고    scopus 로고
    • Improving performance of sparse matrix-vector multiplication
    • A. Pinar and M.T. Heath, "Improving Performance of Sparse Matrix-Vector Multiplication, " Proc. IEEE ACM Supercomputing Conf., Article 30, 1999.
    • (1999) Proc. IEEE ACM Supercomputing Conf.
    • Pinar, A.1    Heath, M.T.2
  • 32
    • 84858077252 scopus 로고    scopus 로고
    • An object-oriented bulk synchronous parallel library for multicore programming
    • A.N. Yzelman and R.H. Bisseling, "An Object-Oriented Bulk Synchronous Parallel Library for Multicore Programming, " Concurrency and Computation: Practice and Experience, vol. 24, no. 5, pp. 533-553, http://dx.doi.org/10.1002/cpe.1843, 2012.
    • (2012) Concurrency and Computation: Practice and Experience , vol.24 , Issue.5 , pp. 533-553
    • Yzelman, A.N.1    Bisseling, R.H.2
  • 35
    • 70449690102 scopus 로고    scopus 로고
    • Analyzing block locality in morton-order and morton-hybrid matrices
    • K.P. Lorton and D.S. Wise, "Analyzing Block Locality in Morton-Order and Morton-Hybrid Matrices, " ACM SIGARCH Computer Architecture News, vol. 35, no. 4, pp. 6-12, 2007.
    • (2007) ACM SIGARCH Computer Architecture News , vol.35 , Issue.4 , pp. 6-12
    • Lorton, K.P.1    Wise, D.S.2
  • 39
    • 84957597788 scopus 로고    scopus 로고
    • Berkeley Benchmarking Optimization Group
    • Berkeley Benchmarking and Optimization Group "pOSKI: Parallel Optimized Sparse Kernel Interface, " http://bebop.cs.berkeley. edu/poski/index.php, 2012.
    • (2012) POSKI: Parallel Optimized Sparse Kernel Interface


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.