SCOPUS 정보 검색 플랫폼

Proceedings of the International Conference on Supercomputing

Volumn 2002-November, Issue , 2002, Pages

Performance optimizations and bounds for sparse matrix-vector multiply

(6) Vuduc, Richard a Demmel, James W a Yelick, Katherine A a Kamil, Shoaib a Nishtala, Rajesh a Lee, Benjamin a

a UNIVERSITY OF CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

CACHE MEMORY; MATRIX ALGEBRA; OPTIMIZATION; VECTORS;

BLOCK SIZES; COMPUTATIONAL KERNELS; PERFORMANCE; PERFORMANCE BOUNDS; PERFORMANCE OPTIMIZATIONS; PERFORMANCE TUNING; SCIENTIFIC APPLICATIONS; SPARSE MATRICES; SPARSE MATRIX-VECTOR MULTIPLY; STRUCTURE REORGANIZATIONS;

FINITE ELEMENT METHOD;

EID: 84990830919 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/SC.2002.10025 Document Type: Conference Paper

Times cited : (75)

References (32)

1
- 0003473816
- 2nd Edition. SIAM, Philadelphia, PA
- R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.
- (1994) Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods
- Barrett, R.¹ Berry, M.² Chan, T.F.³ Demmel, J.⁴ Donato, J.⁵ Dongarra, J.⁶ Eijkhout, V.⁷ Pozo, R.⁸ Romine, C.⁹ der Vorst, H.V.¹⁰

2
- 0032606267
- Automatic nonzero structure analysis
- A. J. C. Bik and H. A. G. Wijshoff. Automatic nonzero structure analysis. SIAM Journal on Computing, 28(5):1576-1587, 1999.
- (1999) SIAM Journal on Computing , vol.28 , Issue.5 , pp. 1576-1587
- Bik, A.J.C.¹ Wijshoff, H.A.G.²

3
- 0030661485
- Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology
- Vienna, Austria, July ACM SIGARC.
- J. Bilmes, K. Asanović, C. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proceedings of the International Conference on Supercomputing, Vienna, Austria, July 1997. ACM SIGARC. see http://www.icsi.berkeley.edu/~bilmes/phipac.
- (1997) Proceedings of the International Conference on Supercomputing
- Bilmes, J.¹ Asanović, K.² Chin, C.³ Demmel, J.⁴

4
- 84891471315
- S. Blackford, G. Corliss, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, C. Hu, W. Kahan, L. Kaufman, B. Kearfott, F. Krogh, X. Li, Z. Maany, A. Petitet, R. Pozo, K. Remington, W. Walster, C. Whaley, and J. W. von Gudenberg. Document for the Basic Linear Algebra Subprograms (BLAS) standard: BLAS Technical Forum, 2001. www.netlib.org/blast.
- (2001) Document for the Basic Linear Algebra Subprograms (BLAS) Standard: BLAS Technical Forum
- Blackford, S.¹ Corliss, G.² Demmel, J.³ Dongarra, J.⁴ Duff, I.⁵ Hammarling, S.⁶ Henry, G.⁷ Heroux, M.⁸ Hu, C.⁹ Kahan, W.¹⁰ Kaufman, L.¹¹ Kearfott, B.¹² Krogh, F.¹³ Li, X.¹⁴ Maany, Z.¹⁵ Petitet, A.¹⁶ Pozo, R.¹⁷ Remington, K.¹⁸ Walster, W.¹⁹ Whaley, C.²⁰ von Gudenberg, J.W.²¹ more..

5
- 0000488282
- The matrix market: A web resource for test matrix collections
- R. F. Boisvert, editor, London, Chapman and Hall. math.nist.gov/MatrixMarket
- R. F. Boisvert, R. Pozo, K. Remington, R. Barrett, and J. J. Dongarra. The Matrix Market: A web resource for test matrix collections. In R. F. Boisvert, editor, Quality of Numerical Software, Assessment and Enhancement, pages 125-137, London, 1997. Chapman and Hall. math.nist.gov/MatrixMarket.
- (1997) Quality of Numerical Software, Assessment and Enhancement , pp. 125-137
- Boisvert, R.F.¹ Pozo, R.² Remington, K.³ Barrett, R.⁴ Dongarra, J.J.⁵

6
- 12844276307
- A scalable cross-platform infrastructure for application performance tuning using hardware counters
- November
- S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of Supercomputing, November 2000.
- (2000) Proceedings of Supercomputing
- Browne, S.¹ Dongarra, J.² Garner, N.³ London, K.⁴ Mucci, P.⁵

7
- 84964748976
- Compiler blockability of numerical algorithms
- S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Proceedings of Supercomputing, pages 114-124, 1992.
- (1992) Proceedings of Supercomputing , pp. 114-124
- Carr, S.¹ Kennedy, K.²

8
- 0034832018
- Exact analysis of the cache behavior of nested loops
- Snowbird, UT, USA, June
- S. Chatterjee, E. Parker, P. J. Hanlon, and A. R. Lebeck. Exact analysis of the cache behavior of nested loops. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, pages 286-297, Snowbird, UT, USA, June 2001.
- (2001) Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation , pp. 286-297
- Chatterjee, S.¹ Parker, E.² Hanlon, P.J.³ Lebeck, A.R.⁴

9
- 85040122690
- T. Davis. UF Sparse Matrix Collection. www.cise.ufl.edu/research/sparse/matrices.
- UF Sparse Matrix Collection
- Davis, T.¹

10
- 0033189408
- Memory hierarchy performance prediction for sparse blocked algorithms
- March
- B. B. Fraguela, R. Doallo, and E. L. Zapata. Memory hierarchy performance prediction for sparse blocked algorithms. Parallel Processing Letters, 9(3), March 1999.
- (1999) Parallel Processing Letters , vol.9 , Issue.3
- Fraguela, B.B.¹ Doallo, R.² Zapata, E.L.³

11
- 0001714824
- Cache miss equations: A compiler framework for analyzing and tuning memory behavior
- S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems, 21(4):703-746, 1999.
- (1999) ACM Transactions on Programming Languages and Systems , vol.21 , Issue.4 , pp. 703-746
- Ghosh, S.¹ Martonosi, M.² Malik, S.³

12
- 0005271318
- Towards realistic bounds for implicit CFD codes
- W. D. Gropp, D. K. Kasushik, D. E. Keyes, and B. F. Smith. Towards realistic bounds for implicit CFD codes. In Proceedings of Parallel Computational Fluid Dynamics, pages 241-248, 1999.
- (1999) Proceedings of Parallel Computational Fluid Dynamics , pp. 241-248
- Gropp, W.D.¹ Kasushik, D.K.² Keyes, D.E.³ Smith, B.F.⁴

13
- 34547734670
- Fracture mechanics on the Intel Itanium architecture: A case study
- Austin, TX, December
- G. Heber, A. J. Dolgert, M. Alt, K. A. Mazurkiewicz, and L. Stringer. Fracture mechanics on the Intel Itanium architecture: A case study. In Workshop on EPIC Architectures and Compiler Technology (ACM MICRO 34), Austin, TX, December 2001.
- (2001) Workshop on EPIC Architectures and Compiler Technology (ACM MICRO 34)
- Heber, G.¹ Dolgert, A.J.² Alt, M.³ Mazurkiewicz, K.A.⁴ Stringer, L.⁵

14
- 35448943938
- Flexible, high-performance matrix multiply via a self-modifying runtime code
- University of Texas at Austin, December
- G. M. Henry. Flexible, high-performance matrix multiply via a self-modifying runtime code. Technical Report TR-2001-46, University of Texas at Austin, December 2001.
- (2001) Technical Report TR-2001-46
- Henry, G.M.¹

15
- 22644452418
- Modeling and improving locality for irregular problems: Sparse matrix-vector product on cache memories as a case study
- D. B. Heras, V. B. Perez, J. C. C. Dominguez, and F. F. Rivera. Modeling and improving locality for irregular problems: sparse matrix-vector product on cache memories as a case study. In HPCN Europe, pages 201-210, 1999.
- (1999) HPCN Europe , pp. 201-210
- Heras, D.B.¹ Perez, V.B.² Dominguez, J.C.C.³ Rivera, F.F.⁴

16
- 0004972603
- PhD thesis, University of California, Berkeley, May
- E.-J. Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, May 2000.
- (2000) Optimizing the Performance of Sparse Matrix-Vector Multiplication
- Im, E.-J.¹

17
- 84949647432
- Optimizing sparse matrix computations for register reuse in SPARSITY
- of LNCS, Springer, May
- E.-J. Im and K. A. Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. In Proceedings of the International Conference on Computational Science, volume 2073 of LNCS, pages 127-136. Springer, May 2001.
- (2001) Proceedings of the International Conference on Computational Science, Volume , vol.2073 , pp. 127-136
- Im, E.-J.¹ Yelick, K.A.²

18
- 1142294743
- Intel. November
- Intel. Intel Itanium processor reference manual for software optimization, November 2001.
- (2001) Intel Itanium Processor Reference Manual for Software Optimization

19
- 0031364322
- On improving the performance of sparse matrix-vector multiplication
- I. James B. White and P. Sadayappan. On improving the performance of sparse matrix-vector multiplication. In Proceedings of the International Conference on High-Performance Computing, 1997.
- (1997) Proceedings of the International Conference on High-Performance Computing
- James, I.¹ White, B.² Sadayappan, P.³

20
- 0038998034
- Memory bandwidth and machine balance in current high performance computers
- December
- J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. Newsletter of the IEEE Technical Committee on Computer Architecture, December 1995. http://tab.computer.org/tcca/NEWS/DEC95/DEC95.HTM.
- (1995) Newsletter of the IEEE Technical Committee on Computer Architecture
- McCalpin, J.D.¹

21
- 0345025793
- J. D. McCalpin. STREAM: Measuring sustainable memory bandwidth in high performance computers, 1995. http://www.cs.virginia.edu/stream.
- (1995) STREAM: Measuring Sustainable Memory Bandwidth in High Performance Computers
- McCalpin, J.D.¹

22
- 0030190854
- Improving data locality with loop transformations
- July
- K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424-453, July 1996.
- (1996) ACM Transactions on Programming Languages and Systems , vol.18 , Issue.4 , pp. 424-453
- McKinley, K.S.¹ Carr, S.² Tseng, C.-W.³

23
- 0029713939
- Algorithms for sparse matrix computations on high-performance workstations
- Philadelpha, PA, USA, May
- J. J. Navarro, E. García, J. L. Larriba-Pey, and T. Juan. Algorithms for sparse matrix computations on high-performance workstations. In Proceedings of the 10th ACM International Conference on Supercomputing, pages 301-308, Philadelpha, PA, USA, May 1996.
- (1996) Proceedings of the 10th ACM International Conference on Supercomputing , pp. 301-308
- Navarro, J.J.¹ García, E.² Larriba-Pey, J.L.³ Juan, T.⁴

24
- 3042576437
- Improving performance of sparse matrix-vector multiplication
- A. Pinar and M. Heath. Improving performance of sparse matrix-vector multiplication. In Proceedings of Supercomputing, 1999.
- (1999) Proceedings of Supercomputing
- Pinar, A.¹ Heath, M.²

25
- 84951088709
- Generation of efficient code for sparse matrix computations
- LNCS, August
- W. Pugh and T. Shpeisman. Generation of efficient code for sparse matrix computations. In Proceedings of the 11th Workshop on Languages and Compilers for Parallel Computing, LNCS, August 1998.
- (1998) Proceedings of the 11th Workshop on Languages and Compilers for Parallel Computing
- Pugh, W.¹ Shpeisman, T.²

26
- 0041391794
- Technical report, NIST, gams. spblas
- K. Remington and R. Pozo. NIST Sparse BLAS: User's Guide. Technical report, NIST, 1996. gams.nist.gov/spblas.
- (1996) NIST Sparse BLAS: User's Guide
- Remington, K.¹ Pozo, R.²

27
- 1842793644
- PhD thesis, University of California, Berkeley, February
- R. H. Saavedra-Barrera. CPU Performance Evaluation and Execution Time Prediction Using Narrow Spectrum Benchmarking. PhD thesis, University of California, Berkeley, February 1992.
- (1992) CPU Performance Evaluation and Execution Time Prediction Using Narrow Spectrum Benchmarking
- Saavedra-Barrera, R.H.¹

28
- 1542601850
- PhD thesis, Cornell University, August
- P. Stodghill. A Relational Approach to the Automatic Generation of Sequential Sparse Matrix Codes. PhD thesis, Cornell University, August 1997.
- (1997) A Relational Approach to the Automatic Generation of Sequential Sparse Matrix Codes
- Stodghill, P.¹

29
- 0000541430
- Characterizing the behavior of sparse algorithms on caches
- O. Temam and W. Jalby. Characterizing the behavior of sparse algorithms on caches. In Proceedings of Supercomputing'92, 1992.
- (1992) Proceedings of Supercomputing'92
- Temam, O.¹ Jalby, W.²

30
- 0039958691
- Improving memory-system performance of sparse matrix-vector multiplication
- March
- S. Toledo. Improving memory-system performance of sparse matrix-vector multiplication. In Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing, March 1997.
- (1997) Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing
- Toledo, S.¹

31
- 84943297310
- Automatically tuned linear algebra software
- C. Whaley and J. Dongarra. Automatically tuned linear algebra software. In Proc. of Supercomp., 1998.
- (1998) Proc. Of Supercomp.
- Whaley, C.¹ Dongarra, J.²

32
- 84976827033
- A data locality optimizing algorithm
- June
- M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation, June 1991.
- (1991) Proceedings of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation
- Wolf, M.E.¹ Lam, M.S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.