SCOPUS 정보 검색 플랫폼

Concurrency and Computation: Practice and Experience

Volumn 21, Issue 15, 2009, Pages 1838-1856

Increasing data reuse of sparse algebra codes on simultaneous multithreading architectures

(4) Pichel, J C a Heras, D B b Cabaleiro, J C b Rivera, F F b

a UNIVERSIDAD CARLOS III DE MADRID (Spain)

b UNIVERSITY OF SANTIAGO DE COMPOSTELA (Spain)

Author keywords

Data reuse; Irregular codes; Locality; Multithreading; Sparse algebra codes; Sparse matrix

Indexed keywords

MATRIX ALGEBRA; MEMORY ARCHITECTURE; MULTITASKING;

DATA REUSE; IRREGULAR CODES; LOCALITY; MULTI-THREADING; SPARSE ALGEBRA CODES; SPARSE MATRICES;

CODES (SYMBOLS);

EID: 70349124898 PISSN: 15320626 EISSN: 15320634 Source Type: Journal
DOI: 10.1002/cpe.1404 Document Type: Article

Times cited : (5)

References (35)

1
- 0029666641
- Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor
- Philadelphia, U.S.A
- Tullsen DM, Eggers SJ, Emer JS, Levy HM, Lo JL, Stamm RL. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. ISCA, Philadelphia, U.S.A., 1996; 191-202.
- (1996) ISCA , pp. 191-202
- Tullsen, D.M.¹ Eggers, S.J.² Emer, J.S.³ Levy, H.M.⁴ Lo, J.L.⁵ Stamm, R.L.⁶

2
- 0031199614
- Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
- Lo JL, Emer JS, Levy HM, Stamm RL, Tullsen DM. Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. ACM Transactions on Computer Systems 1997; 15(3):322-354.
- (1997) ACM Transactions on Computer Systems , vol.15 , Issue.3 , pp. 322-354
- Lo, J.L.¹ Emer, J.S.² Levy, H.M.³ Stamm, R.L.⁴ Tullsen, D.M.⁵

3
- 0029200683
- Simultaneous multithreading: Maximizing on-chip parallelism
- Santa Margherita Ligure, Italy
- Tullsen DM, Eggers S, Levy HM. Simultaneous multithreading: Maximizing on-chip parallelism. Proceedings of the 22th Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, 1995; 392-403.
- (1995) Proceedings of the 22th Annual International Symposium on Computer Architecture , pp. 392-403
- Tullsen, D.M.¹ Eggers, S.² Levy, H.M.³

4
- 21244499927
- Standard memory hierarchy does not fit Simultaneous Multithreading
- Las Vegas, U.S.A
- Hily S, Seznec A. Standard memory hierarchy does not fit Simultaneous Multithreading. Proceedings of the Workshop on Multithreaded Execution Architecture and Compilation (with HPCA-4), Las Vegas, U.S.A., 1998.
- (1998) Proceedings of the Workshop on Multithreaded Execution Architecture and Compilation (With HPCA-4)
- Hily, S.¹ Seznec, A.²

5
- 27144443419
- A technique for accelerating the convergence of restarted GMRES
- Baker AH, Jessup ER, Manteuffel T. A technique for accelerating the convergence of restarted GMRES. SIAM Journal on Matrix Analysis and Applications 2005; 26(4):962-984.
- (2005) SIAM Journal on Matrix Analysis and Applications , vol.26 , Issue.4 , pp. 962-984
- Baker, A.H.¹ Jessup, E.R.² Manteuffel, T.³

6
- 5144222846
- The block lanczos method for linear systems with multiple right-hand sides
- Guennounia AE, Jbilou K, Sadok H. The block lanczos method for linear systems with multiple right-hand sides. Applied Numerical Mathematics 2004; 51(2-3):243-256.
- (2004) Applied Numerical Mathematics , vol.51 , Issue.2-3 , pp. 243-256
- Guennounia, A.E.¹ Jbilou, K.² Sadok, H.³

7
- 1842829625
- SIAM: Philadelphia, PA
- Saad Y. Iterative Methods for Sparse Linear Systems. SIAM: Philadelphia, PA, 2003.
- (2003) Iterative Methods for Sparse Linear Systems
- Saad, Y.¹

8
- 0001087280
- Hyper-Threading technology architecture and microarchitecture
- Marr DT, Binns F, Hill DL, Hinton G, Koufaty DA, Miller JA, Upton M. Hyper-Threading technology architecture and microarchitecture. Intel Technology Journal Q1 2002; 6(1):4-15.
- (2002) Intel Technology Journal Q1 , vol.6 , Issue.1 , pp. 4-15
- Marr, D.T.¹ Binns, F.² Hill, D.L.³ Hinton, G.⁴ Koufaty, D.A.⁵ Miller, J.A.⁶ Upton, M.⁷

9
- 0001803542
- Several strategies for reducing the bandwidth of matrices
- Rose DJ, Willoughby RA (eds.). Plenum Press: New York
- Cuthill E, McKee J. Several strategies for reducing the bandwidth of matrices. Sparse Matrices and their Applications, Rose DJ, Willoughby RA (eds.). Plenum Press: New York, 1972.
- (1972) Sparse Matrices and Their Applications
- Cuthill, E.¹ McKee, J.²

10
- 0030491606
- An approximate minimum degree ordering algorithm
- Amestoy PR, Davis TA, Duff IS. An approximate minimum degree ordering algorithm. SIAM Journal on Matrix Analysis and Applications 1996; 17(4):886-905.
- (1996) SIAM Journal on Matrix Analysis and Applications , vol.17 , Issue.4 , pp. 886-905
- Amestoy, P.R.¹ Davis, T.A.² Duff, I.S.³

11
- 0036734103
- Effects of ordering strategies and programming paradigms on sparse matrix computations
- Oliker L, Li X, Husbands P, Biswas R. Effects of ordering strategies and programming paradigms on sparse matrix computations. SIAM Review 2002; 44(3):373-393.
- (2002) SIAM Review , vol.44 , Issue.3 , pp. 373-393
- Oliker, L.¹ Li, X.² Husbands, P.³ Biswas, R.⁴

12
- 0033189408
- Memory hierarchy performance prediction for blocked sparse algorithms
- Fraguela BB, Doallo R, Zapata EL. Memory hierarchy performance prediction for blocked sparse algorithms. Parallel Processing Letters 1999; 9(3):347-360.
- (1999) Parallel Processing Letters , vol.9 , Issue.3 , pp. 347-360
- Fraguela, B.B.¹ Doallo, R.² Zapata, E.L.³

13
- 0029713939
- Block algorithms for sparse matrix computations on high performance workstations
- Philadelphia, U.S.A
- Navarro JJ, García E, Larriba-Pey JL, Juan T. Block algorithms for sparse matrix computations on high performance workstations. Proceedings of the IEEE International Conference on Supercomputing (ICS '96), Philadelphia, U.S.A., 1996; 301-309.
- (1996) Proceedings of the IEEE International Conference on Supercomputing (ICS '96) , pp. 301-309
- Navarro, J.J.¹ García, E.² Larriba-Pey, J.L.³ Juan, T.⁴

14
- 0039958691
- Improving memory-system performance of sparse matrix-vector multiplication
- Minneapolis, U.S.A
- Toledo S. Improving memory-system performance of sparse matrix-vector multiplication. Proceedings ofthe Eighth SIAM Conference on Parallel Processing for Scientific Computing, Minneapolis, U.S.A., 1997.
- (1997) Proceedings Ofthe Eighth SIAM Conference on Parallel Processing for Scientific Computing
- Toledo, S.¹

15
- 3042576437
- Improving performance of sparse matrix-vector multiplication
- Portland, OR
- Pinar A, Heath M. Improving performance of sparse matrix-vector multiplication. Proceedings of Supercomputing, Portland, OR, 1999.
- (1999) Proceedings of Supercomputing
- Pinar, A.¹ Heath, M.²

16
- 38149066662
- Optimizing sparse matrix vector multiplication on SMPs
- San Antonio, U.S.A
- Im EJ, Yelick K. Optimizing sparse matrix vector multiplication on SMPs. Proceedings of the 10th SIAM Conference on Parallel Processing for Scientific Computing, San Antonio, U.S.A., 1999.
- (1999) Proceedings of the 10th SIAM Conference on Parallel Processing for Scientific Computing
- Im, E.J.¹ Yelick, K.²

17
- 25644439819
- Performance optimization of irregular codes based on the combination of reordering and blocking techniques
- Pichel JC, Heras DB, Cabaleiro JC, Rivera FF. Performance optimization of irregular codes based on the combination of reordering and blocking techniques. Parallel Computing 2005; 31(8-9):858-876.
- (2005) Parallel Computing , vol.31 , Issue.8-9 , pp. 858-876
- Pichel, J.C.¹ Heras, D.B.² Cabaleiro, J.C.³ Rivera, F.F.⁴

18
- 1542710739
- Sparse tiling for stationary iterative methods
- Strout MM, Carter L, Ferrante J, Kreaseck B. Sparse tiling for stationary iterative methods. International Journal of High Performance Computing Applications 2004; 18(1):95-114.
- (2004) International Journal of High Performance Computing Applications , vol.18 , Issue.1 , pp. 95-114
- Strout, M.M.¹ Carter, L.² Ferrante, J.³ Kreaseck, B.⁴

19
- 3042573689
- Dynamic cache partitioning for simultaneous multithreading systems
- Anaheim, CA, U.S.A
- Suh G, Devadas S, Rudolph L. Dynamic cache partitioning for simultaneous multithreading systems. Proceeding ofthe 13th IASTED International Conference on Parallel and Distributed Computing System, Anaheim, CA, U.S.A., 2001.
- (2001) Proceeding Ofthe 13th IASTED International Conference on Parallel and Distributed Computing System
- Suh, G.¹ Devadas, S.² Rudolph, L.³

20
- 0242370926
- Code and data transformations for improving shared cache performance on SMT processors
- Tokyo-Odaiba, Japan
- Nikolopoulos DS. Code and data transformations for improving shared cache performance on SMT processors. International Symposium on High Performance Computing, Tokyo-Odaiba, Japan, 2003; 54-69.
- (2003) International Symposium on High Performance Computing , pp. 54-69
- Nikolopoulos, D.S.¹

21
- 72649092106
- Maximizing TLP with loop-parallelization on SMT
- Austin, U.S.A
- Puppin D, Tullsen DM. Maximizing TLP with loop-parallelization on SMT. Fifth Workshop on Multithreaded Execution, Architecture, and Compilation, Austin, U.S.A., 2001.
- (2001) Fifth Workshop on Multithreaded Execution, Architecture, and Compilation
- Puppin, D.¹ Tullsen, D.M.²

22
- 56749158843
- Optimization of sparse matrix-vector multiply on emerging multicore platforms
- Reno, U.S.A
- Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J. Optimization of sparse matrix-vector multiply on emerging multicore platforms. Proceedings ofSupercomputing (SC), Reno, U.S.A., 2007.
- (2007) Proceedings Of Supercomputing (SC)
- Williams, S.¹ Oliker, L.² Vuduc, R.³ Shalf, J.⁴ Yelick, K.⁵ Demmel, J.⁶

23
- 0035370397
- Modeling data locality for the sparse matrix-vector product using distance measures
- Heras DB, Cabaleiro JC, Rivera FF. Modeling data locality for the sparse matrix-vector product using distance measures. Parallel Computing 2001; 27:897-912.
- (2001) Parallel Computing , vol.27 , pp. 897-912
- Heras, D.B.¹ Cabaleiro, J.C.² Rivera, F.F.³

24
- 17044433896
- A quantitative analysis of loop nest locality
- Cambridge, MA, U.S.A
- McKinley KS, Temam O. A quantitative analysis of loop nest locality. Proceedings ofthe Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, Cambridge, MA, U.S.A., 1996.
- (1996) Proceedings Ofthe Seventh International Conference on Architectural Support for Programming Languages and Operating Systems
- McKinley, K.S.¹ Temam, O.²

25
- 0031364101
- Tuning compiler optimizations for simultaneous multithreading
- Research Triangle Park, North Carolina, U.S.A
- Lo JL, Eggers SJ, Levy HM, Parekh SS, Tullsen DM. Tuning compiler optimizations for simultaneous multithreading. International Symposium on Microarchitecture, Research Triangle Park, North Carolina, U.S.A., 1997; 114-124.
- (1997) International Symposium on Microarchitecture , pp. 114-124
- Lo, J.L.¹ Eggers, S.J.² Levy, H.M.³ Parekh, S.S.⁴ Tullsen, D.M.⁵

26
- 0042415671
- An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum
- Duff I, Heroux M, Pozo R. An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum. ACM Transactions on Mathematical Software 2002; 28(2):239-267.
- (2002) ACM Transactions on Mathematical Software , vol.28 , Issue.2 , pp. 239-267
- Duff, I.¹ Heroux, M.² Pozo, R.³

27
- 1542501019
- SPARSITY: Framework for optimizing sparse matrix-vector multiply
- Im EJ, Yelick KA, Vuduc R. SPARSITY: Framework for optimizing sparse matrix-vector multiply. International Journal of High Performance Computing Applications 2004; 18(1):135-158.
- (2004) International Journal of High Performance Computing Applications , vol.18 , Issue.1 , pp. 135-158
- Im, E.J.¹ Yelick, K.A.² Vuduc, R.³

28
- 3042618790
- Improving the locality of the sparse matrix-vector product on shared memory multiprocessors
- PDP2004,A Coruna, Galicia, Spain
- Pichel JC, Heras DB, Cabaleiro JC, Rivera FF. Improving the locality of the sparse matrix-vector product on shared memory multiprocessors. Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP2004,A Coruna, Galicia, Spain, 2004; 66-71.
- (2004) Euromicro Conference on Parallel, Distributed and Network-based Processing , pp. 66-71
- Pichel, J.C.¹ Heras, D.B.² Cabaleiro, J.C.³ Rivera, F.F.⁴

29
- 0035450031
- Modelling and improving locality for the sparse matrix-vector product on cache memories
- Heras DB, Blanco V, Cabaleiro JC, Rivera FF. Modelling and improving locality for the sparse matrix-vector product on cache memories. Future Generation Computer Systems. Special Issue on High Performance Numerical Methods and Application 2001; 18(1):55-67.
- (2001) Future Generation Computer Systems. Special Issue on High Performance Numerical Methods and Application , vol.18 , Issue.1 , pp. 55-67
- Heras, D.B.¹ Blanco, V.² Cabaleiro, J.C.³ Rivera, F.F.⁴

30
- 84884063278
- Princeton University Press: Princeton, NJ, U.S.A
- Applegate D, Bixby R, Chvatal V, Cook W. The Traveling Salesman Problem: A Computational Study. Princeton University Press: Princeton, NJ, U.S.A., 2006.
- (2006) The Traveling Salesman Problem: A Computational Study
- Applegate, D.¹ Bixby, R.² Chvatal, V.³ Cook, W.⁴

31
- 0003197949
- University of Florida sparse matrix collection
- 15 October 2007
- Davis T. University of Florida Sparse Matrix Collection. NA Digest 1997; 97(23). http://www.cise.ufl.edu/research/ sparse/matrices [15 October 2007].
- (1997) NA Digest , vol.97 , pp. 23
- Davis, T.¹

32
- 0003734628
- Department of Computer Science, University of Minnesota
- Karypis G, Kumar V. METIS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. Department of Computer Science, University of Minnesota, 1997.
- (1997) METIS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-reducing Orderings of Sparse Matrices
- Karypis, G.¹ Kumar, V.²

33
- 0003278283
- The microarchitecture of the Pentium 4 processor
- Hinton G, Sager D, Upton M, Boggs D, Carmean D, Kyker A, Roussel P. The microarchitecture of the Pentium 4 processor. Intel Technology Journal Q1 2001; 1-13.
- (2001) Intel Technology Journal Q1 , pp. 1-13
- Hinton, G.¹ Sager, D.² Upton, M.³ Boggs, D.⁴ Carmean, D.⁵ Kyker, A.⁶ Roussel, P.⁷

34
- 34547715870
- Initial observations of the simultaneous multithreading Pentium 4 processor. PACT '03
- IEEE Computer Society: Washington, DC
- Tuck N, Tullsen DM. Initial observations of the simultaneous multithreading Pentium 4 processor. PACT '03: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society: Washington, DC, 2003.
- (2003) Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
- Tuck, N.¹ Tullsen, D.M.²

35
- 0034268943
- A portable programming interface for performance evaluation on modern processors
- Browne S, Dongarra J, Garner N, Ho G, Mucci P. A portable programming interface for performance evaluation on modern processors. International Journal ofHigh Performance Computing Applications 2000; 14(3):189-204.
- (2000) International Journal OfHigh Performance Computing Applications , vol.14 , Issue.3 , pp. 189-204
- Browne, S.¹ Dongarra, J.² Garner, N.³ Ho, G.⁴ Mucci, P.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.