SCOPUS 정보 검색 플랫폼

Parallel Computing

Volumn 35, Issue 3, 2009, Pages 138-150

Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor

(3) Kurzak, Jakub a Alvaro, Wesley a Dongarra, Jack a,b,c

a UNIVERSITY OF TENNESSEE (United States)

b OAK RIDGE NATIONAL LABORATORY (United States)

c UNIVERSITY OF MANCHESTER (United Kingdom)

Author keywords

Instruction level parallelism; Loop optimizations; Single Instruction Multiple Data; Synergistic Processing Element; Vectorization

Indexed keywords

CELLS; COMPUTER ARCHITECTURE; COMPUTER GRAPHICS; CYTOLOGY; DATA HANDLING; DIGITAL ARITHMETIC; DIGITAL STORAGE; EIGENVALUES AND EIGENFUNCTIONS; GRAPHICS PROCESSING UNIT; LINEAR SYSTEMS; PROGRAM COMPILERS;

INSTRUCTION LEVEL PARALLELISM; LOOP OPTIMIZATIONS; PROCESSING ELEMENTS; SINGLE INSTRUCTION MULTIPLE DATA; VECTORIZATION;

MATRIX ALGEBRA;

EID: 60649099576 PISSN: 01678191 EISSN: None Source Type: Journal
DOI: 10.1016/j.parco.2008.12.010 Document Type: Article

Times cited : (54)

References (49)

1
- 60649118939
- IBM Corporation, November
- IBM Corporation, Cell BE Programming Tutorial, November 2007.
- (2007) Cell BE Programming Tutorial

2
- 60649094533
- IBM Corporation, Cell Broadband Engine Programming Handbook, Version 1.1, April 2007
- IBM Corporation, Cell Broadband Engine Programming Handbook, Version 1.1, April 2007.

3
- 0032592096
- Design challenges of technology scaling
- Borkar S. Design challenges of technology scaling. IEEE Micro 19 4 (1999) 23-29
- (1999) IEEE Micro , vol.19 , Issue.4 , pp. 23-29
- Borkar, S.¹

4
- 20344401552
- Industry trends: chip makers turn to multicore processors
- Geer D. Industry trends: chip makers turn to multicore processors. Computer 38 5 (2005) 11-13
- (2005) Computer , vol.38 , Issue.5 , pp. 11-13
- Geer, D.¹

5
- 34548083281
- The free lunch is over: A fundamental turn toward concurrency in software
- H. Sutter, The free lunch is over: a fundamental turn toward concurrency in software, Dr. Dobb's J. 30(3).
- Dr. Dobb's J , vol.30 , Issue.3
- Sutter, H.¹

6
- 60649085163
- K. Asanovic, R. Bodik, B.C. Catanzaro, J.J. Gebis, P. Husbands, K. Keutzer, D.A. Patterson, W.L. Plishker, J. Shalf, S.W. Williams, K.A. Yelick, The Landscape of Parallel Computing Research: A View from Berkeley, Tech. Rep. UCB/EECS-2006-183, Electrical Engineering and Computer Sciences Department, University of California at Berkeley, 2006.
- K. Asanovic, R. Bodik, B.C. Catanzaro, J.J. Gebis, P. Husbands, K. Keutzer, D.A. Patterson, W.L. Plishker, J. Shalf, S.W. Williams, K.A. Yelick, The Landscape of Parallel Computing Research: A View from Berkeley, Tech. Rep. UCB/EECS-2006-183, Electrical Engineering and Computer Sciences Department, University of California at Berkeley, 2006.

7
- 0003851784
- SIAM
- Dongarra J.J., Duff I.S., Sorensen D.C., and van der Vorst H.A. Numerical Linear Algebra for High-performance Computers (1998), SIAM
- (1998) Numerical Linear Algebra for High-performance Computers
- Dongarra, J.J.¹ Duff, I.S.² Sorensen, D.C.³ van der Vorst, H.A.⁴

8
- 0003424372
- SIAM
- Demmel J.W. Applied Numerical Linear Algebra (1997), SIAM
- (1997) Applied Numerical Linear Algebra
- Demmel, J.W.¹

9
- 0003706460
- SIAM
- Anderson E., Bai Z., Bischof C., Blackford L.S., Demmel J.W., Dongarra J.J., Du Croz J., Greenbaum A., Hammarling S., McKenney A., and Sorensen D. LAPACK Users' Guide (1992), SIAM
- (1992) LAPACK Users' Guide
- Anderson, E.¹ Bai, Z.² Bischof, C.³ Blackford, L.S.⁴ Demmel, J.W.⁵ Dongarra, J.J.⁶ Du Croz, J.⁷ Greenbaum, A.⁸ Hammarling, S.⁹ McKenney, A.¹⁰ Sorensen, D.¹¹

10
- 0003615167
- SIAM
- Blackford L.S., Choi J., Cleary A., D'Azevedo E., Demmel J., Dhillon I., Dongarra J.J., Hammarling S., Henry G., Petitet A., Stanley K., Walker D., and Whaley R.C. ScaLAPACK Users' Guide (1997), SIAM
- (1997) ScaLAPACK Users' Guide
- Blackford, L.S.¹ Choi, J.² Cleary, A.³ D'Azevedo, E.⁴ Demmel, J.⁵ Dhillon, I.⁶ Dongarra, J.J.⁷ Hammarling, S.⁸ Henry, G.⁹ Petitet, A.¹⁰ Stanley, K.¹¹ Walker, D.¹² Whaley, R.C.¹³

11
- 60649093894
- Basic Linear Algebra Technical Forum, Basic Linear Algebra Technical Forum Standard, August 2001.
- Basic Linear Algebra Technical Forum, Basic Linear Algebra Technical Forum Standard, August 2001.

12
- 0032155271
- GEMM-based Level 3 BLAS: high-performance model implementations and performance evaluation Benchmark
- Kågström B., Ling P., and van Loan C. GEMM-based Level 3 BLAS: high-performance model implementations and performance evaluation Benchmark. ACM Trans. Math. Soft. 24 3 (1998) 268-302
- (1998) ACM Trans. Math. Soft. , vol.24 , Issue.3 , pp. 268-302
- Kågström, B.¹ Ling, P.² van Loan, C.³

13
- 60649086375
- ATLAS
- ATLAS. .

14
- 60649096937
- GotoBLAS
- GotoBLAS. .

15
- 35248843628
- E. Chan, E.S. Quintana-Orti, G. Gregorio Quintana-Orti, R. van de Geijn, Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, in: Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures SPAA'07, 2007, pp. 116-125.
- E. Chan, E.S. Quintana-Orti, G. Gregorio Quintana-Orti, R. van de Geijn, Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, in: Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures SPAA'07, 2007, pp. 116-125.

16
- 60649083594
- LAPACK Working Note 178: Implementing Linear Algebra Routines on Multi-Core Processors
- Tech. Rep. CS-07-581, Electrical Engineering and Computer Science Department, University of Tennessee
- J. Kurzak, J.J. Dongarra, LAPACK Working Note 178: Implementing Linear Algebra Routines on Multi-Core Processors, Tech. Rep. CS-07-581, Electrical Engineering and Computer Science Department, University of Tennessee, 2006.
- (2006)
- Kurzak, J.¹ Dongarra, J.J.²

17
- 24644482622
- Analysis of memory hierarchy performance of block data layout
- N. Park, B. Hong, V.K. Prasanna, Analysis of memory hierarchy performance of block data layout, in: International Conference on Parallel Processing, 2002.
- (2002) International Conference on Parallel Processing
- Park, N.¹ Hong, B.² Prasanna, V.K.³

18
- 0042235298
- Tiling, block data layout, and memory hierarchy performance
- Park N., Hong B., and Prasanna V.K. Tiling, block data layout, and memory hierarchy performance. IEEE Trans. Parallel Distrib. Syst. 14 7 (2003) 640-654
- (2003) IEEE Trans. Parallel Distrib. Syst. , vol.14 , Issue.7 , pp. 640-654
- Park, N.¹ Hong, B.² Prasanna, V.K.³

19
- 60649095348
- Using nonlinear array layouts in dense matrix operations
- J.R. Herrero, J.J. Navarro, Using nonlinear array layouts in dense matrix operations, in: Workshop on State-of-the-Art in Scientific and Parallel Computing PARA'06, 2006.
- (2006) Workshop on State-of-the-Art in Scientific and Parallel Computing PARA'06
- Herrero, J.R.¹ Navarro, J.J.²

20
- 51049083291
- LAPACK Working Note 190: Parallel Tiled QR Factorization for Multicore Architectures
- Tech. Rep. CS-07-598, Electrical Engineering and Computer Science Department, University of Tennessee
- A. Buttari, J. Langou, J. Kurzak, J.J. Dongarra, LAPACK Working Note 190: Parallel Tiled QR Factorization for Multicore Architectures, Tech. Rep. CS-07-598, Electrical Engineering and Computer Science Department, University of Tennessee, 2007.
- (2007)
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.J.⁴

21
- 60649086938
- LAPACK Working Note 191: A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures
- Tech. Rep. CS-07-600, Electrical Engineering and Computer Science Department, University of Tennessee
- A. Buttari, J. Langou, J. Kurzak, J.J. Dongarra, LAPACK Working Note 191: A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures, Tech. Rep. CS-07-600, Electrical Engineering and Computer Science Department, University of Tennessee, 2007.
- (2007)
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.J.⁴

22
- 34548753903
- November 2005
- T. Chen, R. Raghavan, J. Dale, E. Iwata, Cell Broadband Engine Architecture and its First Implementation, A Performance View, November 2005. .
- Cell Broadband Engine Architecture and its First Implementation, A Performance View
- Chen, T.¹ Raghavan, R.² Dale, J.³ Iwata, E.⁴

23
- 34547360464
- Implementation of mixed precision in solving systems of linear equations on the CELL processor
- Kurzak J., and Dongarra J.J. Implementation of mixed precision in solving systems of linear equations on the CELL processor. Concurrency Comput. Pract. Exper. 19 10 (2007) 1371-1385
- (2007) Concurrency Comput. Pract. Exper. , vol.19 , Issue.10 , pp. 1371-1385
- Kurzak, J.¹ Dongarra, J.J.²

24
- 49349111725
- Solving systems of linear equations on the CELL processor using Cholesky factorization
- Kurzak J., Buttari A., and Dongarra J.J. Solving systems of linear equations on the CELL processor using Cholesky factorization. IEEE Trans. Parallel Distrib. Syst. 19 9 (2008) 1175-1186
- (2008) IEEE Trans. Parallel Distrib. Syst. , vol.19 , Issue.9 , pp. 1175-1186
- Kurzak, J.¹ Buttari, A.² Dongarra, J.J.³

25
- 60649091697
- February
- D. Hackenberg, Einsatz und Leistungsanalyse der Cell Broadband Engine, Institut für Technische Informatik, Fakultät Informatik, Technische Universität Dresden, Großer Beleg, February 2007.
- (2007) Einsatz und Leistungsanalyse der Cell Broadband Engine, Institut für Technische Informatik, Fakultät Informatik, Technische Universität Dresden, Großer Beleg
- Hackenberg, D.¹

26
- 70350386483
- July 2007
- D. Hackenberg, Fast Matrix Multiplication on CELL Systems, July 2007. .
- Fast Matrix Multiplication on CELL Systems
- Hackenberg, D.¹

27
- 60649103784
- IBM Corporation, November
- IBM Corporation, ALF for Cell BE Programmer's Guide and API Reference, November 2007.
- (2007) ALF for Cell BE Programmer's Guide and API Reference

28
- 60649083592
- M. Pepe, Multi-Core Framework MCF, Mercury Computer Systems, Version 0.4.4, October 2006
- M. Pepe, Multi-Core Framework (MCF), Mercury Computer Systems, Version 0.4.4, October 2006.

29
- 60649095841
- Mercury Computer Systems, Inc
- Mercury Computer Systems, Inc., Scientific Algorithm Library (SAL) Data Sheet, 2006. .
- (2006) Scientific Algorithm Library (SAL) Data Sheet

30
- 60649110446
- IBM Corporation, November
- IBM Corporation, SIMD Math Library API Reference Manual, November 2007.
- (2007) Library API Reference Manual
- Math, S.I.M.D.¹

31
- 60649102034
- I. Corporation, Mathematical Acceleration Subsystem-product Overview, March 2007. .
- I. Corporation, Mathematical Acceleration Subsystem-product Overview, March 2007. .

32
- 60649120425
- Mercury Computer Systems, Inc
- TM) Data Sheet, 2006. .
- (2006) TM) Data Sheet

33
- 60649113854
- European Center for Parallelism of Barcelona, Technical University of Catalonia, Version 3.1, October
- European Center for Parallelism of Barcelona, Technical University of Catalonia, Paraver, Parallel Program Visualization and Analysis Tool Reference Manual, Version 3.1, October 2001.
- (2001) Paraver, Parallel Program Visualization and Analysis Tool Reference Manual

34
- 60649100652
- IBM Corporation, Software Development Kit 2.1 Programmer's Guide, Version 2.1, March 2007.
- IBM Corporation, Software Development Kit 2.1 Programmer's Guide, Version 2.1, March 2007.

35
- 34250487811
- Gaussian elimination is not optimal
- Strassen V. Gaussian elimination is not optimal. Numer. Math. 13 (1969) 354-356
- (1969) Numer. Math. , vol.13 , pp. 354-356
- Strassen, V.¹

36
- 85023205150
- Matrix multiplication via arithmetic progressions
- Coppersmith D., and Winograd S. Matrix multiplication via arithmetic progressions. J. Symbol. Comput. 9 3 (1990) 251-280
- (1990) J. Symbol. Comput. , vol.9 , Issue.3 , pp. 251-280
- Coppersmith, D.¹ Winograd, S.²

37
- 0035023971
- Emmerald: A fast matrix-matrix multiply using Intel's SSE instructions
- Aberdeen D., and Baxter J. Emmerald: A fast matrix-matrix multiply using Intel's SSE instructions. Concurrency Comput. Pract. Exper. 13 2 (2001) 103-119
- (2001) Concurrency Comput. Pract. Exper. , vol.13 , Issue.2 , pp. 103-119
- Aberdeen, D.¹ Baxter, J.²

38
- 34247349114
- The potential of the cell processor for scientific computing
- S. Williams, J. Shalf, L. Oliker, S. Kamil, P. Husbands, K. Yelick, The potential of the cell processor for scientific computing, in: ACM International Conference on Computing Frontiers, 2006.
- (2006) ACM International Conference on Computing Frontiers
- Williams, S.¹ Shalf, J.² Oliker, L.³ Kamil, S.⁴ Husbands, P.⁵ Yelick, K.⁶

39
- 34250216007
- Scientific computing kernels on the cell processor
- Williams S., Shalf J., Oliker L., Kamil S., Husbands P., and Yelick K. Scientific computing kernels on the cell processor. Int. J. Parallel Prog. 35 3 (2007) 263-298
- (2007) Int. J. Parallel Prog. , vol.35 , Issue.3 , pp. 263-298
- Williams, S.¹ Shalf, J.² Oliker, L.³ Kamil, S.⁴ Husbands, P.⁵ Yelick, K.⁶

40
- 60649092533
- IBM Corporation, November
- IBM Corporation, Basic Linear Algebra Subprograms Programmer's Guide and API Reference, November 2007.
- (2007) Basic Linear Algebra Subprograms Programmer's Guide and API Reference

41
- 27644524078
- A streaming processing unit for a CELL processor
- B. Flachs, S. Asano, S.H. Dhong, P. Hofstee, G. Gervais, R. Kim, T. Le, P. Liu, J. Leenstra, J. Liberty, B. Michael, H. Oh, S.M. Mueller, O. Takahashi, A. Hatakeyama, Y. Watanabe, N. Yanoz, A streaming processing unit for a CELL processor, in: IEEE International Solid-State Circuits Conference, 2005, pp. 134-135.
- (2005) IEEE International Solid-State Circuits Conference , pp. 134-135
- Flachs, B.¹ Asano, S.² Dhong, S.H.³ Hofstee, P.⁴ Gervais, G.⁵ Kim, R.⁶ Le, T.⁷ Liu, P.⁸ Leenstra, J.⁹ Liberty, J.¹⁰ Michael, B.¹¹ Oh, H.¹² Mueller, S.M.¹³ Takahashi, O.¹⁴ Hatakeyama, A.¹⁵ Watanabe, Y.¹⁶ Yanoz, N.¹⁷

42
- 31344445939
- The microarchitecture of the synergistic processor for a cell processor
- Flachs B., Asano S., Dhong S.H., Hofstee H.P., Gervais G., Roy K., Le T., Peichun L., Leenstra J., Liberty J., Michael B., Hwa-Joon O., Mueller S.M., Takahashi O., Hatakeyama A., Watanabe Y., Yano N., Brokenshire D.A., Peyravian M., Vandung T., and Iwata E. The microarchitecture of the synergistic processor for a cell processor. IEEE J. Solid-State Circ. 41 1 (2006) 63-70
- (2006) IEEE J. Solid-State Circ. , vol.41 , Issue.1 , pp. 63-70
- Flachs, B.¹ Asano, S.² Dhong, S.H.³ Hofstee, H.P.⁴ Gervais, G.⁵ Roy, K.⁶ Le, T.⁷ Peichun, L.⁸ Leenstra, J.⁹ Liberty, J.¹⁰ Michael, B.¹¹ Hwa-Joon, O.¹² Mueller, S.M.¹³ Takahashi, O.¹⁴ Hatakeyama, A.¹⁵ Watanabe, Y.¹⁶ Yano, N.¹⁷ Brokenshire, D.A.¹⁸ Peyravian, M.¹⁹ Vandung, T.²⁰ Iwata, E.²¹ more..

43
- 33646015987
- Synergistic processing in cell's multicore architecture
- Gschwind M., Hofstee H.P., Flachs B., Hopkins M., Watanabe Y., and Yamazaki T. Synergistic processing in cell's multicore architecture. IEEE Micro 26 2 (2006) 10-24
- (2006) IEEE Micro , vol.26 , Issue.2 , pp. 10-24
- Gschwind, M.¹ Hofstee, H.P.² Flachs, B.³ Hopkins, M.⁴ Watanabe, Y.⁵ Yamazaki, T.⁶

44
- 0004302191
- Hennessy J.L., and Patterson D.A. Computer Architecture: A Quantitative Approach. fourth ed. (2006)
- (2006) Computer Architecture: A Quantitative Approach. fourth ed.
- Hennessy, J.L.¹ Patterson, D.A.²

45
- 0003502903
- Morgan Kaufmann
- Muchnick S. Advanced Compiler Design and Implementation (1997), Morgan Kaufmann
- (1997) Advanced Compiler Design and Implementation
- Muchnick, S.¹

46
- 60649098320
- IBM Corporation, Preventing synergistic processor element indefinite stalls resulting from instruction depletion in the Cell Broadband Engine Processor for CMOS SOI 90 nm, Applications Note, Version 1.0, November 2007
- IBM Corporation, Preventing synergistic processor element indefinite stalls resulting from instruction depletion in the Cell Broadband Engine Processor for CMOS SOI 90 nm, Applications Note, Version 1.0, November 2007.

47
- 0042674307
- The LINPACK Benchmark: past, present and future
- Dongarra J.J., Luszczek P., and Petitet A. The LINPACK Benchmark: past, present and future. Concurrency Comput. Pract. Exper. 15 9 (2003) 803-820
- (2003) Concurrency Comput. Pract. Exper. , vol.15 , Issue.9 , pp. 803-820
- Dongarra, J.J.¹ Luszczek, P.² Petitet, A.³

48
- 84870399830
- TOP500 Supercomputing Sites. .
- TOP500 Supercomputing Sites

49
- 2942741324
- Exploiting superword-level locality in multimedia extension architectures
- Shin J., Chame J., and Hall M.W. Exploiting superword-level locality in multimedia extension architectures. J. Instr. Level Parallel. 5 (2003) 1-28
- (2003) J. Instr. Level Parallel. , vol.5 , pp. 1-28
- Shin, J.¹ Chame, J.² Hall, M.W.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.