메뉴 건너뛰기




Volumn 35, Issue 3, 2009, Pages 138-150

Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor

Author keywords

Instruction level parallelism; Loop optimizations; Single Instruction Multiple Data; Synergistic Processing Element; Vectorization

Indexed keywords

CELLS; COMPUTER ARCHITECTURE; COMPUTER GRAPHICS; CYTOLOGY; DATA HANDLING; DIGITAL ARITHMETIC; DIGITAL STORAGE; EIGENVALUES AND EIGENFUNCTIONS; GRAPHICS PROCESSING UNIT; LINEAR SYSTEMS; PROGRAM COMPILERS;

EID: 60649099576     PISSN: 01678191     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.parco.2008.12.010     Document Type: Article
Times cited : (54)

References (49)
  • 2
    • 60649094533 scopus 로고    scopus 로고
    • IBM Corporation, Cell Broadband Engine Programming Handbook, Version 1.1, April 2007
    • IBM Corporation, Cell Broadband Engine Programming Handbook, Version 1.1, April 2007.
  • 3
    • 0032592096 scopus 로고    scopus 로고
    • Design challenges of technology scaling
    • Borkar S. Design challenges of technology scaling. IEEE Micro 19 4 (1999) 23-29
    • (1999) IEEE Micro , vol.19 , Issue.4 , pp. 23-29
    • Borkar, S.1
  • 4
    • 20344401552 scopus 로고    scopus 로고
    • Industry trends: chip makers turn to multicore processors
    • Geer D. Industry trends: chip makers turn to multicore processors. Computer 38 5 (2005) 11-13
    • (2005) Computer , vol.38 , Issue.5 , pp. 11-13
    • Geer, D.1
  • 5
    • 34548083281 scopus 로고    scopus 로고
    • The free lunch is over: A fundamental turn toward concurrency in software
    • H. Sutter, The free lunch is over: a fundamental turn toward concurrency in software, Dr. Dobb's J. 30(3).
    • Dr. Dobb's J , vol.30 , Issue.3
    • Sutter, H.1
  • 6
    • 60649085163 scopus 로고    scopus 로고
    • K. Asanovic, R. Bodik, B.C. Catanzaro, J.J. Gebis, P. Husbands, K. Keutzer, D.A. Patterson, W.L. Plishker, J. Shalf, S.W. Williams, K.A. Yelick, The Landscape of Parallel Computing Research: A View from Berkeley, Tech. Rep. UCB/EECS-2006-183, Electrical Engineering and Computer Sciences Department, University of California at Berkeley, 2006.
    • K. Asanovic, R. Bodik, B.C. Catanzaro, J.J. Gebis, P. Husbands, K. Keutzer, D.A. Patterson, W.L. Plishker, J. Shalf, S.W. Williams, K.A. Yelick, The Landscape of Parallel Computing Research: A View from Berkeley, Tech. Rep. UCB/EECS-2006-183, Electrical Engineering and Computer Sciences Department, University of California at Berkeley, 2006.
  • 11
    • 60649093894 scopus 로고    scopus 로고
    • Basic Linear Algebra Technical Forum, Basic Linear Algebra Technical Forum Standard, August 2001.
    • Basic Linear Algebra Technical Forum, Basic Linear Algebra Technical Forum Standard, August 2001.
  • 12
    • 0032155271 scopus 로고    scopus 로고
    • GEMM-based Level 3 BLAS: high-performance model implementations and performance evaluation Benchmark
    • Kågström B., Ling P., and van Loan C. GEMM-based Level 3 BLAS: high-performance model implementations and performance evaluation Benchmark. ACM Trans. Math. Soft. 24 3 (1998) 268-302
    • (1998) ACM Trans. Math. Soft. , vol.24 , Issue.3 , pp. 268-302
    • Kågström, B.1    Ling, P.2    van Loan, C.3
  • 13
    • 60649086375 scopus 로고    scopus 로고
    • ATLAS
    • ATLAS. .
  • 14
    • 60649096937 scopus 로고    scopus 로고
    • GotoBLAS
    • GotoBLAS. .
  • 15
    • 35248843628 scopus 로고    scopus 로고
    • E. Chan, E.S. Quintana-Orti, G. Gregorio Quintana-Orti, R. van de Geijn, Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, in: Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures SPAA'07, 2007, pp. 116-125.
    • E. Chan, E.S. Quintana-Orti, G. Gregorio Quintana-Orti, R. van de Geijn, Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, in: Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures SPAA'07, 2007, pp. 116-125.
  • 16
    • 60649083594 scopus 로고    scopus 로고
    • LAPACK Working Note 178: Implementing Linear Algebra Routines on Multi-Core Processors
    • Tech. Rep. CS-07-581, Electrical Engineering and Computer Science Department, University of Tennessee
    • J. Kurzak, J.J. Dongarra, LAPACK Working Note 178: Implementing Linear Algebra Routines on Multi-Core Processors, Tech. Rep. CS-07-581, Electrical Engineering and Computer Science Department, University of Tennessee, 2006.
    • (2006)
    • Kurzak, J.1    Dongarra, J.J.2
  • 18
    • 0042235298 scopus 로고    scopus 로고
    • Tiling, block data layout, and memory hierarchy performance
    • Park N., Hong B., and Prasanna V.K. Tiling, block data layout, and memory hierarchy performance. IEEE Trans. Parallel Distrib. Syst. 14 7 (2003) 640-654
    • (2003) IEEE Trans. Parallel Distrib. Syst. , vol.14 , Issue.7 , pp. 640-654
    • Park, N.1    Hong, B.2    Prasanna, V.K.3
  • 20
    • 51049083291 scopus 로고    scopus 로고
    • LAPACK Working Note 190: Parallel Tiled QR Factorization for Multicore Architectures
    • Tech. Rep. CS-07-598, Electrical Engineering and Computer Science Department, University of Tennessee
    • A. Buttari, J. Langou, J. Kurzak, J.J. Dongarra, LAPACK Working Note 190: Parallel Tiled QR Factorization for Multicore Architectures, Tech. Rep. CS-07-598, Electrical Engineering and Computer Science Department, University of Tennessee, 2007.
    • (2007)
    • Buttari, A.1    Langou, J.2    Kurzak, J.3    Dongarra, J.J.4
  • 21
    • 60649086938 scopus 로고    scopus 로고
    • LAPACK Working Note 191: A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures
    • Tech. Rep. CS-07-600, Electrical Engineering and Computer Science Department, University of Tennessee
    • A. Buttari, J. Langou, J. Kurzak, J.J. Dongarra, LAPACK Working Note 191: A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures, Tech. Rep. CS-07-600, Electrical Engineering and Computer Science Department, University of Tennessee, 2007.
    • (2007)
    • Buttari, A.1    Langou, J.2    Kurzak, J.3    Dongarra, J.J.4
  • 23
    • 34547360464 scopus 로고    scopus 로고
    • Implementation of mixed precision in solving systems of linear equations on the CELL processor
    • Kurzak J., and Dongarra J.J. Implementation of mixed precision in solving systems of linear equations on the CELL processor. Concurrency Comput. Pract. Exper. 19 10 (2007) 1371-1385
    • (2007) Concurrency Comput. Pract. Exper. , vol.19 , Issue.10 , pp. 1371-1385
    • Kurzak, J.1    Dongarra, J.J.2
  • 24
    • 49349111725 scopus 로고    scopus 로고
    • Solving systems of linear equations on the CELL processor using Cholesky factorization
    • Kurzak J., Buttari A., and Dongarra J.J. Solving systems of linear equations on the CELL processor using Cholesky factorization. IEEE Trans. Parallel Distrib. Syst. 19 9 (2008) 1175-1186
    • (2008) IEEE Trans. Parallel Distrib. Syst. , vol.19 , Issue.9 , pp. 1175-1186
    • Kurzak, J.1    Buttari, A.2    Dongarra, J.J.3
  • 28
    • 60649083592 scopus 로고    scopus 로고
    • M. Pepe, Multi-Core Framework MCF, Mercury Computer Systems, Version 0.4.4, October 2006
    • M. Pepe, Multi-Core Framework (MCF), Mercury Computer Systems, Version 0.4.4, October 2006.
  • 31
    • 60649102034 scopus 로고    scopus 로고
    • I. Corporation, Mathematical Acceleration Subsystem-product Overview, March 2007. .
    • I. Corporation, Mathematical Acceleration Subsystem-product Overview, March 2007. .
  • 32
    • 60649120425 scopus 로고    scopus 로고
    • Mercury Computer Systems, Inc
    • TM) Data Sheet, 2006. .
    • (2006) TM) Data Sheet
  • 33
    • 60649113854 scopus 로고    scopus 로고
    • European Center for Parallelism of Barcelona, Technical University of Catalonia, Version 3.1, October
    • European Center for Parallelism of Barcelona, Technical University of Catalonia, Paraver, Parallel Program Visualization and Analysis Tool Reference Manual, Version 3.1, October 2001.
    • (2001) Paraver, Parallel Program Visualization and Analysis Tool Reference Manual
  • 34
    • 60649100652 scopus 로고    scopus 로고
    • IBM Corporation, Software Development Kit 2.1 Programmer's Guide, Version 2.1, March 2007.
    • IBM Corporation, Software Development Kit 2.1 Programmer's Guide, Version 2.1, March 2007.
  • 35
    • 34250487811 scopus 로고
    • Gaussian elimination is not optimal
    • Strassen V. Gaussian elimination is not optimal. Numer. Math. 13 (1969) 354-356
    • (1969) Numer. Math. , vol.13 , pp. 354-356
    • Strassen, V.1
  • 36
    • 85023205150 scopus 로고
    • Matrix multiplication via arithmetic progressions
    • Coppersmith D., and Winograd S. Matrix multiplication via arithmetic progressions. J. Symbol. Comput. 9 3 (1990) 251-280
    • (1990) J. Symbol. Comput. , vol.9 , Issue.3 , pp. 251-280
    • Coppersmith, D.1    Winograd, S.2
  • 37
    • 0035023971 scopus 로고    scopus 로고
    • Emmerald: A fast matrix-matrix multiply using Intel's SSE instructions
    • Aberdeen D., and Baxter J. Emmerald: A fast matrix-matrix multiply using Intel's SSE instructions. Concurrency Comput. Pract. Exper. 13 2 (2001) 103-119
    • (2001) Concurrency Comput. Pract. Exper. , vol.13 , Issue.2 , pp. 103-119
    • Aberdeen, D.1    Baxter, J.2
  • 46
    • 60649098320 scopus 로고    scopus 로고
    • IBM Corporation, Preventing synergistic processor element indefinite stalls resulting from instruction depletion in the Cell Broadband Engine Processor for CMOS SOI 90 nm, Applications Note, Version 1.0, November 2007
    • IBM Corporation, Preventing synergistic processor element indefinite stalls resulting from instruction depletion in the Cell Broadband Engine Processor for CMOS SOI 90 nm, Applications Note, Version 1.0, November 2007.
  • 49
    • 2942741324 scopus 로고    scopus 로고
    • Exploiting superword-level locality in multimedia extension architectures
    • Shin J., Chame J., and Hall M.W. Exploiting superword-level locality in multimedia extension architectures. J. Instr. Level Parallel. 5 (2003) 1-28
    • (2003) J. Instr. Level Parallel. , vol.5 , pp. 1-28
    • Shin, J.1    Chame, J.2    Hall, M.W.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.