메뉴 건너뛰기




Volumn 22, Issue 1, 2010, Pages 15-44

Scheduling dense linear algebra operations on multicore processors

Author keywords

Cholesky; Direct acyclic graph; Dynamic scheduling; Factorization; Linear algebra; LU; Matrix factorization; Multicore; QR; Scheduling; Task graph

Indexed keywords

DATA FLOW ANALYSIS; FACTORIZATION; GRAPH THEORY; LINEAR ALGEBRA; LUTETIUM; PARALLEL PROCESSING SYSTEMS; SCHEDULING; SOFTWARE ARCHITECTURE;

EID: 73149105729     PISSN: 15320626     EISSN: 15320634     Source Type: Journal    
DOI: 10.1002/cpe.1467     Document Type: Article
Times cited : (75)

References (46)
  • 3
    • 73149094576 scopus 로고    scopus 로고
    • Co-Array Fortran. Available at:, 2 June 2009
    • Co-Array Fortran. Available at: http://www.co-array.org/ [2 June 2009].
  • 4
    • 73149125407 scopus 로고    scopus 로고
    • The Berkeley Unified Parallel C (UPC) project. Available at:, 2 June 2009
    • The Berkeley Unified Parallel C (UPC) project. Available at: http://upc.lbl.gov/ [2 June 2009].
  • 5
    • 73149112464 scopus 로고    scopus 로고
    • Titanium project home page. Available at:, 2 June 2009
    • Titanium project home page. Available at: http://titanium.cs.berkeley. edu/ [2 June 2009].
  • 6
    • 73149087967 scopus 로고    scopus 로고
    • Cray, Inc, Available at:, 2 June 2009
    • Cray, Inc. Chapel Language Specification 0.775. Available at: http://chapel.cs.washington.edu/spec-0.775.pdf [2 June 2009].
    • Chapel Language Specification 0.775
  • 7
    • 73149094353 scopus 로고    scopus 로고
    • Sun Microsystems Inc. The Fortress Language Specification, Version 1.0, 2008. Available at: http://research. sun.com/projects/plrg/ Publications/fortress.1.0.pdf [2 June 2009].
    • Sun Microsystems Inc. The Fortress Language Specification, Version 1.0, 2008. Available at: http://research. sun.com/projects/plrg/ Publications/fortress.1.0.pdf [2 June 2009].
  • 8
    • 73149098525 scopus 로고    scopus 로고
    • Saraswat V, Nystrom N. Report on the Experimental Language X10, Version 1.7, 2008. Available at:, 2 June 2009
    • Saraswat V, Nystrom N. Report on the Experimental Language X10, Version 1.7, 2008. Available at: http://dist.codehaus.org/x10/documentation/ languagespec/x10-170.pdf [2 June 2009].
  • 10
    • 35448932427 scopus 로고    scopus 로고
    • Available at:, 2 June 2009
    • Intel Threading Building Blocks. Available at: http://www. threadingbuildingblocks.org/ [2 June 2009].
    • Intel Threading Building Blocks
  • 11
    • 73149089278 scopus 로고    scopus 로고
    • Reinders J. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly Media, Inc., 2007. Available at: http://www.amazon.com/exec/obidos/ASIN/0596514808/ ISBN: 0596514808 [2 June 2009].
    • Reinders J. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly Media, Inc., 2007. Available at: http://www.amazon.com/exec/obidos/ASIN/0596514808/ ISBN: 0596514808 [2 June 2009].
  • 12
    • 73149120555 scopus 로고    scopus 로고
    • OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.0, 2008. Available at:, 2 June 2009
    • OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.0, 2008. Available at: http://www.openmp.org/mp- documents/spec30.pdf [2 June 2009].
  • 14
    • 48949090561 scopus 로고    scopus 로고
    • A proposal for task parallelism in OpenMP
    • A Practical Programming Model for the Multi-Core Era, 3rd International Workshop on OpenMP, IWOMP 2007 , Beijing, China. Springer: Berlin, 3-7 June, DOI: 10.1007/978-3-540- 69303-1.1
    • Ayguadé E, Copty N, Duran A, Hoeflinger J, Lin Y, Massaioli F, Su E, Unnikrishnan P, Zhang G. A proposal for task parallelism in OpenMP. A Practical Programming Model for the Multi-Core Era, 3rd International Workshop on OpenMP, IWOMP 2007 (Lecture Notes in Computer Science, vol. 4935), Beijing, China. Springer: Berlin, 3-7 June 2007; 1-12. DOI: 10.1007/978-3-540- 69303-1.1.
    • (2007) Lecture Notes in Computer Science , vol.4935 , pp. 1-12
    • Ayguadé, E.1    Copty, N.2    Duran, A.3    Hoeflinger, J.4    Lin, Y.5    Massaioli, F.6    Su, E.7    Unnikrishnan, P.8    Zhang, G.9
  • 15
    • 67650056929 scopus 로고    scopus 로고
    • Extending the OpenMP tasking model to allow dependent tasks
    • OpenMP in a New Era of Parallelism, 4th International Workshop, IWOMP 2008 , West Lafayette, IN. Springer: Berlin, 12-14 May, DOI: 10.1007/978-3-540-79561-2.10
    • Duran A, Perez JM, Ayguadé RM, Badia Labarta J. Extending the OpenMP tasking model to allow dependent tasks. OpenMP in a New Era of Parallelism, 4th International Workshop, IWOMP 2008 (Lecture Notes in Computer Science, vol. 5004), West Lafayette, IN. Springer: Berlin, 12-14 May 2008; 111-122. DOI: 10.1007/978-3-540-79561-2.10.
    • (2008) Lecture Notes in Computer Science , vol.5004 , pp. 111-122
    • Duran, A.1    Perez, J.M.2    Ayguadé, R.M.3    Badia Labarta, J.4
  • 16
    • 73149093256 scopus 로고    scopus 로고
    • Barcelona Supercomputing Center. SMP Superscalar (SMPSs) User's Manual, Version 2.0, 2008. Available at: http://www.bsc.es/media/1002.pdf [2 June 2009].
    • Barcelona Supercomputing Center. SMP Superscalar (SMPSs) User's Manual, Version 2.0, 2008. Available at: http://www.bsc.es/media/1002.pdf [2 June 2009].
  • 17
    • 73149085347 scopus 로고    scopus 로고
    • Supercomputing Technologies Group. Cilk 5.4.6 Reference Manual, MIT Laboratory for Computer Science, 1998. Available at: http://supertech.csail. mit.edu/cilk/manual-5.4.6.pdf [2 June 2009].
    • Supercomputing Technologies Group. Cilk 5.4.6 Reference Manual, MIT Laboratory for Computer Science, 1998. Available at: http://supertech.csail. mit.edu/cilk/manual-5.4.6.pdf [2 June 2009].
  • 19
    • 35649006026 scopus 로고    scopus 로고
    • CellSs: Making it easier to program the Cell Broadband Engine processor
    • DOI: 10.1147/rd.515.0593
    • Perez JM, Bellens P, Badia RM, Labarta J. CellSs: Making it easier to program the Cell Broadband Engine processor. IBM Journal of Research and Development 2007; 51(5): 593-604. DOI: 10.1147/rd.515.0593.
    • (2007) IBM Journal of Research and Development , vol.51 , Issue.5 , pp. 593-604
    • Perez, J.M.1    Bellens, P.2    Badia, R.M.3    Labarta, J.4
  • 20
    • 0029531029 scopus 로고
    • The microarchitecture of superscalar processors
    • Smith JE, Sohi GS. The microarchitecture of superscalar processors. Proceedings of the IEEE 1995; 83(12): 1609-1624.
    • (1995) Proceedings of the IEEE , vol.83 , Issue.12 , pp. 1609-1624
    • Smith, J.E.1    Sohi, G.S.2
  • 22
    • 49349111725 scopus 로고    scopus 로고
    • Solving systems of linear equation on the CELL processor using Cholesky factorization
    • DOI: TPDS.2007.70813
    • Kurzak J, Buttari A, Dongarra JJ. Solving systems of linear equation on the CELL processor using Cholesky factorization. IEEE Transactions on Parallel and Distributed Systems 2008; 19(19): 1175-1186. DOI: TPDS.2007.70813.
    • (2008) IEEE Transactions on Parallel and Distributed Systems , vol.19 , Issue.19 , pp. 1175-1186
    • Kurzak, J.1    Buttari, A.2    Dongarra, J.J.3
  • 23
    • 60649117581 scopus 로고    scopus 로고
    • QR factorization for the CELL processor
    • DOI: 10.3233/SPR-2009-0268
    • Kurzak J, Dongarra JJ. QR factorization for the CELL processor. Scientific Programming 2009; 17(1-2): 31-42. DOI: 10.3233/SPR-2009-0268.
    • (2009) Scientific Programming , vol.17 , Issue.1-2 , pp. 31-42
    • Kurzak, J.1    Dongarra, J.J.2
  • 24
    • 0020593101 scopus 로고
    • Solving linear algebraic equations on an MIMD computer
    • DOI: 10.1145/322358.322366
    • Lord RE, Kowalik JS, Kumar SP. Solving linear algebraic equations on an MIMD computer. Journal of the ACM 1983; 30(1): 103-117. DOI: 10.1145/322358.322366.
    • (1983) Journal of the ACM , vol.30 , Issue.1 , pp. 103-117
    • Lord, R.E.1    Kowalik, J.S.2    Kumar, S.P.3
  • 26
    • 0024891893 scopus 로고
    • Vector and parallel algorithms for Cholesky factorization on IBM 3090
    • Reno, NV. ACM: New York, 13-17 November, DOI: 10.1145/76263.76287
    • Agarwal RC, Gustavson FG. Vector and parallel algorithms for Cholesky factorization on IBM 3090. Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, Reno, NV. ACM: New York, 13-17 November 1989; 225-233. DOI: 10.1145/76263.76287.
    • (1989) Proceedings of the 1989 ACM/IEEE Conference on Supercomputing , pp. 225-233
    • Agarwal, R.C.1    Gustavson, F.G.2
  • 27
    • 38049005629 scopus 로고    scopus 로고
    • Implementing linear algebra routines on multi-core processors with pipelining and a look ahead
    • Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA 2006 , Umea, Sweden. Springer: Berlin, 18-21 June, DOI: 10.1007/978-3-540-75755-9-18
    • Kurzak J, Dongarra JJ. Implementing linear algebra routines on multi-core processors with pipelining and a look ahead. Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA 2006 (Lecture Notes in Computer Science, vol. 4699), Umea, Sweden. Springer: Berlin, 18-21 June 2006; 147-156. DOI: 10.1007/978-3-540-75755-9-18.
    • (2006) Lecture Notes in Computer Science , vol.4699 , pp. 147-156
    • Kurzak, J.1    Dongarra, J.J.2
  • 28
    • 36048997493 scopus 로고    scopus 로고
    • Buttari A, Dongarra JJ, Husbands P, Kurzak J, Yelick K. Multithreading for synchronization tolerance in matrix factorization. Scientific Discovery Through Advanced Computing, SciDAC 2007 (Journal of Physics: Conference Series, 78: 012028), Boston, MA. IOP Publishing: Bristol, U.K., 24-28 June 2007. DOI: 10.1088/1742-6596/78/1/012028.
    • Buttari A, Dongarra JJ, Husbands P, Kurzak J, Yelick K. Multithreading for synchronization tolerance in matrix factorization. Scientific Discovery Through Advanced Computing, SciDAC 2007 (Journal of Physics: Conference Series, vol. 78: 012028), Boston, MA. IOP Publishing: Bristol, U.K., 24-28 June 2007. DOI: 10.1088/1742-6596/78/1/012028.
  • 30
    • 58149269099 scopus 로고    scopus 로고
    • A class of parallel tiled linear algebra algorithms for multicore architectures
    • DOI: 10.1016/j.parco.2008. 10.002
    • Buttari A, Langou J, Kurzak J, Dongarra JJ. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Computing: Systems and Applications 2009; 35: 38-53. DOI: 10.1016/j.parco.2008. 10.002
    • (2009) Parallel Computing: Systems and Applications , vol.35 , pp. 38-53
    • Buttari, A.1    Langou, J.2    Kurzak, J.3    Dongarra, J.J.4
  • 33
    • 0003078924 scopus 로고
    • A storage-efficient WY representation for products of Householder transformations
    • Schreiber R, van Loan C. A storage-efficient WY representation for products of Householder transformations. Journal on Scientific and Statistical Computing 1991; 10: 53-57.
    • (1991) Journal on Scientific and Statistical Computing , vol.10 , pp. 53-57
    • Schreiber, R.1    van Loan, C.2
  • 34
    • 0034224207 scopus 로고    scopus 로고
    • Applying recursion to serial and parallel QR factorization leads to better performance
    • Elmroth E, Gustavson FG. Applying recursion to serial and parallel QR factorization leads to better performance. IBM Journal of Research and Development 2000; 44(4): 605-624.
    • (2000) IBM Journal of Research and Development , vol.44 , Issue.4 , pp. 605-624
    • Elmroth, E.1    Gustavson, F.G.2
  • 35
    • 84957033906 scopus 로고    scopus 로고
    • High-performance library software for QR factorization
    • Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, 5th International Workshop, PARA 2000 , Bergen, Norway. Springer: Berlin, DOI: 10.1007/3-540-70734-4.9
    • Elmroth E, Gustavson FG. High-performance library software for QR factorization. Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, 5th International Workshop, PARA 2000 (Lecture Notes in Computer Science, vol. 1947), Bergen, Norway. Springer: Berlin, 18-20 2000; 53-63. DOI: 10.1007/3-540-70734-4.9.
    • (2000) Lecture Notes in Computer Science , vol.1947
    • Elmroth, E.1    Gustavson, F.G.2
  • 36
    • 84947936389 scopus 로고    scopus 로고
    • New serial and parallel recursive QR factorization algorithms for SMP systems
    • Applied Parallel Computing, Large Scale Scientific and Industrial Problems, 4th International Workshop, PARA'98 , Umea, Sweden. Springer: Berlin, 14-17 June, DOI: 10.1007/BFb0095328. Available at: 10.1007/BFb0095328 [2 June 2009
    • Elmroth E, Gustavson FG. New serial and parallel recursive QR factorization algorithms for SMP systems. Applied Parallel Computing, Large Scale Scientific and Industrial Problems, 4th International Workshop, PARA'98 (Lecture Notes in Computer Science, vol. 1541), Umea, Sweden. Springer: Berlin, 14-17 June 1998; 120-128. DOI: 10.1007/BFb0095328. Available at: http://dx.doi.org/10.1007/BFb0095328 [2 June 2009].
    • (1998) Lecture Notes in Computer Science , vol.1541 , pp. 120-128
    • Elmroth, E.1    Gustavson, F.G.2
  • 38
    • 73149115815 scopus 로고    scopus 로고
    • LAPACK working note 68: A highly parallel algorithm for the reduction of a nonsymmetric matrix to block upper-Hessenberg form
    • Technical Report UT-CS-94-221, Computer Science Department, University of Tennessee, 1994. Available at:, 2 June
    • Berry MW, Dongarra JJ, Kim Y. LAPACK working note 68: A highly parallel algorithm for the reduction of a nonsymmetric matrix to block upper-Hessenberg form. Technical Report UT-CS-94-221, Computer Science Department, University of Tennessee, 1994. Available at: http://www.netlib.org/lapack/ lawnspdf/lawn68.pdf [2 June 2009].
    • (2009)
    • Berry, M.W.1    Dongarra, J.J.2    Kim, Y.3
  • 39
    • 0031273280 scopus 로고    scopus 로고
    • Recursion leads to automatic variable blocking for dense linear-algebra algorithms
    • DOI: 10.1147/rd.416.0737
    • Gustavson FG. Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM Journal of Research and Development 1997; 41(6): 737-756. DOI: 10.1147/rd.416.0737.
    • (1997) IBM Journal of Research and Development , vol.41 , Issue.6 , pp. 737-756
    • Gustavson, F.G.1
  • 41
    • 38049054439 scopus 로고    scopus 로고
    • Minimal data copy for dense linear algebra factorization
    • Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA 2006 PARA 2006., Umeå, Sweden. Springer: Berlin, 18-21 June, DOI: 10.1007/978-3-540-75755-9.66
    • Gustavson FG, Gunnels JA, Sexton JC. Minimal data copy for dense linear algebra factorization. Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA 2006 PARA 2006.(Lecture Notes in Computer Science, vol. 4699), Umeå, Sweden. Springer: Berlin, 18-21 June 2006; 540-549. DOI: 10.1007/978-3-540-75755-9.66.
    • (2006) Lecture Notes in Computer Science , vol.4699 , pp. 540-549
    • Gustavson, F.G.1    Gunnels, J.A.2    Sexton, J.C.3
  • 42
    • 1842832833 scopus 로고    scopus 로고
    • Recursive blocked algorithms and hybrid data structures for dense matrix library software
    • DOI: 10.1137/S0036144503428693
    • Elmroth E, Gustavson FG, Jonsson I, Kågström B. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review 2004; 46(1): 3-45. DOI: 10.1137/S0036144503428693.
    • (2004) SIAM Review , vol.46 , Issue.1 , pp. 3-45
    • Elmroth, E.1    Gustavson, F.G.2    Jonsson, I.3    Kågström, B.4
  • 43
    • 17644368925 scopus 로고    scopus 로고
    • Parallel out-of-core computation and updating the QR factorization
    • DOI: 10.1145/1055531.1055534
    • Gunter BC, van de Geijn RA. Parallel out-of-core computation and updating the QR factorization. ACM Transactions on Mathematical Software 2005; 31(1): 60-78. DOI: 10.1145/1055531.1055534.
    • (2005) ACM Transactions on Mathematical Software , vol.31 , Issue.1 , pp. 60-78
    • Gunter, B.C.1    van de Geijn, R.A.2
  • 44
    • 48849086742 scopus 로고    scopus 로고
    • Updating an LU factorization with pivoting
    • DOI: 10.1145/1377612.1377615
    • Quintana-Ortí ES, van de Geijn RA. Updating an LU factorization with pivoting. ACM Trans. Math. Softw 2008; 35(2): 11. DOI: 10.1145/1377612.1377615.
    • (2008) ACM Trans. Math. Softw , vol.35 , Issue.2 , pp. 11
    • Quintana-Ortí, E.S.1    van de Geijn, R.A.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.