메뉴 건너뛰기




Volumn 18, Issue , 2009, Pages 3-26

Scheduling for numerical linear algebra library at scale

Author keywords

Cholesky; linear algebra; matrix factorization; multicore; QR; scheduling; task graph

Indexed keywords


EID: 84906552224     PISSN: 09275452     EISSN: None     Source Type: Book Series    
DOI: 10.3233/978-1-60750-073-5-3     Document Type: Conference Paper
Times cited : (2)

References (42)
  • 3
    • 84906552659 scopus 로고    scopus 로고
    • Co-Array Fortran
    • Co-Array Fortran. http://www.co-array.org/
  • 4
    • 84906552650 scopus 로고    scopus 로고
    • The Berkeley Unified Parallel C (UPC) project
    • The Berkeley Unified Parallel C (UPC) project. http://upc.lbl.gov/
  • 5
    • 84906552651 scopus 로고    scopus 로고
    • Titanium project home page
    • Titanium project home page. http://titanium.cs.berkeley.edu/.
  • 6
    • 84906552652 scopus 로고    scopus 로고
    • Cray Inc. Chapel Language Specification 0.775
    • Cray, Inc. Chapel Language Specification 0.775. http://chapel.cs. washington.edu/spec-0.775.pdf.
  • 7
    • 84906552653 scopus 로고    scopus 로고
    • Sun Microsystems, Inc. The Fortress Language Specification, Version 1.0
    • Sun Microsystems, Inc. The Fortress Language Specification, Version 1.0, 2008. http://research.sun.com/projects/plrg/Publications/fortress.1.0.pdf.
    • (2008)
  • 10
    • 84906552654 scopus 로고    scopus 로고
    • Intel Threading Building Blocks
    • Intel Threading Building Blocks. http://www. threadingbuildingblocks.org/ .
  • 12
    • 84906552655 scopus 로고    scopus 로고
    • OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.0
    • OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.0, 2008. http://www.openmp.org/mp-documents/spec30. pdf
    • (2008)
  • 13
    • 84906552596 scopus 로고    scopus 로고
    • The community of OpenMP users, researchers, tool developers and providers
    • The community of OpenMP users, researchers, tool developers and providers. http://www.compunity.org/.
  • 15
    • 67650056929 scopus 로고    scopus 로고
    • Extending the openmp tasking model to allow dependent tasks
    • West Lafayette, IN, May 12-14 2008. Lecture Notes in Computer Science, DOI: 10.1007/978-3-540-79561-2-10
    • A. Duran, J. M. Perez, R. M. Ayguadé, E. amd Badia, and J. Labarta. Extending the OpenMP tasking model to allow dependent tasks. In OpenMP in a New Era of Parallelism, 4th International Workshop, IWOMP 2008, West Lafayette, IN, May 12-14 2008. Lecture Notes in Computer Science 5004:111-122. DOI: 10.1007/978-3-540-79561-2-10.
    • (2008) OpenMP in A New Era of Parallelism, 4th International Workshop, IWOMP , vol.5004 , pp. 111-122
    • Duran, A.1    Perez, J.M.2    Ayguadé, R.M.3    Badia, E.4    Labarta, J.5
  • 16
    • 84906561407 scopus 로고    scopus 로고
    • Barcelona Supercomputing Center. SMP Superscalar (SMPSs)
    • Barcelona Supercomputing Center. SMP Superscalar (SMPSs) User's Manual, Version 2.0, 2008. http://www.bsc.es/media/1002.pdf.
    • (2008) User's Manual, Version 2.0
  • 17
    • 84906552598 scopus 로고    scopus 로고
    • Supercomputing Technologies Group. MIT Laboratory for Computer Science. Cilk 5.4.6 Reference Manual
    • Supercomputing Technologies Group, MIT Laboratory for Computer Science. Cilk 5.4.6 Reference Manual, 1998. http://supertech.csail.mit.edu/cilk/manual-5. 4.6.pdf.
    • (1998)
  • 19
    • 35649006026 scopus 로고    scopus 로고
    • CellSs: Making it easier to program the cell broadband engine processor
    • DOI: 10.1147/rd.515.0593
    • J. M. Perez, P. Bellens, R. M. Badia, and J. Labarta. CellSs: Making it easier to program the Cell Broadband Engine processor. IBM J. Res. & Dev., 51(5):593-604, 2007. DOI: 10.1147/rd.515.0593.
    • (2007) IBM J. Res. & Dev. , vol.51 , Issue.5 , pp. 593-604
    • Perez, J.M.1    Bellens, P.2    Badia, R.M.3    Labarta, J.4
  • 20
    • 0029531029 scopus 로고
    • The microarchitecture of superscalar processors
    • J. E. Smith and G. S. Sohi. The microarchitecture of superscalar processors. Proceedings of the IEEE, 83(12):1609-1624, 1995.
    • (1995) Proceedings of the IEEE , vol.83 , Issue.12 , pp. 1609-1624
    • Smith, J.E.1    Sohi, G.S.2
  • 22
    • 49349111725 scopus 로고    scopus 로고
    • Solving systems of linear equation on the cell processor using cholesky factorization
    • DOI: TPDS.2007.70813
    • J. Kurzak, A. Buttari, and J. J. Dongarra. Solving systems of linear equation on the CELL processor using Cholesky factorization. Trans. Parallel Distrib. Syst., 19(9):1175-1186, 2008. DOI: TPDS.2007.70813.
    • (2008) Trans. Parallel Distrib. Syst. , vol.19 , Issue.9 , pp. 1175-1186
    • Kurzak, J.1    Buttari, A.2    Dongarra, J.J.3
  • 24
    • 0020593101 scopus 로고
    • Solving linear algebraic equations on an mimd computer
    • DOI: 10.1145/322358.322366
    • R. E. Lord, J. S. Kowalik, and S. P. Kumar. Solving linear algebraic equations on an MIMD computer. J. ACM, 30(1):103-117, 1983. DOI: 10.1145/322358.322366.
    • (1983) J. ACM , vol.30 , Issue.1 , pp. 103-117
    • Lord, R.E.1    Kowalik, J.S.2    Kumar, S.P.3
  • 26
    • 0024891893 scopus 로고    scopus 로고
    • Vector and parallel algorithms for cholesky factorization on ibm 3090
    • Reno, NV, November 13-17 1989. ACM. DOI: 10.1145/76263.76287
    • R. C. Agarwal and F. G. Gustavson. Vector and parallel algorithms for Cholesky factorization on IBM 3090. In Proceedings of the 1989 ACM/IEEE conference on Supercomputing, pages 225-233, Reno, NV, November 13-17 1989. ACM. DOI: 10.1145/76263.76287.
    • Proceedings of the 1989 ACM/IEEE Conference on Supercomputing , pp. 225-233
    • Agarwal, R.C.1    Gustavson, F.G.2
  • 27
    • 38049005629 scopus 로고    scopus 로고
    • Implementing linear algebra routines on multi-core processors with pipelining and a look ahead
    • Umeå, Sweden, June 18-21 2006. Lecture Notes in Computer Science, DOI: 10.1007/978-3-540-75755-9-18
    • J. Kurzak and J. J. Dongarra. Implementing linear algebra routines on multi-core processors with pipelining and a look ahead. In Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA 2006, Umeå, Sweden, June 18-21 2006. Lecture Notes in Computer Science 4699:147-156. DOI: 10.1007/978-3-540-75755-9-18.
    • (2006) Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA , vol.4699 , pp. 147-156
    • Kurzak, J.1    Dongarra, J.J.2
  • 28
    • 36048997493 scopus 로고    scopus 로고
    • Multithreading for synchronization tolerance in matrix factorization
    • Boston, MA, June 24-28 2007. Journal of Physics: Conference Series 78: 012028, IOP Publishing. DOI: 10.1088/1742-6596/78/1/012028
    • A. Buttari, J. J. Dongarra, P. Husbands, J. Kurzak, and K. Yelick. Multithreading for synchronization tolerance in matrix factorization. In Scientific Discovery through Advanced Computing, SciDAC 2007, Boston, MA, June 24-28 2007. Journal of Physics: Conference Series 78:012028, IOP Publishing. DOI: 10.1088/1742-6596/78/1/012028.
    • (2007) Scientific Discovery Through Advanced Computing, SciDAC
    • Buttari, A.1    Dongarra, J.J.2    Husbands, P.3    Kurzak, J.4    Yelick, K.5
  • 29
    • 50249105132 scopus 로고    scopus 로고
    • Parallel tiled qr factorization for multicore architectures
    • DOI: 10.1002/cpe.1301
    • A. Buttari, J. Langou, J. Kurzak, and J. J. Dongarra. Parallel tiled QR factorization for multicore architectures. Concurrency Computat.: Pract. Exper., 20(13):1573-1590, 2008. DOI: 10.1002/cpe.1301.
    • (2008) Concurrency Computat.: Pract. Exper. , vol.20 , Issue.13 , pp. 1573-1590
    • Buttari, A.1    Langou, J.2    Kurzak, J.3    Dongarra, J.J.4
  • 31
    • 1642372163 scopus 로고    scopus 로고
    • Parallel and fully recursive multi-frontal sparse cholesky
    • DOI: 10.1016/j.future.2003.07.007
    • D. Irony, G. Shklarski, and S. Toledo. Parallel and fully recursive multi-frontal sparse Cholesky. Future Gener. Comput. Syst., 20(3):425-440, 2004. DOI: 10.1016/j.future.2003.07.007.
    • (2004) Future Gener. Comput. Syst. , vol.20 , Issue.3 , pp. 425-440
    • Irony, D.1    Shklarski, G.2    Toledo, S.3
  • 33
    • 0034224207 scopus 로고    scopus 로고
    • Applying recursion to serial and parallel qr factorization leads to better performance
    • E. Elmroth and F. G. Gustavson. Applying recursion to serial and parallel QR factorization leads to better performance. IBM J. Res. & Dev., 44(4):605-624, 2000.
    • (2000) IBM J. Res. & Dev. , vol.44 , Issue.4 , pp. 605-624
    • Elmroth, E.1    Gustavson, F.G.2
  • 35
    • 17644368925 scopus 로고    scopus 로고
    • Parallel out-of-core computation and updating the qr factorization
    • DOI: 10.1145/1055531.1055534
    • B. C. Gunter and R. A. van de Geijn. Parallel out-of-core computation and updating the QR factorization. ACM Transactions on Mathematical Software, 31(1):60-78, 2005. DOI: 10.1145/1055531.1055534.
    • (2005) ACM Transactions on Mathematical Software , vol.31 , Issue.1 , pp. 60-78
    • Gunter, B.C.1    Geijn De Van, R.A.2
  • 38
    • 1842832833 scopus 로고    scopus 로고
    • Recursive blocked algorithms and hybrid data structures for dense matrix library software
    • DOI: 10.1137/S0036144503428693
    • E. Elmroth, F. G. Gustavson, I. Jonsson, and B. Kågström. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review, 46(1):3-45, 2004. DOI: 10.1137/S0036144503428693.
    • (2004) SIAM Review , vol.46 , Issue.1 , pp. 3-45
    • Elmroth, E.1    Gustavson, F.G.2    Jonsson, I.3    Kågström, B.4
  • 39
    • 24644482622 scopus 로고    scopus 로고
    • Analysis of memory hierarchy performance of block data layout
    • Vancouver, Canada, August 18-21 2002. IEEE Computer Society. DOI: 10.1109/ICPP.2002.1040857
    • N. Park, B. Hong, and V. K. Prasanna. Analysis of memory hierarchy performance of block data layout. In Proceedings of the 2002 International Conference on Parallel Processing, ICPP'02, pages 35-44, Vancouver, Canada, August 18-21 2002. IEEE Computer Society. DOI: 10.1109/ICPP.2002.1040857.
    • Proceedings of the 2002 International Conference on Parallel Processing, ICPP'02 , pp. 35-44
    • Park, N.1    Hong, B.2    Prasanna, V.K.3
  • 40
    • 0042235298 scopus 로고    scopus 로고
    • Tiling, block data layout, and memory hierarchy performance
    • DOI: 10.1109/TPDS.2003.1214317
    • N. Park, B. Hong, and V. K. Prasanna. Tiling, block data layout, and memory hierarchy performance. IEEE Trans. Parallel Distrib. Syst., 14(7):640-654, 2003. DOI: 10.1109/TPDS.2003.1214317.
    • (2003) IEEE Trans. Parallel Distrib. Syst. , vol.14 , Issue.7 , pp. 640-654
    • Park, N.1    Hong, B.2    Prasanna, V.K.3
  • 41
    • 0001951009 scopus 로고
    • The wy representation for products of householder matrices
    • C. Bischof and C. van Loan. The WY representation for products of Householder matrices. J. Sci. Stat. Comput., 8:2-13, 1987.
    • (1987) J. Sci. Stat. Comput. , vol.8 , pp. 2-13
    • Bischof, C.1    Van Loan, C.2
  • 42
    • 0003078924 scopus 로고
    • A storage-efficient wy representation for products of householder transformations
    • R. Schreiber and C. van Loan. A storage-efficient WY representation for products of Householder transformations. J. Sci. Stat. Comput., 10:53-57, 1991.
    • (1991) J. Sci. Stat. Comput. , vol.10 , pp. 53-57
    • Schreiber, R.1    Van Loan, C.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.