SCOPUS 정보 검색 플랫폼

Advances in Parallel Computing

Volumn 18, Issue , 2009, Pages 3-26

Scheduling for numerical linear algebra library at scale

(4) Kurzak, Jakub a Ltaief, Hatem a Dongarra, Jack J a,b,c Badia, Rosa M d

a UNIVERSITY OF TENNESSEE (United States)

b OAK RIDGE NATIONAL LABORATORY (United States)

c UNIVERSITY OF MANCHESTER (United Kingdom)

d BARCELONA SUPERCOMPUTING CENTER (Spain)

Author keywords

Cholesky; linear algebra; matrix factorization; multicore; QR; scheduling; task graph

Indexed keywords

EID: 84906552224 PISSN: 09275452 EISSN: None Source Type: Book Series
DOI: 10.3233/978-1-60750-073-5-3 Document Type: Conference Paper

Times cited : (2)

References (42)

1
- 2142827336
- SIAM, Philadelphia, PA
- E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. W. Demmel, J. J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LA-PACK Users' Guide. SIAM, Philadelphia, PA, 1992. http://www.netlib. org/lapack/lug/.
- (1992) LA-PACK Users' Guide
- Anderson, E.¹ Bai, Z.² Bischof, C.³ Blackford, L.S.⁴ Demmel, J.W.⁵ Dongarra, J.J.⁶ Du Croz, J.⁷ Greenbaum, A.⁸ Hammarling, S.⁹ McKenney, A.¹⁰ Sorensen, D.¹¹

2
- 0003615167
- SIAM Philadelphia, PA
- L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users' Guide. SIAM, Philadelphia, PA, 1997. http://www.netlib.org/scalapack/slug/.
- (1997) ScaLAPACK Users' Guide
- Blackford, L.S.¹ Choi, J.² Cleary, A.³ D'Azevedo, E.⁴ Demmel, J.⁵ Dhillon, I.⁶ Dongarra, J.J.⁷ Hammarling, S.⁸ Henry, G.⁹ Petitet, A.¹⁰ Stanley, K.¹¹ Walker, D.¹² Whaley, R.C.¹³

3
- 84906552659
- Co-Array Fortran
- Co-Array Fortran. http://www.co-array.org/

4
- 84906552650
- The Berkeley Unified Parallel C (UPC) project
- The Berkeley Unified Parallel C (UPC) project. http://upc.lbl.gov/

5
- 84906552651
- Titanium project home page
- Titanium project home page. http://titanium.cs.berkeley.edu/.

6
- 84906552652
- Cray Inc. Chapel Language Specification 0.775
- Cray, Inc. Chapel Language Specification 0.775. http://chapel.cs. washington.edu/spec-0.775.pdf.

7
- 84906552653
- Sun Microsystems, Inc. The Fortress Language Specification, Version 1.0
- Sun Microsystems, Inc. The Fortress Language Specification, Version 1.0, 2008. http://research.sun.com/projects/plrg/Publications/fortress.1.0.pdf.
- (2008)

8
- 85015476921
- V. Saraswat and N. Nystrom. Report on the Experimental Language X10, Version 1.7, 2008. http://dist.codehaus.org/x10/documentation/languagespec/x10- 170.pdf.
- (2008) Report on the Experimental Language X10, Version 1.7
- Saraswat, V.¹ Nystrom, N.²

9
- 0029191296
- Cilk: An efficient multithreaded runtime system
- Santa Barbara, CA, July 19-21, ACM. DOI: 10.1145/209936.209958
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Principles and Practice of Parallel Programming, Proceedings of the fifth ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPOPP'95, pages 207-216, Santa Barbara, CA, July 19-21 1995. ACM. DOI: 10.1145/209936.209958
- (1995) Principles and Practice of Parallel Programming, Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP'95 , pp. 207-216
- Blumofe, R.D.¹ Joerg, C.F.² Kuszmaul, B.C.³ Leiserson, C.E.⁴ Randall, K.H.⁵ Zhou, Y.⁶

10
- 84906552654
- Intel Threading Building Blocks
- Intel Threading Building Blocks. http://www. threadingbuildingblocks.org/ .

11
- 43149087461
- O'Reilly Media, Inc., ISBN: 0596514808
- J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly Media, Inc., 2007. ISBN: 0596514808.
- (2007) Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism
- Reinders, J.¹

12
- 84906552655
- OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.0
- OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.0, 2008. http://www.openmp.org/mp-documents/spec30. pdf
- (2008)

13
- 84906552596
- The community of OpenMP users, researchers, tool developers and providers
- The community of OpenMP users, researchers, tool developers and providers. http://www.compunity.org/.

14
- 48949090561
- A proposal for task parallelism in openmp
- Beijing, China, June 3-7 2007. Lecture Notes in Computer Science, DOI: 10.1007/978-3-540-69303-1-1
- E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, E. Su, P. Un-nikrishnan, and G. Zhang. A proposal for task parallelism in OpenMP. In A Practical Programming Model for the Multi-Core Era, 3rd International Workshop on OpenMP, IWOMP 2007, Beijing, China, June 3-7 2007. Lecture Notes in Computer Science 4935:1-12. DOI: 10.1007/978-3-540-69303-1-1.
- (2007) A Practical Programming Model for the Multi-Core Era, 3rd International Workshop on OpenMP, IWOMP , vol.4935 , pp. 1-12
- Ayguadé, E.¹ Copty, N.² Duran, A.³ Hoeflinger, J.⁴ Lin, Y.⁵ Massaioli, F.⁶ Su, E.⁷ Un-Nikrishnan, P.⁸ Zhang, G.⁹

15
- 67650056929
- Extending the openmp tasking model to allow dependent tasks
- West Lafayette, IN, May 12-14 2008. Lecture Notes in Computer Science, DOI: 10.1007/978-3-540-79561-2-10
- A. Duran, J. M. Perez, R. M. Ayguadé, E. amd Badia, and J. Labarta. Extending the OpenMP tasking model to allow dependent tasks. In OpenMP in a New Era of Parallelism, 4th International Workshop, IWOMP 2008, West Lafayette, IN, May 12-14 2008. Lecture Notes in Computer Science 5004:111-122. DOI: 10.1007/978-3-540-79561-2-10.
- (2008) OpenMP in A New Era of Parallelism, 4th International Workshop, IWOMP , vol.5004 , pp. 111-122
- Duran, A.¹ Perez, J.M.² Ayguadé, R.M.³ Badia, E.⁴ Labarta, J.⁵

16
- 84906561407
- Barcelona Supercomputing Center. SMP Superscalar (SMPSs)
- Barcelona Supercomputing Center. SMP Superscalar (SMPSs) User's Manual, Version 2.0, 2008. http://www.bsc.es/media/1002.pdf.
- (2008) User's Manual, Version 2.0

17
- 84906552598
- Supercomputing Technologies Group. MIT Laboratory for Computer Science. Cilk 5.4.6 Reference Manual
- Supercomputing Technologies Group, MIT Laboratory for Computer Science. Cilk 5.4.6 Reference Manual, 1998. http://supertech.csail.mit.edu/cilk/manual-5. 4.6.pdf.
- (1998)

18
- 34548265764
- CellSs: A programming model for the cell be architecture
- Tampa, Florida, November 11-17 2006. ACM. DOI: 10.1145/1188455.1188546
- P. Bellens, J. M. Perez, R. M. Badia, and J. Labarta. CellSs: A programming model for the Cell BE architecture. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, Tampa, Florida, November 11-17 2006. ACM. DOI: 10.1145/1188455.1188546.
- Proceedings of the 2006 ACM/IEEE Conference on Supercomputing
- Bellens, P.¹ Perez, J.M.² Badia, R.M.³ Labarta, J.⁴

19
- 35649006026
- CellSs: Making it easier to program the cell broadband engine processor
- DOI: 10.1147/rd.515.0593
- J. M. Perez, P. Bellens, R. M. Badia, and J. Labarta. CellSs: Making it easier to program the Cell Broadband Engine processor. IBM J. Res. & Dev., 51(5):593-604, 2007. DOI: 10.1147/rd.515.0593.
- (2007) IBM J. Res. & Dev. , vol.51 , Issue.5 , pp. 593-604
- Perez, J.M.¹ Bellens, P.² Badia, R.M.³ Labarta, J.⁴

20
- 0029531029
- The microarchitecture of superscalar processors
- J. E. Smith and G. S. Sohi. The microarchitecture of superscalar processors. Proceedings of the IEEE, 83(12):1609-1624, 1995.
- (1995) Proceedings of the IEEE , vol.83 , Issue.12 , pp. 1609-1624
- Smith, J.E.¹ Sohi, G.S.²

21
- 85027612984
- Dependence graphs and compiler optimizations
- Williams-burg, VA, January 1981. ACM. DOI: 10.1145/209936.209958
- D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe. Dependence graphs and compiler optimizations. In Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, pages 207-218, Williams-burg, VA, January 1981. ACM. DOI: 10.1145/209936.209958.
- Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages , pp. 207-218
- Kuck, D.J.¹ Kuhn, R.H.² Padua, D.A.³ Leasure, B.⁴ Wolfe, M.⁵

22
- 49349111725
- Solving systems of linear equation on the cell processor using cholesky factorization
- DOI: TPDS.2007.70813
- J. Kurzak, A. Buttari, and J. J. Dongarra. Solving systems of linear equation on the CELL processor using Cholesky factorization. Trans. Parallel Distrib. Syst., 19(9):1175-1186, 2008. DOI: TPDS.2007.70813.
- (2008) Trans. Parallel Distrib. Syst. , vol.19 , Issue.9 , pp. 1175-1186
- Kurzak, J.¹ Buttari, A.² Dongarra, J.J.³

23
- 84906552600
- QR factorization for the cell processor
- accepted
- J. Kurzak and J. J. Dongarra. QR factorization for the CELL processor. Scientific Programming. (accepted).
- Scientific Programming
- Kurzak, J.¹ Dongarra, J.J.²

24
- 0020593101
- Solving linear algebraic equations on an mimd computer
- DOI: 10.1145/322358.322366
- R. E. Lord, J. S. Kowalik, and S. P. Kumar. Solving linear algebraic equations on an MIMD computer. J. ACM, 30(1):103-117, 1983. DOI: 10.1145/322358.322366.
- (1983) J. ACM , vol.30 , Issue.1 , pp. 103-117
- Lord, R.E.¹ Kowalik, J.S.² Kumar, S.P.³

25
- 33745318358
- A parallel implementation of matrix multiplication and lu factorization on the ibm 3090
- Stanford, CA, August 22-25, 1988. North-Holland Publishing Company. ISBN: 0444873104
- R. C. Agarwal and F. G. Gustavson. A parallel implementation of matrix multiplication and LU factorization on the IBM 3090. In Proceedings of the IFIP WG 2.5 Working Conference on Aspects of Computation on Asynchronous Parallel Processors, pages 217-221, Stanford, CA, August 22-25 1988. North-Holland Publishing Company. ISBN: 0444873104.
- Proceedings of the IFIP WG 2.5 Working Conference on Aspects of Computation on Asynchronous Parallel Processors , pp. 217-221
- Agarwal, R.C.¹ Gustavson, F.G.²

26
- 0024891893
- Vector and parallel algorithms for cholesky factorization on ibm 3090
- Reno, NV, November 13-17 1989. ACM. DOI: 10.1145/76263.76287
- R. C. Agarwal and F. G. Gustavson. Vector and parallel algorithms for Cholesky factorization on IBM 3090. In Proceedings of the 1989 ACM/IEEE conference on Supercomputing, pages 225-233, Reno, NV, November 13-17 1989. ACM. DOI: 10.1145/76263.76287.
- Proceedings of the 1989 ACM/IEEE Conference on Supercomputing , pp. 225-233
- Agarwal, R.C.¹ Gustavson, F.G.²

27
- 38049005629
- Implementing linear algebra routines on multi-core processors with pipelining and a look ahead
- Umeå, Sweden, June 18-21 2006. Lecture Notes in Computer Science, DOI: 10.1007/978-3-540-75755-9-18
- J. Kurzak and J. J. Dongarra. Implementing linear algebra routines on multi-core processors with pipelining and a look ahead. In Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA 2006, Umeå, Sweden, June 18-21 2006. Lecture Notes in Computer Science 4699:147-156. DOI: 10.1007/978-3-540-75755-9-18.
- (2006) Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA , vol.4699 , pp. 147-156
- Kurzak, J.¹ Dongarra, J.J.²

28
- 36048997493
- Multithreading for synchronization tolerance in matrix factorization
- Boston, MA, June 24-28 2007. Journal of Physics: Conference Series 78: 012028, IOP Publishing. DOI: 10.1088/1742-6596/78/1/012028
- A. Buttari, J. J. Dongarra, P. Husbands, J. Kurzak, and K. Yelick. Multithreading for synchronization tolerance in matrix factorization. In Scientific Discovery through Advanced Computing, SciDAC 2007, Boston, MA, June 24-28 2007. Journal of Physics: Conference Series 78:012028, IOP Publishing. DOI: 10.1088/1742-6596/78/1/012028.
- (2007) Scientific Discovery Through Advanced Computing, SciDAC
- Buttari, A.¹ Dongarra, J.J.² Husbands, P.³ Kurzak, J.⁴ Yelick, K.⁵

29
- 50249105132
- Parallel tiled qr factorization for multicore architectures
- DOI: 10.1002/cpe.1301
- A. Buttari, J. Langou, J. Kurzak, and J. J. Dongarra. Parallel tiled QR factorization for multicore architectures. Concurrency Computat.: Pract. Exper., 20(13):1573-1590, 2008. DOI: 10.1002/cpe.1301.
- (2008) Concurrency Computat.: Pract. Exper. , vol.20 , Issue.13 , pp. 1573-1590
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.J.⁴

30
- 84906552602
- A class of parallel tiled linear algebra algorithms for multicore architectures
- accepted
- A. Buttari, J. Langou, J. Kurzak, and J. J. Dongarra. A class of parallel tiled linear algebra algorithms for multicore architectures. Parellel Comput. Syst. Appl. (accepted).
- Parellel Comput. Syst. Appl
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.J.⁴

31
- 1642372163
- Parallel and fully recursive multi-frontal sparse cholesky
- DOI: 10.1016/j.future.2003.07.007
- D. Irony, G. Shklarski, and S. Toledo. Parallel and fully recursive multi-frontal sparse Cholesky. Future Gener. Comput. Syst., 20(3):425-440, 2004. DOI: 10.1016/j.future.2003.07.007.
- (2004) Future Gener. Comput. Syst. , vol.20 , Issue.3 , pp. 425-440
- Irony, D.¹ Shklarski, G.² Toledo, S.³

32
- 84947936389
- New serial and parallel recursive qr factorization algorithms for smp systems
- Umeå, Sweden, June 14-17, 1998. Lecture Notes in Computer Science, DOI: 10.1007/BFb0095328
- E. Elmroth and F. G. Gustavson. New serial and parallel recursive QR factorization algorithms for SMP systems. In Applied Parallel Computing, Large Scale Scientific and Industrial Problems, 4th International Workshop, PARA'98, Umeå, Sweden, June 14-17 1998. Lecture Notes in Computer Science 1541:120-128. DOI: 10.1007/BFb0095328.
- Applied Parallel Computing, Large Scale Scientific and Industrial Problems, 4th International Workshop, PARA'98 , vol.1541 , pp. 120-128
- Elmroth, E.¹ Gustavson, F.G.²

33
- 0034224207
- Applying recursion to serial and parallel qr factorization leads to better performance
- E. Elmroth and F. G. Gustavson. Applying recursion to serial and parallel QR factorization leads to better performance. IBM J. Res. & Dev., 44(4):605-624, 2000.
- (2000) IBM J. Res. & Dev. , vol.44 , Issue.4 , pp. 605-624
- Elmroth, E.¹ Gustavson, F.G.²

34
- 84957033906
- High-performance library software for qr factorization
- Bergen, Norway, June 18-20 2000. Lecture Notes in Computer Science, DOI: 10.1007/3-540-70734-4-9
- E. Elmroth and F. G. Gustavson. High-performance library software for QR factorization. In Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, 5th International Workshop, PARA 2000, Bergen, Norway, June 18-20 2000. Lecture Notes in Computer Science 1947:53-63. DOI: 10.1007/3-540-70734-4- 9.
- (2000) Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, 5th International Workshop, PARA , vol.1947 , pp. 53-63
- Elmroth, E.¹ Gustavson, F.G.²

35
- 17644368925
- Parallel out-of-core computation and updating the qr factorization
- DOI: 10.1145/1055531.1055534
- B. C. Gunter and R. A. van de Geijn. Parallel out-of-core computation and updating the QR factorization. ACM Transactions on Mathematical Software, 31(1):60-78, 2005. DOI: 10.1145/1055531.1055534.
- (2005) ACM Transactions on Mathematical Software , vol.31 , Issue.1 , pp. 60-78
- Gunter, B.C.¹ Geijn De Van, R.A.²

36
- 84901913528
- New generalized matrix data structures lead to a variety of high-performance algorithms
- Ottawa, Canada, October 2-4, 2000. Kluwer Academic Publishers. ISBN: 0792373391
- F. G. Gustavson. New generalized matrix data structures lead to a variety of high-performance algorithms. In Proceedings of the IFIP WG 2.5 Working Conference on Software Architectures for Scientific Computing Applications, pages 211-234, Ottawa, Canada, October 2-4 2000. Kluwer Academic Publishers. ISBN: 0792373391.
- Proceedings of the IFIP WG 2.5 Working Conference on Software Architectures for Scientific Computing Applications , pp. 211-234
- Gustavson, F.G.¹

37
- 38049054439
- Minimal data copy for dense linear algebra factorization
- Umeå, Sweden, June 18-21 2006. Lecture Notes in Computer Science, DOI: 10.1007/978-3-540-75755-9-66
- F. G. Gustavson, J. A. Gunnels, and J. C. Sexton. Minimal data copy for dense linear algebra factorization. In Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA 2006, Umeå, Sweden, June 18-21 2006. Lecture Notes in Computer Science 4699:540-549. DOI: 10.1007/978-3-540-75755-9-66.
- (2006) Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA , vol.4699 , pp. 540-549
- Gustavson, F.G.¹ Gunnels, J.A.² Sexton, J.C.³

38
- 1842832833
- Recursive blocked algorithms and hybrid data structures for dense matrix library software
- DOI: 10.1137/S0036144503428693
- E. Elmroth, F. G. Gustavson, I. Jonsson, and B. Kågström. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review, 46(1):3-45, 2004. DOI: 10.1137/S0036144503428693.
- (2004) SIAM Review , vol.46 , Issue.1 , pp. 3-45
- Elmroth, E.¹ Gustavson, F.G.² Jonsson, I.³ Kågström, B.⁴

39
- 24644482622
- Analysis of memory hierarchy performance of block data layout
- Vancouver, Canada, August 18-21 2002. IEEE Computer Society. DOI: 10.1109/ICPP.2002.1040857
- N. Park, B. Hong, and V. K. Prasanna. Analysis of memory hierarchy performance of block data layout. In Proceedings of the 2002 International Conference on Parallel Processing, ICPP'02, pages 35-44, Vancouver, Canada, August 18-21 2002. IEEE Computer Society. DOI: 10.1109/ICPP.2002.1040857.
- Proceedings of the 2002 International Conference on Parallel Processing, ICPP'02 , pp. 35-44
- Park, N.¹ Hong, B.² Prasanna, V.K.³

40
- 0042235298
- Tiling, block data layout, and memory hierarchy performance
- DOI: 10.1109/TPDS.2003.1214317
- N. Park, B. Hong, and V. K. Prasanna. Tiling, block data layout, and memory hierarchy performance. IEEE Trans. Parallel Distrib. Syst., 14(7):640-654, 2003. DOI: 10.1109/TPDS.2003.1214317.
- (2003) IEEE Trans. Parallel Distrib. Syst. , vol.14 , Issue.7 , pp. 640-654
- Park, N.¹ Hong, B.² Prasanna, V.K.³

41
- 0001951009
- The wy representation for products of householder matrices
- C. Bischof and C. van Loan. The WY representation for products of Householder matrices. J. Sci. Stat. Comput., 8:2-13, 1987.
- (1987) J. Sci. Stat. Comput. , vol.8 , pp. 2-13
- Bischof, C.¹ Van Loan, C.²

42
- 0003078924
- A storage-efficient wy representation for products of householder transformations
- R. Schreiber and C. van Loan. A storage-efficient WY representation for products of Householder transformations. J. Sci. Stat. Comput., 10:53-57, 1991.
- (1991) J. Sci. Stat. Comput. , vol.10 , pp. 53-57
- Schreiber, R.¹ Van Loan, C.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.