-
1
-
-
0003706460
-
-
SIAM: Philadelphia, PA, Available at:, 2 June 2009
-
Anderson E, Bai Z, Bischof C, Blackford LS, Demmel JW, Dongarra JJ, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D. LAPACK Users' Guide. SIAM: Philadelphia, PA, 1992. Available at: http://www.netlib. org/lapack/lug/ [2 June 2009].
-
(1992)
LAPACK Users' Guide
-
-
Anderson, E.1
Bai, Z.2
Bischof, C.3
Blackford, L.S.4
Demmel, J.W.5
Dongarra, J.J.6
Du Croz, J.7
Greenbaum, A.8
Hammarling, S.9
McKenney, A.10
Sorensen, D.11
-
2
-
-
0003615167
-
-
SIAM: Philadelphia, PA, Available at:, 2 June 2009
-
Blackford LS, Choi J, Cleary A, D'Azevedo E, Demmel J, Dhillon I, Dongarra JJ, Hammarling S, Henry G, Petitet A, Stanley K, Walker D, Whaley RC. ScaLAPACK Users' Guide. SIAM: Philadelphia, PA, 1997. Available at: http://www.netlib.org/scalapack/slug/ [2 June 2009].
-
(1997)
ScaLAPACK Users' Guide
-
-
Blackford, L.S.1
Choi, J.2
Cleary, A.3
D'Azevedo, E.4
Demmel, J.5
Dhillon, I.6
Dongarra, J.J.7
Hammarling, S.8
Henry, G.9
Petitet, A.10
Stanley, K.11
Walker, D.12
Whaley, R.C.13
-
3
-
-
73149094576
-
-
Co-Array Fortran. Available at:, 2 June 2009
-
Co-Array Fortran. Available at: http://www.co-array.org/ [2 June 2009].
-
-
-
-
4
-
-
73149125407
-
-
The Berkeley Unified Parallel C (UPC) project. Available at:, 2 June 2009
-
The Berkeley Unified Parallel C (UPC) project. Available at: http://upc.lbl.gov/ [2 June 2009].
-
-
-
-
5
-
-
73149112464
-
-
Titanium project home page. Available at:, 2 June 2009
-
Titanium project home page. Available at: http://titanium.cs.berkeley. edu/ [2 June 2009].
-
-
-
-
6
-
-
73149087967
-
-
Cray, Inc, Available at:, 2 June 2009
-
Cray, Inc. Chapel Language Specification 0.775. Available at: http://chapel.cs.washington.edu/spec-0.775.pdf [2 June 2009].
-
Chapel Language Specification 0.775
-
-
-
7
-
-
73149094353
-
-
Sun Microsystems Inc. The Fortress Language Specification, Version 1.0, 2008. Available at: http://research. sun.com/projects/plrg/ Publications/fortress.1.0.pdf [2 June 2009].
-
Sun Microsystems Inc. The Fortress Language Specification, Version 1.0, 2008. Available at: http://research. sun.com/projects/plrg/ Publications/fortress.1.0.pdf [2 June 2009].
-
-
-
-
8
-
-
73149098525
-
-
Saraswat V, Nystrom N. Report on the Experimental Language X10, Version 1.7, 2008. Available at:, 2 June 2009
-
Saraswat V, Nystrom N. Report on the Experimental Language X10, Version 1.7, 2008. Available at: http://dist.codehaus.org/x10/documentation/ languagespec/x10-170.pdf [2 June 2009].
-
-
-
-
9
-
-
0029191296
-
Cilk: An efficient multithreaded runtime system
-
ACM: Santa Barbara, CA, 19-21 July, DOI: 10.1145/209936.209958
-
Blumofe RD, Joerg CF, Kuszmaul BC, Leiserson CE, Randall KH, Zhou Y. Cilk: An efficient multithreaded runtime system. Principles and Practice of Parallel Programming, Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP'95. ACM: Santa Barbara, CA, 19-21 July 1995; 207-216. DOI: 10.1145/209936.209958.
-
(1995)
Principles and Practice of Parallel Programming, Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP'95
, pp. 207-216
-
-
Blumofe, R.D.1
Joerg, C.F.2
Kuszmaul, B.C.3
Leiserson, C.E.4
Randall, K.H.5
Zhou, Y.6
-
10
-
-
35448932427
-
-
Available at:, 2 June 2009
-
Intel Threading Building Blocks. Available at: http://www. threadingbuildingblocks.org/ [2 June 2009].
-
Intel Threading Building Blocks
-
-
-
11
-
-
73149089278
-
-
Reinders J. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly Media, Inc., 2007. Available at: http://www.amazon.com/exec/obidos/ASIN/0596514808/ ISBN: 0596514808 [2 June 2009].
-
Reinders J. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly Media, Inc., 2007. Available at: http://www.amazon.com/exec/obidos/ASIN/0596514808/ ISBN: 0596514808 [2 June 2009].
-
-
-
-
12
-
-
73149120555
-
-
OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.0, 2008. Available at:, 2 June 2009
-
OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.0, 2008. Available at: http://www.openmp.org/mp- documents/spec30.pdf [2 June 2009].
-
-
-
-
14
-
-
48949090561
-
A proposal for task parallelism in OpenMP
-
A Practical Programming Model for the Multi-Core Era, 3rd International Workshop on OpenMP, IWOMP 2007 , Beijing, China. Springer: Berlin, 3-7 June, DOI: 10.1007/978-3-540- 69303-1.1
-
Ayguadé E, Copty N, Duran A, Hoeflinger J, Lin Y, Massaioli F, Su E, Unnikrishnan P, Zhang G. A proposal for task parallelism in OpenMP. A Practical Programming Model for the Multi-Core Era, 3rd International Workshop on OpenMP, IWOMP 2007 (Lecture Notes in Computer Science, vol. 4935), Beijing, China. Springer: Berlin, 3-7 June 2007; 1-12. DOI: 10.1007/978-3-540- 69303-1.1.
-
(2007)
Lecture Notes in Computer Science
, vol.4935
, pp. 1-12
-
-
Ayguadé, E.1
Copty, N.2
Duran, A.3
Hoeflinger, J.4
Lin, Y.5
Massaioli, F.6
Su, E.7
Unnikrishnan, P.8
Zhang, G.9
-
15
-
-
67650056929
-
Extending the OpenMP tasking model to allow dependent tasks
-
OpenMP in a New Era of Parallelism, 4th International Workshop, IWOMP 2008 , West Lafayette, IN. Springer: Berlin, 12-14 May, DOI: 10.1007/978-3-540-79561-2.10
-
Duran A, Perez JM, Ayguadé RM, Badia Labarta J. Extending the OpenMP tasking model to allow dependent tasks. OpenMP in a New Era of Parallelism, 4th International Workshop, IWOMP 2008 (Lecture Notes in Computer Science, vol. 5004), West Lafayette, IN. Springer: Berlin, 12-14 May 2008; 111-122. DOI: 10.1007/978-3-540-79561-2.10.
-
(2008)
Lecture Notes in Computer Science
, vol.5004
, pp. 111-122
-
-
Duran, A.1
Perez, J.M.2
Ayguadé, R.M.3
Badia Labarta, J.4
-
16
-
-
73149093256
-
-
Barcelona Supercomputing Center. SMP Superscalar (SMPSs) User's Manual, Version 2.0, 2008. Available at: http://www.bsc.es/media/1002.pdf [2 June 2009].
-
Barcelona Supercomputing Center. SMP Superscalar (SMPSs) User's Manual, Version 2.0, 2008. Available at: http://www.bsc.es/media/1002.pdf [2 June 2009].
-
-
-
-
17
-
-
73149085347
-
-
Supercomputing Technologies Group. Cilk 5.4.6 Reference Manual, MIT Laboratory for Computer Science, 1998. Available at: http://supertech.csail. mit.edu/cilk/manual-5.4.6.pdf [2 June 2009].
-
Supercomputing Technologies Group. Cilk 5.4.6 Reference Manual, MIT Laboratory for Computer Science, 1998. Available at: http://supertech.csail. mit.edu/cilk/manual-5.4.6.pdf [2 June 2009].
-
-
-
-
18
-
-
34548265764
-
CellSs: A programming model for the Cell BE architecture
-
Tampa, FL. ACM: New York, 11-17 November, DOI: 10.1145/1188455.1188546
-
Bellens P, Perez JM, Badia RM, Labarta J. CellSs: A programming model for the Cell BE architecture. Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. Tampa, FL. ACM: New York, 11-17 November 2006; DOI: 10.1145/1188455.1188546.
-
(2006)
Proceedings of the 2006 ACM/IEEE Conference on Supercomputing
-
-
Bellens, P.1
Perez, J.M.2
Badia, R.M.3
Labarta, J.4
-
19
-
-
35649006026
-
CellSs: Making it easier to program the Cell Broadband Engine processor
-
DOI: 10.1147/rd.515.0593
-
Perez JM, Bellens P, Badia RM, Labarta J. CellSs: Making it easier to program the Cell Broadband Engine processor. IBM Journal of Research and Development 2007; 51(5): 593-604. DOI: 10.1147/rd.515.0593.
-
(2007)
IBM Journal of Research and Development
, vol.51
, Issue.5
, pp. 593-604
-
-
Perez, J.M.1
Bellens, P.2
Badia, R.M.3
Labarta, J.4
-
20
-
-
0029531029
-
The microarchitecture of superscalar processors
-
Smith JE, Sohi GS. The microarchitecture of superscalar processors. Proceedings of the IEEE 1995; 83(12): 1609-1624.
-
(1995)
Proceedings of the IEEE
, vol.83
, Issue.12
, pp. 1609-1624
-
-
Smith, J.E.1
Sohi, G.S.2
-
21
-
-
85027612984
-
Dependence graphs and compiler optimizations
-
ACM: Williamsburg, VA, January, DOI: 10.1145/209936.209958
-
Kuck DJ, Kuhn RH, Padua DA, Leasure B, Wolfe M. Dependence graphs and compiler optimizations. Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM: Williamsburg, VA, January 1981; 207-218. DOI: 10.1145/209936.209958.
-
(1981)
Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
, pp. 207-218
-
-
Kuck, D.J.1
Kuhn, R.H.2
Padua, D.A.3
Leasure, B.4
Wolfe, M.5
-
22
-
-
49349111725
-
Solving systems of linear equation on the CELL processor using Cholesky factorization
-
DOI: TPDS.2007.70813
-
Kurzak J, Buttari A, Dongarra JJ. Solving systems of linear equation on the CELL processor using Cholesky factorization. IEEE Transactions on Parallel and Distributed Systems 2008; 19(19): 1175-1186. DOI: TPDS.2007.70813.
-
(2008)
IEEE Transactions on Parallel and Distributed Systems
, vol.19
, Issue.19
, pp. 1175-1186
-
-
Kurzak, J.1
Buttari, A.2
Dongarra, J.J.3
-
23
-
-
60649117581
-
QR factorization for the CELL processor
-
DOI: 10.3233/SPR-2009-0268
-
Kurzak J, Dongarra JJ. QR factorization for the CELL processor. Scientific Programming 2009; 17(1-2): 31-42. DOI: 10.3233/SPR-2009-0268.
-
(2009)
Scientific Programming
, vol.17
, Issue.1-2
, pp. 31-42
-
-
Kurzak, J.1
Dongarra, J.J.2
-
24
-
-
0020593101
-
Solving linear algebraic equations on an MIMD computer
-
DOI: 10.1145/322358.322366
-
Lord RE, Kowalik JS, Kumar SP. Solving linear algebraic equations on an MIMD computer. Journal of the ACM 1983; 30(1): 103-117. DOI: 10.1145/322358.322366.
-
(1983)
Journal of the ACM
, vol.30
, Issue.1
, pp. 103-117
-
-
Lord, R.E.1
Kowalik, J.S.2
Kumar, S.P.3
-
26
-
-
0024891893
-
Vector and parallel algorithms for Cholesky factorization on IBM 3090
-
Reno, NV. ACM: New York, 13-17 November, DOI: 10.1145/76263.76287
-
Agarwal RC, Gustavson FG. Vector and parallel algorithms for Cholesky factorization on IBM 3090. Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, Reno, NV. ACM: New York, 13-17 November 1989; 225-233. DOI: 10.1145/76263.76287.
-
(1989)
Proceedings of the 1989 ACM/IEEE Conference on Supercomputing
, pp. 225-233
-
-
Agarwal, R.C.1
Gustavson, F.G.2
-
27
-
-
38049005629
-
Implementing linear algebra routines on multi-core processors with pipelining and a look ahead
-
Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA 2006 , Umea, Sweden. Springer: Berlin, 18-21 June, DOI: 10.1007/978-3-540-75755-9-18
-
Kurzak J, Dongarra JJ. Implementing linear algebra routines on multi-core processors with pipelining and a look ahead. Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA 2006 (Lecture Notes in Computer Science, vol. 4699), Umea, Sweden. Springer: Berlin, 18-21 June 2006; 147-156. DOI: 10.1007/978-3-540-75755-9-18.
-
(2006)
Lecture Notes in Computer Science
, vol.4699
, pp. 147-156
-
-
Kurzak, J.1
Dongarra, J.J.2
-
28
-
-
36048997493
-
-
Buttari A, Dongarra JJ, Husbands P, Kurzak J, Yelick K. Multithreading for synchronization tolerance in matrix factorization. Scientific Discovery Through Advanced Computing, SciDAC 2007 (Journal of Physics: Conference Series, 78: 012028), Boston, MA. IOP Publishing: Bristol, U.K., 24-28 June 2007. DOI: 10.1088/1742-6596/78/1/012028.
-
Buttari A, Dongarra JJ, Husbands P, Kurzak J, Yelick K. Multithreading for synchronization tolerance in matrix factorization. Scientific Discovery Through Advanced Computing, SciDAC 2007 (Journal of Physics: Conference Series, vol. 78: 012028), Boston, MA. IOP Publishing: Bristol, U.K., 24-28 June 2007. DOI: 10.1088/1742-6596/78/1/012028.
-
-
-
-
29
-
-
50249105132
-
Parallel tiled QR factorization for multicore architectures
-
DOI: 10.1002/cpe.1301
-
Buttari A, Langou J, Kurzak J, Dongarra JJ. Parallel tiled QR factorization for multicore architectures. Concurrency and Computation: Practice and Experience 2008; 20(13): 1573-1590. DOI: 10.1002/cpe.1301.
-
(2008)
Concurrency and Computation: Practice and Experience
, vol.20
, Issue.13
, pp. 1573-1590
-
-
Buttari, A.1
Langou, J.2
Kurzak, J.3
Dongarra, J.J.4
-
30
-
-
58149269099
-
A class of parallel tiled linear algebra algorithms for multicore architectures
-
DOI: 10.1016/j.parco.2008. 10.002
-
Buttari A, Langou J, Kurzak J, Dongarra JJ. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Computing: Systems and Applications 2009; 35: 38-53. DOI: 10.1016/j.parco.2008. 10.002
-
(2009)
Parallel Computing: Systems and Applications
, vol.35
, pp. 38-53
-
-
Buttari, A.1
Langou, J.2
Kurzak, J.3
Dongarra, J.J.4
-
33
-
-
0003078924
-
A storage-efficient WY representation for products of Householder transformations
-
Schreiber R, van Loan C. A storage-efficient WY representation for products of Householder transformations. Journal on Scientific and Statistical Computing 1991; 10: 53-57.
-
(1991)
Journal on Scientific and Statistical Computing
, vol.10
, pp. 53-57
-
-
Schreiber, R.1
van Loan, C.2
-
34
-
-
0034224207
-
Applying recursion to serial and parallel QR factorization leads to better performance
-
Elmroth E, Gustavson FG. Applying recursion to serial and parallel QR factorization leads to better performance. IBM Journal of Research and Development 2000; 44(4): 605-624.
-
(2000)
IBM Journal of Research and Development
, vol.44
, Issue.4
, pp. 605-624
-
-
Elmroth, E.1
Gustavson, F.G.2
-
35
-
-
84957033906
-
High-performance library software for QR factorization
-
Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, 5th International Workshop, PARA 2000 , Bergen, Norway. Springer: Berlin, DOI: 10.1007/3-540-70734-4.9
-
Elmroth E, Gustavson FG. High-performance library software for QR factorization. Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, 5th International Workshop, PARA 2000 (Lecture Notes in Computer Science, vol. 1947), Bergen, Norway. Springer: Berlin, 18-20 2000; 53-63. DOI: 10.1007/3-540-70734-4.9.
-
(2000)
Lecture Notes in Computer Science
, vol.1947
-
-
Elmroth, E.1
Gustavson, F.G.2
-
36
-
-
84947936389
-
New serial and parallel recursive QR factorization algorithms for SMP systems
-
Applied Parallel Computing, Large Scale Scientific and Industrial Problems, 4th International Workshop, PARA'98 , Umea, Sweden. Springer: Berlin, 14-17 June, DOI: 10.1007/BFb0095328. Available at: 10.1007/BFb0095328 [2 June 2009
-
Elmroth E, Gustavson FG. New serial and parallel recursive QR factorization algorithms for SMP systems. Applied Parallel Computing, Large Scale Scientific and Industrial Problems, 4th International Workshop, PARA'98 (Lecture Notes in Computer Science, vol. 1541), Umea, Sweden. Springer: Berlin, 14-17 June 1998; 120-128. DOI: 10.1007/BFb0095328. Available at: http://dx.doi.org/10.1007/BFb0095328 [2 June 2009].
-
(1998)
Lecture Notes in Computer Science
, vol.1541
, pp. 120-128
-
-
Elmroth, E.1
Gustavson, F.G.2
-
37
-
-
84966204836
-
Methods for modifying matrix factorizations
-
Gill PE, Golub GH, Murray WA, Saunders MA. Methods for modifying matrix factorizations. Mathematics ofComputation 1974; 28(126): 505-535.
-
(1974)
Mathematics ofComputation
, vol.28
, Issue.126
, pp. 505-535
-
-
Gill, P.E.1
Golub, G.H.2
Murray, W.A.3
Saunders, M.A.4
-
38
-
-
73149115815
-
LAPACK working note 68: A highly parallel algorithm for the reduction of a nonsymmetric matrix to block upper-Hessenberg form
-
Technical Report UT-CS-94-221, Computer Science Department, University of Tennessee, 1994. Available at:, 2 June
-
Berry MW, Dongarra JJ, Kim Y. LAPACK working note 68: A highly parallel algorithm for the reduction of a nonsymmetric matrix to block upper-Hessenberg form. Technical Report UT-CS-94-221, Computer Science Department, University of Tennessee, 1994. Available at: http://www.netlib.org/lapack/ lawnspdf/lawn68.pdf [2 June 2009].
-
(2009)
-
-
Berry, M.W.1
Dongarra, J.J.2
Kim, Y.3
-
39
-
-
0031273280
-
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
-
DOI: 10.1147/rd.416.0737
-
Gustavson FG. Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM Journal of Research and Development 1997; 41(6): 737-756. DOI: 10.1147/rd.416.0737.
-
(1997)
IBM Journal of Research and Development
, vol.41
, Issue.6
, pp. 737-756
-
-
Gustavson, F.G.1
-
41
-
-
38049054439
-
Minimal data copy for dense linear algebra factorization
-
Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA 2006 PARA 2006., Umeå, Sweden. Springer: Berlin, 18-21 June, DOI: 10.1007/978-3-540-75755-9.66
-
Gustavson FG, Gunnels JA, Sexton JC. Minimal data copy for dense linear algebra factorization. Applied Parallel Computing, State of the Art in Scientific Computing, 8th International Workshop, PARA 2006 PARA 2006.(Lecture Notes in Computer Science, vol. 4699), Umeå, Sweden. Springer: Berlin, 18-21 June 2006; 540-549. DOI: 10.1007/978-3-540-75755-9.66.
-
(2006)
Lecture Notes in Computer Science
, vol.4699
, pp. 540-549
-
-
Gustavson, F.G.1
Gunnels, J.A.2
Sexton, J.C.3
-
42
-
-
1842832833
-
Recursive blocked algorithms and hybrid data structures for dense matrix library software
-
DOI: 10.1137/S0036144503428693
-
Elmroth E, Gustavson FG, Jonsson I, Kågström B. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review 2004; 46(1): 3-45. DOI: 10.1137/S0036144503428693.
-
(2004)
SIAM Review
, vol.46
, Issue.1
, pp. 3-45
-
-
Elmroth, E.1
Gustavson, F.G.2
Jonsson, I.3
Kågström, B.4
-
43
-
-
17644368925
-
Parallel out-of-core computation and updating the QR factorization
-
DOI: 10.1145/1055531.1055534
-
Gunter BC, van de Geijn RA. Parallel out-of-core computation and updating the QR factorization. ACM Transactions on Mathematical Software 2005; 31(1): 60-78. DOI: 10.1145/1055531.1055534.
-
(2005)
ACM Transactions on Mathematical Software
, vol.31
, Issue.1
, pp. 60-78
-
-
Gunter, B.C.1
van de Geijn, R.A.2
-
44
-
-
48849086742
-
Updating an LU factorization with pivoting
-
DOI: 10.1145/1377612.1377615
-
Quintana-Ortí ES, van de Geijn RA. Updating an LU factorization with pivoting. ACM Trans. Math. Softw 2008; 35(2): 11. DOI: 10.1145/1377612.1377615.
-
(2008)
ACM Trans. Math. Softw
, vol.35
, Issue.2
, pp. 11
-
-
Quintana-Ortí, E.S.1
van de Geijn, R.A.2
|