-
1
-
-
12444260176
-
-
Technical Report CSD-TR-91-007, Purdue Univ., West Lafayette, Ind.
-
M. Aboelaze, N. Chrisochoides, and E. Houstis, "The Parallelization of Level 2 and 3 BLAS Operations on Distributed-Memory Machines," Technical Report CSD-TR-91-007, Purdue Univ., West Lafayette, Ind., 1991.
-
(1991)
The Parallelization of Level 2 and 3 BLAS Operations on Distributed-Memory Machines
-
-
Aboelaze, M.1
Chrisochoides, N.2
Houstis, E.3
-
2
-
-
0028545949
-
A High Performance Matrix Multiplication Algorithm on a Distributed-Memory Parallel Computer, Using Overlapped Communication
-
R. Agarwal, F. Gustavson, and M. Zubair, "A High Performance Matrix Multiplication Algorithm on a Distributed-Memory Parallel Computer, Using Overlapped Communication," IBM J. Research and Development, vol. 38, no. 6, pp.673-681, 1994.
-
(1994)
IBM J. Research and Development
, vol.38
, Issue.6
, pp. 673-681
-
-
Agarwal, R.1
Gustavson, F.2
Zubair, M.3
-
3
-
-
0029218542
-
SP2 System Architecture
-
T. Agerwala, J. Martin, J. Mirza, D. Sadler, D. Dias, and M. Snir, "SP2 System Architecture," IBM Systems J., vol. 34, no. 2, pp. 153-184, 1995.
-
(1995)
IBM Systems J.
, vol.34
, Issue.2
, pp. 153-184
-
-
Agerwala, T.1
Martin, J.2
Mirza, J.3
Sadler, D.4
Dias, D.5
Snir, M.6
-
4
-
-
0003873564
-
-
Technical Report A-278-CRI, CRI-Ecole des Mines, Fontainebleau, France
-
C. Ancourt, F. Coelho, F. Irigoin, R. Keryell, "A linear Algebra Framework for Static HPF Code Distribution," Technical Report A-278-CRI, CRI-Ecole des Mines, Fontainebleau, France, 1995. (Available at http://www.cri.ensmp.fr.)
-
(1995)
A Linear Algebra Framework for Static HPF Code Distribution
-
-
Ancourt, C.1
Coelho, F.2
Irigoin, F.3
Keryell, R.4
-
5
-
-
0003706460
-
-
Philadelphia, Penn.: SIAM
-
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, LAPACK Users' Guide. Philadelphia, Penn.: SIAM, 1995.
-
(1995)
LAPACK Users' Guide
-
-
Anderson, E.1
Bai, Z.2
Bischof, C.3
Demmel, J.4
Dongarra, J.5
Du Croz, J.6
Greenbaum, A.7
Hammarling, S.8
McKenney, A.9
Ostrouchov, S.10
Sorensen, D.11
-
6
-
-
0039408378
-
-
Technical Report ECA-TR-147, Boeing Computer Services, Seattle, Wash.
-
C. Ashcraft, "The Distributed Solution of Linear Systems Using the Torus-Wrap Data Mapping," Technical Report ECA-TR-147, Boeing Computer Services, Seattle, Wash., 1990.
-
(1990)
The Distributed Solution of Linear Systems Using the Torus-Wrap Data Mapping
-
-
Ashcraft, C.1
-
8
-
-
24344465959
-
-
Technical Report UT CS-96-326, LAPACK Working Note 111, Univ. Tennessee
-
J. Bilmes, K. Asanovic, J. Demmel, D. Lam, and C. Chin, "Optimizing Matrix Multiply using PHiPAC: A Portable, High-Performance, ANSI C Coding Methodology," Technical Report UT CS-96-326, LAPACK Working Note 111, Univ. Tennessee, 1996.
-
(1996)
Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Coding Methodology
-
-
Bilmes, J.1
Asanovic, K.2
Demmel, J.3
Lam, D.4
Chin, C.5
-
9
-
-
84943678690
-
Parallel LU Decomposition on a Transputer Network
-
G. van Zee and J. van der Vorst, eds.
-
R. Bisseling and J. van der Vorst, "Parallel LU Decomposition on a Transputer Network," Lecture Notes in Computer Sciences, G. van Zee and J. van der Vorst, eds., vol. 384, pp. 61-77, 1989.
-
(1989)
Lecture Notes in Computer Sciences
, vol.384
, pp. 61-77
-
-
Bisseling, R.1
Van Der Vorst, J.2
-
11
-
-
0003615167
-
-
Philadelphia, Penn.: SIAM
-
L. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, ScaLAPACK Users' Guide. Philadelphia, Penn.: SIAM, 1997.
-
(1997)
ScaLAPACK Users' Guide
-
-
Blackford, L.1
Choi, J.2
Cleary, A.3
D'Azevedo, E.4
Demmel, J.5
Dhillon, I.6
Dongarra, J.7
Hammarling, S.8
Henry, G.9
Petitet, A.10
Stanley, K.11
Walker, D.12
Whaley, R.C.13
-
12
-
-
0027558054
-
Implementation of BLAS Level 3 and LINPACK Benchmark on the AP1000
-
R. Brent and P. Strazdins, "Implementation of BLAS Level 3 and LINPACK Benchmark on the AP1000," Fujitsu Scientific and Technical J., vol. 5, no. 1, pp. 61-70, 1993.
-
(1993)
Fujitsu Scientific and Technical J.
, vol.5
, Issue.1
, pp. 61-70
-
-
Brent, R.1
Strazdins, P.2
-
13
-
-
0002742410
-
Generating Local Adresses and Communication Sets for Data Parallel Programs
-
S. Chatterjee, J. Gilbert, F. Long, R. Schreiber, and S. Tseng, "Generating Local Adresses and Communication Sets for Data Parallel Programs," J. Parallel and Distributed Computing, vol. 26, pp. 72-84, 1995.
-
(1995)
J. Parallel and Distributed Computing
, vol.26
, pp. 72-84
-
-
Chatterjee, S.1
Gilbert, J.2
Long, F.3
Schreiber, R.4
Tseng, S.5
-
15
-
-
0028530654
-
PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed-Memory Concurrent Computers
-
J. Choi, J. Dongarra, and D. Walker, "PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed-Memory Concurrent Computers," Concurrency: Practice and Experience, vol. 6, no. 7, pp. 543-570, 1994.
-
(1994)
Concurrency: Practice and Experience
, vol.6
, Issue.7
, pp. 543-570
-
-
Choi, J.1
Dongarra, J.2
Walker, D.3
-
16
-
-
0030241311
-
PB-BLAS: A Set of Parallel Block Basic Linear Algebra Subroutines
-
J. Choi, J. Dongarra, and D. Walker, "PB-BLAS: A Set of Parallel Block Basic Linear Algebra Subroutines" Concurrency: Practice and Experience, vol. 8, no. 7, pp. 517-535, 1996.
-
(1996)
Concurrency: Practice and Experience
, vol.8
, Issue.7
, pp. 517-535
-
-
Choi, J.1
Dongarra, J.2
Walker, D.3
-
17
-
-
0031221523
-
Parallel Implementation of BLAS: General Techniques for Level 3 BLAS
-
A. Chtchelkanova, J. Gunnels, G. Morrow, J. Overfelt, and R. van de Geijn, "Parallel Implementation of BLAS: General Techniques for Level 3 BLAS," Concurrency: Practice and Experience, vol. 9, no. 9, pp. 837-857, 1997.
-
(1997)
Concurrency: Practice and Experience
, vol.9
, Issue.9
, pp. 837-857
-
-
Chtchelkanova, A.1
Gunnels, J.2
Morrow, G.3
Overfelt, J.4
Van De Geijn, R.5
-
18
-
-
0006488807
-
QR Factorization of a Dense Matrix on a Hypercube Multiprocessor
-
E. Chu and A. George, "QR Factorization of a Dense Matrix on a Hypercube Multiprocessor," SIAM J. Scientific and Statistical Computing, vol. 11, pp. 990-1,028, 1990.
-
(1990)
SIAM J. Scientific and Statistical Computing
, vol.11
-
-
Chu, E.1
George, A.2
-
19
-
-
0028443077
-
A Parallel Block Implementation of Level 3 BLAS for MIMD Vector Processors
-
M. Day de, I. Duff, and A. Petitet, "A Parallel Block Implementation of Level 3 BLAS for MIMD Vector Processors," ACM Trans. Mathematical Software, vol. 20, no. 2, pp. 178-193, 1994.
-
(1994)
ACM Trans. Mathematical Software
, vol.20
, Issue.2
, pp. 178-193
-
-
Dayde, M.1
Duff, I.2
Petitet, A.3
-
20
-
-
0032002536
-
Scheduling Block-Cyclic Array Redistribution
-
F. Desprez, J. Dongarra, and A. Petitet, C. Randriamaro, Y. Robert, "Scheduling Block-Cyclic Array Redistribution," IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 2, pp. 192-205 1998.
-
(1998)
IEEE Trans. Parallel and Distributed Systems
, vol.9
, Issue.2
, pp. 192-205
-
-
Desprez, F.1
Dongarra, J.2
Petitet, A.3
Randriamaro, C.4
Robert, Y.5
-
21
-
-
0000778168
-
Scalability Issues in the Design of a Library for Dense Linear Algebra
-
J. Dongarra, R. van de Geijn, and D. Walker, "Scalability Issues in the Design of a Library for Dense Linear Algebra," J. Parallel and Distributed Computing, vol. 22, no. 3, pp. 523-537, 1994.
-
(1994)
J. Parallel and Distributed Computing
, vol.22
, Issue.3
, pp. 523-537
-
-
Dongarra, J.1
Van De Geijn, R.2
Walker, D.3
-
22
-
-
0029324485
-
Software Libraries for Linear Algebra Computations on High Performance Computers
-
J. Dongarra and D. Walker, "Software Libraries for Linear Algebra Computations on High Performance Computers," SIAM Review, vol. 37, no. 2, pp. 151-180, 1995.
-
(1995)
SIAM Review
, vol.37
, Issue.2
, pp. 151-180
-
-
Dongarra, J.1
Walker, D.2
-
23
-
-
0012493293
-
-
Technical Report UT CS-95-281, LAPACK Working Note 94, Univ. Tennessee
-
J. Dongarra and R.C. Whaley, "A User's Guide to the BLACS v1.0," Technical Report UT CS-95-281, LAPACK Working Note 94, Univ. Tennessee, 1995. (http://www.netlib.org/blacs/)
-
(1995)
A User's Guide to the BLACS V1.0
-
-
Dongarra, J.1
Whaley, R.C.2
-
24
-
-
0003506603
-
-
Englewood Cliffs, N.J.: Prentice Hall
-
G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker, Solving Problems on Concurrent Processors. Englewood Cliffs, N.J.: Prentice Hall, 1988.
-
(1988)
Solving Problems on Concurrent Processors
-
-
Fox, G.1
Johnson, M.2
Lyzenga, G.3
Otto, S.4
Salmon, J.5
Walker, D.6
-
25
-
-
0023288009
-
Matrix Algorithms on a Hypercube I: Matrix Multiplication
-
G. Fox, S. Otto, and A. Hey, "Matrix Algorithms on a Hypercube I: Matrix Multiplication," Parallel Computing, vol. 3, pp. 17-31, 1987.
-
(1987)
Parallel Computing
, vol.3
, pp. 17-31
-
-
Fox, G.1
Otto, S.2
Hey, A.3
-
26
-
-
0039821547
-
LU Factorization Algorithms on Distributed-Memory Multiprocessor Architectures
-
G. Geist and C. Romine, "LU Factorization Algorithms on Distributed-Memory Multiprocessor Architectures," SIAM J. Scientific and Statistical Computing, vol. 9, pp. 639-649, 1988.
-
(1988)
SIAM J. Scientific and Statistical Computing
, vol.9
, pp. 639-649
-
-
Geist, G.1
Romine, C.2
-
27
-
-
0001615713
-
Parallel Solution Triangular Systems on Distributed-Memory Multiprocessors
-
M. Heath and C. Romine, "Parallel Solution Triangular Systems on Distributed-Memory Multiprocessors," SIAM J. Scientific and Statistical Computing, vol. 9, pp. 558-588, 1988.
-
(1988)
SIAM J. Scientific and Statistical Computing
, vol.9
, pp. 558-588
-
-
Heath, M.1
Romine, C.2
-
29
-
-
0000667923
-
The Torus-Wrap Mapping for Dense Matrix Calculations on Massively Parallel Computers
-
Sept.
-
B. Hendrickson and D. Womble, "The Torus-Wrap Mapping for Dense Matrix Calculations on Massively Parallel Computers," J. Scientific and Statistical Computing, vol. 15, no. 5, pp. 1,201-1,226, Sept. 1994.
-
(1994)
J. Scientific and Statistical Computing
, vol.15
, Issue.5
-
-
Hendrickson, B.1
Womble, D.2
-
31
-
-
0028529387
-
Matrix Multiplication on the Intel Touchstone DELTA
-
S. Huss-Lederman, E. Jacobson, A. Tsao, and G. Zhang, "Matrix Multiplication on the Intel Touchstone DELTA," Concurrency: Practice and Experience, vol. 6, no. 7, pp. 571-594, 1994.
-
(1994)
Concurrency: Practice and Experience
, vol.6
, Issue.7
, pp. 571-594
-
-
Huss-Lederman, S.1
Jacobson, E.2
Tsao, A.3
Zhang, G.4
-
32
-
-
0040831411
-
-
Technical Report UMINF 95-18, Dept. Computing Science, Umeå Univ.
-
B. Kågström, P. Ling, and C. van Loan, "GEMM-Based Level 3 BLAS: High-Performance Model Implementations and Performance Evaluation Benchmark," Technical Report UMINF 95-18, Dept. Computing Science, Umeå Univ., 1995.
-
(1995)
GEMM-Based Level 3 BLAS: High-Performance Model Implementations and Performance Evaluation Benchmark
-
-
Kågström, B.1
Ling, P.2
Van Loan, C.3
-
33
-
-
0029484078
-
Processor Mapping Techniques towards Efficient Data Redistribution
-
E. Kalns and L. Ni, "Processor Mapping Techniques towards Efficient Data Redistribution," IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 6, pp. 1,234-1,247, 1995.
-
(1995)
IEEE Trans. Parallel and Distributed Systems
, vol.12
, Issue.6
-
-
Kalns, E.1
Ni, L.2
-
34
-
-
0029192689
-
A Linear-Time Algorithm for Computing the Memory Access Sequence in Data Parallel Programs
-
K. Kennedy, N. Nedeljković, and A. Sethi, "A Linear-Time Algorithm for Computing the Memory Access Sequence in Data Parallel Programs," Proc. Fifth ACM SIGPLAN, Symp. Principles and Practice of Parallel Programming, 1995.
-
(1995)
Proc. Fifth ACM SIGPLAN, Symp. Principles and Practice of Parallel Programming
-
-
Kennedy, K.1
Nedeljković, N.2
Sethi, A.3
-
35
-
-
0003487717
-
-
Cambridge, Mass.: MIT Press
-
C. Koebel, D. Loveman, R. Schreiber, G. Steele, and M. Zosel, The High Performance Fortran Handbook. Cambridge, Mass.: MIT Press, 1994.
-
(1994)
The High Performance Fortran Handbook
-
-
Koebel, C.1
Loveman, D.2
Schreiber, R.3
Steele, G.4
Zosel, M.5
-
36
-
-
0003901150
-
-
Redwood City, Calif.: Benjamin/Cummings Publishing Company, Inc.
-
V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing. Redwood City, Calif.: Benjamin/Cummings Publishing Company, Inc., 1994.
-
(1994)
Introduction to Parallel Computing
-
-
Kumar, V.1
Grama, A.2
Gupta, A.3
Karypis, G.4
-
37
-
-
0013317481
-
A New Method for Solving Triangular Systems on Distributed-Memory Message-Passing Multiprocessor
-
G. Li and T. Coleman, "A New Method for Solving Triangular Systems on Distributed-Memory Message-Passing Multiprocessor," SIAM J. Scientific and Statistical Computing, vol. 10, no. 2, pp. 382-396, 1989.
-
(1989)
SIAM J. Scientific and Statistical Computing
, vol.10
, Issue.2
, pp. 382-396
-
-
Li, G.1
Coleman, T.2
-
38
-
-
0000828297
-
Block-Cyclic Dense Linear Algebra
-
W. Lichtenstein and S.L. Johnsson, "Block-Cyclic Dense Linear Algebra," SIAM J. Scientific and Statistical Computing, vol. 14, no. 6, pp. 1,259-1,288 1993.
-
(1993)
SIAM J. Scientific and Statistical Computing
, vol.14
, Issue.6
-
-
Lichtenstein, W.1
Johnsson, S.L.2
-
39
-
-
33749927010
-
-
Technical Report CENG 97-10, Dept. Electrical Engineering-Systems, Univ. Southern California, Los Angeles, Calif.
-
Y. Lim, P. Bhat, and V. Prasanna, "Efficient Algorithms for Block-Cyclic Redistribution of Arrays," Technical Report CENG 97-10, Dept. Electrical Engineering-Systems, Univ. Southern California, Los Angeles, Calif., 1997.
-
(1997)
Efficient Algorithms for Block-Cyclic Redistribution of Arrays
-
-
Lim, Y.1
Bhat, P.2
Prasanna, V.3
-
40
-
-
0028464291
-
Multiplication of Matrices of Arbitrary Shapes on a Data Parallel Computer
-
K. Mathur, S.L. Johnsson, "Multiplication of Matrices of Arbitrary Shapes on a Data Parallel Computer," Parallel Computing, vol. 20, pp. 919-951, 1994.
-
(1994)
Parallel Computing
, vol.20
, pp. 919-951
-
-
Mathur, K.1
Johnsson, S.L.2
-
44
-
-
33749948602
-
A High Performance Version of Parallel LAPACK: Preliminary Report
-
Fujitsu Parallel Computing Center
-
P. Strazdins and H. Koesmarno, "A High Performance Version of Parallel LAPACK: Preliminary Report," Proc. Sixth Parallel Computing Workshop, Fujitsu Parallel Computing Center, 1996.
-
(1996)
Proc. Sixth Parallel Computing Workshop
-
-
Strazdins, P.1
Koesmarno, H.2
-
45
-
-
0029218595
-
The SP2 High-Performance Switch
-
C. Stunkel, D. Shea, B. Abali, M. Atkins, C. Bender, D. Grice, P. Hochshild, D. Joseph, B. Nathanson, R. Swetz, R. Stucke, M. Tsao, and P. Varker, "The SP2 High-Performance Switch," IBM Systems J., vol. 34, no. 2, pp. 185-204, 1995.
-
(1995)
IBM Systems J.
, vol.34
, Issue.2
, pp. 185-204
-
-
Stunkel, C.1
Shea, D.2
Abali, B.3
Atkins, M.4
Bender, C.5
Grice, D.6
Hochshild, P.7
Joseph, D.8
Nathanson, B.9
Swetz, R.10
Stucke, R.11
Tsao, M.12
Varker, P.13
-
47
-
-
0031123769
-
SUMMA: Scalable Universal Matrix Multiplication Algorithm
-
R. van de Geijn and J. Watts, "SUMMA: Scalable Universal Matrix Multiplication Algorithm," Concurrency: Practice and Experience, vol. 9, no. 4, pp. 255-274, 1997.
-
(1997)
Concurrency: Practice and Experience
, vol.9
, Issue.4
, pp. 255-274
-
-
Van De Geijn, R.1
Watts, J.2
-
48
-
-
84990712105
-
Experiments with Multicomputer LU Decomposition
-
E. van de Velde, "Experiments with Multicomputer LU Decomposition," Concurrency: Practice and Experience, vol. 2, pp. 1-26, 1990.
-
(1990)
Concurrency: Practice and Experience
, vol.2
, pp. 1-26
-
-
Van De Velde, E.1
-
49
-
-
0030282238
-
Redistribution of Block-Cyclic Data Distributions Using MPI
-
D. Walker and S. Otto, "Redistribution of Block-Cyclic Data Distributions Using MPI," Concurrency: Practice and Experience, vol. 8, no. 9, pp. 707-728, 1996.
-
(1996)
Concurrency: Practice and Experience
, vol.8
, Issue.9
, pp. 707-728
-
-
Walker, D.1
Otto, S.2
-
50
-
-
0010224751
-
Runtime Performance of Parallel Array Assignment: An Empirical Study
-
L. Wang, J. Stichnoth, S. Chatterjee, "Runtime Performance of Parallel Array Assignment: An Empirical Study," Proc. Supercomputing, 1996. (http://www.supercomp.org/sc96/proceedings/).
-
(1996)
Proc. Supercomputing
-
-
Wang, L.1
Stichnoth, J.2
Chatterjee, S.3
|