-
1
-
-
7444229864
-
The cascade high productivity language
-
The cascade high productivity language, hips, 00: 52-60, 2004.
-
(2004)
Hips
, vol.0
, pp. 52-60
-
-
-
2
-
-
0028757636
-
A high performance parallel algorithm for 1-d fft
-
Los Alamitos, CA, USA, IEEE Computer Society Press
-
R. C. Agarwal, F. G. Gustavson, and M. Zubair. A high performance parallel algorithm for 1-d fft. In Supercomputing '94: Proceedings of the 1994 conference on Supercomputing, pages 34-40, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press.
-
(1994)
Supercomputing '94: Proceedings of the 1994 Conference on Supercomputing
, pp. 34-40
-
-
Agarwal, R.C.1
Gustavson, F.G.2
Zubair, M.3
-
3
-
-
33646421297
-
-
Sun Microsystems, Inc., 1.0α edition, Sept.
-
E. Allen, D. Chase, J. Hallett, V. Luchangco, J.-W. Maessen, S. Ryu, G. L. Steele Jr., and S. Tobin-Hochstadt. The Fortress Language Specification. Sun Microsystems, Inc., 1.0α edition, Sept. 2006.
-
(2006)
The Fortress Language Specification.
-
-
Allen, E.1
Chase, D.2
Hallett, J.3
Luchangco, V.4
Maessen, J.-W.5
Ryu, S.6
Steele Jr., G.L.7
Tobin-Hochstadt, S.8
-
4
-
-
21044456455
-
Design and implementation of message-passing services for the Blue Gene/L supercomputer
-
March/May
-
G. Almási, C. Archer, J. G. C. nos, J. A. Gunnels, C. C. Erway, P. Heidelberger, X. Martorell, J. E. Moreira, K. Pinnow, J. Ratterman, B. SteinmacherBurow, W. Gropp, and B. Toonen. Design and implementation of message-passing services for the Blue Gene/L supercomputer. IBM Journal of Research and Development, 49(2/3): 393-406, March/May 2005. Available at http://www.research.ibm.com/journal/rd49-23.html.
-
(2005)
IBM Journal of Research and Development
, vol.49
, Issue.2-3
, pp. 393-406
-
-
Almási, G.1
Archer, C.2
Nos, J.G.C.3
Gunnels, J.A.4
Erway, C.C.5
Heidelberger, P.6
Martorell, X.7
Moreira, J.E.8
Pinnow, K.9
Ratterman, J.10
SteinmacherBurow, B.11
Gropp, W.12
Toonen, B.13
-
5
-
-
32844464238
-
Optimization of mpi collective communication on bluegene/1 systems
-
New York, NY, USA, ACM Press
-
G. Almási, P. Heidelberger, C. J. Archer, X. Martorell, C. C. Erway, J. E. Moreira, B. Steinmacher-Burow, and Y. Zheng. Optimization of mpi collective communication on bluegene/1 systems. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, pages 253-262, New York, NY, USA, 2005. ACM Press.
-
(2005)
ICS '05: Proceedings of the 19th Annual International Conference on Supercomputing
, pp. 253-262
-
-
Almási, G.1
Heidelberger, P.2
Archer, C.J.3
Martorell, X.4
Erway, C.C.5
Moreira, J.E.6
Steinmacher-Burow, B.7
Zheng, Y.8
-
6
-
-
34548784885
-
Nonuniformly communicating noncontiguous data: A case study with petsc and mpi
-
P. Balaji, D. Buntinas, S. Balay, B. Smith, R. Thakur, and W Gropp. Nonuniformly communicating noncontiguous data: A case study with petsc and mpi. In IEEE Parallel and Distributed Processing Symposium (IPDPS), 2006.
-
(2006)
IEEE Parallel and Distributed Processing Symposium (IPDPS)
-
-
Balaji, P.1
Buntinas, D.2
Balay, S.3
Smith, B.4
Thakur, R.5
Gropp, W.6
-
7
-
-
0003660984
-
PETSc users manual
-
Argonne National Laboratory
-
S. Balay, K. Buschelman, V. Eijkhout, W D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 2.1.5, Argonne National Laboratory, 2004.
-
(2004)
Technical Report ANL-95/11 - Revision 2.1.5
-
-
Balay, S.1
Buschelman, K.2
Eijkhout, V.3
Gropp, W.D.4
Kaushik, D.5
Knepley, M.G.6
McInnes, L.C.7
Smith, B.F.8
Zhang, H.9
-
8
-
-
33746070421
-
Shared memory programming for large scale machines
-
Ottawa, Canada
-
C. Barton, C. Caşcaval, G. Almási, Y. Zheng, M. Farreras, S. Chatterjee, and J. N. Amaral. Shared memory programming for large scale machines. In Programming Language Design and Implementation (PLDI), Ottawa, Canada, 2006.
-
(2006)
Programming Language Design and Implementation (PLDI)
-
-
Barton, C.1
Caşcaval, C.2
Almási, G.3
Zheng, Y.4
Farreras, M.5
Chatterjee, S.6
Amaral, J.7
-
9
-
-
79959417706
-
Multidimensional blocking in UPC
-
IBM, July
-
C. Barton, C. Cascaval, G. Almasi, R. Garg, and J. N. Amaral. Multidimensional blocking in UPC. Technical Report RC24305, IBM, July 2007.
-
(2007)
Technical Report RC24305
-
-
Barton, C.1
Cascaval, C.2
Almasi, G.3
Garg, R.4
Amaral, J.5
-
11
-
-
79959386767
-
-
The Berkeley UPC Compiler
-
The Berkeley UPC Compiler, 2002. http : //upc.1b1.gov.
-
(2002)
-
-
-
12
-
-
79959485086
-
-
BLAS Home Page
-
BLAS Home Page, http://www.netlib.org/blas/.
-
-
-
-
16
-
-
0009930394
-
ZPL: A machine independent programming language for parallel computers
-
B. L. Chamberlain, S.-E. Choi, E. C. Lewis, C. Lin, L. Snyder, and D. Weathersby. ZPL: A machine independent programming language for parallel computers. Software Engineering, 26(3): 197-211, 2000.
-
(2000)
Software Engineering
, vol.26
, Issue.3
, pp. 197-211
-
-
Chamberlain, B.L.1
Choi, S.-E.2
Lewis, E.C.3
Lin, C.4
Snyder, L.5
Weathersby, D.6
-
17
-
-
1142293067
-
A performance analysis of the berkeley UPC compiler
-
June
-
W Chen, D. Bonachea, J. Duell, P. Husband, C. Iancu, and K. Yelick. A Performance Analysis of the Berkeley UPC Compiler. In Proc. of Int'l Conference on Supercomputing (ICS), June 2003.
-
(2003)
Proc. of Int'l Conference on Supercomputing (ICS)
-
-
Chen, W.1
Bonachea, D.2
Duell, J.3
Husband, P.4
Iancu, C.5
Yelick, K.6
-
18
-
-
84947808952
-
A proposal for a set of parallel basic linear algebra subprograms
-
London, UK, Springer-Verlag
-
J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. W Walker, and R. C. Whaley. A proposal for a set of parallel basic linear algebra subprograms. In PARA '95: Proceedings of the Second International Workshop on Applied Parallel Computing, Computations in Physics, Chemistry and Engineering Science, pages 107-114, London, UK, 1996. Springer-Verlag.
-
(1996)
PARA '95: Proceedings of the Second International Workshop on Applied Parallel Computing, Computations in Physics, Chemistry and Engineering Science
, pp. 107-114
-
-
Choi, J.1
Dongarra, J.2
Ostrouchov, S.3
Petitet, A.4
Walker, D.W.5
Whaley, R.C.6
-
19
-
-
31844441256
-
An evaluation of global address space languages: Co-array fortran and unified parallel c
-
New York, NY, USA, ACM Press
-
C. Coarfa, Y Dotsenko, J. Mellor-Crummey, F. Cantonnet, T. ElGhazawi, A. Mohanti, Y. Yao, and D. Chavarría-Miranda. An evaluation of global address space languages: co-array fortran and unified parallel c. In PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 36-47, New York, NY, USA, 2005. ACM Press.
-
(2005)
PPoPP '05: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 36-47
-
-
Coarfa, C.1
Dotsenko, Y.2
Mellor-Crummey, J.3
Cantonnet, F.4
ElGhazawi, T.5
Mohanti, A.6
Yao, Y.7
Chavarría-Miranda, D.8
-
20
-
-
0009346826
-
LogP: Towards a realistic model of parallel computation
-
D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a realistic model of parallel computation. In Proc. 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1-12, 1993.
-
(1993)
Proc. 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 1-12
-
-
Culler, D.E.1
Karp, R.M.2
Patterson, D.A.3
Sahay, A.4
Schauser, K.E.5
Santos, E.6
Subramonian, R.7
Von Eicken, T.8
-
22
-
-
80052802178
-
Upc performance and potential: A npb experimental study
-
Los Alamitos, CA, USA, IEEE Computer Society Press
-
T. El-Ghazawi and F. Cantonnet. Upc performance and potential: a npb experimental study. In Supercomputing '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1-26, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.
-
(2002)
Supercomputing '02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing
, pp. 1-26
-
-
El-Ghazawi, T.1
Cantonnet, F.2
-
23
-
-
54249097779
-
-
ESSL User Guide. http://www-03.ibm.com/systems/p/software/essl.html.
-
ESSL User Guide.
-
-
-
24
-
-
27144559253
-
ScaLAPACK: A linear algebra library for messagepassing computers
-
Minneapolis, MN, (electronic), Philadelphia, PA, USA, 1997. Society for Industrial and Applied Mathematics
-
L. S. B. et al. ScaLAPACK: a linear algebra library for messagepassing computers. In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing (Minneapolis, MN, 1997), page 15 (electronic), Philadelphia, PA, USA, 1997. Society for Industrial and Applied Mathematics.
-
(1997)
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing
, pp. 15
-
-
Sanoj, L.S.1
-
25
-
-
20744449792
-
The design and implementation of FFTW3
-
DOI 10.1109/JPROC.2004.840301, Program Generation, Optimization and Platform Adaptation
-
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2): 216-231, 2005. special issue on "Program Generation, Optimization, and Platform Adaptation". (Pubitemid 40851223)
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 216-231
-
-
Frigo, M.1
Johnson, S.G.2
-
26
-
-
21044437801
-
Overview of the BlueGene/L system architecture
-
A. Gara, M. A. Blumrich, D. Chen, G. L.-T. Chiu, P. Coteus, M. Giampapa, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-burow, T. Takken, and P. Vranas. Overview of the BlueGene/L system architecture. IBM Journal of Research and Development, 49(2/3): 195-212, 2005.
-
(2005)
IBM Journal of Research and Development
, vol.49
, Issue.2-3
, pp. 195-212
-
-
Gara, A.1
Blumrich, M.A.2
Chen, D.3
Chiu, G.L.-T.4
Coteus, P.5
Giampapa, M.6
Haring, R.A.7
Heidelberger, P.8
Hoenicke, D.9
Kopcsay, G.V.10
Liebsch, T.A.11
Ohmacht, M.12
Steinmacher-burow, B.D.13
Takken, T.14
Vranas, P.15
-
28
-
-
1142307058
-
-
Tech Report UCB/CSD-01-1163, U.C. Berkeley, November
-
P. Hilfinger, D. Bonachea, D. Gay, S. Graham, B. Liblit, G. Pike, and K. Yelick. Titanium language reference manual. Tech Report UCB/CSD-01-1163, U.C. Berkeley, November 2001.
-
(2001)
Titanium Language Reference Manual
-
-
Hilfinger, P.1
Bonachea, D.2
Gay, D.3
Graham, S.4
Liblit, B.5
Pike, G.6
Yelick, K.7
-
31
-
-
0004235292
-
-
T. MathWorks
-
T. MathWorks. Using matlab, 1997.
-
(1997)
Using Matlab
-
-
-
32
-
-
79959416586
-
-
Message Passing Interface
-
Message Passing Interface. http://www.mpiforum.org/docs/docs.html.
-
-
-
-
33
-
-
22144436121
-
The cholesky decomposition
-
chapter 7, Bristol, England: Adam Hilger, 2nd edition
-
J. C. Nash. "The Cholesky Decomposition." In Compact Numerical Methods for Computers: Linear Algebra and Function Minimisation, chapter 7, pages 84-93. Bristol, England: Adam Hilger, 2nd edition, 1990.
-
(1990)
Compact Numerical Methods for Computers: Linear Algebra and Function Minimisation
, pp. 84-93
-
-
Nash, J.C.1
-
34
-
-
0002081678
-
Co-array fortran for parallel programming
-
R. W. Numrich and J. Reid. Co-array fortran for parallel programming. ACMFortran Forum, 17(2): 1 -31, 1998.
-
(1998)
ACMFortran Forum
, vol.17
, Issue.2
, pp. 1-31
-
-
Numrich, R.W.1
Reid, J.2
-
35
-
-
0002081678
-
Co-array fortran for parallel programming
-
R. W. Numrich and J. Reid. Co-array fortran for parallel programming. SIGPLAN Fortran Forum, 17(2): 1-31, 1998.
-
(1998)
SIGPLAN Fortran Forum
, vol.17
, Issue.2
, pp. 1-31
-
-
Numrich, R.W.1
Reid, J.2
-
37
-
-
19344368072
-
SPIRAL: Code generation for DSP transforms
-
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93 (2): 23 2-275, 2005.
-
(2005)
Proceedings of the IEEE, Special Issue on Program Generation, Optimization, and Adaptation
, vol.93
, Issue.2
, pp. 232-275
-
-
Püschel, M.1
Moura, J.M.F.2
Johnson, J.3
Padua, D.4
Veloso, M.5
Singer, B.W.6
Xiong, J.7
Franchetti, F.8
Gačić, A.9
Voronenko, Y.10
Chen, K.11
Johnson, R.W.12
Rizzolo, N.13
-
38
-
-
33847138695
-
Efficient rdma-based multi-port collectives on multi-rail qsnetii clusters
-
Proceedin gs of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006)
-
Y. Qian and A. Afsahi. Efficient rdma-based multi-port collectives on multi-rail qsnetii clusters. In The 6th Workshop on Communication Architecture for Clusters (CAC 2006), In Proceedin gs of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006.
-
(2006)
The 6th Workshop on Communication Architecture for Clusters (CAC 2006)
-
-
Qian, Y.1
Afsahi, A.2
-
39
-
-
79959485085
-
A specification of the extensions to the collective operations of unified parallel c
-
Michigan Technological University, Department of Computer Science
-
Z. Ryne and S. Seidel. A specification of the extensions to the collective operations of unified parallel c. Technical Report Technical Report 05-08, Michigan Technological University, Department of Computer Science, 2005.
-
(2005)
Technical Report Technical Report 05-08
-
-
Ryne, Z.1
Seidel, S.2
-
40
-
-
33746613581
-
Co-array collectives: Refined semantics for co-array fortran
-
V. N. Alexandrov, G. D. van Albada, P. M. A. Sloot, and J. Dongarra, editors, Springer
-
M. J. Sottile, C. E. Rasmussen, and R. L. Graham. Co-array collectives: Refined semantics for co-array fortran. In V. N. Alexandrov, G. D. van Albada, P. M. A. Sloot, and J. Dongarra, editors, International Conference on Computational Science (2), volume 3992 of Lecture Notes in Computer Science, pages 945-952. Springer, 2006.
-
(2006)
International Conference on Computational Science (2), Volume 3992 of Lecture Notes in Computer Science
, pp. 945-952
-
-
Sottile, M.J.1
Rasmussen, C.E.2
Graham, R.L.3
-
42
-
-
4344655318
-
Performance modeling for self adapting collective communications for mpi
-
S. S. Vadhiyar, G. E. Fagg, and J. J. Dongarra. Performance modeling for self adapting collective communications for mpi. In LACSI Symposium, 2001.
-
(2001)
LACSI Symposium
-
-
Vadhiyar, S.S.1
Fagg, G.E.2
Dongarra, J.J.3
-
44
-
-
24344485098
-
OSKI: A library of automatically tuned sparse matrix kernels
-
San Francisco, CA, USA, June 2005. Institute of Physics Publishing
-
R. Vuduc, J. W Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proceedings of SciDAC 2005, Journal of Physics: Conference Series, San Francisco, CA, USA, June 2005. Institute of Physics Publishing.
-
Proceedings of SciDAC 2005, Journal of Physics: Conference Series
-
-
Vuduc, R.1
Demmel, J.W.2
Yelick, K.A.3
-
45
-
-
0343462141
-
Automated empirical optimizations of software and the ATLAS project
-
DOI 10.1016/S0167-8191(00)00087-9
-
R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2): 3-35, 2001. (Pubitemid 32264775)
-
(2001)
Parallel Computing
, vol.27
, Issue.1-2
, pp. 3-35
-
-
Clint, W.R.1
Petitet, A.2
Dongarra, J.J.3
-
48
-
-
0032155556
-
Titanium: A high-performance java dialect
-
September-November
-
K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance java dialect. Concurrency: Practice and Experience, 10(11-13), September-November 1998.
-
(1998)
Concurrency: Practice and Experience
, vol.10
, Issue.11-13
-
-
Yelick, K.1
Semenzato, L.2
Pike, G.3
Miyamoto, C.4
Liblit, B.5
Krishnamurthy, A.6
Hilfinger, P.7
Graham, S.8
Gay, D.9
Colella, P.10
Aiken, A.11
|