-
3
-
-
0032592492
-
Harness: A next generation distributed virtual machine
-
Micah Beck, Jack J. Dongarra, and Graham E. Fagg. Harness: A next generation distributed virtual machine. Future Generation Computer Systems, 15(5-6):571-582, 1999.
-
(1999)
Future Generation Computer Systems
, vol.15
, Issue.5-6
, pp. 571-582
-
-
Beck, M.1
Dongarra, J.J.2
Fagg, G.E.3
-
4
-
-
0038194608
-
MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes
-
George Bosilca, Aurelien Bouteiller, Samir Djilali, Gilles Fedak, Cecile Germain, Thomas Herault, Vincent Neri, and Anton Selikhov. MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes. In Supercomputing, pages 1-18, 2002.
-
(2002)
Supercomputing
, pp. 1-18
-
-
Bosilca, G.1
Bouteiller, A.2
Djilali, S.3
Fedak, G.4
Germain, C.5
Herault, T.6
Neri, V.7
Selikhov, A.8
-
6
-
-
12344277946
-
The design and implementation of Berkeley lab's linux checkpoint/restart
-
Berkeley, CA 94720
-
Duell, J., Hargrove, P., and Roman, E. The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart. Technical Report LBNL-54941, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, 2002.
-
(2002)
Technical Report LBNL-54941, Lawrence Berkeley National Laboratory
-
-
Duell, J.1
Hargrove, P.2
Roman, E.3
-
7
-
-
34548789748
-
The design and implementation of checkpoint/restart process fault tolerance for open MPI
-
March
-
J. Hursey, J.M. Squyres, T.I. Mattox, and A. Lumsdaine. The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI. In 12th IEEE Workshop on Dependable Parallel, Distributed and Network-Centric Systems, March 2007.
-
(2007)
12th IEEE Workshop on Dependable Parallel, Distributed and Network-Centric Systems
-
-
Hursey, J.1
Squyres, J.M.2
Mattox, T.I.3
Lumsdaine, A.4
-
9
-
-
0003912256
-
Checkpoint and migration of UNIX processes in the condor distributed processing system
-
April, Computer Sciences Department
-
Michael Litzkow, Todd Tannenbaum, Jim Basney, and Miron Livny. Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System. In Technical Report UW-CS-TR-1346, University of Wisconsin-Madison, Computer Sciences Department, April 1997.
-
(1997)
Technical Report UW-CS-TR-1346, University of Wisconsin-Madison
-
-
Litzkow, M.1
Tannenbaum, T.2
Basney, J.3
Livny, M.4
-
10
-
-
74049121711
-
Berkeley lab checkpoint/restart (BLCR) for Linux clusters
-
Paul H. Hargrove and Jason C. Duell. Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters. In SciDAC, 6 2006.
-
(2006)
SciDAC
, vol.6
-
-
Hargrove, P.H.1
Duell, J.C.2
-
11
-
-
0141599174
-
-
Technical report, Knoxville, TN, USA
-
James S. Plank, Micah Beck, Gerry Kingsley, and Kai Li.Libckpt: Transparent checkpointing under unix. Technical report, Knoxville, TN, USA, 1994.
-
(1994)
Libckpt: Transparent Checkpointing Under unix
-
-
Plank, J.S.1
Beck, M.2
Kingsley, G.3
Li, K.4
-
12
-
-
47249116207
-
Group-based coordinated checkpointing for MPI:A case study on infiniband
-
XiAn, China
-
Q. Gao, W. Huang, M. Koop, and D. K. Panda. Groupbased Coordinated Checkpointing for MPI: A Case Study on InfiniBand. In Int'l Conference on Parallel Processing (ICPP), XiAn, China, 9 2007.
-
(2007)
Int'l Conference on Parallel Processing (ICPP)
, vol.9
-
-
Gao, Q.1
Huang, W.2
Koop, M.3
Panda, D.K.4
-
14
-
-
20444444457
-
The LAM/MPI checkpoint/restart framework: System-initiated checkpointing
-
October
-
S. Sankaran and J. M. Squyres and B. Barrett etc. The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing. LACSI, October 2003.
-
(2003)
LACSI
-
-
Sankaran, S.1
Squyres, J.M.2
Barrett, B.3
-
15
-
-
0012243052
-
-
submitted for publication, citeseer.ist.psu.edu/strumpen98compiler.html
-
V. Strumpen. Compiler Technology for Portable Checkpoints. submitted for publication (http://theory.lcs. mit.edu/strumpen/porch.ps.gz). citeseer.ist.psu.edu/strumpen98compiler.html, 1998
-
(1998)
Compiler Technology for Portable Checkpoints
-
-
Trumpen, V.S.1
-
16
-
-
34548768671
-
A job pause service under LAM/MPI+BLCR for transparent fault tolerance
-
Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance. In IPDPS, pages 1-10, 2007.
-
(2007)
IPDPS
, pp. 1-10
-
-
Wang, C.1
Mueller, F.2
Engelmann, C.3
Scott., S.L.4
-
18
-
-
85014969248
-
Architectural requirements and scalability of the NAS parallel benchmarks
-
Frederick C. Wong and Richard P. Martin etc. Architectural requirements and scalability of the NAS parallel benchmarks. In Supercomputing, page 41, 1999
-
(1999)
Supercomputing
, pp. 41
-
-
Wong, F.C.1
Martin, R.P.2
|