-
1
-
-
0029408206
-
The Totem single-ring ordering and membership protocol
-
Nov
-
Y. Amir, L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, and P. Ciarfella. The Totem single-ring ordering and membership protocol. ACM Transactions on Computer Systems, 13(4):311-342, Nov. 1995.
-
(1995)
ACM Transactions on Computer Systems
, vol.13
, Issue.4
, pp. 311-342
-
-
Amir, Y.1
Moser, L.E.2
Melliar-Smith, P.M.3
Agarwal, D.A.4
Ciarfella, P.5
-
2
-
-
84945903089
-
Scalable fault tolerant protocol for paralle runtime environments
-
T. Angskun, G. Fagg, G. Bosilca, J. Pjesivac-Grbovic, and J. Dongarra. Scalable fault tolerant protocol for paralle runtime environments. In Ero PVM/MPI, 2006.
-
(2006)
Ero PVM/MPI
-
-
Angskun, T.1
Fagg, G.2
Bosilca, G.3
Pjesivac-Grbovic, J.4
Dongarra, J.5
-
3
-
-
34548797982
-
-
T. Angskun, G. Fagg, G. Bosilca, J. Pjesivac-Grbovic, and J. Dongarra. Self-healing network for scalable fault tolerant runtime environments. In Austrian-Hungarian Workshop on Distributed and Parallel Systems, 2006.
-
T. Angskun, G. Fagg, G. Bosilca, J. Pjesivac-Grbovic, and J. Dongarra. Self-healing network for scalable fault tolerant runtime environments. In Austrian-Hungarian Workshop on Distributed and Parallel Systems, 2006.
-
-
-
-
4
-
-
12444268370
-
Architecture of LA-MPI, a network-fault-tolerant MPI
-
R. T. Aulwes, D. J. Daniel, N. N. Desai, R. L. Graham, L. D. Risinger, M. A. Taylor, T. S. Woodall, and M. W. Sukalski. Architecture of LA-MPI, a network-fault-tolerant MPI. In Int'l Parallel and Distributed Processing Symposium, 2004.
-
(2004)
Int'l Parallel and Distributed Processing Symposium
-
-
Aulwes, R.T.1
Daniel, D.J.2
Desai, N.N.3
Graham, R.L.4
Risinger, L.D.5
Taylor, M.A.6
Woodall, T.S.7
Sukalski, M.W.8
-
5
-
-
0032203011
-
Coyote: A system for constructing fine-grain configurable communication services
-
N. T. Bhatti, M. A. Hiltunen, R. D. Schlichting, and W. Chiu. Coyote: a system for constructing fine-grain configurable communication services. ACM Trans. Comput. Syst., 16(4):321-366, 1998.
-
(1998)
ACM Trans. Comput. Syst
, vol.16
, Issue.4
, pp. 321-366
-
-
Bhatti, N.T.1
Hiltunen, M.A.2
Schlichting, R.D.3
Chiu, W.4
-
6
-
-
0038194608
-
MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes
-
Nov
-
G. Bosilca, A. Boutellier, and F. Cappello. MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes. In Supercomputing, Nov. 2002.
-
(2002)
Supercomputing
-
-
Bosilca, G.1
Boutellier, A.2
Cappello, F.3
-
9
-
-
84957017252
-
A scalable processmanagement environment for parallel programs
-
R. Butler, W. Gropp, and E. L. Lusk. A scalable processmanagement environment for parallel programs. In Euro PVM/MPI, pages 168-175, 2000.
-
(2000)
Euro PVM/MPI
, pp. 168-175
-
-
Butler, R.1
Gropp, W.2
Lusk, E.L.3
-
12
-
-
34548769215
-
-
J. Duell. The design and implementation of berkeley lab's linux checkpoint/restart. Tr, Lawrence Berkeley National Laboratory, 2000.
-
J. Duell. The design and implementation of berkeley lab's linux checkpoint/restart. Tr, Lawrence Berkeley National Laboratory, 2000.
-
-
-
-
13
-
-
84940567900
-
FT-MPI: Fault Tolerant MPI, supporting dynamic applications in a dynamic world
-
G. E. Fagg and J. J. Dongarra. FT-MPI: Fault Tolerant MPI, supporting dynamic applications in a dynamic world. In Euro PVM/MPI User's Group Meeting, Lecture Notes in Computer Science, volume 1908, pages 346-353, 2000.
-
(2000)
Euro PVM/MPI User's Group Meeting, Lecture Notes in Computer Science
, vol.1908
, pp. 346-353
-
-
Fagg, G.E.1
Dongarra, J.J.2
-
14
-
-
0003487248
-
Strong and weak virtual synchrony in Horus
-
Technical Report TR95-1537, Cornell University, Computer Science Department, Aug. 24
-
R. Friedman and R. van Renesse. Strong and weak virtual synchrony in Horus. Technical Report TR95-1537, Cornell University, Computer Science Department, Aug. 24, 1995.
-
(1995)
-
-
Friedman, R.1
van Renesse, R.2
-
15
-
-
34548755483
-
A checkpoint and restart service specification for open mpi
-
Technical report, Indiana University, Computer Science Department
-
J. Hursey, J. M. Squyres, and A. Lumsdaine. A checkpoint and restart service specification for open mpi. Technical report, Indiana University, Computer Science Department, 2006.
-
(2006)
-
-
Hursey, J.1
Squyres, J.M.2
Lumsdaine, A.3
-
16
-
-
34548033627
-
Personal communications. Ruud Haring
-
July
-
IBM T.J. Watson. Personal communications. Ruud Haring, July 2005.
-
(2005)
-
-
Watson, I.T.J.1
-
18
-
-
0002695959
-
Remote unix - turning idle workstations into cycle servers
-
M. Litzkow. Remote unix - turning idle workstations into cycle servers. In Usenix Summer Conference, pages 381-384, 1987.
-
(1987)
Usenix Summer Conference
, pp. 381-384
-
-
Litzkow, M.1
-
19
-
-
0003912256
-
Checkpoint and migration of UNIX processes in the Condor distributed processing system
-
Technical Report UW-CS-TR-1346, University of Wisconsin, Madison Computer Sciences Department, April
-
M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny. Checkpoint and migration of UNIX processes in the Condor distributed processing system. Technical Report UW-CS-TR-1346, University of Wisconsin - Madison Computer Sciences Department, April 1997.
-
(1997)
-
-
Litzkow, M.1
Tannenbaum, T.2
Basney, J.3
Livny, M.4
-
22
-
-
85084159983
-
-
J. S. Plank, M. Beck, G. Kingsley, and K. Li. Libckpt: Transparent checkpointing under Unix. In Usenix Winter Technical Conference, pages 213-223, January 1995.
-
J. S. Plank, M. Beck, G. Kingsley, and K. Li. Libckpt: Transparent checkpointing under Unix. In Usenix Winter Technical Conference, pages 213-223, January 1995.
-
-
-
-
24
-
-
20444444457
-
The LAM/MPI check-point/restart framework: System-initiated checkpointing
-
Sante Fe, New Mexico, USA, October
-
S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, J. Duell, P. Hargrove, and E. Roman. The LAM/MPI check-point/restart framework: System-initiated checkpointing. In Proceedings, LACSI Symposium, Sante Fe, New Mexico, USA, October 2003.
-
(2003)
Proceedings, LACSI Symposium
-
-
Sankaran, S.1
Squyres, J.M.2
Barrett, B.3
Lumsdaine, A.4
Duell, J.5
Hargrove, P.6
Roman, E.7
-
25
-
-
27844564536
-
Request progression interface (RPI) system services interface (SSI) modules for LAM/MPI
-
Technical Report TR579, Indiana University, Computer Science Department
-
J. M. Squyres, B. Barrett, and A. Lumsdaine. Request progression interface (RPI) system services interface (SSI) modules for LAM/MPI. Technical Report TR579, Indiana University, Computer Science Department, 2003.
-
(2003)
-
-
Squyres, J.M.1
Barrett, B.2
Lumsdaine, A.3
-
26
-
-
35248827046
-
A component architecture for lam/mpi
-
European PVM/MPI Users' Group Meeting, number in, Springer-Verlag, Sep/Oct
-
J. M. Squyres and A. Lumsdaine. A component architecture for lam/mpi. In European PVM/MPI Users' Group Meeting, number 2840 in Lecture Notes in Computer Science, pages 379-387. Springer-Verlag, Sep/Oct 2003.
-
(2003)
Lecture Notes in Computer Science
, vol.2840
, pp. 379-387
-
-
Squyres, J.M.1
Lumsdaine, A.2
-
27
-
-
0029713612
-
CoCheck: Checkpointing and process migration for MPI. In IEEE, editor
-
G. Stellner. CoCheck: checkpointing and process migration for MPI. In IEEE, editor, International Parallel Processing Symposium, pages 526-531, 1996.
-
(1996)
International Parallel Processing Symposium
, pp. 526-531
-
-
Stellner, G.1
-
29
-
-
34547440282
-
Scalable, fault-tolerant membership for mpi tasks on hpc systems
-
June
-
J. Varma, C. Wang, F. Mueller, C. Engelmann, and S. L. Scott. Scalable, fault-tolerant membership for mpi tasks on hpc systems. In International Conference on Supercomputing, pages 219-228, June 2006.
-
(2006)
International Conference on Supercomputing
, pp. 219-228
-
-
Varma, J.1
Wang, C.2
Mueller, F.3
Engelmann, C.4
Scott, S.L.5
-
30
-
-
80052332150
-
Large scale parallel structured amr calculations using the samrai framework
-
Nov
-
A. Wissink, R. Hornung, S. Kohn, and S. Smith. Large scale parallel structured amr calculations using the samrai framework. In Supercomputing, Nov. 2001.
-
(2001)
Supercomputing
-
-
Wissink, A.1
Hornung, R.2
Kohn, S.3
Smith, S.4
-
32
-
-
84976846528
-
A first order approximation to the optimum checkpoint interval
-
J. W. Young. A first order approximation to the optimum checkpoint interval. Commun. ACM, 17(9):530-531, 1974.
-
(1974)
Commun. ACM
, vol.17
, Issue.9
, pp. 530-531
-
-
Young, J.W.1
|