-
1
-
-
0029713612
-
Cocheck: Checkpointing and process migration for MPI
-
Los Alamitos, CA, USA, April, IEEE Computer Society Press
-
G. Stellner. Cocheck: Checkpointing and process migration for MPI. In Proceedings of International Parallel Processing Symposium, pages 526-531, Los Alamitos, CA, USA, April 1996. IEEE Computer Society Press.
-
(1996)
Proceedings of International Parallel Processing Symposium
, pp. 526-531
-
-
Stellner, G.1
-
3
-
-
0032597696
-
Egida: An extensible toolkit for low-overhead fault-tolerance
-
Madison, WI, June
-
S. Rao, L. Alvisi, and H. Vin. Egida: An extensible toolkit for low-overhead fault-tolerance. In Proceedings of IEEE Fault-Tolerant Computing Symposium (FTCS-29), Madison, WI, June 1999.
-
(1999)
Proceedings of IEEE Fault-Tolerant Computing Symposium (FTCS-29)
-
-
Rao, S.1
Alvisi, L.2
Vin, H.3
-
4
-
-
84940567900
-
FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world
-
G. Fagg and J. Dongarra. FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world. In Euro PVM/MPI User's Group Meeting 2000, 346-353, Berlin, Germany, 2000.
-
(2000)
Euro PVM/MPI User's Group Meeting 2000, 346-353, Berlin, Germany
-
-
Fagg, G.1
Dongarra, J.2
-
5
-
-
0034439137
-
MPI-FT: Portable fault tolerance scheme for MPI
-
S. Louca, N. Neophytou, A. Lachanas, and P. Evripidou. MPI-FT: Portable fault tolerance scheme for MPI. Parallel Processing Letters, 10(4): 371-382, 2000.
-
(2000)
Parallel Processing Letters
, vol.10
, Issue.4
, pp. 371-382
-
-
Louca, S.1
Neophytou, N.2
Lachanas, A.3
Evripidou, P.4
-
6
-
-
84884662651
-
MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes
-
November
-
G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fedak, C. Germain, T. Herault, P. Lemarinier, O. Lodygensky, F. Magniette, V. Neri, and A. Selikhov. MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes. In Proceedings of SuperComputing 2002 (SC2002), November 2002.
-
(2002)
Proceedings of SuperComputing 2002 (SC2002)
-
-
Bosilca, G.1
Bouteiller, A.2
Cappello, F.3
Djilali, S.4
Fedak, G.5
Germain, C.6
Herault, T.7
Lemarinier, P.8
Lodygensky, O.9
Magniette, F.10
Neri, V.11
Selikhov, A.12
-
7
-
-
20444444457
-
The LAM/MPI checkpoint/restart framework: System-initiated checkpointing
-
Sante Fe, USA, October
-
S. Sankaran, J.M. Squyres, B. Barrett, A. Lumsdaine, J. Duell, P. Hargrove, and E. Roman. The LAM/MPI checkpoint/restart framework: System-initiated checkpointing. In Proceedings of LACSI Symposium, Sante Fe, USA, October 2003.
-
(2003)
Proceedings of LACSI Symposium
-
-
Sankaran, S.1
Squyres, J.M.2
Barrett, B.3
Lumsdaine, A.4
Duell, J.5
Hargrove, P.6
Roman, E.7
-
8
-
-
12444268370
-
Architecture of LA-MPI, a network-fault-tolerant MPI
-
IEEE, April
-
R.T. Aulwes, D.J. Daniel, N.N. Desai, R.L. Graham, L.D. Risinger, M.A. Taylor, T.S.Woodall, and M.W. Sukalski. Architecture of LA-MPI, a network-fault-tolerant MPI. In Proceedings of 18th International Parallel and Distributed Processing Symposium. IEEE, April 2004.
-
(2004)
Proceedings of 18th International Parallel and Distributed Processing Symposium
-
-
Aulwes, R.T.1
Daniel, D.J.2
Desai, N.N.3
Graham, R.L.4
Risinger, L.D.5
Taylor, M.A.6
Woodall, T.S.7
Sukalski, M.W.8
-
9
-
-
77954003885
-
MPI/FT: Architecture and taxonomies for fault-tolerant, message-passing middleware for performance portable parallel computing
-
Melbourne, Australia
-
R. Batchu, J. Neelamegam, Z. Cui, M. Beddhua, et. al, MPI/FT: Architecture and taxonomies for fault-tolerant, message-passing middleware for performance portable parallel computing. In Proceedings of the 1st IEEE International Symposium of Cluster Computing and the Grid, Melbourne, Australia, 2001
-
(2001)
Proceedings of the 1st IEEE International Symposium of Cluster Computing and the Grid
-
-
Batchu, R.1
Neelamegam, J.2
Cui, Z.3
Beddhua, M.4
et., al.5
-
10
-
-
0042078549
-
A survey of rollback-recovery protocols in message-passing systems
-
E.N. Elnozahy, L. Alvisi, Y.M. Wang and D.B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Computer Survey, 34(3):375-408, 2002.
-
(2002)
ACM Computer Survey
, vol.34
, Issue.3
, pp. 375-408
-
-
Elnozahy, E.N.1
Alvisi, L.2
Wang, Y.M.3
Johnson, D.B.4
|