-
2
-
-
0009228886
-
Understanding the message logging paradigm for masking process crashes
-
L. Alvisi. Understanding the message logging paradigm for masking process crashes. Technical Report TR96-1577, 1, 1996.
-
(1996)
Technical Report
, vol.TR96-1577
, pp. 1
-
-
Alvisi, L.1
-
3
-
-
0032597670
-
An analysis of communication induced checkpointing
-
L. Alvisi, E. N. Elnozahy, S. Rao, S. A. Husain, and A. D. Mel. An analysis of communication induced checkpointing. In Symposium on Fault-Tolerant Computing, pages 242-249, 1999.
-
(1999)
Symposium on Fault-tolerant Computing
, pp. 242-249
-
-
Alvisi, L.1
Elnozahy, E.N.2
Rao, S.3
Husain, S.A.4
Mel, A.D.5
-
5
-
-
0024606852
-
Fault tolerance under unix
-
February
-
A. Borg, W. Blau, W. Graetsch, F. Herrmann, and W. Oberle. Fault tolerance under unix. In ACM Transactions on Computer Systems, pages 1-24, February 1989.
-
(1989)
ACM Transactions on Computer Systems
, pp. 1-24
-
-
Borg, A.1
Blau, W.2
Graetsch, W.3
Herrmann, F.4
Oberle, W.5
-
6
-
-
4344718367
-
Toward a scalable fault tolerant mpi for volatile nodes
-
IEEE
-
G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fedak, C. Germain, T. Herault, P. Lemarinier, O. Lodygensky, F. Magniette, V. Neri, and A. Selikhov. Toward a scalable fault tolerant mpi for volatile nodes. In Proceedings of SC 2002. IEEE, 2002.
-
(2002)
Proceedings of SC 2002
-
-
Bosilca, G.1
Bouteiller, A.2
Cappello, F.3
Djilali, S.4
Fedak, G.5
Germain, C.6
Herault, T.7
Lemarinier, P.8
Lodygensky, O.9
Magniette, F.10
Neri, V.11
Selikhov, A.12
-
7
-
-
60449096682
-
MPICH-V2: A fault tolerant MPI for volatile nodes based on pessimistic sender based message logging
-
A. Bouteiller, F. Cappello, T. Herault, G. Krawezik, P. Lemarinier, and F. Magniette. MPICH-V2: a fault tolerant MPI for volatile nodes based on pessimistic sender based message logging. In SC 2003.
-
SC 2003
-
-
Bouteiller, A.1
Cappello, F.2
Herault, T.3
Krawezik, G.4
Lemarinier, P.5
Magniette, F.6
-
9
-
-
0021538527
-
A distributed domino-effect free recovery algorithm
-
December
-
D. Briatico, A. Ciuffoletti, and L. Simoncini. A distributed domino-effect free recovery algorithm. In IEEE International Symposium on Reliability, Distributed Software, and Databases, pages 207-215, December 1984.
-
(1984)
IEEE International Symposium on Reliability, Distributed Software, and Databases
, pp. 207-215
-
-
Briatico, D.1
Ciuffoletti, A.2
Simoncini, L.3
-
11
-
-
0022020346
-
Distributed snapshots: Determining global states of distributed systems
-
February
-
K. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. In ACM Transactions on Computer Systems, pages 3(1):63-75, February 1985.
-
(1985)
ACM Transactions on Computer Systems
, vol.3
, Issue.1
, pp. 63-75
-
-
Chandy, K.1
Lamport, L.2
-
14
-
-
0004096191
-
A survey of rollback-recovery protocols in message passing systems
-
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, Oct.
-
M. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message passing systems. Technical Report CMU-CS-96-181, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, Oct. 1996.
-
(1996)
Technical Report
, vol.CMU-CS-96-181
-
-
Elnozahy, M.1
Alvisi, L.2
Wang, Y.M.3
Johnson, D.B.4
-
15
-
-
84940567900
-
FT-MPI: Fault tolerant MPI, supporting dynamic applications in dynamic world
-
S. Verlag, editor, Berlin, Germany
-
G. Fagg and J. Dongarra. FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in Dynamic World. In S. Verlag, editor, Euro PVM/MPI User's Group Meeting, pages 346-353, Berlin, Germany, 2000.
-
(2000)
Euro PVM/MPI User's Group Meeting
, pp. 346-353
-
-
Fagg, G.1
Dongarra, J.2
-
16
-
-
12444260048
-
Adaptive MPI
-
College Station, Texas, October
-
C. Huang, O. Lawlor, and L. V. Kalé. Adaptive MPI. In LCPC, College Station, Texas, October 2003.
-
(2003)
LCPC
-
-
Huang, C.1
Lawlor, O.2
Kalé, L.V.3
-
18
-
-
0002479236
-
Charm++: Parallel programming with message-driven objects
-
G. V. Wilson and P. Lu, editors, MIT Press
-
L. V. Kale and S. Krishnan. Charm++: Parallel Programming with Message-Driven Objects. In G. V. Wilson and P. Lu, editors, Parallel Programming using C++, pages 175-213. MIT Press, 1996.
-
(1996)
Parallel Programming Using C++
, pp. 175-213
-
-
Kale, L.V.1
Krishnan, S.2
-
19
-
-
84976815497
-
Fail-stop processors: An approach to designing fault-tolerant computing systems
-
R. D. Schlichting and F. B. Schneider. Fail-stop processors: An approach to designing fault-tolerant computing systems. ACM Transactions on Computer Systems, 1(3):222-238, 1983.
-
(1983)
ACM Transactions on Computer Systems
, vol.1
, Issue.3
, pp. 222-238
-
-
Schlichting, R.D.1
Schneider, F.B.2
-
20
-
-
0029713612
-
CoCheck: Checkpointing and process migration for MPI
-
Honolulu, Hawaii
-
G. Stellner. CoCheck: Checkpointing and Process Migration for MPI. In Proceedings of the 10th IPPS, Honolulu, Hawaii, 1996.
-
(1996)
Proceedings of the 10th IPPS
-
-
Stellner, G.1
-
21
-
-
0022112420
-
Optimistic recovery in distributed systems
-
R. Strom and S. Yemini. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst., 3(3):204-226, 1985.
-
(1985)
ACM Trans. Comput. Syst.
, vol.3
, Issue.3
, pp. 204-226
-
-
Strom, R.1
Yemini, S.2
-
24
-
-
12444339819
-
Bigsim: A parallel simulator for performance prediction of extremely large parallel machines
-
Santa Fe, New Mexico, April
-
G. Zheng, G. Kakulapati, and L. V. Kalé. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In 2004 IPDPS Conference, Santa Fe, New Mexico, April 2004.
-
(2004)
2004 IPDPS Conference
-
-
Zheng, G.1
Kakulapati, G.2
Kalé, L.V.3
|