-
4
-
-
60449096682
-
Mpichv2: A fault tolerant mpi for volatile nodes based on the pessimistic sender based message logging
-
Phoenix, USA
-
A. Bouteiller, F. Cappello, T. Hérault, P. Lemarinier, G. Krawezik, and F. Magniette. Mpichv2: a fault tolerant mpi for volatile nodes based on the pessimistic sender based message logging. In SuperComputing, Phoenix, USA, 2003.
-
(2003)
SuperComputing
-
-
Bouteiller, A.1
Cappello, F.2
Hérault, T.3
Lemarinier, P.4
Krawezik, G.5
Magniette, F.6
-
5
-
-
0042078549
-
A survey of rollback-recovery protocols in message-passing systems
-
E. N. Mootaz Elnozahy, L. Alvisi, Y.-M. Wang, and Johnson D. B. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv., 34(3):375-408, 2002.
-
(2002)
ACM Comput. Surv.
, vol.34
, Issue.3
, pp. 375-408
-
-
Elnozahy, E.N.M.1
Alvisi, L.2
Wang, Y.-M.3
Johnson, D.B.4
-
6
-
-
20444463494
-
Ftc-charm++: An in-memory checkpoint-based fault tolerant runtime for charm++ and mpi
-
San Dieago, CA, September
-
L. V. Kale G. Zheng, L. Shi. Ftc-charm++: An in-memory checkpoint-based fault tolerant runtime for charm++ and mpi. In 2004 IEEE International Conference on Cluster Computing, San Dieago, CA, September 2004.
-
(2004)
2004 IEEE International Conference on Cluster Computing
-
-
Kale, L.V.1
Zheng, G.2
Shi, L.3
-
7
-
-
84908538816
-
Athapascan-1: On-line building data flow graph in a parallel language
-
IEEE, editor, Paris, France, October
-
F. Galilée, J.-L. Roch, G. Cavalheiro, and M. Doreille. Athapascan-1: On-line building data flow graph in a parallel language. In IEEE, editor, PACT'98, pages 88-95, Paris, France, October 1998.
-
(1998)
PACT'98
, pp. 88-95
-
-
Galilée, F.1
Roch, J.-L.2
Cavalheiro, G.3
Doreille, M.4
-
8
-
-
0014477093
-
Bounds on multiprocessing timing anomalies
-
Ronald L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal of Applied Mathematics, 17(2):416-429, 1969.
-
(1969)
SIAM Journal of Applied Mathematics
, vol.17
, Issue.2
, pp. 416-429
-
-
Graham, R.L.1
-
10
-
-
27144462717
-
Using data-flow analysis for resilience and result checking in peer-to-peer computations
-
Zaragoza, Spain, August
-
S. Jafar, S. Varrette, and J.-L. Roch. Using data-flow analysis for resilience and result checking in peer-to-peer computations. In IEEE DEXA'2004, Zaragoza, Spain, August 2004.
-
(2004)
IEEE DEXA'2004
-
-
Jafar, S.1
Varrette, S.2
Roch, J.-L.3
-
11
-
-
0022020346
-
Distributed snapshots: Determining global states of distributed systems
-
Lamport K. M. Chandy. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63-75, 1985.
-
(1985)
ACM Trans. Comput. Syst.
, vol.3
, Issue.1
, pp. 63-75
-
-
Chandy, L.K.M.1
-
12
-
-
0032000230
-
Message logging: Pessimistic, optimistic, causal and optimal
-
Transactions on Software Engineering
-
K. Marzullo L. Alvisi. Message logging: Pessimistic, optimistic, causal and optimal. TSE, 24(2): 149-159, 1998. Transactions on Software Engineering.
-
(1998)
TSE
, vol.24
, Issue.2
, pp. 149-159
-
-
Marzullo, K.1
Alvisi, L.2
-
13
-
-
0003912256
-
Checkpoint and migration of unix processes in the condor distributed processing system
-
Univ. Wisconsin, Madison
-
M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny. Checkpoint and migration of unix processes in the condor distributed processing system. Technical Report CS-TR-97-1346, Univ. Wisconsin, Madison, 1997.
-
(1997)
Technical Report
, vol.CS-TR-97-1346
-
-
Litzkow, M.1
Tannenbaum, T.2
Basney, J.3
Livny, M.4
-
16
-
-
0022112420
-
Optimistic recovery in distributed systems
-
S. Yemini R. Strom. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst., 3(3):204-226, 1985.
-
(1985)
ACM Trans. Comput. Syst.
, vol.3
, Issue.3
, pp. 204-226
-
-
Yemini, S.1
Strom, R.2
-
18
-
-
12444281734
-
A fault tolerant protocol for massively parallel machines
-
IEEE Press
-
L. V. Kale S. Chakravorty. A fault tolerant protocol for massively parallel machines. In FT-PDS Workshop for IPDPS 2004. IEEE Press, 2004.
-
(2004)
FT-PDS Workshop for IPDPS 2004
-
-
Kale, L.V.1
Chakravorty, S.2
-
20
-
-
0012243052
-
Compiler technology for portable checkpoints
-
MIT Laboratory for Computer Science, Cambridge
-
V. Strumpen. Compiler technology for portable checkpoints. Technical Report MA-02139, MIT Laboratory for Computer Science, Cambridge, 1998.
-
(1998)
Technical Report
, vol.MA-02139
-
-
Strumpen, V.1
|