-
2
-
-
0029237761
-
Message logging: Pessimistic, optimistic, and causal
-
IEEE CS Press
-
Alvisi, L. and Marzullo, K. (1995) ‘Message logging: Pessimistic, optimistic, and causal’, in Proceedings of the 15th International Conference on Distributed Computing Systems (ICDCS 1995), IEEE CS Press, May–June, pp.229–236.
-
(1995)
Proceedings of the 15th International Conference on Distributed Computing Systems (ICDCS 1995)
, vol.May–June
, pp. 229-236
-
-
Alvisi, L.1
Marzullo, K.2
-
3
-
-
0032597670
-
An analysis of communication induced checkpointing
-
IEEE CS Press
-
Alvisi, L., Elnozahy, E., Rao, S., Husain, S.A. and Mel, A.D. (1999) ‘An analysis of communication induced checkpointing’, in Proceedings of the 29th Symposium on Fault-Tolerant Computing (FTCS’99), IEEE CS Press.
-
(1999)
Proceedings of the 29th Symposium on Fault-Tolerant Computing (FTCS’99)
-
-
Alvisi, L.1
Elnozahy, E.2
Rao, S.3
Husain, S.A.4
Mel, A.D.5
-
4
-
-
0003605996
-
-
Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, Report NAS-95–020
-
Bailey, D., Harris, T., Saphir, W., Wijngaart, R.V.D., Woo, A. and Yarrow, M. (1995) ‘The NAS Parallel Benchmarks 2.0’, Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, Report NAS-95–020.
-
(1995)
The NAS Parallel Benchmarks 2.0
-
-
Bailey, D.1
Harris, T.2
Saphir, W.3
Wijngaart, R.V.D.4
Woo, A.5
Yarrow, M.6
-
5
-
-
84884662651
-
MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes
-
Baltimore, USA, IEEE/ACM
-
Bosilca, G., Bouteiller, A., Cappello, F., Djilali, S., Fédak, G., Germain, C., Hérault, T., Lemarinier, P., Lodygensky, O., Magniette, F., Néri, V. and Selikhov, A. (2002) ‘MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes’, in SC2002: High Performance Networking and Computing (SC2002), Baltimore, USA, IEEE/ACM.
-
(2002)
SC2002: High Performance Networking and Computing (SC2002)
-
-
Bosilca, G.1
Bouteiller, A.2
Cappello, F.3
Djilali, S.4
Fédak, G.5
Germain, C.6
Hérault, T.7
Lemarinier, P.8
Lodygensky, O.9
Magniette, F.10
Néri, V.11
Selikhov, A.12
-
6
-
-
60449096682
-
MPICH-V2: a fault tolerant MPI for volatile nodes based on pessimistic sender based message logging
-
Phoenix, USA, IEEE/ACM
-
Bouteiller, A., Cappello, F., Hérault, T., Krawezik, G., Lemarinier, P. and Magniette, F. (2003) ‘MPICH-V2: a fault tolerant MPI for volatile nodes based on pessimistic sender based message logging’, High Performance Networking and Computing (SC2003), Phoenix, USA, IEEE/ACM.
-
(2003)
High Performance Networking and Computing (SC2003)
-
-
Bouteiller, A.1
Cappello, F.2
Hérault, T.3
Krawezik, G.4
Lemarinier, P.5
Magniette, F.6
-
7
-
-
0022020346
-
Distributed snapshots: Determining global states of distributed systems
-
February ACM
-
Chandy, K.M. and Lamport, L. (1985) ‘Distributed snapshots: Determining global states of distributed systems’, Transactions on Computer Systems, February, Vol. 3, No. 1, ACM, pp.63–75.
-
(1985)
Transactions on Computer Systems
, vol.3
, Issue.1
, pp. 63-75
-
-
Chandy, K.M.1
Lamport, L.2
-
8
-
-
12244312401
-
Compiler support for automatic checkpointing
-
Canada, IEEE
-
Choi, S.-E. and Deitz, S.J. (2002) ‘Compiler support for automatic checkpointing’, in 16th Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2002), Canada, IEEE, June, pp.213.
-
(2002)
16th Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2002)
, vol.June
, pp. 213
-
-
Choi, S.-E.1
Deitz, S.J.2
-
9
-
-
0026867749
-
Manetho: Transparent rollback-recovery with low overhead, limited rollback and fast output
-
May
-
Elnozahy, Elmootazbellah and Zwaenepoel (1992) ‘Manetho: Transparent rollback-recovery with low overhead, limited rollback and fast output’, IEEE Transactions on Computers, May, Vol. 41, No. 5.
-
(1992)
IEEE Transactions on Computers
, vol.41
, Issue.5
-
-
Elnozahy1
Elmootazbellah2
Zwaenepoel3
-
10
-
-
0004096191
-
-
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, Technical Report CMU-CS-96-181
-
Elnozahy, M., Alvisi, L., Wang, Y.M. and Johnson, D.B. (1996) ‘A survey of rollback-recovery protocols in message passing systems’, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, Technical Report CMU-CS-96-181.
-
(1996)
A survey of rollback-recovery protocols in message passing systems
-
-
Elnozahy, M.1
Alvisi, L.2
Wang, Y.M.3
Johnson, D.B.4
-
11
-
-
0035480335
-
HARNESS and fault tolerant MPI
-
October
-
Fagg, G.E., Bukovsky, A. and Dongarra, J.J. (2001) ‘HARNESS and fault tolerant MPI’, Parallel Computing, October, Vol. 27, No. 11, pp.1479–1495.
-
(2001)
Parallel Computing
, vol.27
, Issue.11
, pp. 1479-1495
-
-
Fagg, G.E.1
Bukovsky, A.2
Dongarra, J.J.3
-
12
-
-
0036374186
-
A network-failure-tolerant message-passing system for terascale clusters
-
New York, USA, ACM
-
Graham, R.L., Choi, S.-E., Daniel, D.J., Desai, N.N., Minnich, R.G. Rasmussen, C.E., Risinger, L.D. and Sukalski, M.W. (2002) ‘A network-failure-tolerant message-passing system for terascale clusters’, in International Conference on Supercomputing(ICS’02), New York, USA, ACM, June, pp.77–83.
-
(2002)
International Conference on Supercomputing(ICS’02)
, vol.June
, pp. 77-83
-
-
Graham, R.L.1
Choi, S.-E.2
Daniel, D.J.3
Desai, N.N.4
Minnich, R.G.5
Rasmussen, C.E.6
Risinger, L.D.7
Sukalski, M.W.8
-
14
-
-
0030243005
-
Highperformance, portable implementation of the MPI message passing interface standard
-
September
-
Gropp, W., Lusk, E., Doss, N. and Skjellum, A. (1996) ‘Highperformance, portable implementation of the MPI message passing interface standard’, Parallel Computing, September, Vol. 22, No. 6, pp.789–828.
-
(1996)
Parallel Computing
, vol.22
, Issue.6
, pp. 789-828
-
-
Gropp, W.1
Lusk, E.2
Doss, N.3
Skjellum, A.4
-
15
-
-
0003912256
-
-
University of Wisconsin- Madison, Technical Report 1346
-
Litzkow, M., Tannenbaum, T., Basney, J. and Livny, M. (1997) ‘Checkpoint and migration of UNIX processes in the condor distributed processing system’, University of Wisconsin- Madison, Technical Report 1346.
-
(1997)
Checkpoint and migration of UNIX processes in the condor distributed processing system
-
-
Litzkow, M.1
Tannenbaum, T.2
Basney, J.3
Livny, M.4
-
16
-
-
85014175705
-
Experimental assessment of workstation failures and their impact on checkpointing systems
-
IEEE CS Press
-
Plank, J.S. and Elwasif, W.R. (1998) ‘Experimental assessment of workstation failures and their impact on checkpointing systems’, in 28th Symposium on Fault-Tolerant Computing (FTCS’98), IEEE CS Press, June, pp.48–57.
-
(1998)
28th Symposium on Fault-Tolerant Computing (FTCS’98)
, vol.June
, pp. 48-57
-
-
Plank, J.S.1
Elwasif, W.R.2
-
18
-
-
0032597696
-
Egida: an extensible toolkit for low-overhead fault-tolerance
-
IEEE CS Press
-
Rao, S., Alvisi, L. and Vin, H.M. (1999) ‘Egida: an extensible toolkit for low-overhead fault-tolerance’, in Proceedings of the 29th Symposium on Fault-Tolerant Computing (FTCS’99), IEEE CS Press, pp.48–55.
-
(1999)
Proceedings of the 29th Symposium on Fault-Tolerant Computing (FTCS’99)
, pp. 48-55
-
-
Rao, S.1
Alvisi, L.2
Vin, H.M.3
-
19
-
-
0032317801
-
The cost of recovery in message logging protocols
-
IEEE CS Press
-
Rao, S., Alvisi, L. and Vin, H.M. (1998) ‘The cost of recovery in message logging protocols’, in Proceedings of the 17th Symposium on Reliable Distributed Systems (SRDS). IEEE CS Press, October, pp.10–18.
-
(1998)
Proceedings of the 17th Symposium on Reliable Distributed Systems (SRDS)
, vol.October
, pp. 10-18
-
-
Rao, S.1
Alvisi, L.2
Vin, H.M.3
-
21
-
-
0022112420
-
Optimistic recovery in distributed systems
-
August ACM
-
Strom, E. and Yemini, S. (1985) ‘Optimistic recovery in distributed systems’, Transactions on Computer Systems, August, Vol. 3. No. 3, ACM, pp.204–226.
-
(1985)
Transactions on Computer Systems
, vol.3
, Issue.3
, pp. 204-226
-
-
Strom, E.1
Yemini, S.2
|