-
1
-
-
0032597670
-
An analysis of communication induced checkpointing
-
FTCS-29
-
L. Alvisi, E. N. Elnozahy, S. Rao, S. A. Husain, and A. D. Mel: An Analysis of Communication Induced Checkpointing. FTCS-29, The 29th International Symposium on Fault-Tolerant Computing, pp. 242-249
-
The 29th International Symposium on Fault-Tolerant Computing
, pp. 242-249
-
-
Alvisi, L.1
Elnozahy, E.N.2
Rao, S.3
Husain, S.A.4
Mel, A.D.5
-
4
-
-
84944901411
-
Coordinated check-point versus message log for fault-tolerant MPI
-
December
-
A. Bouteiller, P. Lemarinier, G. Krawezik, and F. Cappello: Coordinated check-point versus message log for fault-tolerant MPI. In proceedings of Cluster 2003, pp. 242-250, December 2003.
-
(2003)
In Proceedings of Cluster 2003
, pp. 242-250
-
-
Bouteiller, A.1
Lemarinier, P.2
Krawezik, G.3
Cappello, F.4
-
5
-
-
0022020346
-
Distributed snapshots: Determining global states of distributed systems
-
Aug.
-
K.M. Chandy and L. Lamport: Distributed snapshots: Determining global states of distributed systems. ACM Trans. on Computing Systems, vol. 3, no. 1, pp. 63-75, Aug. 1985.
-
(1985)
ACM Trans. on Computing Systems
, vol.3
, Issue.1
, pp. 63-75
-
-
Chandy, K.M.1
Lamport, L.2
-
6
-
-
0042078549
-
A survey of rollback-recovery protocols in message-passing systems
-
E. N. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson: A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys, vol. 34, no. 3, pp. 375-408, 2002.
-
(2002)
ACM Computing Surveys
, vol.34
, Issue.3
, pp. 375-408
-
-
Elnozahy, E.N.1
Alvisi, L.2
Wang, Y.-M.3
Johnson, D.B.4
-
7
-
-
0030243005
-
A high-performance, portable implementation of the MPI Message Passing Interface standard
-
W. Gropp, E. Lusk, N. Doss, and A. Skjellum: A high-performance, portable implementation of the MPI Message Passing Interface Standard. Parallel Computing, vol. 22, no. 6, pp. 789-828, 1996.
-
(1996)
Parallel Computing
, vol.22
, Issue.6
, pp. 789-828
-
-
Gropp, W.1
Lusk, E.2
Doss, N.3
Skjellum, A.4
-
8
-
-
0742293840
-
MPICH-G2: A grid-enabled implementation of the message passing interface
-
May
-
N. T. Karnois, B. Toonen, and I. Foster: MPICH-G2: A grid-enabled implementation of the message passing interface. Journal of Parallel and Distributed Computing, vol. 63, no. 5, pp. 551-563, May 2003.
-
(2003)
Journal of Parallel and Distributed Computing
, vol.63
, Issue.5
, pp. 551-563
-
-
Karnois, N.T.1
Toonen, B.2
Foster, I.3
-
9
-
-
0003610530
-
-
NASA Ames Research Center: Nas parallel benchmarks. Technical report, http://science.nas.nasa.gov/Software/NPB/, 1997.
-
(1997)
Technical Report
-
-
-
10
-
-
84888898496
-
RENEW: A tool for fast and efficient implementation of checkpoint protocols
-
N. Neves and W. K. Fuchs: RENEW: A tool for fast and efficient implementation of checkpoint protocols. Symp. on Fault-Tolerant Computing, pp. 58-67, 1998.
-
(1998)
Symp. on Fault-Tolerant Computing
, pp. 58-67
-
-
Neves, N.1
Fuchs, W.K.2
-
12
-
-
23044532594
-
Application recovery in parallel programming environment
-
G. T. Nguyen, V. D. Tran, and M. Kotocová: Application recovery in parallel programming environment. European PVM/MPI, pp. 234-242, 2002.
-
(2002)
European PVM/MPI
, pp. 234-242
-
-
Nguyen, G.T.1
Tran, V.D.2
Kotocová, M.3
-
14
-
-
0032179680
-
Diskless checkpointing
-
J. S. Plank, K. Li, and M. A. Puening: Diskless checkpointing. IEEE Trans. on Parallel and Distributed Systems, vol. 9, no. 10, pp. 972-986, 1998.
-
(1998)
IEEE Trans. on Parallel and Distributed Systems
, vol.9
, Issue.10
, pp. 972-986
-
-
Plank, J.S.1
Li, K.2
Puening, M.A.3
-
15
-
-
0033721199
-
The cost of recovery in message logging protocols
-
Mar./Apr.
-
S. Rao, L. Alvisi, and H. M. Vin: The cost of recovery in message logging protocols. IEEE Transaction on Knowledge and Data Engineering, vol. 12, no. 2, pp. 160-173, Mar./Apr. 2000.
-
(2000)
IEEE Transaction on Knowledge and Data Engineering
, vol.12
, Issue.2
, pp. 160-173
-
-
Rao, S.1
Alvisi, L.2
Vin, H.M.3
-
17
-
-
0032202258
-
The Hector distributed run-time environment
-
Nov.
-
S. H. Russ, J. Robinson, B. K. Flachs, and B. Heckel: The Hector distributed run-time environment. IEEE Trans. on Parallel and Distributed Systems, vol. 9, no. 11, pp. 1102-1114, Nov. 1998.
-
(1998)
IEEE Trans. on Parallel and Distributed Systems
, vol.9
, Issue.11
, pp. 1102-1114
-
-
Russ, S.H.1
Robinson, J.2
Flachs, B.K.3
Heckel, B.4
-
18
-
-
0029713612
-
CoCheck: Checkpointing and process migration for MPI
-
Apr.
-
G. Stellner: CoCheck: Checkpointing and process migration for MPI. Proc. the Int'l Parallel Processing Symp., pp. 526-531, Apr. 1996.
-
(1996)
Proc. the Int'l Parallel Processing Symp.
, pp. 526-531
-
-
Stellner, G.1
-
19
-
-
24944544064
-
-
ckpt library
-
V. Zandy: ckpt library, http://www.cs.wisc.edu/zandy/ckpt/.
-
-
-
Zandy, V.1
-
20
-
-
0026867749
-
Manetho: Transparent rollback-recovery with low overhead, limited rollback, and fast output commit
-
May
-
W. Zwaenepoel and E. N. Elnozahy: Manetho: Transparent rollback-recovery with low overhead, limited rollback, and fast output commit. IEEE Transactions on Computers, Vol. C-41, No. 5, pp. 526-531, May 1992.
-
(1992)
IEEE Transactions on Computers
, vol.C-41
, Issue.5
, pp. 526-531
-
-
Zwaenepoel, W.1
Elnozahy, E.N.2
|