-
1
-
-
20444444457
-
The LAM/MPI checkpoint/restart framework: System-initiated checkpointing
-
Sante Fe, New Mexico, USA, October
-
S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, J. Duell, P. Hargrove, and E. Roman, "The LAM/MPI checkpoint/restart framework: System-initiated checkpointing," in Proceedings, LACSI Symposium, Sante Fe, New Mexico, USA, October 2003.
-
(2003)
Proceedings, LACSI Symposium
-
-
Sankaran, S.1
Squyres, J.M.2
Barrett, B.3
Lumsdaine, A.4
Duell, J.5
Hargrove, P.6
Roman, E.7
-
2
-
-
60449096682
-
MPICH-V2: A fault tolerant MPI for volatile nodes based on pessimistic sender based message logging
-
Phoenix USA. IEEE/ACM, November
-
A. Bouteiller, F. Cappello, T. Hérault, G. Krawezik, P. Lemarinier, and F. Magniette, "MPICH-V2: a fault tolerant MPI for volatile nodes based on pessimistic sender based message logging," in High Performance Networking and Computing (SC2003), Phoenix USA. IEEE/ACM, November 2003.
-
(2003)
High Performance Networking and Computing (SC2003)
-
-
Bouteiller, A.1
Cappello, F.2
Hérault, T.3
Krawezik, G.4
Lemarinier, P.5
Magniette, F.6
-
3
-
-
84940567900
-
FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world
-
Balatonfred, Hungary: Springer-Verlag Heidelberg, September
-
G. Fagg and J. Dongarra, "FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world," in 7th Euro PVM/MPI User's Group Meeting2000, vol. 1908 / 2000. Balatonfred, Hungary: Springer-Verlag Heidelberg, September 2000.
-
(2000)
7th Euro PVM/MPI User's Group Meeting2000
, vol.1908-2000
-
-
Fagg, G.1
Dongarra, J.2
-
4
-
-
84944901411
-
Coordinated checkpoint versus message log for fault tolerant MPI
-
IEEE CS Press, December
-
A. Bouteiller, P. Lemarinier, G. Krawezik, and F. Cappello, "Coordinated checkpoint versus message log for fault tolerant MPI," in IEEE International Conference on Cluster Computing (Cluster 2003). IEEE CS Press, December 2003, pp. 242-250.
-
(2003)
IEEE International Conference on Cluster Computing (Cluster 2003)
, pp. 242-250
-
-
Bouteiller, A.1
Lemarinier, P.2
Krawezik, G.3
Cappello, F.4
-
5
-
-
20444435911
-
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
-
IEEE CS Press, September
-
P. Lemarinier, A. Bouteiller, T. Herault, G. Krawezik, and F. Cappello, "Improved message logging versus improved coordinated checkpointing for fault tolerant MPI," in IEEE International Conference on Cluster Computing (Cluster 2004). IEEE CS Press, September 2004.
-
(2004)
IEEE International Conference on Cluster Computing (Cluster 2004)
-
-
Lemarinier, P.1
Bouteiller, A.2
Herault, T.3
Krawezik, G.4
Cappello, F.5
-
6
-
-
0026867749
-
Manetho: Transparent rollback-recovery with low overhead, limited rollback and fast output
-
May
-
Elnozahy, Elmootazbellah, and Zwaenepoel, "Manetho: Transparent rollback-recovery with low overhead, limited rollback and fast output," IEEE Transactions on Computers, vol. 41, no. 5, May 1992.
-
(1992)
IEEE Transactions on Computers
, vol.41
, Issue.5
-
-
Elnozahy, E.1
Zwaenepoel2
-
7
-
-
0032311702
-
An efficient algorithm for causal message logging
-
IEEE CS Press, October
-
B. Lee, T. Park, H. Y. Yeom, and Y. Cho, "An efficient algorithm for causal message logging," in 17th Symposium on Reliable Distributed Systems (SRDS 1998). IEEE CS Press, October 1998, pp. 19-25.
-
(1998)
17th Symposium on Reliable Distributed Systems (SRDS 1998)
, pp. 19-25
-
-
Lee, B.1
Park, T.2
Yeom, H.Y.3
Cho, Y.4
-
9
-
-
0042078549
-
A survey of rollback-recovery protocols in message-passing systems
-
September
-
M. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson, "A survey of rollback-recovery protocols in message-passing systems,". ACM Computing Surveys (CSUR), vol. 34, no. 3, pp. 375 - 408, September 2002.
-
(2002)
ACM Computing Surveys (CSUR)
, vol.34
, Issue.3
, pp. 375-408
-
-
Elnozahy, M.1
Alvisi, L.2
Wang, Y.M.3
Johnson, D.B.4
-
10
-
-
0029237761
-
Message logging: Pessimistic, optimistic, and causal
-
IEEE CS Press, May-June
-
L. Alvisi and K. Marzullo, "Message logging: Pessimistic, optimistic, and causal," in Proceedings of the 15th International Conference on Distributed Computing Systems (ICDCS 1995). IEEE CS Press, May-June 1995, pp. 229-236.
-
(1995)
Proceedings of the 15th International Conference on Distributed Computing Systems (ICDCS 1995)
, pp. 229-236
-
-
Alvisi, L.1
Marzullo, K.2
-
12
-
-
0032313590
-
The relative overhead of piggybacking in causal message logging protocols
-
IEEE CS Press
-
K. Bhatia, K. Marzullo, and L. Alvisi, "The relative overhead of piggybacking in causal message logging protocols," in 17th Symposium on Reliable Distributed Systems (SRDS'98). IEEE CS Press, 1998, pp. 348-353.
-
(1998)
17th Symposium on Reliable Distributed Systems (SRDS'98)
, pp. 348-353
-
-
Bhatia, K.1
Marzullo, K.2
Alvisi, L.3
-
13
-
-
84884662651
-
MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes
-
Baltimore USA: IEEE/ACM, November
-
G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fédak, C. Germain, T. Hérault, P. Lemarinier, O. Lodygensky, F. Magniette, V. Néri, and A. Selikhov, "MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes," in High Performance Networking and Computing (SC2002). Baltimore USA: IEEE/ACM, November 2002.
-
(2002)
High Performance Networking and Computing (SC2002)
-
-
Bosilca, G.1
Bouteiller, A.2
Cappello, F.3
Djilali, S.4
Fédak, G.5
Germain, C.6
Hérault, T.7
Lemarinier, P.8
Lodygensky, O.9
Magniette, F.10
Néri, V.11
Selikhov, A.12
-
14
-
-
0032317801
-
The cost of recovery in message logging protocols
-
IEEE CS Press, October
-
S. Rao, L. Alvisi, and H. M. Vin, "The cost of recovery in message logging protocols," in 17th Symposium on Reliable Distributed Systems (SRDS). IEEE CS Press, October 1998, pp. 10-18.
-
(1998)
17th Symposium on Reliable Distributed Systems (SRDS)
, pp. 10-18
-
-
Rao, S.1
Alvisi, L.2
Vin, H.M.3
-
15
-
-
0030243005
-
High-performance, portable implementation of the MPI message passing interface standard
-
September
-
W. Gropp, E. Lusk, N. Doss, and A. Skjellum, "High-performance, portable implementation of the MPI message passing interface standard," Parallel Computing, vol. 22, no. 6, pp. 789-828, September 1996.
-
(1996)
Parallel Computing
, vol.22
, Issue.6
, pp. 789-828
-
-
Gropp, W.1
Lusk, E.2
Doss, N.3
Skjellum, A.4
-
17
-
-
0003605996
-
The NAS parallel benchmarks 2.0
-
NASA Ames Research Center, Report NAS-95-020
-
D. Bailey, T. Harris, W. Saphir, R. V. D. Wijngaart, A. Woo, and M. Yarrow, "The NAS Parallel Benchmarks 2.0," Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, Report NAS-95-020, 1995.
-
(1995)
Numerical Aerodynamic Simulation Facility
-
-
Bailey, D.1
Harris, T.2
Saphir, W.3
Wijngaart, R.V.D.4
Woo, A.5
Yarrow, M.6
|