-
1
-
-
33646420251
-
Starfish: Fault-tolerant dynamic MPI programs on clusters of workstations
-
July
-
A. Agbaria and R. Friedman. Starfish: Fault-tolerant dynamic MPI programs on clusters of workstations. Cluster Computing, 6(3):227-236, July 2003.
-
(2003)
Cluster Computing
, vol.6
, Issue.3
, pp. 227-236
-
-
Agbaria, A.1
Friedman, R.2
-
2
-
-
0032597670
-
An analysis of communication induced checkpointing
-
L. Alvisi, E. N. Elnozahy, S. Rao, S. A. Husain, and A. D. Mel. An analysis of communication induced checkpointing. In Symposium on Fault-Tolerant Computing, pages 242-249, 1999.
-
(1999)
Symposium on Fault-Tolerant Computing
, pp. 242-249
-
-
Alvisi, L.1
Elnozahy, E.N.2
Rao, S.3
Husain, S.A.4
Mel, A.D.5
-
3
-
-
0024606852
-
Fault tolerance under unix
-
February
-
A. Borg, W. Blau, W. Graetsch, F. Herrmann, and W. Oberle. Fault tolerance under unix. In ACM Transactions on Computer Systems, pages 1-24, February 1989.
-
(1989)
ACM Transactions on Computer Systems
, pp. 1-24
-
-
Borg, A.1
Blau, W.2
Graetsch, W.3
Herrmann, F.4
Oberle, W.5
-
4
-
-
4344718367
-
Toward a scalable fault tolerant mpi for volatile nodes
-
IEEE
-
G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fedak, C. Germain, T. Herault, P. Lemarinier, O. Lodygensky, F. Magniette, V. Neri, and A. Selikhov. Toward a scalable fault tolerant mpi for volatile nodes. In Proceedings of SC 2002. IEEE, 2002.
-
(2002)
Proceedings of SC 2002
-
-
Bosilca, G.1
Bouteiller, A.2
Cappello, F.3
Djilali, S.4
Fedak, G.5
Germain, C.6
Herault, T.7
Lemarinier, P.8
Lodygensky, O.9
Magniette, F.10
Neri, V.11
Selikhov, A.12
-
5
-
-
79961061539
-
MPICH-V2: A fault tolerant MPI for volatile nodes based on the pessimistic sender based message logging programming via processor virtualization
-
November
-
A. Bouteiller, F. Cappello, T. Hérault, G. Krawezik, P. Lemarinier, and F. Magniette. MPICH-V2: A fault tolerant MPI for volatile nodes based on the pessimistic sender based message logging programming via processor virtualization. In Proceedings of SC'03, November 2003.
-
(2003)
Proceedings of SC'03
-
-
Bouteiller, A.1
Cappello, F.2
Hérault, T.3
Krawezik, G.4
Lemarinier, P.5
Magniette, F.6
-
6
-
-
33746310123
-
Impact of event logger on causal message logging protocols for fault tolerant mpi
-
A. Bouteiller, B. Collin, T. Herault, P. Lemarinier, and F. Cappello. Impact of event logger on causal message logging protocols for fault tolerant mpi. In IPDPS'05, page 97, 2005.
-
(2005)
IPDPS'05
, pp. 97
-
-
Bouteiller, A.1
Collin, B.2
Herault, T.3
Lemarinier, P.4
Cappello, F.5
-
7
-
-
0021538527
-
A distributed domino-effect free recovery algorithm
-
December
-
D. Briatico, A. Ciuffoletti, and L. Simoncini. A distributed domino-effect free recovery algorithm. In IEEE International Symposium on Reliability, Distributed Software, and Databases, pages 207-215, December 1984.
-
(1984)
IEEE International Symposium on Reliability, Distributed Software, and Databases
, pp. 207-215
-
-
Briatico, D.1
Ciuffoletti, A.2
Simoncini, L.3
-
9
-
-
12444281734
-
A fault tolerant protocol for massively parallel machines
-
Santa Fe, NM, April, IEEE Press
-
S. Chakravorty and L. V. Kalé. A fault tolerant protocol for massively parallel machines. In FTPDS Workshop at IPDPS'2004, Santa Fe, NM, April 2004. IEEE Press.
-
(2004)
FTPDS Workshop at IPDPS'2004
-
-
Chakravorty, S.1
Kalé, L.V.2
-
10
-
-
84900298636
-
-
Y. Chen, J. S. Plank, and K. Li. Clip: A checkpointing tool for message-passing parallel programs. In Proc. of the 1997 ACM/IEEE conference on Supercomputing, pages 1-11, 1997.
-
Y. Chen, J. S. Plank, and K. Li. Clip: A checkpointing tool for message-passing parallel programs. In Proc. of the 1997 ACM/IEEE conference on Supercomputing, pages 1-11, 1997.
-
-
-
-
11
-
-
34548773972
-
-
W. E. Cohen, R. K. Gaede, and W. D. Garrett. Interconnection network independent characterization of communication traffic in the nas benchmarks via processor performance monitoring hardware
-
W. E. Cohen, R. K. Gaede, and W. D. Garrett. Interconnection network independent characterization of communication traffic in the nas benchmarks via processor performance monitoring hardware.
-
-
-
-
12
-
-
0004096191
-
A survey of rollback-recovery protocols in message passing systems
-
Technical Report CMU-CS-96-181, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, Oct
-
E. N. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message passing systems. Technical Report CMU-CS-96-181, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, Oct. 1996.
-
(1996)
-
-
Elnozahy, E.N.1
Alvisi, L.2
Wang, Y.M.3
Johnson, D.B.4
-
13
-
-
0026867749
-
Manetho: Transparent rollback-recovery with low overhead, limited rollback, and fast output commit
-
E. N. Elnozahy and W. Zwaenepoel. Manetho: Transparent rollback-recovery with low overhead, limited rollback, and fast output commit. IEEE Transactions on Computers, 41(5):526-531, 1992.
-
(1992)
IEEE Transactions on Computers
, vol.41
, Issue.5
, pp. 526-531
-
-
Elnozahy, E.N.1
Zwaenepoel, W.2
-
14
-
-
34548782726
-
Scalable cosmology simulations on parallel machines
-
July
-
F. Gioachin, A. Sharma, S. Chackravorty, C. Mendes, L. V. Kale, and T. R. Quinn. Scalable cosmology simulations on parallel machines. In 7th International Meeting on High Performance Computing for Computational Science (VEC-PAR), July 2006.
-
(2006)
7th International Meeting on High Performance Computing for Computational Science (VEC-PAR)
-
-
Gioachin, F.1
Sharma, A.2
Chackravorty, S.3
Mendes, C.4
Kale, L.V.5
Quinn, T.R.6
-
15
-
-
33646395818
-
-
Master's thesis, Dep. of Computer Science, University of Illinois, Urbana, IL
-
C. Huang. System support for checkpoint and restart of Charm++ and AMPI applications. Master's thesis, Dep. of Computer Science, University of Illinois, Urbana, IL, 2004.
-
(2004)
System support for checkpoint and restart of Charm++ and AMPI applications
-
-
Huang, C.1
-
16
-
-
12444260048
-
Adaptive MPI
-
College Station, TX, October
-
C. Huang, O. Lawlor, and L. V. Kalé. Adaptive MPI. In Proceedings of LCPC 03, College Station, TX, October 2003.
-
(2003)
Proceedings of LCPC 03
-
-
Huang, C.1
Lawlor, O.2
Kalé, L.V.3
-
19
-
-
0002479236
-
Charm++: Parallel programming with message-driven objects
-
G. V. Wilson and P. Lu, editors, MIT Press
-
L. V. Kalé and S. Krishnan. Charm++: Parallel programming with message-driven objects. In G. V. Wilson and P. Lu, editors, Parallel Programming using C++, pages 175-213. MIT Press, 1996.
-
(1996)
Parallel Programming using C
, pp. 175-213
-
-
Kalé, L.V.1
Krishnan, S.2
-
20
-
-
35048847069
-
A lightweight message logging scheme for fault tolerant mpi
-
I. Lee, H. Y. Yeom, T. Park, and H.-W. Park. A lightweight message logging scheme for fault tolerant mpi. In PPAM, pages 397-404, 2003.
-
(2003)
PPAM
, pp. 397-404
-
-
Lee, I.1
Yeom, H.Y.2
Park, T.3
Park, H.-W.4
-
21
-
-
85114705648
-
NAMD: Biomolecular simulation on thousands of processors
-
Baltimore, MD, September
-
J. C. Phillips, G. Zheng, S. Kumar, and L. V. Kalé. NAMD: Biomolecular simulation on thousands of processors. In Proceedings of SC 2002, Baltimore, MD, September 2002.
-
(2002)
Proceedings of SC 2002
-
-
Phillips, J.C.1
Zheng, G.2
Kumar, S.3
Kalé, L.V.4
-
22
-
-
84976815497
-
Fail-stop processors: An approach to designing fault-tolerant computing systems
-
R. D. Schlichting and F. B. Schneider. Fail-stop processors: An approach to designing fault-tolerant computing systems. ACM Transactions on Computer Systems, 1(3):222-238, 1983.
-
(1983)
ACM Transactions on Computer Systems
, vol.1
, Issue.3
, pp. 222-238
-
-
Schlichting, R.D.1
Schneider, F.B.2
|