-
1
-
-
36148941068
-
-
Schroeder, B., Gibson, G.A.: Understanding failures in petascale computers. Journal of Physics: Conference Series 78, 012022, 11 (2007)
-
Schroeder, B., Gibson, G.A.: Understanding failures in petascale computers. Journal of Physics: Conference Series 78, 012022, 11 (2007)
-
-
-
-
2
-
-
46049100206
-
Increasing the cluster availability using RADIC
-
Duarte, A., Rexachs, D., Luque, E.: Increasing the cluster availability using RADIC. In: IEEE International Conference on Cluster Computing, 2006, pp. 1-8 (2006)
-
(2006)
IEEE International Conference on Cluster Computing
, pp. 1-8
-
-
Duarte, A.1
Rexachs, D.2
Luque, E.3
-
3
-
-
0042078549
-
A survey of rollback-recovery protocols in message-passing systems
-
Elnozahy, E.N.M., Alvisi, L., Wang, Y.M., Johnson, D.B.: A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys 34(3), 375-408 (2002)
-
(2002)
ACM Computing Surveys
, vol.34
, Issue.3
, pp. 375-408
-
-
Elnozahy, E.N.M.1
Alvisi, L.2
Wang, Y.M.3
Johnson, D.B.4
-
4
-
-
51849162159
-
-
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge (1999); LCCN: QA76.642 G76 1999
-
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge (1999); LCCN: QA76.642 G76 1999
-
-
-
-
5
-
-
49049087407
-
Reliable, Atomic and Causal Broadcast
-
P T R Prentice Hall, USA
-
Jalote, P.: Reliable, Atomic and Causal Broadcast. In: Fault Tolerance in Distributed Systems, vol. 1, p. 142. P T R Prentice Hall, USA (1994)
-
(1994)
Fault Tolerance in Distributed Systems
, vol.1
, pp. 142
-
-
Jalote, P.1
-
6
-
-
33750255136
-
-
Duarte, A., Rexachs, D., Luque, E.: An intelligent management of fault tolerance in cluster using radicmpi. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, 4192, pp. 150-157. Springer, Heidelberg (2006)
-
Duarte, A., Rexachs, D., Luque, E.: An intelligent management of fault tolerance in cluster using radicmpi. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 150-157. Springer, Heidelberg (2006)
-
-
-
-
7
-
-
33746779994
-
Mpich-v project: A multiprotocol automatic fault-tolerant mpi
-
Bouteiller, A., Herault, T., Krawezik, G., Lemarinier, P., Cappello, F.: Mpich-v project: A multiprotocol automatic fault-tolerant mpi. International Journal of High Performance Computing Applications 20(3), 319 (2006)
-
(2006)
International Journal of High Performance Computing Applications
, vol.20
, Issue.3
, pp. 319
-
-
Bouteiller, A.1
Herault, T.2
Krawezik, G.3
Lemarinier, P.4
Cappello, F.5
-
8
-
-
33751082401
-
-
Li, Y., Lan, Z.: Exploit failure prediction for adaptive fault-tolerance in cluster computing. In: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2006), May 16-19, 2006, 1, pp. 531-538 (2006)
-
Li, Y., Lan, Z.: Exploit failure prediction for adaptive fault-tolerance in cluster computing. In: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2006), May 16-19, 2006, vol. 1, pp. 531-538 (2006)
-
-
-
-
9
-
-
0842288851
-
Evaluation of checkpointing mechanism on score cluster system
-
Kondo, M., Hayashida, T., Imai, M., Nakamura, H., Nanya, T., Hori, A.: Evaluation of checkpointing mechanism on score cluster system. IEICE Transactions on Information and Systems 86(12), 2553-2562 (2003)
-
(2003)
IEICE Transactions on Information and Systems
, vol.86
, Issue.12
, pp. 2553-2562
-
-
Kondo, M.1
Hayashida, T.2
Imai, M.3
Nakamura, H.4
Nanya, T.5
Hori, A.6
|