-
1
-
-
80455150839
-
-
sysstat. Available at: http://sebastien.godard.pagesperso-orange.fr/.
-
-
-
-
2
-
-
40849089513
-
Model-based performance evaluation of distributed checkpointing protocols
-
DOI 10.1016/j.peva.2007.09.001, PII S0166531607001009
-
A. Agbaria and R. Friedman. Model-based performance eval-uation of distributed checkpointing protocols. Performance Evaluation, 65(5):345-365, 2008. (Pubitemid 351400683)
-
(2008)
Performance Evaluation
, vol.65
, Issue.5
, pp. 345-365
-
-
Agbaria, A.1
Friedman, R.2
-
4
-
-
74049111423
-
Compiler-enhanced incremental checkpointing for openmp applications
-
G. Bronevetsky, D. J. Marques, K. K. Pingali, R. Rugina, and S. A. McKee. Compiler-enhanced incremental checkpointing for openmp applications. In Proc. of ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), 2008.
-
(2008)
Proc. of ACM Symposium on Principles and Practice of Parallel Programming (PPoPP)
-
-
Bronevetsky, G.1
Marques, D.J.2
Pingali, K.K.3
Rugina, R.4
McKee, S.A.5
-
8
-
-
0042078549
-
A survey of rollback-recovery protocols in message-passing systems
-
E. N. M. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys, 34(3):375-408, 2002.
-
(2002)
ACM Computing Surveys
, vol.34
, Issue.3
, pp. 375-408
-
-
Elnozahy, E.N.M.1
Alvisi, L.2
Wang, Y.-M.3
Johnson, D.B.4
-
10
-
-
76849100508
-
Failure-aware resource management for high-availability computing clusters with distributed virtual ma-chines
-
S. Fu. Failure-aware resource management for high-availability computing clusters with distributed virtual ma-chines. Journal of Parallel and Distributed Computing, 70(4):384-393, 2010.
-
(2010)
Journal of Parallel and Distributed Computing
, vol.70
, Issue.4
, pp. 384-393
-
-
Fu, S.1
-
14
-
-
77956227790
-
Quantifying event correlations for proac-tive failure management in networked computing systems
-
S. Fu and C. Xu. Quantifying event correlations for proac-tive failure management in networked computing systems. Journal of Parallel and Distributed Computing, 70(11):1100-1109, 2010.
-
(2010)
Journal of Parallel and Distributed Computing
, vol.70
, Issue.11
, pp. 1100-1109
-
-
Fu, S.1
Xu, C.2
-
15
-
-
33947184459
-
Analytical models for architecture-based software reliability prediction: A unification framework
-
DOI 10.1109/TR.2006.884587
-
S. S. Gokhale and K. S. Trivedi. Analytical models for architecture-based software reliability prediction: A uni ca-tion framework. IEEE Transactions on Reliability, 55(4):578-590, 2006. (Pubitemid 46405748)
-
(2006)
IEEE Transactions on Reliability
, vol.55
, Issue.4
, pp. 578-590
-
-
Gokhale, S.S.1
Trivedi, K.S.2
-
22
-
-
33845589803
-
BlueGene/L failure analysis and prediction models
-
Y. Liang, Y. Zhang, A. Sivasubramaniam, M. Jette, and R. K. Sahoo. BlueGene/L failure analysis and prediction models. In Proc. of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2006.
-
(2006)
Proc. of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
-
-
Liang, Y.1
Zhang, Y.2
Sivasubramaniam, A.3
Jette, M.4
Sahoo, R.K.5
-
23
-
-
27544497222
-
Filtering failure logs for a BlueGene/L prototype
-
Y. Liang, Y. Zhang, A. Sivasubramaniam, R. Sahoo, J. Moreira, and M. Gupta. Filtering failure logs for a BlueGene/L prototype. In Proc. of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2005.
-
(2005)
Proc. of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
-
-
Liang, Y.1
Zhang, Y.2
Sivasubramaniam, A.3
Sahoo, R.4
Moreira, J.5
Gupta, M.6
-
31
-
-
77950267881
-
A survey of online failure prediction methods
-
F. Salfner, M. Lenk, and M. Malek. A survey of online failure prediction methods. ACM Computing Surveys, 42:10:1-10:42, 2010.
-
(2010)
ACM Computing Surveys
, vol.42
, pp. 101-1042
-
-
Salfner, F.1
Lenk, M.2
Malek, M.3
-
38
-
-
67650672322
-
Beyond availability: Towards a deeper understanding of ma-chine failure characteristics in large distributed systems
-
P. Yalagandula, S. Nath, H. Yu, P. B. Gibbons, and S. Sesha. Beyond availability: Towards a deeper understanding of ma-chine failure characteristics in large distributed systems. In Proc. of USENIX WORLDS, 2004.
-
(2004)
Proc. of USENIX WORLDS
-
-
Yalagandula, P.1
Nath, S.2
Yu, H.3
Gibbons, P.B.4
Sesha, S.5
-
40
-
-
79551557730
-
A hierarchical failure management framework for dependability assurance in compute clusters
-
Z. Zhang and S. Fu. A hierarchical failure management framework for dependability assurance in compute clusters. International Journal of Computational Science, 4(4):313-326, 2010.
-
(2010)
International Journal of Computational Science
, vol.4
, Issue.4
, pp. 313-326
-
-
Zhang, Z.1
Fu, S.2
|