-
1
-
-
80053011239
-
-
Available at
-
sysstat. Available at: http://sebastien.godard.pagesperso-orange.fr/.
-
Sysstat
-
-
-
2
-
-
40849089513
-
Model-based performance evaluation of distributed checkpointing protocols
-
DOI 10.1016/j.peva.2007.09.001, PII S0166531607001009
-
A. Agbaria and R. Friedman. Model-based performance evaluation of distributed checkpointing protocols. Performance Evaluation, 65(5):345-365, 2008. (Pubitemid 351400683)
-
(2008)
Performance Evaluation
, vol.65
, Issue.5
, pp. 345-365
-
-
Agbaria, A.1
Friedman, R.2
-
3
-
-
74049111423
-
Compiler-enhanced incremental checkpointing for openmp applications
-
G. Bronevetsky, D. J. Marques, K. K. Pingali, R. Rugina, and S. A. McKee. Compiler-enhanced incremental checkpointing for openmp applications. In Proceedings of ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), 2008.
-
(2008)
Proceedings of ACM Symposium on Principles and Practice of Parallel Programming (PPoPP)
-
-
Bronevetsky, G.1
Marques, D.J.2
Pingali, K.K.3
Rugina, R.4
McKee, S.A.5
-
8
-
-
0042078549
-
A survey of rollback-recovery protocols in message-passing systems
-
E. N. M. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys, 34(3):375-408, 2002.
-
(2002)
ACM Computing Surveys
, vol.34
, Issue.3
, pp. 375-408
-
-
Elnozahy, E.N.M.1
Alvisi, L.2
Wang, Y.-M.3
Johnson, D.B.4
-
10
-
-
76849100508
-
Failure-aware resource management for high-availability computing clusters with distributed virtual machines
-
S. Fu. Failure-aware resource management for high-availability computing clusters with distributed virtual machines. Journal of Parallel and Distributed Computing, 70(4):384-393, 2010.
-
(2010)
Journal of Parallel and Distributed Computing
, vol.70
, Issue.4
, pp. 384-393
-
-
Fu, S.1
-
14
-
-
77956227790
-
Quantifying event correlations for proactive failure management in networked computing systems
-
S. Fu and C. Xu. Quantifying event correlations for proactive failure management in networked computing systems. Journal of Parallel and Distributed Computing, 70(11):1100-1109, 2010.
-
(2010)
Journal of Parallel and Distributed Computing
, vol.70
, Issue.11
, pp. 1100-1109
-
-
Fu, S.1
Xu, C.2
-
15
-
-
33947184459
-
Analytical models for architecture-based software reliability prediction: A unification framework
-
DOI 10.1109/TR.2006.884587
-
S. S. Gokhale and K. S. Trivedi. Analytical models for architecture-based software reliability prediction: A unification framework. IEEE Transactions on Reliability, 55(4):578-590, 2006. (Pubitemid 46405748)
-
(2006)
IEEE Transactions on Reliability
, vol.55
, Issue.4
, pp. 578-590
-
-
Gokhale, S.S.1
Trivedi, K.S.2
-
16
-
-
55849147399
-
Dynamic meta-learning for failure prediction in large-scale systems: A case study
-
J. Gu, Z. Zheng, Z. Lan, J. White, E. Hocks, and B.-H. Park. Dynamic meta-learning for failure prediction in large-scale systems: A case study. In Proceedings of IEEE International Conference on Parallel Processing (ICPP), 2008.
-
(2008)
Proceedings of IEEE International Conference on Parallel Processing (ICPP)
-
-
Gu, J.1
Zheng, Z.2
Lan, Z.3
White, J.4
Hocks, E.5
Park, B.-H.6
-
21
-
-
7544223741
-
A survey of outlier detection methodologies
-
V. Hodge and J. Austin. A survey of outlier detection methodologies. Artificial Intelligence Review, 22:85-126, 2004.
-
(2004)
Artificial Intelligence Review
, vol.22
, pp. 85-126
-
-
Hodge, V.1
Austin, J.2
-
25
-
-
33845589803
-
BlueGene/L failure analysis and prediction models
-
Y. Liang, Y. Zhang, A. Sivasubramaniam, M. Jette, and R. K. Sahoo. BlueGene/L failure analysis and prediction models. In Proceedings of International Conference on Dependable Systems and Networks (DSN), 2006.
-
(2006)
Proceedings of International Conference on Dependable Systems and Networks (DSN)
-
-
Liang, Y.1
Zhang, Y.2
Sivasubramaniam, A.3
Jette, M.4
Sahoo, R.K.5
-
26
-
-
27544497222
-
Filtering failure logs for a BlueGene/L prototype
-
Y. Liang, Y. Zhang, A. Sivasubramaniam, R. Sahoo, J. Moreira, and M. Gupta. Filtering failure logs for a BlueGene/L prototype. In Proceedings of Conference on Dependable Systems and Networks (DSN), 2005.
-
(2005)
Proceedings of Conference on Dependable Systems and Networks (DSN)
-
-
Liang, Y.1
Zhang, Y.2
Sivasubramaniam, A.3
Sahoo, R.4
Moreira, J.5
Gupta, M.6
-
30
-
-
78649317228
-
Coordinated session-based admission control with statistical learning for multi-tier internet applications
-
S. Muppala and X. Zhou. Coordinated session-based admission control with statistical learning for multi-tier internet applications. Journal of Network and Computer Applications, Elsevier, 34(1):20-29, 2011.
-
(2011)
Journal of Network and Computer Applications, Elsevier
, vol.34
, Issue.1
, pp. 20-29
-
-
Muppala, S.1
Zhou, X.2
-
41
-
-
67650672322
-
Beyond availability: Towards a deeper understanding of machine failure characteristics in large distributed systems
-
P. Yalagandula, S. Nath, H. Yu, P. B. Gibbons, and S. Sesha. Beyond availability: Towards a deeper understanding of machine failure characteristics in large distributed systems. In Proceedings of USENIX WORLDS, 2004.
-
(2004)
Proceedings of USENIX WORLDS
-
-
Yalagandula, P.1
Nath, S.2
Yu, H.3
Gibbons, P.B.4
Sesha, S.5
-
44
-
-
79551557730
-
A hierarchical failure management framework for dependability assurance in compute clusters
-
Z. Zhang and S. Fu. A hierarchical failure management framework for dependability assurance in compute clusters. International Journal of Computational Science, 4(4):313-326, 2010.
-
(2010)
International Journal of Computational Science
, vol.4
, Issue.4
, pp. 313-326
-
-
Zhang, Z.1
Fu, S.2
|