-
1
-
-
9144223280
-
Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery
-
Member Member
-
Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery. IEEE Trans. Dependable Secur. Comput., 1(2):97-108, 2004. Member-Elmootazbellah N. Elnozahy and Member-James S. Plank.
-
(2004)
IEEE Trans. Dependable Secur. Comput.
, vol.1
, Issue.2
, pp. 97-108
-
-
Elnozahy, E.N.1
Plank, J.S.2
-
3
-
-
36049041275
-
Understanding disk failure rates: What does an mttf of 1,000,000 hours mean to you?
-
Schroeder Bianca and Gibson Garth A. Understanding disk failure rates: What does an mttf of 1,000,000 hours mean to you? Trans. Storage, 3(3):8, 2007.
-
(2007)
Trans. Storage
, vol.3
, Issue.3
, pp. 8
-
-
Bianca, S.1
Gibson Garth, A.2
-
4
-
-
0036821893
-
The möbius framework and its implementation
-
Daniel D. Deavours, Graham Clark, Tod Courtney, David Daly, Salem Derisavi, Jay M. Doyle, William H. Sanders, and Patrick G. Webster. The möbius framework and its implementation. IEEE Trans. Softw. Eng., 28(10):956-969, 2002.
-
(2002)
IEEE Trans. Softw. Eng.
, vol.28
, Issue.10
, pp. 956-969
-
-
Deavours, D.D.1
Clark, G.2
Courtney, T.3
Daly, D.4
Derisavi, S.5
Doyle, J.M.6
Sanders, W.H.7
Webster, P.G.8
-
5
-
-
4544372804
-
Error sensitivity of the linux kernel executing on powerpc g4 and pentium 4 processors
-
Washington, DC, USA, IEEE Computer Society
-
Weining Gu, Zbigniew Kalbarczyk, and Ravishankar K. Iyer. Error sensitivity of the linux kernel executing on powerpc g4 and pentium 4 processors. In DSN '04: Proceedings of the 2004 International Conference on Dependable Systems and Networks, page 887,Washington, DC, USA, 2004. IEEE Computer Society.
-
(2004)
DSN '04: Proceedings of the 2004 International Conference on Dependable Systems and Networks
, pp. 887
-
-
Gu, W.1
Kalbarczyk, Z.2
Iyer, R.K.3
-
6
-
-
1542359963
-
Characterization of linux kernel behavior under errors
-
Washington, DC, USA, IEEE Computer Society
-
Weining Gu, Zbigniew Kalbarczyk, Ravishankar K. Iyer, and Zhenyu Yang. Characterization of linux kernel behavior under errors. In DSN '04: Proceedings of the 2004 International Conference on Dependable Systems and Networks, page 459, Washington, DC, USA, 2004. IEEE Computer Society.
-
(2004)
DSN '04: Proceedings of the 2004 International Conference on Dependable Systems and Networks
, pp. 459
-
-
Gu, W.1
Kalbarczyk, Z.2
Iyer, R.K.3
Yang, Z.4
-
7
-
-
0031388399
-
Impact of checkpoint latency on overhead ratio of a checkpointing scheme
-
Vaidya Nitin H. Impact of checkpoint latency on overhead ratio of a checkpointing scheme. IEEE Trans. Comput., 46(8):942-947, 1997.
-
(1997)
IEEE Trans. Comput.
, vol.46
, Issue.8
, pp. 942-947
-
-
Vaidya Nitin, H.1
-
8
-
-
79952028931
-
A model for predicting the optimum checkpoint interval for restart dumps
-
Daly John. A model for predicting the optimum checkpoint interval for restart dumps. Computational Science - ICCS 2003, (8):724, 2003.
-
(2003)
Computational Science - ICCS 2003
, Issue.8
, pp. 724
-
-
John, D.1
-
10
-
-
0031341097
-
Performance analysis of two time-based coordinated checkpointing protocols
-
Washington, DC, USA, IEEE Computer Society
-
G. Kavanaugh and W. Sanders. Performance analysis of two time-based coordinated checkpointing protocols. In PRFTS '97: Proceedings of the 1997 Pacific Rim International Symposium on Fault-Tolerant Systems, page 194,Washington, DC, USA, 1997. IEEE Computer Society.
-
(1997)
PRFTS '97: Proceedings of the 1997 Pacific Rim International Symposium on Fault-Tolerant Systems
, pp. 194
-
-
Kavanaugh, G.1
Sanders, W.2
-
11
-
-
33845589803
-
Bluegene/l failure analysis and prediction models
-
Washington, DC, USA, IEEE Computer Society
-
Yinglung Liang, Yanyong Zhang, Anand Sivasubramaniam, Morris Jette, and Ramendra Sahoo. Bluegene/l failure analysis and prediction models. In DSN '06: Proceedings of the International Conference on Dependable Systems and Networks, pages 425-434, Washington, DC, USA, 2006. IEEE Computer Society.
-
(2006)
DSN '06: Proceedings of the International Conference on Dependable Systems and Networks
, pp. 425-434
-
-
Liang, Y.1
Zhang, Y.2
Sivasubramaniam, A.3
Jette, M.4
Sahoo, R.5
-
12
-
-
79952025442
-
A framework for the assessment of the dependability of supercomputers via log analysis
-
IEEE Computer Society
-
Catello Di Martino, Domenico Cotroneo, Zbigniew Kalbarczyk, and Ravishankar K. Iyer. A framework for the assessment of the dependability of supercomputers via log analysis. In DSN '08: Supplemental volume of Proceedings of the 2008 International Conference on Dependable Systems and Networks, Washington, DC, USA, 2008. IEEE Computer Society.
-
DSN '08: Supplemental Volume of Proceedings of the 2008 International Conference on Dependable Systems and Networks, Washington, DC, USA, 2008
-
-
Di Martino, C.1
Cotroneo, D.2
Kalbarczyk, Z.3
Iyer, R.K.4
-
13
-
-
33746286070
-
Performance implications of periodic checkpointing on large-scale cluster systems
-
Washington, DC, USA, IEEE Computer Society
-
A. J. Oliner, R. K. Sahoo, J. E. Moreira, and M. Gupta. Performance implications of periodic checkpointing on large-scale cluster systems. In IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18, page 299.2, Washington, DC, USA, 2005. IEEE Computer Society.
-
(2005)
IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18
-
-
Oliner, A.J.1
Sahoo, R.K.2
Moreira, J.E.3
Gupta, M.4
-
16
-
-
84976846528
-
A first order approximation to the optimum checkpoint interval
-
Young John W. A first order approximation to the optimum checkpoint interval. Commun. ACM, 17(9):530-531, 1974.
-
(1974)
Commun. ACM
, vol.17
, Issue.9
, pp. 530-531
-
-
Young, J.W.1
-
17
-
-
27544513113
-
Modeling coordinated checkpointing for large-scale supercomputers
-
Washington, DC, USA, IEEE Computer Society
-
Long Wang, Karthik Pattabiraman, Zbigniew Kalbarczyk, Ravishankar K. Iyer, Lawrence Votta, and Alan Wood. Modeling coordinated checkpointing for large-scale supercomputers. In DSN '05: Proceedings of the 2005 International Conference on Dependable Systems and Networks, pages 812-821, Washington, DC, USA, 2005. IEEE Computer Society.
-
(2005)
DSN '05: Proceedings of the 2005 International Conference on Dependable Systems and Networks
, pp. 812-821
-
-
Wang, L.1
Pattabiraman, K.2
Kalbarczyk, Z.3
Iyer, R.K.4
Votta, L.5
Wood, A.6
|