메뉴 건너뛰기




Volumn , Issue , 2007, Pages 452-457

A reliability-aware approach for an optimal checkpoint/restart model in HPC environments

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER SYSTEMS; FAULT TOLERANCE; FAULT TOLERANT COMPUTER SYSTEMS; POISSON DISTRIBUTION; QUALITY ASSURANCE;

EID: 50649087193     PISSN: 15525244     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/CLUSTR.2007.4629264     Document Type: Conference Paper
Times cited : (27)

References (21)
  • 1
    • 0001924983 scopus 로고
    • A survey of analytic models of roll-back and recovery strategies
    • May
    • K.M. Chandy, "A survey of analytic models of roll-back and recovery strategies," Computer 8,5 (May 1975), 40-47
    • (1975) Computer , vol.8 , Issue.5 , pp. 40-47
    • Chandy, K.M.1
  • 2
    • 0016487291 scopus 로고
    • Analytic models for rollback and recovery stratagems m data base systems
    • March
    • K.M. Chandy, J.C. Browne, C. W. Dissly, and W. R. Unrig, "Analytic models for rollback and recovery stratagems m data base systems," IEEE Trans Software Eng. SE-1, (March 1975), 100-110
    • (1975) IEEE Trans Software Eng , vol.SE-1 , pp. 100-110
    • Chandy, K.M.1    Browne, J.C.2    Dissly, C.W.3    Unrig, W.R.4
  • 3
    • 28044438299 scopus 로고    scopus 로고
    • A Model for Predicting the Optimum Checkpoint Interval for Restart Dumps
    • J.T. Daly, "A Model for Predicting the Optimum Checkpoint Interval for Restart Dumps," ICCS 2003, LNCS 2660, Proceedings 4 (2003) 3-12
    • (2003) ICCS 2003, LNCS 2660, Proceedings , vol.4 , pp. 3-12
    • Daly, J.T.1
  • 4
    • 51049113966 scopus 로고    scopus 로고
    • A Higher Order Estimate of the Optimum Checkpoint Interval for Restart Dumps
    • Elsevier, Amsterdam
    • J.T. Daly, "A Higher Order Estimate of the Optimum Checkpoint Interval for Restart Dumps," Future Generation Computer Systems (Elsevier, Amsterdam, 2004)
    • (2004) Future Generation Computer Systems
    • Daly, J.T.1
  • 5
    • 9144223280 scopus 로고    scopus 로고
    • Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery
    • IEEE Trans. Dependable Sec. Comput
    • E. Elnozahy, J. Plank, "Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery," IEEE Trans. Dependable Sec. Comput. .1.(2): 97-108 (2004)
    • (2004) , vol.1 , Issue.2 , pp. 97-108
    • Elnozahy, E.1    Plank, J.2
  • 6
    • 0000652719 scopus 로고
    • Selection, of a checkpoint interval in a critical-task environment
    • R. Geist, R. Reynolds, and J. Westall, "Selection, of a checkpoint interval in a critical-task environment," IEEE Trans. Reliability, 37, (4), 395-400 (1988)
    • (1988) IEEE Trans. Reliability , vol.37 , Issue.4 , pp. 395-400
    • Geist, R.1    Reynolds, R.2    Westall, J.3
  • 8
    • 0004244684 scopus 로고
    • Checkpointing and the Modeling of Program Execution Time
    • M.R. Lyu, ed, pp, John Wiley & Sons
    • V.F. Nicola, "Checkpointing and the Modeling of Program Execution Time," Software Fault Tolerance, M.R. Lyu, ed., pp. 167-188, John Wiley & Sons, 1995.
    • (1995) Software Fault Tolerance , pp. 167-188
    • Nicola, V.F.1
  • 10
    • 0032597646 scopus 로고    scopus 로고
    • The Average Availability of Parallel Checkpointing Systems and Its Importance in Selecting Runtime Parameters
    • J.S. Plank, M.A. Thomason, "The Average Availability of Parallel Checkpointing Systems and Its Importance in Selecting Runtime Parameters," IEEE Proc. Int'l Symp. on Fault-Tolerant Computing, 1999.
    • (1999) IEEE Proc. Int'l Symp. on Fault-Tolerant Computing
    • Plank, J.S.1    Thomason, M.A.2
  • 11
    • 0003778293 scopus 로고
    • Wiley; 2nd edition January, ISBN-10: 0471120626
    • S.M. Ross, "Stochastic Processes," Wiley; 2nd edition (January 1995), ISBN-10: 0471120626
    • (1995) Stochastic Processes
    • Ross, S.M.1
  • 12
    • 33746286070 scopus 로고    scopus 로고
    • Performance implications of periodic checkpointing on large-scale cluster systems
    • IEEE International
    • A. J. Oliner "Performance implications of periodic checkpointing on large-scale cluster systems", Parallel and Distributed Processing Symposium, 2005. Proc. 19th IEEE International (2005), pp. 299b-299b.
    • (2005) Parallel and Distributed Processing Symposium, 2005. Proc. 19th
    • Oliner, A.J.1
  • 15
    • 53349101430 scopus 로고    scopus 로고
    • M. Treaster, A survey of fault-tolerance and fault-recovery techniques in parallel systems, Technical Report cs.DC/ 0501002, ACM Computing Research Repository (CoRR), January 2005.
    • M. Treaster, "A survey of fault-tolerance and fault-recovery techniques in parallel systems," Technical Report cs.DC/ 0501002, ACM Computing Research Repository (CoRR), January 2005.
  • 16
    • 84866903812 scopus 로고
    • Distributed Computing Systems and Checkpointing
    • K. F. Wong, M.A. Franklin, "Distributed Computing Systems and Checkpointing," HPDC 1993: 224-233
    • (1993) HPDC , pp. 224-233
    • Wong, K.F.1    Franklin, M.A.2
  • 20
    • 0035390088 scopus 로고    scopus 로고
    • A Variational Calculus Approach to Optimal Checkpoint Placement
    • July
    • Y. Ling, J. Mi, and X. Lin, "A Variational Calculus Approach to Optimal Checkpoint Placement," IEEE Trans. Computers, vol. 50, no. 7, 699-707, July 2001.
    • (2001) IEEE Trans. Computers , vol.50 , Issue.7 , pp. 699-707
    • Ling, Y.1    Mi, J.2    Lin, X.3
  • 21
    • 33646721605 scopus 로고    scopus 로고
    • Distribution-Free Checkpoint Placement Algorithms Based on Min-Max Principle
    • April
    • T. Ozaki, T. Dohi, and H. Okamura, "Distribution-Free Checkpoint Placement Algorithms Based on Min-Max Principle," IEEE Transactions on Dependable and Secure Computing, Volume 3 , Issue 2 (April 2006), 130-140
    • (2006) IEEE Transactions on Dependable and Secure Computing , vol.3 , Issue.2 , pp. 130-140
    • Ozaki, T.1    Dohi, T.2    Okamura, H.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.