메뉴 건너뛰기




Volumn , Issue , 2011, Pages

Checkpointing strategies for parallel jobs

Author keywords

Checkpointing; Fault tolerance; Parallel job; Sequential job

Indexed keywords

CHECK POINTING; DYNAMIC PROGRAMMING ALGORITHM; EXPECTED EXECUTION TIME; EXTENSIVE SIMULATIONS; INTER-ARRIVAL TIME; JOB EXECUTION; JOB PARALLELISM; OPTIMAL SOLUTIONS; PARALLEL JOB; PARALLEL JOBS; PERIODIC CHECKPOINTING; PROCESSOR FAILURES; REAL-WORLD SYSTEM; SEQUENTIAL JOB; SIMULATION EXPERIMENTS; WEIBULL;

EID: 83155184556     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/2063384.2063428     Document Type: Conference Paper
Times cited : (67)

References (31)
  • 1
    • 85060036181 scopus 로고
    • The validity of the single processor approach to achieving large scale computing capabilities
    • AFIPS Press
    • G. Amdahl. The validity of the single processor approach to achieving large scale computing capabilities. In AFIPS Conference Proceedings, volume 30, pages 483-485. AFIPS Press, 1967.
    • (1967) AFIPS Conference Proceedings , vol.30 , pp. 483-485
    • Amdahl, G.1
  • 6
    • 77955097389 scopus 로고    scopus 로고
    • A exible checkpoint/restart model in distributed systems
    • volume 6067 of LNCS
    • M.-S. Bouguerra, T. Gautier, D. Trystram, and J.-M. Vincent. A exible checkpoint/restart model in distributed systems. In PPAM, volume 6067 of LNCS, pages 206-215, 2010.
    • (2010) PPAM , pp. 206-215
    • Bouguerra, M.-S.1    Gautier, T.2    Trystram, D.3    Vincent, J.-M.4
  • 7
    • 83455190312 scopus 로고    scopus 로고
    • An optimal algorithm for scheduling checkpoints with variable costs
    • Oct.
    • M. S. Bouguerra, D. Trystram, and F. Wagner. An optimal algorithm for scheduling checkpoints with variable costs. Technical report, INRIA, Oct. 2010.
    • (2010) Technical Report, INRIA
    • Bouguerra, M.S.1    Trystram, D.2    Wagner, F.3
  • 8
    • 78649559128 scopus 로고    scopus 로고
    • Checkpointing vs. Migration for post-petascale supercomputers
    • IEEE Computer Society Press
    • F. Cappello, H. Casanova, and Y. Robert. Checkpointing vs. migration for post-petascale supercomputers. In ICPP'2010. IEEE Computer Society Press, 2010.
    • (2010) ICPP'2010
    • Cappello, F.1    Casanova, H.2    Robert, Y.3
  • 10
    • 28044460018 scopus 로고    scopus 로고
    • A higher order estimate of the optimum checkpoint interval for restart dumps
    • DOI 10.1016/j.future.2004.11.016, PII S0167739X04002213
    • J. T. Daly. A higher order estimate of the optimum checkpoint interval for restart dumps. Future Generation Computer Systems, 22(3):303-312, 2004. (Pubitemid 41689812)
    • (2006) Future Generation Computer Systems , vol.22 , Issue.3 , pp. 303-312
    • Daly, J.T.1
  • 12
  • 13
    • 0036041277 scopus 로고    scopus 로고
    • Improving cluster availability using workstation validation
    • T. Heath, R. P. Martin, and T. D. Nguyen. Improving cluster availability using workstation validation. SIGMETRICS Perf. Eval. Rev., 30(1):217-227, 2002. (Pubitemid 35009524)
    • (2002) Performance Evaluation Review , vol.30 , Issue.1 , pp. 217-227
    • Heath, T.1    Martin, R.P.2    Nguyen, T.D.3
  • 15
    • 78650009816 scopus 로고    scopus 로고
    • Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters
    • ACM
    • W. Jones, J. Daly, and N. DeBardeleben. Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters. In HPDC'10, pages 276-279. ACM, 2010.
    • (2010) HPDC'10 , pp. 276-279
    • Jones, W.1    Daly, J.2    DeBardeleben, N.3
  • 16
    • 0028994247 scopus 로고
    • Software rejuvenation: Analysis, module and applications
    • Washington, DC, USA, IEEE CS
    • N. Kolettis and N. D. Fulton. Software rejuvenation: Analysis, module and applications. In FTCS'95, page 381, Washington, DC, USA, 1995. IEEE CS.
    • (1995) FTCS'95 , pp. 381
    • Kolettis, N.1    Fulton, N.D.2
  • 18
    • 0023995854 scopus 로고    scopus 로고
    • Computing optimal checkpointing strategies for rollback and recovery systems
    • P. L'Ecuyer and J. Malenfant. Computing optimal checkpointing strategies for rollback and recovery systems. IEEE Transactions on computers, 37(4):491-496, 2002.
    • (2002) IEEE Transactions on Computers , vol.37 , Issue.4 , pp. 491-496
    • L'Ecuyer, P.1    Malenfant, J.2
  • 19
    • 0035390088 scopus 로고    scopus 로고
    • A variational calculus approach to optimal checkpoint placement
    • DOI 10.1109/12.936236
    • Y. Ling, J. Mi, and X. Lin. A variational calculus approach to optimal checkpoint placement. IEEE Transactions on computers, pages 699-708, 2001. (Pubitemid 32720123)
    • (2001) IEEE Transactions on Computers , vol.50 , Issue.7 , pp. 699-708
    • Ling, Y.1    Mi, J.2    Lin, X.3
  • 20
    • 51049108820 scopus 로고    scopus 로고
    • An optimal checkpoint/restart model for a large scale high performance computing system
    • IEEE
    • Y. Liu, R. Nassar, C. Leangsuksun, N. Naksinehaboon, M. Paun, and S. Scott. An optimal checkpoint/restart model for a large scale high performance computing system. In IPDPS 2008, pages 1-9. IEEE, 2008.
    • (2008) IPDPS 2008 , pp. 1-9
    • Liu, Y.1    Nassar, R.2    Leangsuksun, C.3    Naksinehaboon, N.4    Paun, M.5    Scott, S.6
  • 23
    • 33646721605 scopus 로고    scopus 로고
    • Distribution-free checkpoint placement algorithms based on min-max principle
    • T. Ozaki, T. Dohi, H. Okamura, and N. Kaio. Distribution-free checkpoint placement algorithms based on min-max principle. IEEE TDSC, pages 130-140, 2006.
    • (2006) IEEE TDSC , pp. 130-140
    • Ozaki, T.1    Dohi, T.2    Okamura, H.3    Kaio, N.4
  • 27
    • 84976696875 scopus 로고
    • Performance analysis of checkpointing strategies
    • A. Tantawi and M. Ruschitzka. Performance analysis of checkpointing strategies. ACM TOCS, 2(2):123-144, 1984.
    • (1984) ACM TOCS , vol.2 , Issue.2 , pp. 123-144
    • Tantawi, A.1    Ruschitzka, M.2
  • 28
    • 0021473687 scopus 로고
    • On the optimum checkpoint selection problem
    • S. Toueg and O. Babaoglu. On the optimum checkpoint selection problem. SIAM J. Computing, 13(3):630-649, 1984.
    • (1984) SIAM J. Computing , vol.13 , Issue.3 , pp. 630-649
    • Toueg, S.1    Babaoglu, O.2
  • 29
    • 83155195315 scopus 로고    scopus 로고
    • Analysis of dependencies of checkpoint cost and checkpoint interval of fault tolerant MPI applications
    • K. Venkatesh. Analysis of Dependencies of Checkpoint Cost and Checkpoint Interval of Fault Tolerant MPI Applications. Analysis, 2(08):2690-2697, 2010.
    • (2010) Analysis , vol.2 , Issue.8 , pp. 2690-2697
    • Venkatesh, K.1
  • 31
    • 84976846528 scopus 로고
    • A first order approximation to the optimum checkpoint interval
    • J. W. Young. A first order approximation to the optimum checkpoint interval. Communications of the ACM, 17(9):530-531, 1974.
    • (1974) Communications of the ACM , vol.17 , Issue.9 , pp. 530-531
    • Young, J.W.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.