메뉴 건너뛰기




Volumn , Issue , 2012, Pages

On the complexity of scheduling checkpoints for computational workflows

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATIONAL WORKFLOWS; DYNAMIC PROGRAMMING ALGORITHM; EXPECTED EXECUTION TIME; FAILURE DISTRIBUTIONS; ROLLBACK AND RECOVERY; SCHEDULING PROBLEM; SCHEDULING STRATEGIES; THEORETICAL FOUNDATIONS;

EID: 84880916044     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/DSNW.2012.6264675     Document Type: Conference Paper
Times cited : (10)

References (29)
  • 1
    • 0028994247 scopus 로고
    • Software rejuvenation: Analysis, module and applications
    • Washington, DC, USA: IEEE CS
    • N. Kolettis and N. D. Fulton, "Software rejuvenation: Analysis, module and applications," in FTCS '95. Washington, DC, USA: IEEE CS, 1995, p. 381.
    • (1995) FTCS '95 , pp. 381
    • Kolettis, N.1    Fulton, N.D.2
  • 3
    • 0033894885 scopus 로고    scopus 로고
    • Heuristic algorithm for mapping communicating tasks on heterogeneous resources
    • K. Taura and A. A. Chien, "A heuristic algorithm for mapping communicating tasks on heterogeneous resources," in Heterogeneous Computing Workshop. IEEE Computer Society Press, 2000, pp. 102-115. (Pubitemid 30592821)
    • (2000) Proceedings of the Heterogeneous Computing Workshop, HCW , pp. 102-115
    • Taura Kenjiro1    Chien Andrew2
  • 4
    • 60649109658 scopus 로고    scopus 로고
    • Supporting distributed application workflows in heterogeneous computing environments
    • IEEE Computer Society Press
    • Q. Wu and Y. Gu, "Supporting distributed application workflows in heterogeneous computing environments," in 14th Int. Conf. on Parallel and Distributed Systems (ICPADS). IEEE Computer Society Press, 2008.
    • (2008) 14th Int. Conf. on Parallel and Distributed Systems (ICPADS)
    • Wu, Q.1    Gu, Y.2
  • 5
    • 0020765766 scopus 로고
    • Effects of checkpointing on program execution time
    • DOI 10.1016/0020-0190(83)90093-5
    • A. Duda, "The effects of checkpointing on program execution time," Inf. Processing Letters, Vol. 16, no. 5, pp. 221-229, 1983. (Pubitemid 13590444)
    • (1983) Information Processing Letters , vol.16 , Issue.5 , pp. 221-229
    • Duda, A.1
  • 6
    • 28044460018 scopus 로고    scopus 로고
    • A higher order estimate of the optimum checkpoint interval for restart dumps
    • DOI 10.1016/j.future.2004.11.016, PII S0167739X04002213
    • J. T. Daly, "A higher order estimate of the optimum checkpoint interval for restart dumps," Future Generation Computer Systems, Vol. 22, no. 3, pp. 303-312, 2004. (Pubitemid 41689812)
    • (2006) Future Generation Computer Systems , vol.22 , Issue.3 , pp. 303-312
    • Daly, J.T.1
  • 7
    • 0036041277 scopus 로고    scopus 로고
    • Improving cluster availability using workstation validation
    • T. Heath, R. P. Martin, and T. D. Nguyen, "Improving cluster availability using workstation validation," SIGMETRICS Perf. Eval. Rev., Vol. 30, no. 1, pp. 217-227, 2002. (Pubitemid 35009524)
    • (2002) Performance Evaluation Review , vol.30 , Issue.1 , pp. 217-227
    • Heath, T.1    Martin, R.P.2    Nguyen, T.D.3
  • 9
    • 51049108820 scopus 로고    scopus 로고
    • An optimal checkpoint/restart model for a large scale high performance computing system
    • IEEE
    • Y. Liu, R. Nassar, C. Leangsuksun, N. Naksinehaboon, M. Paun, and S. Scott, "An optimal checkpoint/restart model for a large scale high performance computing system," in IPDPS 2008. IEEE, 2008, pp. 1-9.
    • (2008) IPDPS 2008 , pp. 1-9
    • Liu, Y.1    Nassar, R.2    Leangsuksun, C.3    Naksinehaboon, N.4    Paun, M.5    Scott, S.6
  • 11
    • 77955097389 scopus 로고    scopus 로고
    • A flexible checkpoint/restart model in distributed systems
    • http://dx.doi.org/10.1007/978-3-642-14390-822
    • M.-S. Bouguerra, T. Gautier, D. Trystram, and J.-M. Vincent, "A flexible checkpoint/restart model in distributed systems," in PPAM, ser. LNCS, Vol. 6067, 2010, pp. 206-215. [Online]. Available: http://dx.doi.org/10. 1007/978-3-642-14390-822
    • (2010) PPAM, ser. LNCS , vol.6067 , pp. 206-215
    • Bouguerra, M.-S.1    Gautier, T.2    Trystram, D.3    Vincent, J.-M.4
  • 13
    • 85060036181 scopus 로고
    • The validity of the single processor approach to achieving large scale computing capabilities
    • AFIPS Press
    • G. Amdahl, "The validity of the single processor approach to achieving large scale computing capabilities," in AFIPS Conference Proceedings, Vol. 30. AFIPS Press, 1967, pp. 483-485.
    • (1967) AFIPS Conference Proceedings , vol.30 , pp. 483-485
    • Amdahl, G.1
  • 15
    • 84867631517 scopus 로고    scopus 로고
    • Using group replication for resilience on exascale systems
    • Research report RR-7876, February
    • M. Bougeret, H. Casanova, Y. Robert, F. Vivien, and D. Zaidouni, "Using group replication for resilience on exascale systems," INRIA, Research report RR-7876, February 2012. [Online]. Available: http://hal.inria.fr/hal-00668016
    • (2012) INRIA
    • Bougeret, M.1    Casanova, H.2    Robert, Y.3    Vivien, F.4    Zaidouni, D.5
  • 19
    • 84880864185 scopus 로고    scopus 로고
    • Complexity analysis of checkpoint scheduling with variable costs
    • IEEE Transactions On
    • M.-S. Bouguerra, D. Trystram, and F. Wagner, "Complexity analysis of checkpoint scheduling with variable costs," Computers, IEEE Transactions on, 2012.
    • (2012) Computers
    • Bouguerra, M.-S.1    Trystram, D.2    Wagner, F.3
  • 20
    • 77954903245 scopus 로고    scopus 로고
    • The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems
    • IEEE International Symposium On
    • D. Kondo, B. Javadi, A. Iosup, and D. Epema, "The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems," Cluster Computing and the Grid, IEEE International Symposium on, Vol. 0, pp. 398-407, 2010.
    • (2010) Cluster Computing and the Grid , pp. 398-407
    • Kondo, D.1    Javadi, B.2    Iosup, A.3    Epema, D.4
  • 21
    • 84976846528 scopus 로고
    • A first order approximation to the optimum checkpoint interval
    • J. W. Young, "A first order approximation to the optimum checkpoint interval," Communications of the ACM, Vol. 17, no. 9, pp. 530-531, 1974.
    • (1974) Communications of the ACM , vol.17 , Issue.9 , pp. 530-531
    • Young, J.W.1
  • 22
    • 78650009816 scopus 로고    scopus 로고
    • Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters
    • ACM
    • W. Jones, J. Daly, and N. DeBardeleben, "Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters," in HPDC'10. ACM, 2010, pp. 276-279.
    • (2010) HPDC'10 , pp. 276-279
    • Jones, W.1    Daly, J.2    DeBardeleben, N.3
  • 23
    • 83155195315 scopus 로고    scopus 로고
    • Analysis of dependencies of checkpoint cost and checkpoint interval of fault tolerant MPI applications
    • K. Venkatesh, "Analysis of Dependencies of Checkpoint Cost and Checkpoint Interval of Fault Tolerant MPI Applications," Analysis, Vol. 2, no. 08, pp. 2690-2697, 2010.
    • (2010) Analysis , vol.2 , Issue.8 , pp. 2690-2697
    • Venkatesh, K.1
  • 24
    • 84976696875 scopus 로고
    • Performance analysis of checkpointing strategies
    • A. Tantawi and M. Ruschitzka, "Performance analysis of checkpointing strategies," ACM TOCS, Vol. 2, no. 2, pp. 123-144, 1984.
    • (1984) ACM TOCS , vol.2 , Issue.2 , pp. 123-144
    • Tantawi, A.1    Ruschitzka, M.2
  • 25
    • 35248884762 scopus 로고    scopus 로고
    • Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems
    • DOI 10.1145/1248377.1248423, SPAA'07: Proceedings of the Nineteenth Annual Symposium on Parallelism in Algorithms and Architectures
    • J. Dongarra, E. Jeannot, E. Saule, and Z. Shi, "Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems," in ACM Symposium on Parallel Algorithms and Architectures (SPAA). ACM Press, 2007, pp. 280-288. (Pubitemid 47568577)
    • (2007) Annual ACM Symposium on Parallelism in Algorithms and Architectures , pp. 280-288
    • Dongarra, J.J.1    Jeannot, E.2    Saule, E.3    Shi, Z.4
  • 26
    • 0036504529 scopus 로고    scopus 로고
    • Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing
    • DOI 10.1109/71.993209
    • A. Dogan and F. Özgüner, "Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing," IEEE Trans. Parallel Distributed Systems, Vol. 13, no. 3, pp. 308-323, 2002. (Pubitemid 34448783)
    • (2002) IEEE Transactions on Parallel and Distributed Systems , vol.13 , Issue.3 , pp. 308-323
    • Dogan, A.1    Ozguner, F.2
  • 27
    • 59149105005 scopus 로고    scopus 로고
    • Reliability versus performance for critical applications
    • A. Girault, E. Saule, and D. Trystram, "Reliability versus performance for critical applications," J. Parallel Distributed Computing, Vol. 69, no. 3, pp. 326-336, 2009.
    • (2009) J. Parallel Distributed Computing , vol.69 , Issue.3 , pp. 326-336
    • Girault, A.1    Saule, E.2    Trystram, D.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.