메뉴 건너뛰기




Volumn , Issue , 2017, Pages 285-320

A Taxonomy and Survey of Fault-Tolerant Workflow Management Systems in Cloud and Distributed Computing Environments

Author keywords

Algorithms; Checkpointing; Cloud computing; Distributed systems; Fault tolerance; Task duplication; Task retry; Workflows

Indexed keywords


EID: 85179296455     PISSN: None     EISSN: None     Source Type: Book    
DOI: 10.1016/B978-0-12-805467-3.00015-6     Document Type: Chapter
Times cited : (39)

References (113)
  • 1
    • 78651350373 scopus 로고    scopus 로고
    • Scientific workflows and clouds
    • G. Juve and E. Deelman (2010) Scientific workflows and clouds. Crossroads 16(3), 14-18.
    • (2010) Crossroads , vol.16 , Issue.3 , pp. 14-18
    • Juve, G.1    Deelman, E.2
  • 4
    • 31444436554 scopus 로고    scopus 로고
    • A taxonomy of scientific workflow systems for grid computing
    • J. Yu and R. Buyya (2005) A taxonomy of scientific workflow systems for grid computing. SIGMOD Rec. 34(3), 44-49.
    • (2005) SIGMOD Rec. , vol.34 , Issue.3 , pp. 44-49
    • Yu, J.1    Buyya, R.2
  • 6
    • 85017313616 scopus 로고    scopus 로고
    • Cloud computing – issues, research and implementations
    • M.A. Vouk (2008) Cloud computing – issues, research and implementations. CIT, J. Comput. Inf. Technol. 16(4), 235-246.
    • (2008) CIT, J. Comput. Inf. Technol. , vol.16 , Issue.4 , pp. 235-246
    • Vouk, M.A.1
  • 8
    • 0002050141 scopus 로고    scopus 로고
    • Static scheduling algorithms for allocating directed task graphs to multiprocessors
    • Y. Kwok and I. Ahmad (1999) Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406-471.
    • (1999) ACM Comput. Surv. , vol.31 , Issue.4 , pp. 406-471
    • Kwok, Y.1    Ahmad, I.2
  • 12
    • 0028499648 scopus 로고
    • Robustness measures and robust scheduling for job shops
    • V. Leon, S. Wu and H. Robert (1994) Robustness measures and robust scheduling for job shops. IIE Trans. 26(5), 32-43.
    • (1994) IIE Trans. , vol.26 , Issue.5 , pp. 32-43
    • Leon, V.1    Wu, S.2    Robert, H.3
  • 13
    • 13444269168 scopus 로고    scopus 로고
    • Project scheduling under uncertainty: survey and research potentials
    • W. Herroelen and R. Leus (2005) Project scheduling under uncertainty: survey and research potentials. Eur. J. Oper. Res. 165(2), 289-306.
    • (2005) Eur. J. Oper. Res. , vol.165 , Issue.2 , pp. 289-306
    • Herroelen, W.1    Leus, R.2
  • 18
    • 84865067611 scopus 로고    scopus 로고
    • Failure-aware resource provisioning for hybrid cloud infrastructure
    • B. Javadi, J. Abawajy and R. Buyya (2012) Failure-aware resource provisioning for hybrid cloud infrastructure. J. Parallel Distrib. Comput. 72(10), 1318-1331.
    • (2012) J. Parallel Distrib. Comput. , vol.72 , Issue.10 , pp. 1318-1331
    • Javadi, B.1    Abawajy, J.2    Buyya, R.3
  • 19
    • 0345415768 scopus 로고    scopus 로고
    • Fundamentals of fault-tolerant distributed computing in asynchronous environments
    • F. Gärtner (1999) Fundamentals of fault-tolerant distributed computing in asynchronous environments. ACM Comput. Surv. 31(1), 1-26.
    • (1999) ACM Comput. Surv. , vol.31 , Issue.1 , pp. 1-26
    • Gärtner, F.1
  • 21
    • 84870530759 scopus 로고    scopus 로고
    • Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms
    • A. Benoit, L.-C. Canon, E. Jeannot and Y. Robert (2012) Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms. J. Sched. 15(5), 615-627.
    • (2012) J. Sched. , vol.15 , Issue.5 , pp. 615-627
    • Benoit, A.1    Canon, L.-C.2    Jeannot, E.3    Robert, Y.4
  • 22
    • 84976815497 scopus 로고
    • Fail-stop processors: an approach to designing fault-tolerant computing systems
    • R.D. Schlichting and F.B. Schneider (1983) Fail-stop processors: an approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1(3), 222-238.
    • (1983) ACM Trans. Comput. Syst. , vol.1 , Issue.3 , pp. 222-238
    • Schlichting, R.D.1    Schneider, F.B.2
  • 23
    • 67949091205 scopus 로고    scopus 로고
    • Reliability in grid computing systems
    • C. Dabrowski (2009) Reliability in grid computing systems. Concurr. Comput., Pract. Exp. 21(8), 927-959.
    • (2009) Concurr. Comput., Pract. Exp. , vol.21 , Issue.8 , pp. 927-959
    • Dabrowski, C.1
  • 24
    • 33947211821 scopus 로고    scopus 로고
    • On the efficacy, efficiency and emergent behavior of task replication in large distributed systems
    • W. Cirne, F. Brasileiro, D. Paranhos, L. Goes and W. Voorsluys (2007) On the efficacy, efficiency and emergent behavior of task replication in large distributed systems. Parallel Comput. 33(3), 213-234.
    • (2007) Parallel Comput. , vol.33 , Issue.3 , pp. 213-234
    • Cirne, W.1    Brasileiro, F.2    Paranhos, D.3    Goes, L.4    Voorsluys, W.5
  • 29
    • 0036522785 scopus 로고    scopus 로고
    • Effective scheduling of duplicated tasks for fault tolerance in multiprocessor systems
    • K. Hashimoto, T. Tsuchiya and T. Kikuno (2002) Effective scheduling of duplicated tasks for fault tolerance in multiprocessor systems. IEICE Trans. Inf. Syst. 85(3), 525-534.
    • (2002) IEICE Trans. Inf. Syst. , vol.85 , Issue.3 , pp. 525-534
    • Hashimoto, K.1    Tsuchiya, T.2    Kikuno, T.3
  • 34
    • 76849102536 scopus 로고    scopus 로고
    • List scheduling with duplication for heterogeneous computing systems
    • X. Tang, X. Li, G. Liao and R. Li (2010) List scheduling with duplication for heterogeneous computing systems. J. Parallel Distrib. Comput. 70(4), 323-329.
    • (2010) J. Parallel Distrib. Comput. , vol.70 , Issue.4 , pp. 323-329
    • Tang, X.1    Li, X.2    Liao, G.3    Li, R.4
  • 35
    • 84904340844 scopus 로고    scopus 로고
    • Meeting deadlines of scientific workflows in public clouds with tasks replication
    • R. Calheiros and R. Buyya (2013) Meeting deadlines of scientific workflows in public clouds with tasks replication. IEEE Trans. Parallel Distrib. Syst. PP(99), 1.
    • (2013) IEEE Trans. Parallel Distrib. Syst. , vol.PP , Issue.99 , pp. 1
    • Calheiros, R.1    Buyya, R.2
  • 50
    • 84859734439 scopus 로고    scopus 로고
    • Meeting soft deadlines in scientific workflows using resubmission impact
    • K. Plankensteiner and R. Prodan (2012) Meeting soft deadlines in scientific workflows using resubmission impact. IEEE Trans. Parallel Distrib. Syst. 23(5), 890-901.
    • (2012) IEEE Trans. Parallel Distrib. Syst. , vol.23 , Issue.5 , pp. 890-901
    • Plankensteiner, K.1    Prodan, R.2
  • 51
    • 12944263565 scopus 로고    scopus 로고
    • A low-cost rescheduling policy for efficient mapping of workflows on grid systems
    • R. Sakellariou and H. Zhao (2004) A low-cost rescheduling policy for efficient mapping of workflows on grid systems. Sci. Program. 12(4), 253-262.
    • (2004) Sci. Program. , vol.12 , Issue.4 , pp. 253-262
    • Sakellariou, R.1    Zhao, H.2
  • 53
    • 33646390933 scopus 로고    scopus 로고
    • Dee: a distributed fault tolerant workflow enactment engine for grid computing
    • L. Yang, O. Rana, B. Di Martino, J. Dongarra (Eds)
    • R. Duan, R. Prodan and T. Fahringer (2005) Dee: a distributed fault tolerant workflow enactment engine for grid computing. L. Yang, O. Rana, B. Di Martino, J. Dongarra (Eds) High Performance Computing and Communications Lecture Notes in Computer Science vol. 3726 704-716.
    • (2005) High Performance Computing and Communications Lecture Notes in Computer Science , vol.3726 , pp. 704-716
    • Duan, R.1    Prodan, R.2    Fahringer, T.3
  • 54
    • 0042078549 scopus 로고    scopus 로고
    • A survey of rollback-recovery protocols in message-passing systems
    • E.N. Elnozahy, L. Alvisi, Y. Wang and D.B. Johnson (2002) A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv. 34(3), 375-408.
    • (2002) ACM Comput. Surv. , vol.34 , Issue.3 , pp. 375-408
    • Elnozahy, E.N.1    Alvisi, L.2    Wang, Y.3    Johnson, D.B.4
  • 55
    • 34250348415 scopus 로고    scopus 로고
    • Adaptive selection of necessary and sufficient checkpoints for dynamic verification of temporal constraints in grid workflow systems
    • J. Chen and Y. Yang (2007) Adaptive selection of necessary and sufficient checkpoints for dynamic verification of temporal constraints in grid workflow systems. ACM Trans. Auton. Adapt. Syst. 2(6)
    • (2007) ACM Trans. Auton. Adapt. Syst. , vol.2 , Issue.6
    • Chen, J.1    Yang, Y.2
  • 56
    • 84893755528 scopus 로고    scopus 로고
    • Contention management in federated virtualized distributed systems: implementation and evaluation
    • M.A. Salehi, A.N. Toosi and R. Buyya (2014) Contention management in federated virtualized distributed systems: implementation and evaluation. Softw. Pract. Exp. 44(3), 353-368.
    • (2014) Softw. Pract. Exp. , vol.44 , Issue.3 , pp. 353-368
    • Salehi, M.A.1    Toosi, A.N.2    Buyya, R.3
  • 57
    • 84881374819 scopus 로고    scopus 로고
    • A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems
    • I. Egwutuoha, D. Levy, B. Selic and S. Chen (2013) A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems. J. Supercomput. 65(3), 1302-1326.
    • (2013) J. Supercomput. , vol.65 , Issue.3 , pp. 1302-1326
    • Egwutuoha, I.1    Levy, D.2    Selic, B.3    Chen, S.4
  • 58
    • 77955307144 scopus 로고    scopus 로고
    • An uncoordinated asynchronous checkpointing model for hierarchical scientific workflows
    • R. Tolosana-Calasanz, J. Baòares, P. Álvarez, J. Ezpeleta and O. Rana (2010) An uncoordinated asynchronous checkpointing model for hierarchical scientific workflows. J. Comput. Syst. Sci. 76(6), 403-415.
    • (2010) J. Comput. Syst. Sci. , vol.76 , Issue.6 , pp. 403-415
    • Tolosana-Calasanz, R.1    Baòares, J.2    Álvarez, P.3    Ezpeleta, J.4    Rana, O.5
  • 59
    • 84892524778 scopus 로고    scopus 로고
    • Resource provisioning based on preempting virtual machines in distributed systems
    • M.A. Salehi, B. Javadi and R. Buyya (2014) Resource provisioning based on preempting virtual machines in distributed systems. Concurr. Comput., Pract. Exp. 26(2), 412-433.
    • (2014) Concurr. Comput., Pract. Exp. , vol.26 , Issue.2 , pp. 412-433
    • Salehi, M.A.1    Javadi, B.2    Buyya, R.3
  • 61
    • 84892308818 scopus 로고    scopus 로고
    • Java COG kit workflow
    • I. Taylor, E. Deelman, D. Gannon, M. Shields (Eds)
    • G. von Laszewski, M. Hategan and D. Kodeboyina (2007) Java COG kit workflow. I. Taylor, E. Deelman, D. Gannon, M. Shields (Eds) Workflows for e-Science 340-356.
    • (2007) Workflows for e-Science , pp. 340-356
    • von Laszewski, G.1    Hategan, M.2    Kodeboyina, D.3
  • 62
    • 78951488046 scopus 로고    scopus 로고
    • SwinDeW-C: a peer-to-peer based cloud workflow system
    • B. Furht, A. Escalante (Eds)
    • X. Liu, D. Yuan, G. Zhang, J. Chen and Y. Yang (2010) SwinDeW-C: a peer-to-peer based cloud workflow system. B. Furht, A. Escalante (Eds) Handbook of Cloud Computing 309-332.
    • (2010) Handbook of Cloud Computing , pp. 309-332
    • Liu, X.1    Yuan, D.2    Zhang, G.3    Chen, J.4    Yang, Y.5
  • 63
    • 31444456909 scopus 로고    scopus 로고
    • A survey of data provenance in e-science
    • Y.L. Simmhan, B. Plale and D. Gannon (2005) A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31-36.
    • (2005) SIGMOD Rec. , vol.34 , Issue.3 , pp. 31-36
    • Simmhan, Y.L.1    Plale, B.2    Gannon, D.3
  • 68
    • 70349916404 scopus 로고    scopus 로고
    • Trust-based robust scheduling and runtime adaptation of scientific workflow
    • M. Wang, K. Ramamohanarao and J. Chen (2009) Trust-based robust scheduling and runtime adaptation of scientific workflow. Concurr. Comput., Pract. Exp. 21(16), 1982-1998.
    • (2009) Concurr. Comput., Pract. Exp. , vol.21 , Issue.16 , pp. 1982-1998
    • Wang, M.1    Ramamohanarao, K.2    Chen, J.3
  • 70
    • 84906789337 scopus 로고    scopus 로고
    • Trust-driven and QoS demand clustering analysis based cloud workflow scheduling strategies
    • W. Li, J. Wu, Q. Zhang, K. Hu and J. Li (2014) Trust-driven and QoS demand clustering analysis based cloud workflow scheduling strategies. Clust. Comput. 17(3), 1013-1030.
    • (2014) Clust. Comput. , vol.17 , Issue.3 , pp. 1013-1030
    • Li, W.1    Wu, J.2    Zhang, Q.3    Hu, K.4    Li, J.5
  • 71
    • 84907600373 scopus 로고    scopus 로고
    • A trust service-oriented scheduling model for workflow applications in cloud computing
    • W. Tan, Y. Sun, L.X. Li, G. Lu and T. Wang (2014) A trust service-oriented scheduling model for workflow applications in cloud computing. IEEE Syst. J. 8(3), 868-878.
    • (2014) IEEE Syst. J. , vol.8 , Issue.3 , pp. 868-878
    • Tan, W.1    Sun, Y.2    Li, L.X.3    Lu, G.4    Wang, T.5
  • 72
    • 84875862310 scopus 로고    scopus 로고
    • Multi-criteria scheduling of precedence task graphs on heterogeneous platforms
    • A. Benoit, M. Hakem and Y. Robert (2010) Multi-criteria scheduling of precedence task graphs on heterogeneous platforms. Comput. J. 53(6), 772-785. http://comjnl.oxfordjournals.org/content/53/6/772.full.pdf+html
    • (2010) Comput. J. , vol.53 , Issue.6 , pp. 772-785
    • Benoit, A.1    Hakem, M.2    Robert, Y.3
  • 74
    • 33748541058 scopus 로고    scopus 로고
    • Efficient task replication and management for adaptive fault tolerance in mobile grid environments
    • A. Litke, D. Skoutas, K. Tserpes and T. Varvarigou (2007) Efficient task replication and management for adaptive fault tolerance in mobile grid environments. Future Gener. Comput. Syst. 23(2), 163-178.
    • (2007) Future Gener. Comput. Syst. , vol.23 , Issue.2 , pp. 163-178
    • Litke, A.1    Skoutas, D.2    Tserpes, K.3    Varvarigou, T.4
  • 75
    • 78649488801 scopus 로고    scopus 로고
    • Reputation-based dependable scheduling of workflow applications in peer-to-peer grids
    • M. Rahman, R. Ranjan and R. Buyya (2010) Reputation-based dependable scheduling of workflow applications in peer-to-peer grids. Comput. Netw. 54(18), 3341-3359.
    • (2010) Comput. Netw. , vol.54 , Issue.18 , pp. 3341-3359
    • Rahman, M.1    Ranjan, R.2    Buyya, R.3
  • 76
    • 79960440687 scopus 로고    scopus 로고
    • Optimizing the makespan and reliability for workflow applications with reputation and a look-ahead genetic algorithm
    • X. Wang, C.S. Yeo, R. Buyya and J. Su (2011) Optimizing the makespan and reliability for workflow applications with reputation and a look-ahead genetic algorithm. Future Gener. Comput. Syst. 27(8), 1124-1134.
    • (2011) Future Gener. Comput. Syst. , vol.27 , Issue.8 , pp. 1124-1134
    • Wang, X.1    Yeo, C.S.2    Buyya, R.3    Su, J.4
  • 77
    • 77649275942 scopus 로고    scopus 로고
    • Evaluation and optimization of the robustness of DAG schedules in heterogeneous environments
    • L. Canon and E. Jeannot (2010) Evaluation and optimization of the robustness of DAG schedules in heterogeneous environments. IEEE Trans. Parallel Distrib. Syst. 21(4), 532-546.
    • (2010) IEEE Trans. Parallel Distrib. Syst. , vol.21 , Issue.4 , pp. 532-546
    • Canon, L.1    Jeannot, E.2
  • 78
    • 1642268665 scopus 로고    scopus 로고
    • Robust scheduling of metaprograms
    • L. Bölöni and D.C. Marinescu (2002) Robust scheduling of metaprograms. J. Sched. 5(5), 395-412.
    • (2002) J. Sched. , vol.5 , Issue.5 , pp. 395-412
    • Bölöni, L.1    Marinescu, D.C.2
  • 80
    • 84865048353 scopus 로고    scopus 로고
    • A framework for ranking of cloud computing services
    • Special Section: Utility and Cloud Computing
    • S.K. Garg, S. Versteeg and R. Buyya (2013) A framework for ranking of cloud computing services. Future Gener. Comput. Syst. 29(4), 1012-1023. Special Section: Utility and Cloud Computing
    • (2013) Future Gener. Comput. Syst. , vol.29 , Issue.4 , pp. 1012-1023
    • Garg, S.K.1    Versteeg, S.2    Buyya, R.3
  • 81
    • 84893686761 scopus 로고    scopus 로고
    • Bi-level fuzzy based advanced reservation of cloud workflow applications on distributed grid resources
    • S. Adabi, A. Movaghar and A.M. Rahmani (2014) Bi-level fuzzy based advanced reservation of cloud workflow applications on distributed grid resources. J. Supercomput. 67(1), 175-218.
    • (2014) J. Supercomput. , vol.67 , Issue.1 , pp. 175-218
    • Adabi, S.1    Movaghar, A.2    Rahmani, A.M.3
  • 83
    • 85179270514 scopus 로고    scopus 로고
    • [Online; accessed 01 December 2014]
    • Pegasus workflow management system. https://pegasus.isi.edu/ [Online; accessed 01 December 2014]
    • (2014) Pegasus workflow management system
  • 85
    • 79951848895 scopus 로고    scopus 로고
    • The Triana workflow environment: architecture and applications
    • I. Taylor, E. Deelman, D. Gannon, M. Shields (Eds)
    • I. Taylor, M. Shields, I. Wang and A. Harrison (2007) The Triana workflow environment: architecture and applications. I. Taylor, E. Deelman, D. Gannon, M. Shields (Eds) Workflows for e-Science 320-339.
    • (2007) Workflows for e-Science , pp. 320-339
    • Taylor, I.1    Shields, M.2    Wang, I.3    Harrison, A.4
  • 90
    • 71749103907 scopus 로고    scopus 로고
    • Cloudbus toolkit for market-oriented cloud computing
    • M. Jaatun, G. Zhao, C. Rong (Eds)
    • R. Buyya, S. Pandey and C. Vecchiola (2009) Cloudbus toolkit for market-oriented cloud computing. M. Jaatun, G. Zhao, C. Rong (Eds) Cloud Computing Lecture Notes in Computer Science vol. 5931 24-44.
    • (2009) Cloud Computing Lecture Notes in Computer Science , vol.5931 , pp. 24-44
    • Buyya, R.1    Pandey, S.2    Vecchiola, C.3
  • 97
    • 37549003336 scopus 로고    scopus 로고
    • MapReduce: simplified data processing on large clusters
    • J. Dean and S. Ghemawat (2008) MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107-113.
    • (2008) Commun. ACM , vol.51 , Issue.1 , pp. 107-113
    • Dean, J.1    Ghemawat, S.2
  • 98
    • 78650843050 scopus 로고    scopus 로고
    • 1st edition, CT, USA: Greenwich
    • C. Lam (2010) Hadoop in Action. 1st edition CT, USA: Greenwich
    • (2010) Hadoop in Action
    • Lam, C.1
  • 107
    • 45449086850 scopus 로고    scopus 로고
    • A light-weight grid workflow execution engine enabling client and middleware independence
    • R. Wyrzykowski, J. Dongarra, K. Karczewski, J. Wasniewski (Eds)
    • E. Elmroth, F. Hernández and J. Tordsson (2008) A light-weight grid workflow execution engine enabling client and middleware independence. R. Wyrzykowski, J. Dongarra, K. Karczewski, J. Wasniewski (Eds) Parallel Processing and Applied Mathematics Lecture Notes in Computer Science vol. 4967 754-761.
    • (2008) Parallel Processing and Applied Mathematics Lecture Notes in Computer Science , vol.4967 , pp. 754-761
    • Elmroth, E.1    Hernández, F.2    Tordsson, J.3
  • 108
    • 33244457031 scopus 로고    scopus 로고
    • Multi-grid, multi-user workflows in the p-grade grid portal
    • P. Kacsuk and G. Sipos (2005) Multi-grid, multi-user workflows in the p-grade grid portal. J. Grid Comput. 3(3–4), 221-238.
    • (2005) J. Grid Comput. , vol.3 , Issue.3-4 , pp. 221-238
    • Kacsuk, P.1    Sipos, G.2
  • 112
    • 84876574500 scopus 로고    scopus 로고
    • [Online; accessed 5 December 2014]
    • Pegasus workflow generator. https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator/2014 [Online; accessed 5 December 2014]
    • Pegasus workflow generator


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.