메뉴 건너뛰기




Volumn 29, Issue 4, 2014, Pages 363-378

Cost-oriented proactive fault tolerance approach to high performance computing (HPC) in the cloud

Author keywords

Cloud computing; computationintensive; HaaS; HPC; proactive fault tolerance

Indexed keywords

ALGORITHMS; CLOCKS; CLOUD COMPUTING; FAULT TOLERANCE;

EID: 84898914916     PISSN: 17445760     EISSN: 17445779     Source Type: Journal    
DOI: 10.1080/17445760.2013.803686     Document Type: Article
Times cited : (8)

References (33)
  • 1
    • 84898855433 scopus 로고    scopus 로고
    • [Online] Available at
    • Amazon. [Online]. Available at. http://aws.amazon.com/ec2/
    • Amazon1
  • 2
    • 84898923825 scopus 로고    scopus 로고
    • [Online] Available at
    • Baremetalcloud. [Online]. Available at. http://baremetalcloud.com/index. php/en/
    • Baremetalcloud
  • 4
    • 84898916874 scopus 로고    scopus 로고
    • Nicholas Carr. [Online] Available at
    • Nicholas Carr. [Online]. Available at. http://www.roughtype.com/p279
  • 5
    • 84898837506 scopus 로고    scopus 로고
    • Available at
    • CFDR. Available at. http://cfdr.usenix.org (2012).
    • (2012)
  • 6
    • 78149470110 scopus 로고    scopus 로고
    • A large-scale study of failures in high performance computing systems, dependable and secure computing
    • B. Schroeder and G.A. Gibson, A Large-Scale Study of Failures in High Performance Computing Systems, Dependable and Secure Computing, IEEE Transactions 7(4) (2010), pp. 337-351.
    • (2010) IEEE Transactions , vol.7 , Issue.4 , pp. 337-351
    • Schroeder, B.1    Gibson, G.A.2
  • 7
    • 12444258147 scopus 로고    scopus 로고
    • Development of naturally fault tolerant algorithms for computing on 100,000 processors
    • Available at
    • Al Geist and Christian Engelmann, Development of naturally fault tolerant algorithms for computing on 100,000 processors, J. Parallel Distributed Comput. (2002). Available at www.csm.ornl.gov/,geist
    • (2002) J Parallel Distributed Comput
    • Geist, A.1    Engelmann, C.2
  • 9
    • 67349137506 scopus 로고    scopus 로고
    • Cost-oriented task allocation and hardware redundancy policies in heterogeneous distributed computing systems considering software reliability
    • Huajun Hu, Suchang Guo, and Bo Yang, Cost-oriented task allocation and hardware redundancy policies in heterogeneous distributed computing systems considering software reliability, Comput. Ind. Eng. 56(4) (2009), pp. 1687-1696.
    • (2009) Comput. Ind. Eng. , vol.56 , Issue.4 , pp. 1687-1696
    • Hu, H.1    Guo, S.2    Yang, B.3
  • 10
    • 0037410519 scopus 로고    scopus 로고
    • Optimal task allocation and hardware redundancy policies in distributed computing systems
    • C.-C. Hsieh, Optimal task allocation and hardware redundancy policies in distributed computing systems, Eur. J. Operational Res. 147(2) (2003), pp. 430-447.
    • (2003) Eur. J. Operational Res. , vol.147 , Issue.2 , pp. 430-447
    • Hsieh, C.-C.1
  • 11
    • 68249127079 scopus 로고    scopus 로고
    • Fault Tolerance in Petascale/exascale systems: Current knowledge, challenges and research opportunities
    • F. Cappello, Fault Tolerance in Petascale/exascale systems: Current knowledge, challenges and research opportunities, Int. J. High Perform. Comput. Appl. 23(3) (2009), pp. 212-226.
    • (2009) Int. J. High Perform. Comput. Appl. , vol.23 , Issue.3 , pp. 212-226
    • Cappello, F.1
  • 13
    • 78649985381 scopus 로고    scopus 로고
    • Cloud computing for parallel scientific HPC applications: Feasibility of running coupled atmosphere-ocean climate models on amazon's EC2
    • (CCA-08), October 2008, Chicago, IL ACM
    • C. Evangelinos and C.N. Hill, Cloud computing for parallel scientific HPC Applications: Feasibility of running coupled Atmosphere-Ocean climate models on Amazon's EC2, in Cloud Computing and Its Applications 2008 (CCA-08), October 2008, Chicago, IL, ACM.
    • (2008) Cloud Computing and Its Applications
    • Evangelinos, C.1    Hill, C.N.2
  • 14
    • 84863649355 scopus 로고    scopus 로고
    • A fault tolerance framework for high performance computing in cloud, in cluster, cloud and grid computing (CCGrid)
    • Ottawa, Canada, IEEE
    • I.P. Egwutuoha, S. Chen, D. Levy, and B. Selic, A fault tolerance framework for high performance computing in cloud, in Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium, Ottawa, Canada, IEEE, 2012, pp. 709-710.
    • (2012) 2012 12th IEEE/ACM International Symposium , pp. 709-710
    • Egwutuoha, I.P.1    Chen, S.2    Levy, D.3    Selic, B.4
  • 18
    • 84898908341 scopus 로고    scopus 로고
    • [Online]. Aavailable At
    • Lm-sensors. [Online]. Aavailable at. http://lm-sensors.org/wiki/ Documentation
  • 21
    • 84898918714 scopus 로고    scopus 로고
    • Available at
    • Open-iscsi, 2013. Available at: Http://www.open-iscsi.org
    • (2013)
  • 22
    • 84898845581 scopus 로고    scopus 로고
    • Xen Hypervisor. [Online]. Available at
    • Xen, Xen hypervisor. [Online]. Available at. http://www.xen.org/products/ xenhyp.html
  • 24
    • 0042078549 scopus 로고    scopus 로고
    • A survey of rollback-recovery protocols in message-passing systems
    • E.N.M. Elnozahy, L. Alvisi, Y.M. Wang, and D.B. Johnson, A survey of rollback-recovery protocols in message-passing systems, ACM Comput. Surv. (CSUR) 34(3) (2002), pp. 375-408.
    • (2002) ACM Comput. Surv. (CSUR) , vol.34 , Issue.3 , pp. 375-408
    • Elnozahy, E.N.M.1    Alvisi, L.2    Wang, Y.M.3    Johnson, D.B.4
  • 25
    • 84898921532 scopus 로고    scopus 로고
    • [Online]. Available at
    • Checkpointing.org, [Online]. Available at. http://checkpointing.org/
  • 26
    • 84897731347 scopus 로고    scopus 로고
    • A brief review of cloud computing, challenges and potential solutions
    • I.P. Egwutuoha, D. Schragl, and R. Calvo, A Brief Review of Cloud Computing, Challenges and Potential Solutions, J. Parallel Cloud Comput. 2(1) (2013).
    • (2013) J Parallel Cloud Comput. , vol.2 , Issue.1
    • Egwutuoha, I.P.1    Schragl, D.2    Calvo, R.3
  • 27
    • 0029633168 scopus 로고
    • GROMACS: A message-passing parallel molecular dynamics implementation
    • H.J. Berendsen, D. van der Spoel, and R. van Drunen, GROMACS: A message-passing parallel molecular dynamics implementation, Comput. Phys. Commun. 91(1) (1995), pp. 43-56.
    • (1995) Comput. Phys. Commun. , vol.91 , Issue.1 , pp. 43-56
    • Berendsen, H.J.1    Van Der Spoel, D.2    Van Drunen, R.3
  • 30
    • 84881374819 scopus 로고    scopus 로고
    • A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems
    • 10.1007/s11227-013-0884-0890
    • I.P. Egwutuoha, D. Levy, B. Selic, and S. Chen, A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems, J. Supercomput. (2013). 10.1007/s11227-013-0884-0
    • (2013) J. Supercomput.
    • Egwutuoha, I.P.1    Levy, D.2    Selic, B.3    Chen, S.4
  • 31
    • 28044460018 scopus 로고    scopus 로고
    • A higher order estimate of the optimum checkpoint interval for restart dumps
    • J.T. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, Generation Comput. Syst. 22 (2006), pp. 303-312.
    • (2006) Generation Comput. Syst. , vol.22 , pp. 303-312
    • Daly, J.T.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.