메뉴 건너뛰기




Volumn 25, Issue 4, 2014, Pages 1034-1043

Reliability of heterogeneous distributed computing systems in the presence of correlated failures

Author keywords

Distributed computing; load balancing; non Markovian process; reliability; Shared risk group; spatially correlated failures

Indexed keywords

DISTRIBUTED COMPUTER SYSTEMS; RESOURCE ALLOCATION;

EID: 84896377371     PISSN: 10459219     EISSN: None     Source Type: Journal    
DOI: 10.1109/TPDS.2013.78     Document Type: Article
Times cited : (16)

References (31)
  • 2
    • 0035390490 scopus 로고    scopus 로고
    • Maximizing reliability of distributed computing system with task allocation using simple genetic algorithm
    • PII S1383762101000133
    • D. Vidyarthi et al., "Maximizing Reliability of a Distributed Computing System with Task Allocation Using Simple Genetic Algorithm," J. Systems Architecture, vol. 47, pp. 549-554, 2001. (Pubitemid 32768279)
    • (2001) Journal of Systems Architecture , vol.47 , Issue.7 , pp. 549-554
    • Vidyarthi, D.P.1    Tripathi, A.K.2
  • 3
    • 33845570456 scopus 로고    scopus 로고
    • Empirical and analytical evaluation of systems with multiple unreliable servers
    • DOI 10.1109/DSN.2006.32, 1633540, Proceedings - DSN 2006: 2006 International Conference on Dependable Systems and Networks
    • J. Palmer et al., "Empirical and Analytical Evaluation of Systems with Multiple Unreliable Servers," Proc. Int'l Conf. Dependable Systems and Networks, pp. 517-525, 2006. (Pubitemid 44930452)
    • (2006) Proceedings of the International Conference on Dependable Systems and Networks , vol.2006 , pp. 517-525
    • Palmer, J.1    Mitrani, I.2
  • 4
    • 33845593340 scopus 로고    scopus 로고
    • A large-scale study of failures in high-performance computing systems
    • DOI 10.1109/DSN.2006.5, 1633514, Proceedings - DSN 2006: 2006 International Conference on Dependable Systems and Networks
    • B. Schroeder et al., "A Large-Scale Study of Failures in High-Performance Computing Systems," Proc. Int'l Conf. Dependable Systems and Networks, pp. 249-258, 2006. (Pubitemid 44930426)
    • (2006) Proceedings of the International Conference on Dependable Systems and Networks , vol.2006 , pp. 249-258
    • Schroeder, B.1    Gibson, G.A.2
  • 6
    • 78349257397 scopus 로고    scopus 로고
    • A model for space-correlated failures in large-scale distributed systems
    • M. Gallet et al., "A Model for Space-Correlated Failures in Large-Scale Distributed Systems," Proc. 16th Euro-Par Conf. Parallel Processing, pp. 88-100, 2010.
    • (2010) Proc. 16th Euro-Par Conf. Parallel Processing , pp. 88-100
    • Gallet, M.1
  • 7
    • 81455156173 scopus 로고    scopus 로고
    • Prefail: A programmable failure-injection framework
    • EECS Dept., Univ. of California, Berkeley Apr.
    • P. Joshi et al., "Prefail: A Programmable Failure-Injection Framework," Technical Report UCB/EECS-2011-30, EECS Dept., Univ. of California, Berkeley, http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011- 30.html, Apr. 2011.
    • (2011) Technical Report UCB/EECS-2011-2030
    • Joshi, P.1
  • 9
    • 84861726834 scopus 로고    scopus 로고
    • Performance and Reliability of Non-Markovian Heterogeneous Distributed Computing Systems
    • July
    • J.E. Pezoa et al., "Performance and Reliability of Non-Markovian Heterogeneous Distributed Computing Systems," IEEE Trans. Parallel and Distributed Systems, vol. 23, no. 7, pp. 1288-1301, July 2012.
    • (2012) IEEE Trans. Parallel and Distributed Systems , vol.23 , Issue.7 , pp. 1288-1301
    • Pezoa, J.E.1
  • 10
    • 38849123050 scopus 로고    scopus 로고
    • Tolerating correlated failures in wide-area monitoring services
    • Intel Research
    • S. Nath et al., "Tolerating Correlated Failures in Wide-Area Monitoring Services," technical report, Intel Research, 2004.
    • (2004) Technical Report
    • Nath, S.1
  • 12
    • 79551631221 scopus 로고    scopus 로고
    • Assessing the impact of geographically correlated failures on overlay-based data dissemination
    • K. Kim et al., "Assessing the Impact of Geographically Correlated Failures on Overlay-Based Data Dissemination," Proc. IEEE GLOBECOM, pp. 1-5, 2010.
    • (2010) Proc. IEEE GLOBECOM , pp. 1-5
    • Kim, K.1
  • 14
    • 0036445057 scopus 로고    scopus 로고
    • Introspective failure analysis: Avoiding correlated failures in peer-to-peer systems
    • H. Weatherspoo et al., "Introspective Failure Analysis: Avoiding Correlated Failures in Peer-to-Peer Systems," Proc. IEEE Symp. Reliable Distributed Systems, 2002.
    • (2002) Proc. IEEE Symp. Reliable Distributed Systems
    • Weatherspoo, H.1
  • 15
    • 0034155447 scopus 로고    scopus 로고
    • Failure Correlation in Software Reliability Model
    • Mar
    • K. Goseva-Popstojanova et al., "Failure Correlation in Software Reliability Model," IEEE Trans. Reliability, vol. 49, pp. 37-48, Mar. 2000.
    • (2000) IEEE Trans. Reliability , vol.49 , pp. 37-48
    • Goseva-Popstojanova, K.1
  • 16
    • 15544381240 scopus 로고    scopus 로고
    • Modeling and analysis of correlated software failures of multiple types
    • DOI 10.1109/TR.2004.841709
    • Y. Dai et al., "Modeling and Analysis of Correlated Software Failures of Multiple Types," IEEE Trans. Reliability, vol. 54, pp. 100-106, Mar. 2005. (Pubitemid 40400318)
    • (2005) IEEE Transactions on Reliability , vol.54 , Issue.1 , pp. 100-106
    • Dai, Y.-S.1    Xie, M.2    Poh, K.-L.3
  • 17
    • 34547865669 scopus 로고    scopus 로고
    • Performance and reliability of tree-structured grid services considering data dependence and failure correlation
    • DOI 10.1109/TC.2007.1018
    • Y.-S. Dai et al., "Performance and Reliability of Tree-Structured Grid Services Considering Data Dependence and Failure Correlation," IEEE Trans. Computers, vol. 56, pp. 925-936, July 2007. (Pubitemid 47249807)
    • (2007) IEEE Transactions on Computers , vol.56 , Issue.7 , pp. 925-936
    • Dai, Y.-S.1    Levitin, G.2    Trivedi, K.S.3
  • 18
    • 78651554943 scopus 로고    scopus 로고
    • Estimating system reliability with correlated component failures
    • L. Fiondella et al., "Estimating System Reliability with Correlated Component Failures," Int'l J. Reliability and Safety, vol. 4, no. 2/3, pp. 188-205, 2010.
    • (2010) Int'l J. Reliability and Safety , vol.4 , Issue.2-3 , pp. 188-205
    • Fiondella, L.1
  • 19
    • 77954903245 scopus 로고    scopus 로고
    • The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems
    • D. Kondo et al., "The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems," Proc. 10th IEEE/ACM Int'l Conf. Cluster, Cloud and Grid Computing, pp. 398-407, 2010.
    • (2010) Proc. 10th IEEE/ACM Int'l Conf. Cluster, Cloud and Grid Computing , pp. 398-407
    • Kondo, D.1
  • 21
    • 70450001349 scopus 로고    scopus 로고
    • Robust sequential resource allocation in heterogeneous distributed systems with random compute node failures
    • V. Shestak et al., "Robust Sequential Resource Allocation in Heterogeneous Distributed Systems with Random Compute Node Failures," Proc. IEEE Int'l Symp. Parallel and Distributed Processing, 2009.
    • (2009) Proc. IEEE Int'l Symp. Parallel and Distributed Processing
    • Shestak, V.1
  • 23
    • 0347598689 scopus 로고    scopus 로고
    • Inference of shared risk link groups internet draft ietf
    • D. Papadimitriou et al., "Inference of Shared Risk Link Groups Internet Draft IETF, Internet Draft," http://www3.tools.ietf.org/html/ draft-many-inference-srlg-02, 2002.
    • (2002) Internet Draft
    • Papadimitriou, D.1
  • 24
    • 80051789606 scopus 로고    scopus 로고
    • Cross-layer survivability in wdm-based networks
    • Aug
    • K. Lee et al., "Cross-Layer Survivability in WDM-Based Networks," IEEE/ACM Trans. Networking, vol. 19, no. 4, pp. 1000-1013, Aug. 2011.
    • (2011) IEEE/ACM Trans. Networking , vol.19 , Issue.4 , pp. 1000-1013
    • Lee, K.1
  • 25
    • 84896368626 scopus 로고    scopus 로고
    • The failure trace archive
    • "The Failure Trace Archive," INRIA, http://fta.inria.fr, 2012.
    • (2012) INRIA
  • 27
    • 84655169956 scopus 로고    scopus 로고
    • Assessing the vulnerability of the fiber infrastructure to disasters
    • Dec
    • S. Neumayer et al., "Assessing the Vulnerability of the Fiber Infrastructure to Disasters," IEEE/ACM Trans. Networking, vol. 19, no. 6, pp. 1610-1623, Dec. 2011.
    • (2011) IEEE/ACM Trans. Networking , vol.19 , Issue.6 , pp. 1610-1623
    • Neumayer, S.1
  • 28
    • 77956170257 scopus 로고    scopus 로고
    • Maximizing Service Reliability in Distributed Computing Systems with Random Failures: Theory and Implementation
    • Oct
    • J.E. Pezoa et al., "Maximizing Service Reliability in Distributed Computing Systems with Random Failures: Theory and Implementation," IEEE Trans. Parallel and Distributed Systems, vol. 21, no. 10, pp. 1531-1544, Oct. 2010.
    • (2010) IEEE Trans. Parallel and Distributed Systems , vol.21 , Issue.10 , pp. 1531-1544
    • Pezoa, J.E.1
  • 29
    • 84896383282 scopus 로고    scopus 로고
    • The grid5000
    • "The Grid5000," INRIA, http://www.grid5000.fr, 2012.
    • (2012) INRIA
  • 30
    • 84896371556 scopus 로고    scopus 로고
    • The grid workload archive
    • "The Grid Workload Archive," PDS Group, TU Delft, http://gwa.ewi.tudelft.nl, 2012.
    • (2012) PDS Group, TU Delft


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.