메뉴 건너뛰기




Volumn , Issue , 2006, Pages 531-538

Exploit failure prediction for adaptive fault-tolerance in cluster computing

Author keywords

[No Author keywords available]

Indexed keywords

CLUSTER COMPUTING; DYNAMIC DECISION; EXECUTION TIME; FAILURE PREDICTION;

EID: 33751082401     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/CCGRID.2006.45     Document Type: Conference Paper
Times cited : (51)

References (25)
  • 3
    • 33751116008 scopus 로고    scopus 로고
    • Coordinated checkpoint versus message logging for fault tolerance MPI
    • A. Bouteiller et al., "Coordinated Checkpoint versus Message Logging for Fault Tolerance MPI", Proc. of IEEE Cluster03, 2003
    • (2003) Proc. of IEEE Cluster03
    • Bouteiller, A.1
  • 6
    • 0025414335 scopus 로고
    • Optimal strategies for scheduling checkpoints and preventive maintenance
    • Apr.
    • E.G. Coffman and E.N. Gilbert, "Optimal Strategies for Scheduling Checkpoints and Preventive Maintenance", IEEE Trans. Reliability, vol. 39, no. 1, pp. 9-18, Apr. 1990.
    • (1990) IEEE Trans. Reliability , vol.39 , Issue.1 , pp. 9-18
    • Coffman, E.G.1    Gilbert, E.N.2
  • 7
    • 0020765766 scopus 로고
    • The effects of checkpointing on program execution time
    • June
    • A. Duda, "The Effects of Checkpointing on Program Execution Time", Information Processing Letters, vol. 16, no. 5, pp. 221-229,June 1983.
    • (1983) Information Processing Letters , vol.16 , Issue.5 , pp. 221-229
    • Duda, A.1
  • 10
    • 33751118952 scopus 로고    scopus 로고
    • Ph.D. thesis, University of Illinois at Urbana-Champaign
    • Charng-Da Lu, Ph.D. thesis, University of Illinois at Urbana-Champaign, 2005
    • (2005)
    • Lu, C.-D.1
  • 11
    • 33746286070 scopus 로고    scopus 로고
    • Performance implications of periodic checkpointing on large-scale cluster systems
    • A. Oliner, Ramendra K. Sahoo, José E. Moreira, Meeta S. Gupta, "Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems", IPDPS 2005
    • IPDPS 2005
    • Oliner, A.1    Sahoo, R.K.2    Moreira, J.E.3    Gupta, M.S.4
  • 14
    • 4544342875 scopus 로고    scopus 로고
    • Min-max checkpoint placement under incomplete failure information
    • Tatsuya Ozaki, Tadashi Dohi, Hiroyuki Okamura, Naoto Kaio, "Min-Max Checkpoint Placement under Incomplete Failure Information", DSN 2004: 721-730
    • DSN 2004 , pp. 721-730
    • Ozaki, T.1    Dohi, T.2    Okamura, H.3    Kaio, N.4
  • 15
    • 4544336600 scopus 로고    scopus 로고
    • Proactive recovery in distributed CORBA applications
    • Soila M. Pertet, Priya Narasimhan, "Proactive Recovery in Distributed CORBA Applications", DSN 2004: 357-366
    • DSN 2004 , pp. 357-366
    • Pertet, S.M.1    Narasimhan, P.2
  • 16
    • 2642553870 scopus 로고    scopus 로고
    • A dynamic checkpointing scheme based on reinforcement learning
    • Hiroyuki Okamura, Yuki Nishimura, Tadashi Dohi: A Dynamic Checkpointing Scheme Based on Reinforcement Learning. PRDC 2004: 151-158
    • PRDC 2004 , pp. 151-158
    • Okamura, H.1    Nishimura, Y.2    Dohi, T.3
  • 17
    • 0002991145 scopus 로고
    • Ickp: A consistent checkpointer for multicomputers
    • Summer
    • J. S. Plank and K. Li, "Ickp: a consistent checkpointer for multicomputers", IEEE Parallel & Distributed Technology, 2 (2):62-67, Summer 1994.
    • (1994) IEEE Parallel & Distributed Technology , vol.2 , Issue.2 , pp. 62-67
    • Plank, J.S.1    Li, K.2
  • 20
    • 77952378080 scopus 로고    scopus 로고
    • Anand sivasubramaniam: Critical event prediction for proactive management in large-scale computer clusters
    • Ramendra K. Sahoo, A. Oliner, Irina Rish, Manish Gupta, José E. Moreira, Sheng Ma, Ricardo Vilalta, Anand Sivasubramaniam: Critical event prediction for proactive management in large-scale computer clusters. KDD 2003: 426-435
    • KDD 2003 , pp. 426-435
    • Sahoo, R.K.1    Oliner, A.2    Rish, I.3    Gupta, M.4    Moreira, J.E.5    Sheng, M.6    Vilalta, R.7
  • 21
    • 27544438268 scopus 로고    scopus 로고
    • A performability-oriented software rejuvenation framework for distributed applications
    • Ann T. Tai, Kam S. Tso, William H. Sanders, Savio N. Chau: A Performability-Oriented Software Rejuvenation Framework for Distributed Applications. DSN 2005: 570-579.
    • DSN 2005 , pp. 570-579
    • Tai, A.T.1    Tso, K.S.2    Sanders, W.H.3    Chau, S.N.4
  • 22
    • 78149354391 scopus 로고    scopus 로고
    • Predicting rare events in temporal domains
    • Ricardo Vilalta, Sheng Ma: Predicting Rare Events In Temporal Domains. ICDM 2002: 474-481
    • ICDM 2002 , pp. 474-481
    • Vilalta, R.1    Sheng, M.2
  • 23
    • 85166352696 scopus 로고    scopus 로고
    • Learning to predict rare events in event sequences
    • Gary M. Weiss, Haym Hirsh: Learning to Predict Rare Events in Event Sequences. KDD 1998: 359-363
    • KDD 1998 , pp. 359-363
    • Weiss, G.M.1    Hirsh, H.2
  • 24
    • 33750743393 scopus 로고    scopus 로고
    • Learning to predict rare events in categorical time-series data
    • Gary M. Weiss, Haym Hirsh, Learning to Predict Rare Events in Categorical Time-Series Data, AAAI Workshop, 1998
    • (1998) AAAI Workshop
    • Weiss, G.M.1    Hirsh, H.2
  • 25
    • 84976846528 scopus 로고
    • A first order approximation to the optimal checkpoint interval
    • John W. Young, "A First Order Approximation to the Optimal Checkpoint Interval", Comm. ACM 17(9): 530-531(1974)
    • (1974) Comm. ACM , vol.17 , Issue.9 , pp. 530-531
    • Young, J.W.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.