메뉴 건너뛰기




Volumn , Issue , 2008, Pages 157-164

Dynamic meta-learning for failure prediction in large-scale systems: A case study

Author keywords

[No Author keywords available]

Indexed keywords

CASE STUDIES; COMPONENT RELIABILITIES; FAILURE PATTERNS; FAILURE PREDICTIONS; FALSE ALARM RATES; FAULT MANAGEMENTS; HIGH PERFORMANCE COMPUTING; OPEN PROBLEMS; PREDICTION ACCURACIES; PREDICTION ENGINES; RELIABLE COMPONENTS; SYSTEM OPERATIONS; SYSTEM SIZES; TRAINING SETS;

EID: 55849147399     PISSN: 01903918     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICPP.2008.17     Document Type: Conference Paper
Times cited : (47)

References (35)
  • 1
    • 55849143236 scopus 로고    scopus 로고
    • R. Agrawal, R. Srikant, Fast Algorithms for Mining Association Rules, VLDB. Sep 12-15 1994, Chile, 487-99
    • R. Agrawal, R. Srikant, "Fast Algorithms for Mining Association Rules", VLDB. Sep 12-15 1994, Chile, 487-99
  • 4
    • 9144223280 scopus 로고    scopus 로고
    • Checkpointing for PetaScale Systems: A Look into the Future of Practical Rollback-Recovery
    • E. Elnozahy and J. S. Plank, "Checkpointing for PetaScale Systems: A Look into the Future of Practical Rollback-Recovery", IEEE Transactions on Dependable and Secure Computing, Volume 1, Number 2, 2004, pp. 97-108.
    • (2004) IEEE Transactions on Dependable and Secure Computing , vol.1 , Issue.2 , pp. 97-108
    • Elnozahy, E.1    Plank, J.S.2
  • 5
    • 56749178938 scopus 로고    scopus 로고
    • Exploring Event Correlation for Failure Prediction in Coalitions of Clusters
    • S. Fu, C. Z. Xu, "Exploring Event Correlation for Failure Prediction in Coalitions of Clusters", Proc of SC 2007, 2007.
    • (2007) Proc of SC , pp. 2007
    • Fu, S.1    Xu, C.Z.2
  • 6
    • 21044437801 scopus 로고    scopus 로고
    • l, "Overview of the Blue Gene/L System Architecture
    • A. Gara, M. A. Blumrich et a.l, "Overview of the Blue Gene/L System Architecture", IBM J. Res. & Dev. 49, No. 2/3, 195-212, 2005.
    • (2005) IBM J. Res. & Dev , vol.49 , Issue.2-3 , pp. 195-212
    • Gara, A.1    Blumrich, M.A.2    et a3
  • 8
    • 33845434226 scopus 로고    scopus 로고
    • Transparent Incremental Checkpointing at Kernel Level: A Foundation for Fault Tolerance for Parallel Computers
    • R. Gioiosa, J. Sancho, S. Jiang, F. Petrini, K. Davis, "Transparent Incremental Checkpointing at Kernel Level: A Foundation for Fault Tolerance for Parallel Computers", Proc. of SC2005, 2005.
    • (2005) Proc. of SC2005
    • Gioiosa, R.1    Sancho, J.2    Jiang, S.3    Petrini, F.4    Davis, K.5
  • 12
    • 0012253727 scopus 로고    scopus 로고
    • Bayesian Approaches to Failure Prediction for Disk Drives
    • G. Hamerly and C. Elkan, "Bayesian Approaches to Failure Prediction for Disk Drives", Proc. of ICML, 2001.
    • (2001) Proc. of ICML
    • Hamerly, G.1    Elkan, C.2
  • 16
    • 33751082401 scopus 로고    scopus 로고
    • Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing
    • Y. Li, Z. Lan, "Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing", Proc. of IEEE CCGrid'06, 2006.
    • (2006) Proc. of IEEE CCGrid'06
    • Li, Y.1    Lan, Z.2
  • 19
    • 36049013419 scopus 로고    scopus 로고
    • What Supercomputers Say: A Study of Five System Logs
    • A. Oliner and J. Stearly, "What Supercomputers Say: A Study of Five System Logs", Proc. of DSN 2001.
    • Proc. of DSN 2001
    • Oliner, A.1    Stearly, J.2
  • 20
    • 12444257746 scopus 로고    scopus 로고
    • A. Oliner, Ramendra K. Sahoo, José E. Moreira, Manish Gupta, Anand Sivasubramaniam, Fault-Aware Job Scheduling for BlueGene/L Systems, IPDPS, 2004.
    • A. Oliner, Ramendra K. Sahoo, José E. Moreira, Manish Gupta, Anand Sivasubramaniam, "Fault-Aware Job Scheduling for BlueGene/L Systems", IPDPS, 2004.
  • 21
    • 33748611921 scopus 로고    scopus 로고
    • Ensemble Based Systems in Decision Making
    • R. Polikar, "Ensemble Based Systems in Decision Making", IEEE Circuits and Systems Magazine, vol.6, no. 3, pp. 21-45, 2006.
    • (2006) IEEE Circuits and Systems Magazine , vol.6 , Issue.3 , pp. 21-45
    • Polikar, R.1
  • 23
    • 77952378080 scopus 로고    scopus 로고
    • Critical event prediction for proactive management in large-scale computer clusters
    • R.K. Sahoo, A.J. Oliner et al., "Critical event prediction for proactive management in large-scale computer clusters", Proc. of KDD, 2003, pp. 426-435.
    • (2003) Proc. of KDD , pp. 426-435
    • Sahoo, R.K.1    Oliner, A.J.2
  • 24
    • 47249157799 scopus 로고    scopus 로고
    • Advanced Failure Prediction in Complex Software Systems
    • Hoffmann, Salfner et al, "Advanced Failure Prediction in Complex Software Systems", Proc. of SRDS, 2004.
    • (2004) Proc. of SRDS
    • Hoffmann, S.1
  • 25
    • 84934312471 scopus 로고    scopus 로고
    • M. Schulz, G. Bronevetsky, R. Fernandes, D. Marques, K. Pingali, P. Stodghill, Implementation and Evaluation of a Scalable Application-level Checkpoint-Recovery Scheme for MPI Programs, Supercomputing 2004. November 6-12, 2004.
    • M. Schulz, G. Bronevetsky, R. Fernandes, D. Marques, K. Pingali, P. Stodghill, "Implementation and Evaluation of a Scalable Application-level Checkpoint-Recovery Scheme for MPI Programs", Supercomputing 2004. November 6-12, 2004.
  • 29
    • 33644804204 scopus 로고    scopus 로고
    • MSET Performance Optimization for Detection of Softtware Aging
    • K. Vaidyanathan and K. Gross, "MSET Performance Optimization for Detection of Softtware Aging", Proc. of ISSRE, 2003.
    • (2003) Proc. of ISSRE
    • Vaidyanathan, K.1    Gross, K.2
  • 31
    • 34548768671 scopus 로고    scopus 로고
    • A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance
    • C. Wang and F. Mueller and C. Engelmann and S. Scott, "A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance", Proc. of IPDPS ,2007.
    • (2007) Proc. of IPDPS
    • Wang, C.1    Mueller, F.2    Engelmann, C.3    Scott, S.4
  • 34
    • 47249110817 scopus 로고    scopus 로고
    • Home
    • SDSC Blue Gene/L Homepage. www.sdsc.edu/us/resources/bluegene
    • SDSC Blue Gene/L


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.